[Various] Ashes of the Singularity DX12 Benchmarks - Page 118 - Overclock.net - An Overclocking Community

Forum Jump: 

[Various] Ashes of the Singularity DX12 Benchmarks

 
Thread Tools
post #1171 of 2682 (permalink) Old 08-28-2015, 03:57 PM
New to Overclock.net
 
Themisseble's Avatar
 
Join Date: Oct 2013
Posts: 2,001
Rep: 38 (Unique: 29)
Quote:
Originally Posted by sugarhell View Post

Who the heck use 12 TAA duration? This is so blur. Its impractical and none use this huge amount

especially for strategy game... just res scaling or MSAA.

@ Mahigan maybe you could reply to Brad or who ever you were talking from stardock those two benchmakrs and show that in benchmark FX 8350 clearly bottleneck R9 290X.
Themisseble is offline  
Sponsored Links
Advertisement
 
post #1172 of 2682 (permalink) Old 08-28-2015, 04:37 PM
New to Overclock.net
 
Mahigan's Avatar
 
Join Date: Aug 2015
Location: Ottawa, Canada
Posts: 1,749
Rep: 874 (Unique: 233)
Quote:
Originally Posted by Themisseble View Post

especially for strategy game... just res scaling or MSAA.

@ Mahigan maybe you could reply to Brad or who ever you were talking from stardock those two benchmakrs and show that in benchmark FX 8350 clearly bottleneck R9 290X.

I don't think it is CPU related.

PontiacGTX shared this link with me and I believe he is onto something: http://www.hardwaresecrets.com/everything-you-need-to-know-about-the-hypertransport-bus/4/
Quote:
HyperTransport 3.0 adds the following new clock rates, keeping compatibility with HT1 and HT2 rates (transfer rates assuming 16-bit links, which is the configuration used by AMD processors):
1,800 MHz = 3,600 MT/s = 7,200 MB/s
2,000 MHz = 4,000 MT/s = 8,000 MB/s
2,400 MHz = 4,800 MT/s = 9,600 MB/s
2,600 MHz = 5,200 MT/s = 10,400 MB/s
Sometimes you will see the MT/s numbers published as MHz, as already discussed.
Socket AM2+ and AM3 processors and their companion chipsets, however, are limited to the 8,000 MB/s transfer rate. Only socket AM3+ CPUs and chipsets are capable of using all the speeds published above. Of course, all CPUs and chipsets are compatible with the lower transfer rates available.
Keep in mind that socket AM2+ processors can still be installed on socket AM2 motherboards, however, their HyperTransport bus will be limited to HT2 speeds.
Once again, the transfer rates announced by the HyperTransport consortium are highly exaggerated. They announce HyperTransport 3.0 as having a maximum transfer rate of 41.6 GB/s. To reach this number they considered 32-bit links (and not 16-bit links) and doubled the number found by two because there are two links available. The math used was 2,600 MHz x 32 x 2 / 8 x 2 links. As we have already explained, AMD processors use 16-bit links, not 32-bit ones, and we don’t agree with the methodology of doubling the transfer rate, done because there is one link for transmitting and another for receiving data. We would only agree with this if the links were in the same direction.

Now granted the AMD 990FX uses a 3200MHz HT 3.1 link which results in 6,400 MT/s or 12,800 MB/s now look at the schematic below:


The AMD FX Processor communicates with the 990FX Northbridge at 12.8GB/s which talks to the PCIe 2.0 ports at 16GB/s. Therefore, for all intents and purposes, the AMD FX Processor talks to the Graphics card at 12.8GB/s, even if the Graphics card is running on a PCIe 2.0 x16 port.

Now we know that the a PCIe 2.0 x8 slot (8 GB/s) bottlenecks an AMD R9 290 under Ashes of the Singularity. Therefore the culprit for poor AMD performance could very well be the Hypertransport Link.

Take Battlefield 4, it's a DX11 title that is heavy on draw calls (for a DX11 game):

PCIe 2.0 x8 is saturated already (8 GB/s). Now imagine having all those CPU cores, now available in DX12, making draw calls ontop of the textures etc travelling over the bus? For an AMD system, this is further compounded by the slow HT 3.1 link (12.8GB/s) and that's in the best case scenario (990/FX chipset). If you're using a 970 chipset, you're knocked down to HT 3.0 or 10.4 GB/s. The 3D Mark Overhead API test isn't sending textures either (or any other heavy command), it's only sending draw calls. So it really wouldn't show up on that test.

Again... just a theory.

"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." - Arthur Conan Doyle (Sherlock Holmes)
Mahigan is offline  
post #1173 of 2682 (permalink) Old 08-28-2015, 05:25 PM
New to Overclock.net
 
Serandur's Avatar
 
Join Date: Feb 2014
Posts: 1,648
Rep: 217 (Unique: 122)
@Mahigan

This is just my opinion/takeaway based on a lot of what you said across these forums as well as some others (on the partisanship thing, on theoretical capabilities, and on the role of DX12 in Maxwell vs GCN).


First off on the partisanship thing; I agree initiatives like Gameworks are harmful and detrimental... for users of any GPU vendor. Regardless of whether I've been using flagship Radeon or GeForce parts, a lot of Gameworks effects seem unduly demanding (obviously geared more towards the former, however at least Radeons get the benefit of a CCC tesselation slider that can lessen the insanity). Sometimes, even PhysX (on older titles) gives me wonky performance for little/no discernible reason on GeForce cards. Marketing is nasty too imo, but that's always kind of been there. It can be somewhat countered by educated reasoning and discussions on sites such as these, but the tech media will often glance over major issues if it threatens a favored hardware manufacturer.

However, if you're talking partisanship on the level of consumers; it cannot be eradicated imho. Even a perfectly neutral poster pointing out scenarios where there is an objective disparity between two products can be lending ammo for one biased party or be perceived as a biased threat by the other (as I'm sure you've already noticed; it's kind of an instinct everyone has to some degree). To change that, you'd have to alter the fundamental nature of human psychology in which one viewpoint (that person's own) is most salient and their own interests perceived as most important. The mere perception of partisanship (by one self-classified group regarding another) is enough to invoke or strengthen it on the other side as a defense.




On theoretical capabilities and DX12, I'm going to first note that extremely in-depth technical examination (especially over the course of dozens of pages) can make for a convoluted discussion that most people will not fully follow. It is with great pride for my hobby that I say microprocessors are easily among the most sophisticated technologies mankind produces. And so most people even on a site like this will not get much out of seeing a lot of technical jargon explaining obscure specifications considering this is largely not a professional microarchitecture engineering forum.

There's been a lot of discussion and speculation over a still-limited set of data to simply and effectively say ACEs are effective towards properly feeding GCN's shaders and making efficient use of GCN's cycles whereas Maxwell is fairly well-designed to do so regardless. Now here's where the theoretical capabilities come in.



One very important and quintessential factor in any discussion involving GCN vs Maxwell is clock speed and I feel like the comparisons made in this thread are unfair towards understanding the distinction on both a theoretical and realistic level:



If there is any area where I feel both AMD and Nvidia's marketing teams have screwed up in recent years, it would be the reference coolers on Hawaii and GM200 respectively, given the effect it seems to have on how people judge the chips' capabilities. There is reason to believe, given the 980 Ti's poor scaling in the computerbase.de DX12 benchmark, that the reference model wasn't even maintaining it's default boost speed (which is already quite conservative). Pretty much every 980 Ti (particularly the ones educated enthusiasts tend to buy; ie. aftermarket/custom models) can consistently do significantly more than the 1076 MHz the reference model may fall back on by default. ~1450-1500 is both a realistic goal and limit for the 980 Ti on air whereas Hawaii and Fiji both have a more conservative realistic range on air of about 1100-1180 MHz). Naturally, many things are amplified by that specification, so it is fairly important.

From experience, aftermarket 980 Ti's (such as one pcgameshardware.de used) often consistently maintain ~1350-1400 MHz out of the box. Mine certainly does (technically, 1405-1418 MHz with Gigabyte's OC mode and 1367-1380 MHz without). Aftermarket Hawaii cards tend to be around 1050-1100 MHz out of the box (with a significant memory overclock) and Fiji XT is simply 1050 MHz. How this affects theoretical capabilities is huge:



390X:

(2816 shaders x 2)*1.05 = ~5914 GFLOPS

4 rasterizers x 1.05 = 4.2 Gtri/s

64 ROPs x 1.05 = ~67 Gpixel/s

176 TMUs x 1.05 = ~185 Gtexel/s (~93 or half that with fp16/int16)



Fury X:

(4096 x 2)*1.05 = ~8601 GFLOPS

4 rasterizers x 1.05 = 4.2 Gtri/s

64 ROPs x 1.05 = ~67 Gpixel/s

256 TMUs x 1.05 = ~269 Gtexel/s (~135 or half that with fp16/int16)


980 Ti:

(2816 shaders x 2)*1.35 = ~7603 GFLOPS (~29% higher than 390X, ~12% less than Fury X)

6 rasterizers x 1.35 = 8.1 Gtri/s (~93% more or nearly double the 390X/Fury X)

96 ROPs x 1.35 = ~130 Gpixel/s (~93% more or nearly double the 390X/Fury X)

176 TMUs x 1.35 = ~238 Gtexel/s (maintains same rate at fp16/int16)



Visual proof of GCN vs Kepler/Maxwell's fp16/int16 texture filtering rates; not sure about relevance:
Warning: Spoiler! (Click to show)






Additional overclocking capability of all three parts tends to be about another ~10% more than that, so their relative positioning stays more-or-less consistent going even further. I know it's a long post and I thank anyone for reading the entire thing, but this is why I find it highly difficult to believe that even with both at theoretical peak performance, GCN will show any significant advantage that would allow them to outlast their Maxwell competitors in any significant way. Even the Pcper review supposedly utilizing the ACEs did not show an unspecified 390X (for which there is no reference model) more than ~5-13% ahead of a conservatively-clocked reference 980 (against which the 980 Ti has between 37.5% to 50% more of everything while being able to clock nearly every bit as high; in other words, it should be untouchable against the 390X).






Accordingly, even an unspecified 390 with DX12 in the computerbase.de review is only exhibiting about ~10% more than a 970 with DX11 (which should really not be faster than DX12 for the card) and the 280X exhibits a similarly small lead over GTX 770.










Speculation is all well and good, but drawing conclusions about GCN and Maxwell's longevity based on these is pretty weak and that's where some of the disagreement is coming from, imo. The only conclusions I can see from this are inconclusiveness... that, and don't buy reference 980 Tis if you can avoid it; aftermarket ones are in a whole other league when the reference ones choke on their own heat.

Serandur is offline  
Sponsored Links
Advertisement
 
post #1174 of 2682 (permalink) Old 08-28-2015, 05:35 PM
Zen
 
Kpjoslee's Avatar
 
Join Date: Jan 2013
Location: Somewhere in US.
Posts: 961
Rep: 50 (Unique: 33)
Interesting info and all that...but the game is still in the alpha state. Right now the game runs pretty bad on AMD CPU atm but might improve later on lol. We just gotta take the benchmark as what it is currently and wait until they release the another benchmark in more mature state. We definitely need more samples before we can find out what is the limiting factor, either CPU or GPU side.

My home PC
(15 items)
CPU
AMD Threadripper 1950x
Motherboard
Gigabyte Aorus X399 Gaming 7
GPU
EVGA Geforce RTX 2080 Ti XC Ultra
RAM
G.Skill DDR4 3600 CL16
Hard Drive
Samsung Evo 840 500GB
Hard Drive
Samsung 960 Pro 500GB
Power Supply
EVGA SuperNova G2 1300W
Cooling
Noctua NH-U14S TR4
Case
Corsair Carbide Air 540
Operating System
Windows 10 Pro
Monitor
Dell U2711
Monitor
Samsung 55" 4k
Keyboard
Corsair K70
Mouse
Logitech G502
Audio
Denon AVR-X3300W
▲ hide details ▲
Kpjoslee is offline  
post #1175 of 2682 (permalink) Old 08-28-2015, 05:44 PM
New to Overclock.net
 
Mahigan's Avatar
 
Join Date: Aug 2015
Location: Ottawa, Canada
Posts: 1,749
Rep: 874 (Unique: 233)
While I agree with you that overclocking and switching out reference coolers is something some of us do, the majority of the market does not. Reviewers generally test reference cards first. To establish their performance as recommended by the manufacturer moving onto overclocking and other factors later on in the review cycle (reviewing individual factory overclocked products as well). Overclocking, though something which is amusing (I often run my 290x's at 1,250MHz each) and can yield considerable results, is not based on the recommended manufacturer settings.

I acknowledge your point, but given that overlocking returns vary, one cannot use overclocked cards in order to give the majority of consumers an idea of the performance they can expect.

I understand what you're saying, but I was attempting to explain the results people were seeing in the reviews they were reading about the Ashes of the Singularity DX12 benchmark. The reviewers weren't using overclocked cards. Therefore explaining what caused the performance levels people were seeing is what I did.

As for going forward. I cannot factor overclocked cards for the reasons I mentioned prior in this post. You just cannot predict, with any degree of certainty, what overclock a user will achieve. Therefore it is far more prudent to go by the manufacturers recommended settings. That means quoting nVIDIA and AMD. They designed their GPUs with certain clock speeds in mind. Those are the clock speeds upon which one ought to recommend a product. You can mention one card has a propensity to overclock higher than another, as reviewers do, but you can't promise a degree of performance based on how well your particular card overclocks.

One thing is for certain, I would have liked to see pcgameshardware test those factory overclocked cards at the recommended benchmark settings rather than attempt to derive a particular result. It would have added to the discussion, rather than render their results unusable..

"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." - Arthur Conan Doyle (Sherlock Holmes)
Mahigan is offline  
post #1176 of 2682 (permalink) Old 08-28-2015, 05:58 PM
Hey I get one of these!
 
KyadCK's Avatar
 
Join Date: Aug 2011
Location: Chicago
Posts: 7,209
Rep: 301 (Unique: 212)
Quote:
Originally Posted by Mahigan View Post

Quote:
Originally Posted by Themisseble View Post

especially for strategy game... just res scaling or MSAA.

@ Mahigan maybe you could reply to Brad or who ever you were talking from stardock those two benchmakrs and show that in benchmark FX 8350 clearly bottleneck R9 290X.

I don't think it is CPU related.

PontiacGTX shared this link with me and I believe he is onto something: http://www.hardwaresecrets.com/everything-you-need-to-know-about-the-hypertransport-bus/4/
Quote:
HyperTransport 3.0 adds the following new clock rates, keeping compatibility with HT1 and HT2 rates (transfer rates assuming 16-bit links, which is the configuration used by AMD processors):
1,800 MHz = 3,600 MT/s = 7,200 MB/s
2,000 MHz = 4,000 MT/s = 8,000 MB/s
2,400 MHz = 4,800 MT/s = 9,600 MB/s
2,600 MHz = 5,200 MT/s = 10,400 MB/s
Sometimes you will see the MT/s numbers published as MHz, as already discussed.
Socket AM2+ and AM3 processors and their companion chipsets, however, are limited to the 8,000 MB/s transfer rate. Only socket AM3+ CPUs and chipsets are capable of using all the speeds published above. Of course, all CPUs and chipsets are compatible with the lower transfer rates available.
Keep in mind that socket AM2+ processors can still be installed on socket AM2 motherboards, however, their HyperTransport bus will be limited to HT2 speeds.
Once again, the transfer rates announced by the HyperTransport consortium are highly exaggerated. They announce HyperTransport 3.0 as having a maximum transfer rate of 41.6 GB/s. To reach this number they considered 32-bit links (and not 16-bit links) and doubled the number found by two because there are two links available. The math used was 2,600 MHz x 32 x 2 / 8 x 2 links. As we have already explained, AMD processors use 16-bit links, not 32-bit ones, and we don’t agree with the methodology of doubling the transfer rate, done because there is one link for transmitting and another for receiving data. We would only agree with this if the links were in the same direction.

Now granted the AMD 990FX uses a 3200MHz HT 3.1 link which results in 6,400 MT/s or 12,800 MB/s now look at the schematic below:


The AMD FX Processor communicates with the 990FX Northbridge at 12.8GB/s which talks to the PCIe 2.0 ports at 16GB/s. Therefore, for all intents and purposes, the AMD FX Processor talks to the Graphics card at 12.8GB/s, even if the Graphics card is running on a PCIe 2.0 x16 port.

Now we know that the a PCIe 2.0 x8 slot (8 GB/s) bottlenecks an AMD R9 290 under Ashes of the Singularity. Therefore the culprit for poor AMD performance could very well be the Hypertransport Link.

Take Battlefield 4, it's a DX11 title that is heavy on draw calls (for a DX11 game):

PCIe 2.0 x8 is saturated already (8 GB/s). Now imagine having all those CPU cores, now available in DX12, making draw calls ontop of the textures etc travelling over the bus? For an AMD system, this is further compounded by the slow HT 3.1 link (12.8GB/s) and that's in the best case scenario (990/FX chipset). If you're using a 970 chipset, you're knocked down to HT 3.0 or 10.4 GB/s. The 3D Mark Overhead API test isn't sending textures either (or any other heavy command), it's only sending draw calls. So it really wouldn't show up on that test.

Again... just a theory.

Some incorrect math in there.

990FX stock HT clock is 2.6Ghz but can on some motherboards be overclocked to ~3Ghz. They are not stock clocked at 3.2Ghz. They are also 16-bit links in each direction, not 32-bit. 10.4GB/s unidirectional.

PCI-e 2.0 is 500MB/s/lane pre-encode. That's 8GB/s pre-encoded on an x16, and 6.4GB/s post encode (8/10 encode rate). PCI-e 3.0 is 1GB/s per lane and 128/130 encode rate for just under 16GB/s.

The 990FX board with 3.0 adds a PLX chip on top of that. It takes 32 lanes of 2.0 and splits them into either x16 or x8/x8 of 3.0, including the encoding changes. Hance the latency.

Forge
(18 items)
Forge-LT
(7 items)
CPU
AMD Threadripper 1950X
Motherboard
Gigabyte X399 Designare
GPU
EVGA 1080ti SC2 Hybrid
GPU
EVGA 1080ti SC2 Hybrid
RAM
32GB G.Skill TridentZ RGB (4x8GB 3200Mhz 14-14-14)
Hard Drive
Intel 900P 480GB
Hard Drive
Samsung 950 Pro 512GB
Power Supply
Corsair AX1200
Cooling
EK Predator 240
Case
Corsair Graphite 780T
Operating System
Windows 10 Enterprise x64
Monitor
2x Acer XR341CK
Keyboard
Corsair Vengeance K70 RGB
Mouse
Corsair Vengeance M65 RGB
Audio
Sennheiser HD700
Audio
Sound Blaster AE-5
Audio
Audio Technica AT4040
Audio
30ART Mic Tube Amp
CPU
i7-4720HQ
Motherboard
UX501JW-UB71T
GPU
GTX 960m
RAM
16GB 1600 9-9-9-27
Hard Drive
512GB PCI-e SSD
Operating System
Windows 10 Pro
Monitor
4k IPS
▲ hide details ▲
KyadCK is offline  
post #1177 of 2682 (permalink) Old 08-28-2015, 06:05 PM
new to OCN?
 
PontiacGTX's Avatar
 
Join Date: Aug 2011
Location: Venezuela
Posts: 26,368
Rep: 1536 (Unique: 924)
Quote:
Originally Posted by sugarhell View Post

No it doesnt increase the draw calls. How can an API increase the draw calls?
I thought that draw calls were set by API/Driver because amd suggested the bottleneck on DX11 were the draw calls .but I had seen tgat AMD suggested that dying light required 40k or 70k draw calls

And now tell this


It might based on the graphic engine code,driver and maybe well the API?
PontiacGTX is offline  
post #1178 of 2682 (permalink) Old 08-28-2015, 06:05 PM
New to Overclock.net
 
Mahigan's Avatar
 
Join Date: Aug 2015
Location: Ottawa, Canada
Posts: 1,749
Rep: 874 (Unique: 233)
Quote:
Originally Posted by KyadCK View Post

Some incorrect math in there.

990FX stock HT clock is 2.6Ghz but can on some motherboards be overclocked to ~3Ghz. They are not stock clocked at 3.2Ghz. They are also 16-bit links in each direction, not 32-bit. 10.4GB/s unidirectional.

PCI-e 2.0 is 500MB/s/lane pre-encode. That's 8GB/s pre-encoded on an x16, and 6.4GB/s post encode (8/10 encode rate). PCI-e 3.0 is 1GB/s per lane and 128/130 encode rate for just under 16GB/s.

The 990FX board with 3.0 adds a PLX chip on top of that. It takes 32 lanes of 2.0 and splits them into either x16 or x8/x8 of 3.0, including the encoding changes. Hance the latency.

Well if anyone did the math incorrectly it would be AMD. I took the 6.4 GT/s 990FX shot from their PR material. Based on Hardware secrets information, I simply did a 6.4*2 for 12.8 GB/s. If the boards do only function at 2600MHz, rather than 3200MHz according to the PR material, then I am not sure why they would have placed this slide in their information. It would therefore appear to be rather dishonest on their part.

"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." - Arthur Conan Doyle (Sherlock Holmes)
Mahigan is offline  
post #1179 of 2682 (permalink) Old 08-28-2015, 06:09 PM
New to Overclock.net
 
HalGameGuru's Avatar
 
Join Date: Aug 2015
Location: Houston, TX
Posts: 25
Rep: 12 (Unique: 4)
It could explain some of the performance back and forths we see with APU's and Athlon X4's in relation to the FX CPUs.

HalGameGuru is offline  
post #1180 of 2682 (permalink) Old 08-28-2015, 06:15 PM
New to Overclock.net
 
Mahigan's Avatar
 
Join Date: Aug 2015
Location: Ottawa, Canada
Posts: 1,749
Rep: 874 (Unique: 233)
Quote:
Originally Posted by HalGameGuru View Post

It could explain some of the performance back and forths we see with APU's and Athlon X4's in relation to the FX CPUs.

Yes... very good point.


"Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." - Arthur Conan Doyle (Sherlock Holmes)
Mahigan is offline  
Closed Thread

Quick Reply
Message:
Options

Register Now

In order to be able to post messages on the Overclock.net - An Overclocking Community forums, you must first register.
Please enter your desired user name, your email address and other required details in the form below.
User Name:
If you do not want to register, fill this field only and the name will be used as user name for your post.
Password
Please enter a password for your user account. Note that passwords are case-sensitive.
Password:
Confirm Password:
Email Address
Please enter a valid email address for yourself.
Email Address:

Log-in



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off