[Various] Futuremark's Time Spy DirectX 12 "Benchmark" Compromised. Less Compute/Parallelism than Doom/Aots. Also... - Overclock.net

Forum Jump: 
Reply
 
Thread Tools
post #1 of 253 Old 07-18-2016, 11:38 AM - Thread Starter
333mhz
 
Randomdude's Avatar
 
Join Date: Aug 2011
Posts: 486
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 45
It seems to be coded to strictly favor Pascal's hack-job async implementation, namely compute preemption, as per nVidia's dx12 "do's"and "dont's".
Warning: Spoiler! (Click to show)
Do's

Minimize the use of barriers and fences
We have seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports
The DX11 driver is doing a great job of reducing barriers – now under DX12 you need to do it
Any barrier or fence can limit parallelism
Make sure to always use the minimum set of resource usage flags
Stay away from using D3D12_RESOURCE_USAGE_GENERIC_READ unless you really need every single flag that is set in this combination of flags
Redundant flags may trigger redundant flushes and stalls and slow down your game unnecessarily
To reiterate: We have seen redundant and/or overly conservative barrier flags and their associated wait for idle operations as a major performance problem for DX11 to DX12 ports.
Specify the minimum set of targets in ID3D12CommandList::ResourceBarrier
Adding false dependencies adds redundancy
Group barriers in one call to ID3D12CommandList::ResourceBarrier
This way the worst case can be picked instead of sequentially going through all barriers
Use split barriers when possible
Use the _BEGIN_ONLY/_END_ONLY flags
This helps the driver doing a more efficient job
Do use fences to signal events/advance across calls to ExecuteCommandLists
Dont's

Don’t insert redundant barriers
This limits parallelism
A transition from D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE to D3D12_RESOURCE_STATE_RENDER_TARGET and back without any draw calls in-between is redundant
Avoid read-to-read barriers
Get the resource in the right state for all subsequent reads
Don’t use D3D12_RESOURCE_USAGE_GENERIC_READ unless you really needs every single flag
Don’t sequentially call ID3D12CommandList::ResourceBarrier with just one barrier
This doesn’t allow the driver to pick the worst case of a set of barriers
Don’t expect fences to trigger signals/advance at a finer granularity then once per ExecuteCommandLists call.

This has come to my attention thanks to a few members of this board, namely @Doothe, @Mahigan, @Slomo4shO, @JackCY, @PontiacGTX among others.

These are some of the more interesting posts:
Quote:
Originally Posted by Doothe View Post

I took three screenshots, one of each game, in GPUView. From left to right, DOOM, AOTS, and Time Spy. Each timeline is roughly the same length of time. I'm still learning how to read, and interpret this information but I figured I'd share some of the images with you guys and maybe get a better understanding of whats going on.

s51q4IX.jpg

the image is 4800x2560. i recommend opening it up in a separate tab.

Quote:
Originally Posted by Doothe View Post

Time Spy has a Pre-Emption Packet(black rectangle) in the 3D Queue that shows up every time a compute queue is processed


From Nvidia’s whitepaper:
"Compute Preemption is another important new hardware and software feature added to GP100 that allows compute tasks to be preempted at instruction-level granularity, rather than thread block granularity as in prior Maxwell and Kepler GPU architectures. Compute Preemption prevents long-running applications from either monopolizing the system (preventing other applications from running) or timing out."


btw doom is vulkan. Idk if Vulkan is properly picked up by GPUView so disregard it if you want.

Quote:
Originally Posted by Slomo4shO View Post

Compute queues as a % of total run time:

Doom: 43.70%
AOTS: 90.45%
Time Spy: 21.38%

Quote:
Originally Posted by JackCY View Post

That's what I keep saying biggrin.gif They simply reused their older DX11 like approach with DX12 and the features they use are quite limited as well so that they can support old hardware so new HW with new features that older doesn't have they do not use. I bet they also want 1 engine with 1 path to run on all GPUs to make their Benchmark "valid", to them but it makes it invalid to me since it doesn't use each HW to it's maximum potential, be it NV or AMD or some other GPU.

Figuratively: Say there are two architectures, one has 1 thread to do the work and the other has 16 threads, now they make an engine that only uses 1 thread and try to compute parallel work using 1 thread so they switch context like mad to get it done, of course this engine works on both 1 and 16 threaded HW and runs the same speed in theory but that 16 threaded HW is underutilized as it could do 16 times more work at the same time if used in parallel with 16 submission threads. Context switching is expensive and so on.

This article has a bit of explanation of the differences between architectures and their features.


Quote:
Originally Posted by Slomo4shO View Post

So even lower at 18.76%...

The bench definitely isn't compute heavy.

Quote:
Originally Posted by PontiacGTX View Post

time spy compute queues are less than AotS, most of them are graphics, and it seems they do double fences which could be throttling AMD´s compute+graphics perf and/or parrallelism


then 3dmark could run a single path where It fits most hardware, with pre emption

Quote:
Originally Posted by PontiacGTX View Post

It can be used for GCN, but it wont take advantage of parrarellism and perofrmance gains, Maxwell can do some degree of pre emption and it doesnt get negative performance(given how fences are limiting the contexts switching) and it can work in Pascal given it has improved pre emption

when people compares them Maxwell seems to have some degree of async compute(benchmark is aimed to it but it does pre emption) GCN can do pre emption but it isnt deliver same gains as async compute and Pascal shows their improved pre emption gains



Devs tell they use a single path but this only favors one side


I love big boards like this, because you can call everyone's attention to a problem when it's noticed. And a lot of people here are quite capable of noticing such problems thumb.gif Glad to be a part of such a community.

Anyway, I can say that the logical conclusion from this is that Futuremark's benchmark is BOTCHED and biased, not indicative of DX12 capabilities as it should be, but instead restricting them - thus it has arguably no credibility as a BENCHMARK suite.

Benchmark - Standard, or a set of standards, used as a point of reference for evaluating performance or level of quality. Benchmarks may be drawn from a firm's own experience, from the experience of other firms in the industry, or from legal requirements such as environmental regulations.
+example A new benchmark was set for the football team when weakest member benched 200 pounds, therefore setting the expectations for all other teammates to bench at least that amount.

In this case we have the weakest member benching 200 pounds, but he happens to be sponsoring the gym... and the gym has 2 members. So the bench press goes to 200 pounds.

I am incredibly disappointed and that's why I am giving voice to this in NEWS.

Randomdude is offline  
Sponsored Links
Advertisement
 
post #2 of 253 Old 07-18-2016, 12:11 PM
New to Overclock.net
 
huzzug's Avatar
 
Join Date: Jun 2012
Posts: 4,670
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 4 Post(s)
Liked: 354
This is not news. If someone from Tech Media decides to further investigate and write about it with backup from Nvidia & or Futuremark themselves, then a case can be made. You can move it to AMD / Graphic Card subforums if you like.

huzzug is offline  
post #3 of 253 Old 07-18-2016, 12:20 PM
Overclocker in Training
 
cranfam's Avatar
 
Join Date: Mar 2016
Posts: 173
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 15
cranfam is offline  
Sponsored Links
Advertisement
 
post #4 of 253 Old 07-18-2016, 12:24 PM
 
Xuper's Avatar
 
Join Date: Jan 2014
Location: None
Posts: 775
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 43
Quote:
Originally Posted by cranfam View Post

Is this surprising to anyone?

 

Well , you think should I pick this bench as Valid Evidence for Async Compute ? I say No.


CPU : AMD Ryzen 1600X | Memory : [Ripjaws V] F4-3200C16D-16GVKB  | Motherboard : Asus Prime X370 Pro | Graphic : XFX AMD Radeon R9 290 Double Dissipation | Monitor : AOC 931Sw | HDD : 1x120 GB SSD Samsung Evo , 1x2TB Seagate

 

 

Xuper is offline  
post #5 of 253 Old 07-18-2016, 12:27 PM
 
Join Date: Sep 2013
Location: DFW
Posts: 4,086
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 289
They should rename this "benchmark" to Nvidia pricing validation tool biggrin.gif

“You are young, my son, and, as the years go by, time will change and even reverse many of your present opinions. Refrain therefore awhile from setting yourself up as a judge of the highest matters. ” -Plato, Laws X

Inner liberty can be judged by how often a person feels offended, for you can no more insult a mature man than you can paint the air. -Vernon Howard


Slomo4shO is offline  
post #6 of 253 Old 07-18-2016, 12:30 PM
Overclocker in Training
 
cranfam's Avatar
 
Join Date: Mar 2016
Posts: 173
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 15
Quote:
Originally Posted by Xuper View Post

Well , you think should I pick this bench as Valid Evidence for Async Compute ? I say No.

No, but if I recall, similar complaints were made about Firestrike favoring Nvidia hardware. So, my question was if this surprised anyone, considering the accusations against the previous 3DMark benchmark.
cranfam is offline  
post #7 of 253 Old 07-18-2016, 12:30 PM
 
Exeed Orbit's Avatar
 
Join Date: May 2015
Posts: 422
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 25
Quote:
Originally Posted by Slomo4shO View Post

They should rename this "benchmark" to Nvidia pricing validation tool biggrin.gif

And even then, the 1060 is merely 10% ahead of the RX 480. Doesn't bode well right now.
Exeed Orbit is offline  
post #8 of 253 Old 07-18-2016, 12:38 PM
 
Banko's Avatar
 
Join Date: Jun 2012
Location: Chicago, IL
Posts: 10
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 1
All this does is outline one of the issues with Vulkan/DX12, due to things like this each Vendor will need to have it's own rendering path.

However that is most likely not going to happen so you are either going to have someone coding for an AMD Preferred path or vice versa.

Banko is offline  
post #9 of 253 Old 07-18-2016, 12:40 PM
Networking Nut
 
kylzer's Avatar
 
Join Date: Dec 2008
Location: Scotland
Posts: 3,096
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 166
Nvidia endorsed benchmark

made to look better for nvidia results

ohh wow i'm so surprised i can't even.

Intel i7 4790k @ 5ghz on H2o
Corsair Vengeance Pro - 32GB DDR3 - 2666Mhz
Gigabyte - Z97x Gaming 3
Sandisk X400 M.2 - 512GB
NAS Storage - x6 - 4TB WD RED
EVGA - 980ti SSC + Dell U2515H
Corsair - CX650M
kylzer is offline  
post #10 of 253 Old 07-18-2016, 12:45 PM
If it isnt Dutch it aint much !
 
Farih's Avatar
 
Join Date: Jun 2009
Location: Dutchland
Posts: 4,813
Mentioned: 0 Post(s)
Tagged: 0 Thread(s)
Quoted: 0 Post(s)
Liked: 108
Hmmm, dunno what to think of this lol

Also dont know if this should be in the news section allready.

2nd PC >>> Intel I5-4690K @ 4,6ghz - 16GB 2133mhz - R9 290X
3rd PC >>> FX-6300 @ 4,5ghz - 8GB 1866mhz - GTX960
4th PC >>> Dual Core E5700 - 2GB 800mhz - HD6670
5th PC >>> AMD AM1 5350 - 4GB DDR3 - 3,1TB storage
Farih is offline  
Reply

Quick Reply

Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off