Originally Posted by PontiacGTX
It is a synthethic benchmark it is quite a bit different to Dx11 games, even games which have overhead dont show it on all levels/maps, and some DX11 games dont have issues with DX11 draw calls limit at all
What I liked was hearing that 3DMark spokesperson claiming that the "driver" is responsible for the behavior we see in a DX12 benchmark. DX12 and Driver... let that sink-in.
This may be true for nVIDIA (due to their inclusion of static scheduling) but it most certainly is not as true for AMD (due to their hardware scheduling). Kollock stated the same thing regarding AMDs hardware scheduler).
It seems to me that optimizations are lacking for the AMD path (if there are actually separate AMD and nVIDIA paths to begin with). The programmer is the one responsible for "marking up" the tasks he/she wants executed in parallel (as the Microsoft sample code I shared shows and as Kollock explained). So if the programmer did not mark up many of these tasks for the AMD hardware then of course you are not going to receive all of the potential performance. A low amount of marked up work would fit well for nVIDIAs Pascal architecture but would end up under-utilizing GCN for the reasons I mentioned in a previous post (Pascal GPCs and Dynamic Load Balancing explanation).
All of the games we have seen "mark up" a lot more work to be executed in parallel than what 3DMark stated with their "10-20%" claim. It seems to me that 3DMark should have gone for 40% of a frame being executed in parallel for AMD (which is what AotS does) and stuck to 10-20% for nVIDIA. That way they would have two perfectly optimized paths for both architectures. This is how games are being programmed (like Ashes of the Singularity) with separate optimized paths for both AMD and nVIDIA. The kicker is that it is nVIDIAs driver which is responsible for handling the scheduling of such tasks to the nVIDIA hardware. This means that nVIDIA would incur a larger CPU overhead (as we have seen under AotS). We also see that this will be the case for nVIDIA hardware under Doom Vulkan as absent Asynchronous Compute + Graphics... the nVIDIA hardware is tied with the AMD hardware in terms of CPU overhead. Once the Async path is implemented... nVIDIAs CPU overhead will be higher as I had mentioned in my initial coverage of nVIDIAs Async Compute capabilities.
We will likely end up with a version of 3DMark which will not at all represent the performance we will be seeing in upcoming DX12 titles for AMD. I think that the nVIDIA performance is perfectly optimized though... so what we see in 3DMark perfectly highlights what we can expect from Pascal.
As for Maxwell... when Async Compute is enabled... we should be seeing a drop in performance due to the GPU stalls caused by the fences. Even if the nVIDIA driver says "No Async Compute" the fences remain. This is what Kollock mentioned and what we have seen thus far in actual games making use of the technology.