Let's start with OP, i am not going to go through the whole thread since it mostly dissolves to part of folks pretending it matters, part of folks pretending it's conspiracy and part of folks thinking previous 2 groups are crazy (needless to say, where i land).
It seems to be coded to strictly favor Pascal's hack-job async implementation, namely compute preemption, as per nVidia's dx12 "do's"and "dont's".
First and foremost, we obviously get totally understanding and informed opinion on Pascal's async implementation.
Next, it cites rather thought-out list of dos and don'ts, that actually applies in similar form to every concurrent application. Yes, the "GCN optimized" ones too.
Onto the quotes inside, and look, i won't waste much time parsing GPUView output, it's actually getting off-track, but i will question stuff done on the fly
Time Spy has a Pre-Emption Packet(black rectangle) in the 3D Queue that shows up every time a compute queue is processed
I have heard confusing stuff about what GPUView actually tracks, so i'll attack on both fronts:
1) Where in D3D12 is it written down how to put pre-emption packet into compute queue, huh?
2) If it's not written down, and as such done by GPU itself on the whim (and is as such tracked by GPUView like that), what is the whole fuss about? It's definitely not Time Spy doing the job, but driver deciding pre-emption would be the approach here. Now, some mention excessive usage of fences and barriers, but well... they are needed at times, to avoid disasters. I'd know, that was part of my diploma (but in application to CPUs).
Compute queues as a % of total run time:
Time Spy: 21.38%
Time Spy: 21.38%
Yet, Oxide have claimed AotS spends only about 1/3rd of frametime on compute queue? Where da truth at?
I bet they also want 1 engine with 1 path to run on all GPUs to make their Benchmark "valid", to them but it makes it invalid to me since it doesn't use each HW to it's maximum potential, be it NV or AMD or some other GPU.
Well, i forgot that another half of discussions on the matter were arguing precisely that. Well, what if i told you, that tech demo and benchmark are different things?
Anyway, I can say that the logical conclusion from this is that Futuremark's benchmark is BOTCHED and biased, not indicative of DX12 capabilities as it should be, but instead restricting them - thus it has arguably no credibility as a BENCHMARK suite.
Quotes about pre-emption
Half-correct, but miss the kernel of truth in the other half, could at least waste some time reading 2 paragraphs in Pascal whitepaper, right before pre-emption description, to know what is done, how it's done and why pre-emption improvements matter and where they matter.
Logical conclusion from reading the OP and quotes in it is that OP is BOTCHED and biased, and also lacks understanding of how async compute in Pascal works. Any understanding of it, actually.