Overclock.net › Forums › Industry News › Video Game News › [computerbase.de] DOOM + Vulkan Benchmarked.
New Posts  All Forums:Forum Nav:

[computerbase.de] DOOM + Vulkan Benchmarked. - Page 52

post #511 of 632
Quote:
Originally Posted by Mahigan View Post

Maxwell drops in performance once Async Compute + Graphics is enabled under AotS as seen here...


In Time Spy.. we see this odd behavior whereas the performance stays the same..




That is the point of contention here.
Shall we solve this riddle now?

As we know, for Maxwell cards, the OS will apply CPU side cooperative scheduling to merge the command buffers submitted to multiple software queues into a single one for submission to the GPU.

When we compare async on and off, the application essentially does the same, respectively the developer did manually when designing the application. With async off, all the work gets queued into the same queue.

With async on, and the OS performing the scheduling, we can run into two different cases though:
  1. The OS queues the command buffers in the precise same order as the developer would had.
  2. The OS finds a different valid order for the command buffers, so the execution order differs.

I suspect what we see, whenever Maxwell suffers from async on, is actually the second case. The OS made a bad choice when scheduling, and induced some type of stall / wait on barrier / memory transfer which would have been hidden with the hand tuned schedule the developer specified in the case of async off.

When the OS doesn't stumble, it's either a sign that the order of execution didn't matter (no additional stalls induced), or that the OS coincidentally came to the very same schedule as the the application would have had.


All of this obviously assumes that when you tell an application not to use async, that it won't perform additional optimizations internally instead (such as e.g. eliminating redundant barriers and fences by performing manual, application side state tracking, hence reducing the effective overhead).

For 3DMark at least, no such optimization happens on the application side. So if either a mismatching order is still stall-free, or if the software scheduler chose the same order, you just won't see a difference.



Oh, and why did I say OS? Because the software scheduler is apparently provided by Microsoft, it's NOT part of the driver.

Nvidia can't fix it, or tune it better. It's simply not in their domain. That poor performance is a bug in Windows 10, not in Nvidias driver.
Edited by Ext3h - 7/18/16 at 12:30pm
post #512 of 632
or

old AotS bench where maxwell was trying Async but get performance hit

....
so nvidia forced Async in drivers to be off for maxwell
....

Time Spy on maxwell in not using async even if u have async on in app ..


MAGIC
post #513 of 632
Quote:
Originally Posted by ZealotKi11er View Post

DX12 ... makes it possible to achieve scenes never possible with DX11.

I wasn't aware of this. Is there an example?
Workhorse
(19 items)
 
AMD Daily Driver
(15 items)
 
 
CPUMotherboardGraphicsRAM
AMD FX8320e @ 4.8 Ghz asus sabretooth 1080 ti g-skill ares 1866mhz 16gb 
Hard DriveHard DriveCoolingOS
Visiontek 240gb ssd western digital 750gb hard drive Enermax Liqmax II 240 Windows 10 Pro 
OSMonitorMonitorKeyboard
Fedora 25 Asus VH238H HTC Vive CM Storm Devastator 
PowerCaseMouse
PC Power & Cooler Silencer 910W DIYPC Gamerstorm-BK CM Storm Devastator 
  hide details  
Reply
Workhorse
(19 items)
 
AMD Daily Driver
(15 items)
 
 
CPUMotherboardGraphicsRAM
AMD FX8320e @ 4.8 Ghz asus sabretooth 1080 ti g-skill ares 1866mhz 16gb 
Hard DriveHard DriveCoolingOS
Visiontek 240gb ssd western digital 750gb hard drive Enermax Liqmax II 240 Windows 10 Pro 
OSMonitorMonitorKeyboard
Fedora 25 Asus VH238H HTC Vive CM Storm Devastator 
PowerCaseMouse
PC Power & Cooler Silencer 910W DIYPC Gamerstorm-BK CM Storm Devastator 
  hide details  
Reply
post #514 of 632
Quote:
Originally Posted by kaosstar View Post

I wasn't aware of this. Is there an example?

Not really because there are no true DX12 only games.
Ishimura
(21 items)
 
Silent Knight
(13 items)
 
 
CPUMotherboardGraphicsRAM
Intel Core i7 3770K @ 4.6GHz ASRock Z77E-ITX eVGA GTX 1080 Ti Hybrid AVEXIR Blitz 1.1 16GB DDR3-2400MHz CL10 
Hard DriveHard DriveCoolingCooling
SanDisk Ultra II 960GB Toshiba X300 5TB Corsair H100i GTX eVGA Hybrid Water Cooler  
CoolingOSMonitorKeyboard
4x GentleTyphoon AP-15 Windows 10 Pro 64-Bit Philips Brilliance BDM4065UC 4K Razer BlackWidow Chroma  
PowerCaseMouseMouse Pad
eVGA SuperNOVA 750 G3 Define Nano S Logitech G502 Proteus Core PECHAM Gaming Mouse Pad XX-Large 
AudioAudioAudioAudio
Audioengine D1 DAC Mackie CR Series CR3 Audio-Technica ATH-M50 Sennheiser HD 598 
Audio
Sony XB950BT 
CPUMotherboardGraphicsRAM
AMD Phenom II X4 955 @ 4.2GHz ASUS M4A79XTD EVO AMD Radeon HD 7970 3GB @ 1200/1500 2x 4GB G.SKILL Ripjaws X DDR3-1600 
Hard DriveHard DriveHard DriveCooling
OCZ Agility 3 60GB WD Caviar Green 1.5TB 2 x Seagate Barracuda 2TB XSPC Raystorm 
CoolingCoolingOSPower
EK-FC7970 XSPC RS360 Windows 10 Pro 64-Bit Corsair TX750 
Case
NZXT Switch 810  
  hide details  
Reply
Ishimura
(21 items)
 
Silent Knight
(13 items)
 
 
CPUMotherboardGraphicsRAM
Intel Core i7 3770K @ 4.6GHz ASRock Z77E-ITX eVGA GTX 1080 Ti Hybrid AVEXIR Blitz 1.1 16GB DDR3-2400MHz CL10 
Hard DriveHard DriveCoolingCooling
SanDisk Ultra II 960GB Toshiba X300 5TB Corsair H100i GTX eVGA Hybrid Water Cooler  
CoolingOSMonitorKeyboard
4x GentleTyphoon AP-15 Windows 10 Pro 64-Bit Philips Brilliance BDM4065UC 4K Razer BlackWidow Chroma  
PowerCaseMouseMouse Pad
eVGA SuperNOVA 750 G3 Define Nano S Logitech G502 Proteus Core PECHAM Gaming Mouse Pad XX-Large 
AudioAudioAudioAudio
Audioengine D1 DAC Mackie CR Series CR3 Audio-Technica ATH-M50 Sennheiser HD 598 
Audio
Sony XB950BT 
CPUMotherboardGraphicsRAM
AMD Phenom II X4 955 @ 4.2GHz ASUS M4A79XTD EVO AMD Radeon HD 7970 3GB @ 1200/1500 2x 4GB G.SKILL Ripjaws X DDR3-1600 
Hard DriveHard DriveHard DriveCooling
OCZ Agility 3 60GB WD Caviar Green 1.5TB 2 x Seagate Barracuda 2TB XSPC Raystorm 
CoolingCoolingOSPower
EK-FC7970 XSPC RS360 Windows 10 Pro 64-Bit Corsair TX750 
Case
NZXT Switch 810  
  hide details  
Reply
post #515 of 632
Quote:
Originally Posted by Kravicka View Post

Time Spy on maxwell in not using async whenever u have async on/off in app ..
You can't see that based on GPUView or alike. That shows you the hardware queues after the OS has merged them, not the software queues the application sees.
The application can't tell whether the queues are just emulated or mapped directly to hardware.

Nvidia wasn't wrong when they said that they had never "enabled" async compute in the driver for Maxwell. They didn't, and neither is it enabled today. That's the operating system forcibly adding the emulation and the (partially subpar) software scheduling, when it detects that the application is requesting multiple queues, but the driver is providing less than requested.
post #516 of 632
Quote:
Originally Posted by Bidz View Post

So now it's ok to drop objectivity in favor of sales?

You know, in cases of highly important stuff like measuring safety, contamination, etc, dropping objectivity in favor of "sales" can send you to prison.

Good thing video card benchmarks aren't important, eh? I mean, no one has died or had their health seriously effected because a benchmark favored one chipset over another.

Quote:
It's not neutral if it's not really measuring the full extent of async capabilities by limiting em to fit one side capabilities.

Sort of like it not really being neutral because it runs tessellation on the CPU rather than the GPU because Nivida cards would kill the AMD cards if tessellation was performed by the GPU, right?

I mean, let's be totally "neutral" here and run the Physics test using GPU based tessellation. If you're not doing that then your limiting your test to fit AMDs capabilities, right?

Or does that testing bias only apply when the benchmark would favor AMD?
Edited by moustang - 7/18/16 at 1:34pm
post #517 of 632
Quote:
Originally Posted by moustang View Post

Good thing video card benchmarks aren't important, eh? I mean, no one has died or had their health seriously effected because a benchmark favored one chipset over another.
Sort of like it not really being neutral because it runs tessellation on the CPU rather than the GPU because Nivida cards would kill the AMD cards if tessellation was performed by the GPU, right?

I mean, let's be totally "neutral" here and run the Physics test using GPU based Physx. If you're not doing that then your limiting your test to fit AMDs capabilities, right?

Or does that testing bias only apply when the benchmark would favor AMD?
Tesselation is not exclusive to one GPU. PhysX is. Nice try.
post #518 of 632
Quote:
Originally Posted by ZealotKi11er View Post

Quote:
Originally Posted by kaosstar View Post

I wasn't aware of this. Is there an example?

Not really because there are no true DX12 only games.

So...what qualifies as a "true DX12 only game"? I'm curious since there are at least a couple of them on the Windows Store. Quantum Break, Forza 6 Apex to name two.
post #519 of 632
Quote:
Originally Posted by Ext3h View Post

Shall we solve this riddle now?

As we know, for Maxwell cards, the OS will apply CPU side cooperative scheduling to merge the command buffers submitted to multiple software queues into a single one for submission to the GPU.

When we compare async on and off, the application essentially does the same, respectively the developer did manually when designing the application. With async off, all the work gets queued into the same queue.

With async on, and the OS performing the scheduling, we can run into two different cases though:
  1. The OS queues the command buffers in the precise same order as the developer would had.
  2. The OS finds a different valid order for the command buffers, so the execution order differs.

I suspect what we see, whenever Maxwell suffers from async on, is actually the second case. The OS made a bad choice when scheduling, and induced some type of stall / wait on barrier / memory transfer which would have been hidden with the hand tuned schedule the developer specified in the case of async off.

When the OS doesn't stumble, it's either a sign that the order of execution didn't matter (no additional stalls induced), or that the OS coincidentally came to the very same schedule as the the application would have had.


All of this obviously assumes that when you tell an application not to use async, that it won't perform additional optimizations internally instead (such as e.g. eliminating redundant barriers and fences by performing manual, application side state tracking, hence reducing the effective overhead).

For 3DMark at least, no such optimization happens on the application side. So if either a mismatching order is still stall-free, or if the software scheduler chose the same order, you just won't see a difference.



Oh, and why did I say OS? Because the software scheduler is apparently provided by Microsoft, it's NOT part of the driver.

Nvidia can't fix it, or tune it better. It's simply not in their domain. That poor performance is a bug in Windows 10, not in Nvidias driver.

I am curious... is the case dependent on the amount of scheduling requests/intructions being made? Meaning that under heavier loads... does the second case become more likely?

Seems to me that Vulkan likely will not exhibit this behavior then. It appears to be a Windows 10 DX12 issue.
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
post #520 of 632
For real though ?? +50% boost at 1080p with Fury, that's completely insane lol seems like all this hard word for implenting a new API with AMD is finally paying off.. curious to see future titles etc !
The Kraken
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel i5 6600k@4.5@1.36v Asrock z170 Extreme 6 Gigabytes Windforce R9 290x @ 1180/1500 G-Skill Ripjaws V 3200 Mhz Cl16@3580mhz 
Hard DriveHard DriveHard DriveHard Drive
Caviar Black 640Gb Crucial MX 100 256Gb Seagate Barracuda 1Tb Samsung 850 Evo 500Gb 
CoolingOSMonitorMonitor
ID Cooling FrostFlow 240 Windows 7/8/10 Asus Vw266h Crossover 2795@95hz 
KeyboardPowerCaseMouse
Logitech G11 Corsair AX760i Phanteks Enthoo Primo SteelSeries Sensei+Coolermaster Alcor 
Audio
Corsair 1500D 
  hide details  
Reply
The Kraken
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel i5 6600k@4.5@1.36v Asrock z170 Extreme 6 Gigabytes Windforce R9 290x @ 1180/1500 G-Skill Ripjaws V 3200 Mhz Cl16@3580mhz 
Hard DriveHard DriveHard DriveHard Drive
Caviar Black 640Gb Crucial MX 100 256Gb Seagate Barracuda 1Tb Samsung 850 Evo 500Gb 
CoolingOSMonitorMonitor
ID Cooling FrostFlow 240 Windows 7/8/10 Asus Vw266h Crossover 2795@95hz 
KeyboardPowerCaseMouse
Logitech G11 Corsair AX760i Phanteks Enthoo Primo SteelSeries Sensei+Coolermaster Alcor 
Audio
Corsair 1500D 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Video Game News
Overclock.net › Forums › Industry News › Video Game News › [computerbase.de] DOOM + Vulkan Benchmarked.