Overclock.net › Forums › Graphics Cards › Graphics Cards - General › DirectX 12: Asynchronous Compute (An exercise in Crowd-sourcing)
New Posts  All Forums:Forum Nav:

DirectX 12: Asynchronous Compute (An exercise in Crowd-sourcing) - Page 17

post #161 of 252
Quote:
Originally Posted by drSeehas View Post


And still no working Asynchronous Compute (concurrent graphics and compute)?

Maybe Async Compute wasn't that major a feature of DX12 till AMD came along, so Nvidia are a little sore over AMD 'hijacking' DX12.

Remember Nvidia did suggest some DX12 features that are present in hardware on Nvidia cards but not on AMD. They probably worked on that with MS.
Edited by Tivan - 10/4/15 at 1:47am
Cute PC
(15 items)
 
  
CPUMotherboardGraphicsRAM
4930k@4200 Sabertooth x79 R9 290 Tri-X@950/1250 4x4GB@2133CL9 
Hard DriveCoolingOSMonitor
Crucial BX100 Mugen 4 Win7 Benq xl2411z 
MonitorKeyboardPowerCase
NEC EA231WMi QPad-MK50 (reds) Seasonic S12G 750 Define R4  
MouseMouse PadAudio
Deathadder 3.5G BE Razer Goliathus Speed Edition Large Onboard 
  hide details  
Reply
Cute PC
(15 items)
 
  
CPUMotherboardGraphicsRAM
4930k@4200 Sabertooth x79 R9 290 Tri-X@950/1250 4x4GB@2133CL9 
Hard DriveCoolingOSMonitor
Crucial BX100 Mugen 4 Win7 Benq xl2411z 
MonitorKeyboardPowerCase
NEC EA231WMi QPad-MK50 (reds) Seasonic S12G 750 Define R4  
MouseMouse PadAudio
Deathadder 3.5G BE Razer Goliathus Speed Edition Large Onboard 
  hide details  
Reply
post #162 of 252
Quote:
Originally Posted by Mahigan View Post

Software but the context switch is processed by the SMM/SMX which requires a complete flush, meaning that all work the SMM/SMX was working on is lost when a high priority/preemption request is sent, which is worse than I had originally thought. This is what results in the added latency. Muti-engine concurrency (working on both Graphic and Compute tasks concurrently) is also problematic and If this is true then it does mean that pure D3D12 programming is not well suited when working with mixed-mode loads under Maxwell 2. You'd be better off resorting to CUDA/HyperQ. This last part could, perhaps, still be fixed in the driver, assigning/converting DX12 requests to CUDA requests, and this is probably what nVIDIA are working on in their driver (mentioned by Kollock).

In many ways, I was being kind to nVIDIA redface.gif

 

Quote:
Originally Posted by Ext3h View Post

No, the asynchronicity refers to scheduling work ahead which is only blocked by fences, nothing else. And the GPU/driver deciding autonomously when to execute it. Nvidia does this part just fine, even though there are some mode transitions involved which can sum up quite badly if you provoke it.

Efficient asynchronicity is when concurrency gets involved. This part isn't working, respectively only for work committed to different queues. These additional queues are only available to CUDA based applications though. Which PhysX is.

Normally, the compute engines should have had independent queues, so the hardware would have been able to use at least draw call level preemption. But since everything goes through the 3D queue, and even involves a full pipeline flush, there is no concurrency with pure DX12. And that can cause significant differences between AMDs and Nvidias performance levels if the engine designer was expecting concurrency.

 

So nvidia was able to use Async Compute with Cuda (PhysX) but couldn't do with Pure DX12? if it's true that Nvidia was able to use Async, then i can say that Async compute is doing at hardware level?
still I'm confused.It's possible to use Async compute without using context switching ? or still required?

 

Quote:
Originally Posted by Mahigan View Post

With Arkham Knight, you have two "engines" running concurrently. There is the Arkham Knight engine as well as the PhysX libraries. It is evident that both would be running concurrently, though executed sequentially, from two separate sources. Asynchronous Compute is a single command within, a batch of commands, which instructs the GPU to execute two workloads at once (like Hyperthreading). This command would, for example, instruct the GPU to execute a Graphic task and concurrently execute a Compute task, within the same command line. This command could just as well instruct the GPU to execute two compute commands concurrently. Two commands from the same source. This requires a context switch (when executing a Graphic and Compute command in parallel). The commands are sent in parallel to the GPU.

 

I read somewhere that  "async isnt a single command, its multiple commands spread across multiple command queues"
 

and here https://msdn.microsoft.com/en-us/library/windows/desktop/dn899124%28v=vs.85%29.aspx

 

I think :
 

Quote:
1. A resource can be accessed for read and write from multiple command queues simultaneously, including across processes, only if it is in the state D3D12_RESOURCE_STATE_UNORDERED_ACCESS.

2. A resource can be read from multiple command queues simultaneously, including across processes, only if it is in the state D3D12_RESOURCE_STATE_GENERIC_READ.

3. All write operations, except for the unordered access case described in case 1 above, must be done exclusively by a single command queue at a time. When a resource has transitioned to a writeable state on a queue, it is considered exclusively owned by that queue and it must transition to D3D12_RESOURCE_STATE_GENERIC_READ before it can be accessed by another queue.

post #163 of 252
Quote:
Originally Posted by Xuper View Post

So nvidia was able to use Async Compute with Cuda (PhysX) but couldn't do with Pure DX12? if it's true that Nvidia was able to use Async, then i can say that Async compute is doing at hardware level?

still I'm confused.It's possible to use Async compute without using context switching ? or still required?
Context switching can't be avoided in full.

It's happening on multiple levels, once in the scheduling frontend, where the GPU is switching between 3D and pure compute mode in a whole. And once again on the SMM/SMX level where each individual units needs to switch once again.

That first mode switch can be avoided, if it was just working correctly. The second one not yet. Not with Maxwell at least.
Quote:
Originally Posted by Xuper View Post

I read somewhere that  "async isnt a single command, its multiple commands spread across multiple command queues"
One step further, and it would be complete. Async also requires that the queues can move freely in relation to each other. There is no "happens before" or "happens after" relation unless explicitly modelled with signals and fences. The execution order is just fixed inside each queue.

But that alone doesn't get you much, except for some freedom in the execution schedule. In order to take full advantage of that freedom, you need to be responsive in terms of using every possible idle phase for interleaved execution. If you don't, your possible gains are limited to solely reducing context switching, nothing else.
Quote:
Originally Posted by Mahigan View Post

Software but the context switch is processed by the SMM/SMX which requires a complete flush, meaning that all work the SMM/SMX was working on is lost when a high priority/preemption request is sent, which is worse than I had originally thought. This is what results in the added latency.
A flush doesn't necessarily imply data loss, as it can easily wait until the SMM underruns its current queue, plus this only goes for mixing compute/3D kernels. It's really just a minor issue, compared to the rest.

Maxwell doesn't even have such fine grained preemption that it would be possible to evict running jobs in any way. It can only preempt jobs which haven't started execution yet. That's what NV meant with "Preemption at draw call boundaries" on this years GDC talk.

It's not loosing any progress either, in fact, it looks like Nvidia is abusing regular signals to communicate to the GPU how far the work in each queue has advanced. You can even see these signals in GPUView, in the device context. You will notice that there is an additional fence placed on all command buffers which values correlates with the length of that specific command buffer.

Well, actual "preemption" with dataloss does happen. Whenever the drivers decides to commit suicide because something timed out. That even causes duplicated work. But it's not the regular case, by no means. It's more like a reset button on the GPU.
post #164 of 252
Wanted: [WTB] GPU upgrade
$210 (USD) or best offer
  
Reply
Wanted: [WTB] GPU upgrade
$210 (USD) or best offer
  
Reply
post #165 of 252
Thread Starter 
It would also appear that we've been comparing apples to oranges all along in AotS. AMD cards are producing more effects and rendering more content than nvidia cards.
http://forums.anandtech.com/showthread.php?t=2462951&page=6

Dynamic lighting is missing on nvidia cards. Dynamic lighting uses Async Compute. Smoke effects, like the smoke clouds, are also missing on nvidia.
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
post #166 of 252
lachen.gif No way! Not surprising though..

Edit:

Tech sites doing their jobs as usual.. rolleyes.gif

As great as this thread is, i think that info needs a new one..
Edited by GorillaSceptre - 2/12/16 at 10:24am
post #167 of 252
post #168 of 252

@Mahigan

 

http://www.anandtech.com/show/10067/ashes-of-the-singularity-revisited-beta

 

They specifically tested the effects of Async, which has been increased in the latest bench apparently:
 

 

 

 

Things are looking good for us GCN card owners :D

Vega has Ryzen
(29 items)
 
   
CPUMotherboardGraphicsGraphics
AMD Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Power Color RX Vega 64 EK FC Vega - Nickel/Plexi 
RAMHard DriveHard DriveHard Drive
8GB Corsair LPX 3200 Western Digital Black 6TB Crucial M500 240Gb Intel 530 240GB 
Hard DriveCoolingCoolingCooling
OCZ Vertex 3 128GB XSPC Raystorm XSPC EX240 XSPC EX240 
CoolingCoolingCoolingCooling
Koolance Compression Fittings Feser Silver Tubing Corsair SP120 High Performance x4 XSPC Dual 5.25" Reservoir 
CoolingCoolingOSMonitor
Swiftech MCP-655 Yate Loon 20mm Medium Speed x2 Windows 10 Pro LG 23EA63V-P 23" Slim LED IPS 
MonitorMonitorKeyboardPower
LG 23EA63V-P 23" Slim LED IPS LG 23EA63V-P 23" Slim LED IPS Logitech G510 Corsair RX1000X 
CaseMouseMouse PadAudio
Corsair Obsidian 350D Corsair Sabre RGB Corsair MM800C Polaris RGB Logitech Z623 
Audio
Corsair Gaming Void RGB Headset 
CPUMotherboardGraphicsRAM
Intel Core i5 4250U Intel NUC BOXD54250WYKH1 Intel HD5000 Crucial Ballistix Sport 8GB DDR3L 
Hard DriveOSMonitorKeyboard
Crucial M500 mSATA Windows 7 Ultimate LG 50" LED 120Hz TV Logitech K400 
Mouse
Logitech K400 
CPUMotherboardGraphicsRAM
Ryzen 5 1600 ASRock AB350m Pro 4 Gigabyte GTX 1060 6GB G1 Gaming Patriot Viper 4 - 8GB DDR4 3200MHz (PV48G320C6K) 
Hard DriveHard DriveHard DriveHard Drive
Toshiba Q Series 128GB 6TB Toshiba N300 Samsung 4TB D3 Station - External Sandisk X110 256GB M.2  
Optical DriveCoolingCoolingCooling
Panasonic UJ-265 Slim 6X Blu-ray Writer Cryorig M9 92mm Mini-Tower Thermal Grizzly Conductonaut SilenX Effizio Thermistor 
OSMonitorKeyboardPower
Windows 10 Pro Sharp Aquos 60" 120Hz Logitech K800 EVGA 850G2 SuperNova 
CaseMouseMouse PadAudio
Silverstone GD09B Logitech G602 Corsair MM600 Pioneer Elite VSX-LX101 
Other
XBOX One Controller 
  hide details  
Reply
Vega has Ryzen
(29 items)
 
   
CPUMotherboardGraphicsGraphics
AMD Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Power Color RX Vega 64 EK FC Vega - Nickel/Plexi 
RAMHard DriveHard DriveHard Drive
8GB Corsair LPX 3200 Western Digital Black 6TB Crucial M500 240Gb Intel 530 240GB 
Hard DriveCoolingCoolingCooling
OCZ Vertex 3 128GB XSPC Raystorm XSPC EX240 XSPC EX240 
CoolingCoolingCoolingCooling
Koolance Compression Fittings Feser Silver Tubing Corsair SP120 High Performance x4 XSPC Dual 5.25" Reservoir 
CoolingCoolingOSMonitor
Swiftech MCP-655 Yate Loon 20mm Medium Speed x2 Windows 10 Pro LG 23EA63V-P 23" Slim LED IPS 
MonitorMonitorKeyboardPower
LG 23EA63V-P 23" Slim LED IPS LG 23EA63V-P 23" Slim LED IPS Logitech G510 Corsair RX1000X 
CaseMouseMouse PadAudio
Corsair Obsidian 350D Corsair Sabre RGB Corsair MM800C Polaris RGB Logitech Z623 
Audio
Corsair Gaming Void RGB Headset 
CPUMotherboardGraphicsRAM
Intel Core i5 4250U Intel NUC BOXD54250WYKH1 Intel HD5000 Crucial Ballistix Sport 8GB DDR3L 
Hard DriveOSMonitorKeyboard
Crucial M500 mSATA Windows 7 Ultimate LG 50" LED 120Hz TV Logitech K400 
Mouse
Logitech K400 
CPUMotherboardGraphicsRAM
Ryzen 5 1600 ASRock AB350m Pro 4 Gigabyte GTX 1060 6GB G1 Gaming Patriot Viper 4 - 8GB DDR4 3200MHz (PV48G320C6K) 
Hard DriveHard DriveHard DriveHard Drive
Toshiba Q Series 128GB 6TB Toshiba N300 Samsung 4TB D3 Station - External Sandisk X110 256GB M.2  
Optical DriveCoolingCoolingCooling
Panasonic UJ-265 Slim 6X Blu-ray Writer Cryorig M9 92mm Mini-Tower Thermal Grizzly Conductonaut SilenX Effizio Thermistor 
OSMonitorKeyboardPower
Windows 10 Pro Sharp Aquos 60" 120Hz Logitech K800 EVGA 850G2 SuperNova 
CaseMouseMouse PadAudio
Silverstone GD09B Logitech G602 Corsair MM600 Pioneer Elite VSX-LX101 
Other
XBOX One Controller 
  hide details  
Reply
post #169 of 252
Quote:
Originally Posted by Roboyto View Post

@Mahigan


http://www.anandtech.com/show/10067/ashes-of-the-singularity-revisited-beta

They specifically tested the effects of Async, which has been increased in the latest bench apparently:

 







Things are looking good for us GCN card owners biggrin.gif
GCN2.0 and 3.0 more than 1.0
Wanted: [WTB] GPU upgrade
$210 (USD) or best offer
  
Reply
Wanted: [WTB] GPU upgrade
$210 (USD) or best offer
  
Reply
post #170 of 252
Quote:
Originally Posted by PontiacGTX View Post


GCN2.0 and 3.0 more than 1.0

 

1.1 & 1.2?  

Vega has Ryzen
(29 items)
 
   
CPUMotherboardGraphicsGraphics
AMD Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Power Color RX Vega 64 EK FC Vega - Nickel/Plexi 
RAMHard DriveHard DriveHard Drive
8GB Corsair LPX 3200 Western Digital Black 6TB Crucial M500 240Gb Intel 530 240GB 
Hard DriveCoolingCoolingCooling
OCZ Vertex 3 128GB XSPC Raystorm XSPC EX240 XSPC EX240 
CoolingCoolingCoolingCooling
Koolance Compression Fittings Feser Silver Tubing Corsair SP120 High Performance x4 XSPC Dual 5.25" Reservoir 
CoolingCoolingOSMonitor
Swiftech MCP-655 Yate Loon 20mm Medium Speed x2 Windows 10 Pro LG 23EA63V-P 23" Slim LED IPS 
MonitorMonitorKeyboardPower
LG 23EA63V-P 23" Slim LED IPS LG 23EA63V-P 23" Slim LED IPS Logitech G510 Corsair RX1000X 
CaseMouseMouse PadAudio
Corsair Obsidian 350D Corsair Sabre RGB Corsair MM800C Polaris RGB Logitech Z623 
Audio
Corsair Gaming Void RGB Headset 
CPUMotherboardGraphicsRAM
Intel Core i5 4250U Intel NUC BOXD54250WYKH1 Intel HD5000 Crucial Ballistix Sport 8GB DDR3L 
Hard DriveOSMonitorKeyboard
Crucial M500 mSATA Windows 7 Ultimate LG 50" LED 120Hz TV Logitech K400 
Mouse
Logitech K400 
CPUMotherboardGraphicsRAM
Ryzen 5 1600 ASRock AB350m Pro 4 Gigabyte GTX 1060 6GB G1 Gaming Patriot Viper 4 - 8GB DDR4 3200MHz (PV48G320C6K) 
Hard DriveHard DriveHard DriveHard Drive
Toshiba Q Series 128GB 6TB Toshiba N300 Samsung 4TB D3 Station - External Sandisk X110 256GB M.2  
Optical DriveCoolingCoolingCooling
Panasonic UJ-265 Slim 6X Blu-ray Writer Cryorig M9 92mm Mini-Tower Thermal Grizzly Conductonaut SilenX Effizio Thermistor 
OSMonitorKeyboardPower
Windows 10 Pro Sharp Aquos 60" 120Hz Logitech K800 EVGA 850G2 SuperNova 
CaseMouseMouse PadAudio
Silverstone GD09B Logitech G602 Corsair MM600 Pioneer Elite VSX-LX101 
Other
XBOX One Controller 
  hide details  
Reply
Vega has Ryzen
(29 items)
 
   
CPUMotherboardGraphicsGraphics
AMD Ryzen 7 1700 Gigabyte GA-AB350M Gaming 3 Power Color RX Vega 64 EK FC Vega - Nickel/Plexi 
RAMHard DriveHard DriveHard Drive
8GB Corsair LPX 3200 Western Digital Black 6TB Crucial M500 240Gb Intel 530 240GB 
Hard DriveCoolingCoolingCooling
OCZ Vertex 3 128GB XSPC Raystorm XSPC EX240 XSPC EX240 
CoolingCoolingCoolingCooling
Koolance Compression Fittings Feser Silver Tubing Corsair SP120 High Performance x4 XSPC Dual 5.25" Reservoir 
CoolingCoolingOSMonitor
Swiftech MCP-655 Yate Loon 20mm Medium Speed x2 Windows 10 Pro LG 23EA63V-P 23" Slim LED IPS 
MonitorMonitorKeyboardPower
LG 23EA63V-P 23" Slim LED IPS LG 23EA63V-P 23" Slim LED IPS Logitech G510 Corsair RX1000X 
CaseMouseMouse PadAudio
Corsair Obsidian 350D Corsair Sabre RGB Corsair MM800C Polaris RGB Logitech Z623 
Audio
Corsair Gaming Void RGB Headset 
CPUMotherboardGraphicsRAM
Intel Core i5 4250U Intel NUC BOXD54250WYKH1 Intel HD5000 Crucial Ballistix Sport 8GB DDR3L 
Hard DriveOSMonitorKeyboard
Crucial M500 mSATA Windows 7 Ultimate LG 50" LED 120Hz TV Logitech K400 
Mouse
Logitech K400 
CPUMotherboardGraphicsRAM
Ryzen 5 1600 ASRock AB350m Pro 4 Gigabyte GTX 1060 6GB G1 Gaming Patriot Viper 4 - 8GB DDR4 3200MHz (PV48G320C6K) 
Hard DriveHard DriveHard DriveHard Drive
Toshiba Q Series 128GB 6TB Toshiba N300 Samsung 4TB D3 Station - External Sandisk X110 256GB M.2  
Optical DriveCoolingCoolingCooling
Panasonic UJ-265 Slim 6X Blu-ray Writer Cryorig M9 92mm Mini-Tower Thermal Grizzly Conductonaut SilenX Effizio Thermistor 
OSMonitorKeyboardPower
Windows 10 Pro Sharp Aquos 60" 120Hz Logitech K800 EVGA 850G2 SuperNova 
CaseMouseMouse PadAudio
Silverstone GD09B Logitech G602 Corsair MM600 Pioneer Elite VSX-LX101 
Other
XBOX One Controller 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Graphics Cards - General
Overclock.net › Forums › Graphics Cards › Graphics Cards - General › DirectX 12: Asynchronous Compute (An exercise in Crowd-sourcing)