Overclock.net › Forums › Graphics Cards › AMD/ATI › ATI Drivers and Overclocking Software › AMD GPU Drivers: The Real Truth.
New Posts  All Forums:Forum Nav:

AMD GPU Drivers: The Real Truth.

post #1 of 454
Thread Starter 
For the past Five years, I've been very vocal and outspoken towards CPU inefficiency, commonly known as a CPU bottleneck. Back then, I could not convince a single soul that such a thing even existed. Every time I was told that something was wrong with my system. Five systems later, I still have the problem! Often, the CPU's usage will increase substantially, causing the GPU usage to reduce overall. FPS often suffers, mouse input becomes delayed and frame latency increases. Windows task manager reports high usage on all cores. Although this indicates a bottleneck, the actual usage cannot be displayed by task manager properly. What actually happens is that Core #0 runs at 99% usage, whilst the remaining cores show less usage. We are indeed quick to blame the Direct-X 11 API for it's short comings, but how does the AMD driver fair in all this?

Let us start with Direct-X 11 API. In short, it does have the ability to use multiple CPU cores for a more efficient workload:
Quote:
DX11 adds multi-threading support that allows applications to simultaneously create resources or manage state and issue draw commands, all from an arbitrary number of threads. This may not significantly speed up the graphics subsystem (especially if we are already very GPU limited), but this does increase the ability to more easily explicitly massively thread a game and take advantage of the increasing number of CPU cores on the desktop.

Source

This API has introduced multi-threaded capabilities to the pipeline, whilst utilizing parallel loads:
Quote:
The major benefit I'm talking about here is multi-threading. Yes, eventually everything will need to be drawn, rasterized, and displayed (linearly and synchronously), but DX11 adds multi-threading support that allows applications to simultaneously create resources or manage state and issue draw commands, all from an arbitrary number of threads

Source

Direct-X 11 has the ability to use deferred context/command listing as it's main multi workload function:
Quote:
A deferred contexts is a special ID3D11DeviceContext that can be called in parallel on a different thread than the main thread which is issuing commands to the immediate context. Unlike the immediate context, calls to a deferred contexts are not sent to the GPU at the time of call and must be marshalled into a command list which is then executed at a later date. It is also possible to execute a command list multiple times to replay a sequence of GPU work against different input data.

Source

Nvidia explain the reasons why a developer would like to take advantage of deferred contexts within the API
Quote:
The entire reason for using or not using deferred contexts revolves around performance. There is a potential to parallelize CPU load onto idle CPU cores and improve performance.

You will be interested in using deferred context command lists if:
•Your game is CPU bottlenecked.
•You have a significant # of draw calls (>3000).
•Your CPU bottleneck is from render thread load or Direct3D API calls.
•You have a threaded renderer but serialize to a main render thread for mapping incurring sync point costs

Source

After this quick explanation of the API and some of it's core functions, let's look at AMD's side of the story. Again, to put things simply, deferred contexts and command lists are not mandatory. AMD have specifically chosen not to use it in their driver. The drawback is significantly higher CPU overhead is observed, which heavily chokes the GPU(s) in CPU intensive applications. However, the positives from this is that you get a simple and stable driver which can work with almost any newly released game without compatibility issues and the need for specific driver tweaks. It also gives consistent performance in most games across the board, even if that performance is relatively low. Hence why you can run pretty much any Indie game on an AMD card without a "not supported" error message. In contrast to Nvidia drivers, each game requires driver tweaks for that game to actually remain stable. This is a lot of work for the driver team, but at least you are getting the performance boosts from the multi rendering capabilities of Direct-X 11. I would not be surprised if AMD chose compatibility over performance to either reduce staff workload, or because the driver team don't have the ability to compile a fully stable driver which uses such features discussed.

AMD have shown promise in the past with their CPU overhead issues with "Sid Meier's Civilization Beyond Earth", whereby they worked closely with the developer to ensure the game used good API optimization methods for enhanced CPU performance. AMD also enabled command list support (such a rare moment) for this title which allowed proper usage on all threads.



Source

Here is more information on the Civ: Beyond Earth case in relation to AMD's driver and the API's capabilities:
Quote:
Traditionally, rendering is a very serial process. The program needs to setup a bunch of objects and then pass that on to the video drivers and finally to the GPU. There's a high degree of submission overhead, meaning it's possible to choke the CPU while submitting a large number of objects to the GPU. In DirectX 11, multi-threaded rendering is achieved by turning the D3D pipeline into a 3 step process: the Device, the Immediate Context, and the Deferred Context. The important bit here is that the deferred context is full of things that have yet to be sent to the GPU, and that you can have a deferred context for each thread. When developers talk about multi-threaded rendering with DX11, this is what they're referring to. When you use DX11s multi-threaded rendering capabilities correctly, you can have several threads assemble their deferred contexts, and then combine them into a single command list once it comes time to render the scene.
Quote:
But let's be clear here: multi-threaded rendering is a massive undertaking on the driver and hardware side. You're doing the GPU equivalent of inventing the multi-tasking operating system. NVIDIA and AMD have not until this point supported multi-threaded rendering in their drivers, as they have needed time to implement this feature correctly in their drivers.
Quote:
Anyhow, as far as I know, AMD does not currently offer fully support for multi-threaded rendering (I don't have an AMD card plugged in right now to run the DX Caps Viewer against). I'm not sure where they are on it, though I doubt they're very far behind.
Quote:
So in conclusion, the reason NVIDIA beats AMD in Civ V is that NVIDIA currently offers full support for multi-threaded rendering/deferred contexts/command lists, while AMD does not. Civ V uses massive amounts of objects and complex terrain, and because it's multi-threaded rendering capable the introduction of multi-threaded rendering support in NVIDIA's drivers means that NVIDIA's GPUs can now rip through the game.

Source

Will we see an improvement from AMD's DriectX-11 overhead, I think not as suggested here in an interview with AMD's Richard Huddy at Bit-tech:
Quote:
AMD says DX11 multi-threaded rendering can double object/draw-call throughput, and they want to go well beyond that by bypassing the DX11 API.

Source One
Source Two

Nvidia not too long back decided to fully support Direct-X 11's capabilities with their "wonder driver" which promised the following:

CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90

Source

It would seem Nvidia had fully enabled all the bells and whistles associated with the Direct-X 11 API, putting an end to CPU bottlenecks for most games with those running an Nvidia GPU. AMD still refuse to make such changes even now, in Q3 2015. Their main focus is clearly Direct-X 12, which does not help those who wish to play a game using Direct-X 11 as shown here:



Source

Clearly, the image above shows that AMD's driver had a maximum output of 1.1m draw calls despite the hardware being used at the time of testing. This has now been improved to around 1.3m depending on the hardware configuration.

To conclude, this is only a basic analysis of Direct-X and AMD's drivers. It is clear that AMD are not utilizing the full potential of Direct-X 11, causing CPU limitations for AMD customers. To reiterate, I spoke about all this 4-5 years ago, yet I was labelled as the crazy guy in the corner of the room who's making stuff up because he's out of his mind, and the bottleneck was nothing more than a problem with the system. instead, it turns out those involved just has a lack of understanding. hopefully with these findings, we can show people the true state of AMD GPU drivers and demand change once and for all. last but not least, i'd like to leave you with some Nvidia marketing benchmarks (which turned out to be completely true) to solidify the fact that AMD's driver performance is in a diabolical situation in comparison.
Warning: Spoiler! (Click to show)





Source

UPDATE 24/02/2016: More evidence of AMD's DX11 single threaded performance issues. Look at the boost over DX11, and over the 980 Ti, in DX12.

New Ashes of the Singularity build benchmark results:


Quote:
Originally Posted by Mahigan View Post

Well lets take it a step further shall we?

SM200 (GTX 980 Ti):
22 SMMs
Each SMM contains 128 SIMD cores.
Each SMM can execute 64 warps concurrently.
Each Warp is comprised of 32 threads.
So that's 2,048 threads per SMM or 128 SIMD cores.
2,048 x 22 = 45,056 threads in flight (executing concurrently).

GCN3:
64 CUs
Each CU contains 64 SIMD cores.
Each CU can execute 40 wavefronts concurrently.
Each Wavefront is comprised of 64 threads.
So that's 2,560 threads per CU or 64 SIMD cores.
2,560 x 64 = 163,840 threads in flight (executing concurrently).

Now factor this:


GCN3 SIMDs are more powerful than SM200 SIMDs core for core. It takes a GCN3 SIMD less time to process a MADD.

So what is the conclusion?

1. GCN3 is far more parallel.

2. GCN3 has less SIMD cores dedicated towards doing more work. SM200 has more SIMD cores dedicated towards doing less work.

3. If you're feeding your GPU with small amounts of compute work items, SM200 will come out on top. If you're feeding your GPU with large amounts of compute work, GCN3 will come out on top.

Now this is just the tip of the Iceberg, there's also this to consider (Multi-Engine support): Warning: Spoiler! (Click to show)

GCN3 stands to benefit more from Asynchronous compute + graphics than SM200 does because GCN3 has more threads idling than SM200. So long as you feed both architectures with optimized code, they perform as expected.

What is as expected? Warning: Spoiler! (Click to show)


Hitman, under DX12, will showcase GCN quite nicely I believe.

Edited by BradleyW - 2/24/16 at 10:30am
X79-GCN
(22 items)
 
  
CPUMotherboardGraphicsRAM
Intel 3930K 4.5GHz HT GIGABYTE GA-X79-UP4 AMD R9-290X GEil Evo Potenza DDR3 2400MHz CL10 (4x4GB) 
Hard DriveCoolingCoolingCooling
Samsung 840 Pro 120GB EK Supremacy (CPU) NF F12's P/P (360 Rad)  NF A14's (420 Rad)  
CoolingCoolingCoolingCooling
XSPC Chrome Compression Fittings EK RES X3 150 Primochill PremoFlex Advanced LRT Clear 1/2 ID EK-FC (R9 290X) 
CoolingCoolingCoolingOS
EK D5 Vario Top-X  Phobya G-Changer V2 360mm Phobya G-Changer V2 420mm Win 10 x64 Pro 
MonitorKeyboardPowerCase
BenQ XR3501 35" Curved Corsair Vengeance K90 Seasonic X-1250 Gold (v2) Corsair 900D 
MouseAudio
Logitech G400s Senn HD 598 
  hide details  
Reply
X79-GCN
(22 items)
 
  
CPUMotherboardGraphicsRAM
Intel 3930K 4.5GHz HT GIGABYTE GA-X79-UP4 AMD R9-290X GEil Evo Potenza DDR3 2400MHz CL10 (4x4GB) 
Hard DriveCoolingCoolingCooling
Samsung 840 Pro 120GB EK Supremacy (CPU) NF F12's P/P (360 Rad)  NF A14's (420 Rad)  
CoolingCoolingCoolingCooling
XSPC Chrome Compression Fittings EK RES X3 150 Primochill PremoFlex Advanced LRT Clear 1/2 ID EK-FC (R9 290X) 
CoolingCoolingCoolingOS
EK D5 Vario Top-X  Phobya G-Changer V2 360mm Phobya G-Changer V2 420mm Win 10 x64 Pro 
MonitorKeyboardPowerCase
BenQ XR3501 35" Curved Corsair Vengeance K90 Seasonic X-1250 Gold (v2) Corsair 900D 
MouseAudio
Logitech G400s Senn HD 598 
  hide details  
Reply
post #2 of 454
Very good read, reserving my comment real estate for more in-depth commentary, since I've preached about inefficiency as well, for the past few years.

thumb.gif
My Rig
(14 items)
 
Ex-wife's Rig
(15 items)
 
 
CPUMotherboardGraphicsRAM
Core i5 4460 AsRock H81M-DG4 Sapphire Rx470 Platinum KVR 1600 16Gb 
Hard DriveHard DriveCoolingOS
2x Seagate 3Tb Samsung 850 EVO 120 Scythe Ninja 3 Rev.B Windows 10 Pro 
MonitorKeyboardPowerCase
Fujitsu Siemens A17-2A Logitech K280e SuperFlower SF-550K12XP Thermaltake Versa H25 
MouseAudio
Logitech G402 Sony MDR XD150 
CPUMotherboardGraphicsRAM
Athlon 750K 4.0Ghz AsRock FM2A75 Pro4+ Sapphire R9 270X Dual-X Kingston 2x4Gb 1600 
Hard DriveHard DriveOptical DriveCooling
Samsung 850 EVO 120  Western Digital 320Gb LiteON DVD-RW CoolerMaster Hyper Z600 
OSMonitorKeyboardPower
Windows 7 Pro x64 Toshiba 32" FullHD TV Logitech FSP Hexa 550 
CaseMouse
DeLUX Logitech 
  hide details  
Reply
My Rig
(14 items)
 
Ex-wife's Rig
(15 items)
 
 
CPUMotherboardGraphicsRAM
Core i5 4460 AsRock H81M-DG4 Sapphire Rx470 Platinum KVR 1600 16Gb 
Hard DriveHard DriveCoolingOS
2x Seagate 3Tb Samsung 850 EVO 120 Scythe Ninja 3 Rev.B Windows 10 Pro 
MonitorKeyboardPowerCase
Fujitsu Siemens A17-2A Logitech K280e SuperFlower SF-550K12XP Thermaltake Versa H25 
MouseAudio
Logitech G402 Sony MDR XD150 
CPUMotherboardGraphicsRAM
Athlon 750K 4.0Ghz AsRock FM2A75 Pro4+ Sapphire R9 270X Dual-X Kingston 2x4Gb 1600 
Hard DriveHard DriveOptical DriveCooling
Samsung 850 EVO 120  Western Digital 320Gb LiteON DVD-RW CoolerMaster Hyper Z600 
OSMonitorKeyboardPower
Windows 7 Pro x64 Toshiba 32" FullHD TV Logitech FSP Hexa 550 
CaseMouse
DeLUX Logitech 
  hide details  
Reply
post #3 of 454
Use 1440.now the game is gpu bound again tongue.gif
  
Reply
  
Reply
post #4 of 454
Thread Starter 
Quote:
Originally Posted by PontiacGTX View Post

Use 1440.now the game is gpu bound again tongue.gif

Upping the resolution is the biggest misconception of the CPU bottleneck issue. At 1440p, I can still find myself bottlenecked depending on the application.
X79-GCN
(22 items)
 
  
CPUMotherboardGraphicsRAM
Intel 3930K 4.5GHz HT GIGABYTE GA-X79-UP4 AMD R9-290X GEil Evo Potenza DDR3 2400MHz CL10 (4x4GB) 
Hard DriveCoolingCoolingCooling
Samsung 840 Pro 120GB EK Supremacy (CPU) NF F12's P/P (360 Rad)  NF A14's (420 Rad)  
CoolingCoolingCoolingCooling
XSPC Chrome Compression Fittings EK RES X3 150 Primochill PremoFlex Advanced LRT Clear 1/2 ID EK-FC (R9 290X) 
CoolingCoolingCoolingOS
EK D5 Vario Top-X  Phobya G-Changer V2 360mm Phobya G-Changer V2 420mm Win 10 x64 Pro 
MonitorKeyboardPowerCase
BenQ XR3501 35" Curved Corsair Vengeance K90 Seasonic X-1250 Gold (v2) Corsair 900D 
MouseAudio
Logitech G400s Senn HD 598 
  hide details  
Reply
X79-GCN
(22 items)
 
  
CPUMotherboardGraphicsRAM
Intel 3930K 4.5GHz HT GIGABYTE GA-X79-UP4 AMD R9-290X GEil Evo Potenza DDR3 2400MHz CL10 (4x4GB) 
Hard DriveCoolingCoolingCooling
Samsung 840 Pro 120GB EK Supremacy (CPU) NF F12's P/P (360 Rad)  NF A14's (420 Rad)  
CoolingCoolingCoolingCooling
XSPC Chrome Compression Fittings EK RES X3 150 Primochill PremoFlex Advanced LRT Clear 1/2 ID EK-FC (R9 290X) 
CoolingCoolingCoolingOS
EK D5 Vario Top-X  Phobya G-Changer V2 360mm Phobya G-Changer V2 420mm Win 10 x64 Pro 
MonitorKeyboardPowerCase
BenQ XR3501 35" Curved Corsair Vengeance K90 Seasonic X-1250 Gold (v2) Corsair 900D 
MouseAudio
Logitech G400s Senn HD 598 
  hide details  
Reply
post #5 of 454
Very interesting, I've been saying this for the past half year to a year myself as well. (Especially to friend with a small budget looking to upgrade/build a new PC).
I really hope AMD will still resolve this and release proper DX11 drivers.
My PC
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 - 5820k MSI X99S Plus SLI Sapphire R9 Fury NITRO Corsair Vengeance 16GB DDR4 2800Mhz 
Hard DriveHard DriveOptical DriveCooling
Samsung SM951 512GB Seagate Barracuda 500GB Noctua NH-U14S 
OSMonitorKeyboardPower
Windows 7 Ultimate 64Bit Asus MG279Q Logitech G510 Corsair RM750 
CaseMouseMouse PadAudio
Corsair Obsidian 700D Logitech G700 Outplay Sennheiser HD598 
Audio
Tritton PC510HDA (Microphone use only) 
  hide details  
Reply
My PC
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 - 5820k MSI X99S Plus SLI Sapphire R9 Fury NITRO Corsair Vengeance 16GB DDR4 2800Mhz 
Hard DriveHard DriveOptical DriveCooling
Samsung SM951 512GB Seagate Barracuda 500GB Noctua NH-U14S 
OSMonitorKeyboardPower
Windows 7 Ultimate 64Bit Asus MG279Q Logitech G510 Corsair RM750 
CaseMouseMouse PadAudio
Corsair Obsidian 700D Logitech G700 Outplay Sennheiser HD598 
Audio
Tritton PC510HDA (Microphone use only) 
  hide details  
Reply
post #6 of 454
Quote:
Originally Posted by BradleyW View Post

Upping the resolution is the biggest misconception of the CPU bottleneck issue. At 1440p, I can still find myself bottlenecked depending on the application.
still is way lower than you get at 1080 with a bigger performance loss regardless whether the driver are optimized for DX11 or not.you can see people getting bottleneck mostly at dx11 at 1080 with powergul gpu setup

This works only if the game has proper use of the multicore processors...
  
Reply
  
Reply
post #7 of 454
Who cares how big the bottleneck is, if frame rate remains the same? People usually play at their native resolution, upping to 1440p is linked to a big investment. And why? To get the same crappy performance, just knowing you CPU no longer bottlenecks that much. It's pointless and expensive.
My Rig
(14 items)
 
Ex-wife's Rig
(15 items)
 
 
CPUMotherboardGraphicsRAM
Core i5 4460 AsRock H81M-DG4 Sapphire Rx470 Platinum KVR 1600 16Gb 
Hard DriveHard DriveCoolingOS
2x Seagate 3Tb Samsung 850 EVO 120 Scythe Ninja 3 Rev.B Windows 10 Pro 
MonitorKeyboardPowerCase
Fujitsu Siemens A17-2A Logitech K280e SuperFlower SF-550K12XP Thermaltake Versa H25 
MouseAudio
Logitech G402 Sony MDR XD150 
CPUMotherboardGraphicsRAM
Athlon 750K 4.0Ghz AsRock FM2A75 Pro4+ Sapphire R9 270X Dual-X Kingston 2x4Gb 1600 
Hard DriveHard DriveOptical DriveCooling
Samsung 850 EVO 120  Western Digital 320Gb LiteON DVD-RW CoolerMaster Hyper Z600 
OSMonitorKeyboardPower
Windows 7 Pro x64 Toshiba 32" FullHD TV Logitech FSP Hexa 550 
CaseMouse
DeLUX Logitech 
  hide details  
Reply
My Rig
(14 items)
 
Ex-wife's Rig
(15 items)
 
 
CPUMotherboardGraphicsRAM
Core i5 4460 AsRock H81M-DG4 Sapphire Rx470 Platinum KVR 1600 16Gb 
Hard DriveHard DriveCoolingOS
2x Seagate 3Tb Samsung 850 EVO 120 Scythe Ninja 3 Rev.B Windows 10 Pro 
MonitorKeyboardPowerCase
Fujitsu Siemens A17-2A Logitech K280e SuperFlower SF-550K12XP Thermaltake Versa H25 
MouseAudio
Logitech G402 Sony MDR XD150 
CPUMotherboardGraphicsRAM
Athlon 750K 4.0Ghz AsRock FM2A75 Pro4+ Sapphire R9 270X Dual-X Kingston 2x4Gb 1600 
Hard DriveHard DriveOptical DriveCooling
Samsung 850 EVO 120  Western Digital 320Gb LiteON DVD-RW CoolerMaster Hyper Z600 
OSMonitorKeyboardPower
Windows 7 Pro x64 Toshiba 32" FullHD TV Logitech FSP Hexa 550 
CaseMouse
DeLUX Logitech 
  hide details  
Reply
post #8 of 454
Quote:
Originally Posted by ronnin426850 View Post

Who cares how big the bottleneck is, if frame rate remains the same? People usually play at their native resolution, upping to 1440p is linked to a big investment. And why? To get the same crappy performance, just knowing you CPU no longer bottlenecks that much. It's pointless and expensive.
to those to have a way lower framerate with even stuttering..

You know there is downsampling via drivers which doesnt add any costs

By getting a lower framerate but playable...
  
Reply
  
Reply
post #9 of 454
Haven't read the wall of text yet, but as a critical person and a sceptic i'm weary of article titles that are not from the most respected review sites yet include the words "The Real Truth" .
Ambitious, pretentious, or close to truth? I don't know.
But i do know that it is not possible on this forum for registered members to change thread titles into more propriate words, which is very unpractical.

I'll read the post tonight.
post #10 of 454
Ok.
Workstation
(4 items)
 
  
CPUMotherboardGraphicsMonitor
Xeon E5-2690 Supermicro 2011 Nvidia GP100/ Vega FE Dell ultrasharp 4k 
  hide details  
Reply
Workstation
(4 items)
 
  
CPUMotherboardGraphicsMonitor
Xeon E5-2690 Supermicro 2011 Nvidia GP100/ Vega FE Dell ultrasharp 4k 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
Overclock.net › Forums › Graphics Cards › AMD/ATI › ATI Drivers and Overclocking Software › AMD GPU Drivers: The Real Truth.