Overclock.net banner

1 - 20 of 454 Posts

·
Programmer
Joined
·
28,680 Posts
Discussion Starter #1
For the past Five years, I've been very vocal and outspoken towards CPU inefficiency, commonly known as a CPU bottleneck. Back then, I could not convince a single soul that such a thing even existed. Every time I was told that something was wrong with my system. Five systems later, I still have the problem! Often, the CPU's usage will increase substantially, causing the GPU usage to reduce overall. FPS often suffers, mouse input becomes delayed and frame latency increases. Windows task manager reports high usage on all cores. Although this indicates a bottleneck, the actual usage cannot be displayed by task manager properly. What actually happens is that Core #0 runs at 99% usage, whilst the remaining cores show less usage. We are indeed quick to blame the Direct-X 11 API for it's short comings, but how does the AMD driver fair in all this?

Let us start with Direct-X 11 API. In short, it does have the ability to use multiple CPU cores for a more efficient workload:
Quote:
DX11 adds multi-threading support that allows applications to simultaneously create resources or manage state and issue draw commands, all from an arbitrary number of threads. This may not significantly speed up the graphics subsystem (especially if we are already very GPU limited), but this does increase the ability to more easily explicitly massively thread a game and take advantage of the increasing number of CPU cores on the desktop.
Source

This API has introduced multi-threaded capabilities to the pipeline, whilst utilizing parallel loads:
Quote:
The major benefit I'm talking about here is multi-threading. Yes, eventually everything will need to be drawn, rasterized, and displayed (linearly and synchronously), but DX11 adds multi-threading support that allows applications to simultaneously create resources or manage state and issue draw commands, all from an arbitrary number of threads
Source

Direct-X 11 has the ability to use deferred context/command listing as it's main multi workload function:
Quote:
A deferred contexts is a special ID3D11DeviceContext that can be called in parallel on a different thread than the main thread which is issuing commands to the immediate context. Unlike the immediate context, calls to a deferred contexts are not sent to the GPU at the time of call and must be marshalled into a command list which is then executed at a later date. It is also possible to execute a command list multiple times to replay a sequence of GPU work against different input data.
Source

Nvidia explain the reasons why a developer would like to take advantage of deferred contexts within the API
Quote:
The entire reason for using or not using deferred contexts revolves around performance. There is a potential to parallelize CPU load onto idle CPU cores and improve performance.

You will be interested in using deferred context command lists if:
•Your game is CPU bottlenecked.
•You have a significant # of draw calls (>3000).
•Your CPU bottleneck is from render thread load or Direct3D API calls.
•You have a threaded renderer but serialize to a main render thread for mapping incurring sync point costs
Source

After this quick explanation of the API and some of it's core functions, let's look at AMD's side of the story. Again, to put things simply, deferred contexts and command lists are not mandatory. AMD have specifically chosen not to use it in their driver. The drawback is significantly higher CPU overhead is observed, which heavily chokes the GPU(s) in CPU intensive applications. However, the positives from this is that you get a simple and stable driver which can work with almost any newly released game without compatibility issues and the need for specific driver tweaks. It also gives consistent performance in most games across the board, even if that performance is relatively low. Hence why you can run pretty much any Indie game on an AMD card without a "not supported" error message. In contrast to Nvidia drivers, each game requires driver tweaks for that game to actually remain stable. This is a lot of work for the driver team, but at least you are getting the performance boosts from the multi rendering capabilities of Direct-X 11. I would not be surprised if AMD chose compatibility over performance to either reduce staff workload, or because the driver team don't have the ability to compile a fully stable driver which uses such features discussed.

AMD have shown promise in the past with their CPU overhead issues with "Sid Meier's Civilization Beyond Earth", whereby they worked closely with the developer to ensure the game used good API optimization methods for enhanced CPU performance. AMD also enabled command list support (such a rare moment) for this title which allowed proper usage on all threads.



Source

Here is more information on the Civ: Beyond Earth case in relation to AMD's driver and the API's capabilities:
Quote:
Traditionally, rendering is a very serial process. The program needs to setup a bunch of objects and then pass that on to the video drivers and finally to the GPU. There's a high degree of submission overhead, meaning it's possible to choke the CPU while submitting a large number of objects to the GPU. In DirectX 11, multi-threaded rendering is achieved by turning the D3D pipeline into a 3 step process: the Device, the Immediate Context, and the Deferred Context. The important bit here is that the deferred context is full of things that have yet to be sent to the GPU, and that you can have a deferred context for each thread. When developers talk about multi-threaded rendering with DX11, this is what they're referring to. When you use DX11s multi-threaded rendering capabilities correctly, you can have several threads assemble their deferred contexts, and then combine them into a single command list once it comes time to render the scene.
Quote:
But let's be clear here: multi-threaded rendering is a massive undertaking on the driver and hardware side. You're doing the GPU equivalent of inventing the multi-tasking operating system. NVIDIA and AMD have not until this point supported multi-threaded rendering in their drivers, as they have needed time to implement this feature correctly in their drivers.
Quote:
Anyhow, as far as I know, AMD does not currently offer fully support for multi-threaded rendering (I don't have an AMD card plugged in right now to run the DX Caps Viewer against). I'm not sure where they are on it, though I doubt they're very far behind.
Quote:
So in conclusion, the reason NVIDIA beats AMD in Civ V is that NVIDIA currently offers full support for multi-threaded rendering/deferred contexts/command lists, while AMD does not. Civ V uses massive amounts of objects and complex terrain, and because it's multi-threaded rendering capable the introduction of multi-threaded rendering support in NVIDIA's drivers means that NVIDIA's GPUs can now rip through the game.
Source

Will we see an improvement from AMD's DriectX-11 overhead, I think not as suggested here in an interview with AMD's Richard Huddy at Bit-tech:
Quote:
AMD says DX11 multi-threaded rendering can double object/draw-call throughput, and they want to go well beyond that by bypassing the DX11 API.
Source One
Source Two

Nvidia not too long back decided to fully support Direct-X 11's capabilities with their "wonder driver" which promised the following:

CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), quality = 90

Source

It would seem Nvidia had fully enabled all the bells and whistles associated with the Direct-X 11 API, putting an end to CPU bottlenecks for most games with those running an Nvidia GPU. AMD still refuse to make such changes even now, in Q3 2015. Their main focus is clearly Direct-X 12, which does not help those who wish to play a game using Direct-X 11 as shown here:



Source

Clearly, the image above shows that AMD's driver had a maximum output of 1.1m draw calls despite the hardware being used at the time of testing. This has now been improved to around 1.3m depending on the hardware configuration.

To conclude, this is only a basic analysis of Direct-X and AMD's drivers. It is clear that AMD are not utilizing the full potential of Direct-X 11, causing CPU limitations for AMD customers. To reiterate, I spoke about all this 4-5 years ago, yet I was labelled as the crazy guy in the corner of the room who's making stuff up because he's out of his mind, and the bottleneck was nothing more than a problem with the system. instead, it turns out those involved just has a lack of understanding. hopefully with these findings, we can show people the true state of AMD GPU drivers and demand change once and for all. last but not least, i'd like to leave you with some Nvidia marketing benchmarks (which turned out to be completely true) to solidify the fact that AMD's driver performance is in a diabolical situation in comparison.





Source

UPDATE 24/02/2016: More evidence of AMD's DX11 single threaded performance issues. Look at the boost over DX11, and over the 980 Ti, in DX12.

New Ashes of the Singularity build benchmark results:


Quote:
Originally Posted by Mahigan View Post

Well lets take it a step further shall we?

SM200 (GTX 980 Ti):
22 SMMs
Each SMM contains 128 SIMD cores.
Each SMM can execute 64 warps concurrently.
Each Warp is comprised of 32 threads.
So that's 2,048 threads per SMM or 128 SIMD cores.
2,048 x 22 = 45,056 threads in flight (executing concurrently).

GCN3:
64 CUs
Each CU contains 64 SIMD cores.
Each CU can execute 40 wavefronts concurrently.
Each Wavefront is comprised of 64 threads.
So that's 2,560 threads per CU or 64 SIMD cores.
2,560 x 64 = 163,840 threads in flight (executing concurrently).

Now factor this:


GCN3 SIMDs are more powerful than SM200 SIMDs core for core. It takes a GCN3 SIMD less time to process a MADD.

So what is the conclusion?

1. GCN3 is far more parallel.

2. GCN3 has less SIMD cores dedicated towards doing more work. SM200 has more SIMD cores dedicated towards doing less work.

3. If you're feeding your GPU with small amounts of compute work items, SM200 will come out on top. If you're feeding your GPU with large amounts of compute work, GCN3 will come out on top.

Now this is just the tip of the Iceberg, there's also this to consider (Multi-Engine support):

GCN3 stands to benefit more from Asynchronous compute + graphics than SM200 does because GCN3 has more threads idling than SM200. So long as you feed both architectures with optimized code, they perform as expected.

What is as expected?


Hitman, under DX12, will showcase GCN quite nicely I believe.
 

·
Old dog, old tricks
Joined
·
9,755 Posts
Very good read, reserving my comment real estate for more in-depth commentary, since I've preached about inefficiency as well, for the past few years.

thumb.gif
 

·
Programmer
Joined
·
28,680 Posts
Discussion Starter #4
Quote:
Originally Posted by PontiacGTX View Post

Use 1440.now the game is gpu bound again
tongue.gif
Upping the resolution is the biggest misconception of the CPU bottleneck issue. At 1440p, I can still find myself bottlenecked depending on the application.
 
  • Rep+
Reactions: Cyro999

·
Registered
Joined
·
653 Posts
Very interesting, I've been saying this for the past half year to a year myself as well. (Especially to friend with a small budget looking to upgrade/build a new PC).
I really hope AMD will still resolve this and release proper DX11 drivers.
 

·
New to OCN?
Joined
·
26,919 Posts
Quote:
Originally Posted by BradleyW View Post

Upping the resolution is the biggest misconception of the CPU bottleneck issue. At 1440p, I can still find myself bottlenecked depending on the application.
still is way lower than you get at 1080 with a bigger performance loss regardless whether the driver are optimized for DX11 or not.you can see people getting bottleneck mostly at dx11 at 1080 with powergul gpu setup

This works only if the game has proper use of the multicore processors...
 

·
Old dog, old tricks
Joined
·
9,755 Posts
Who cares how big the bottleneck is, if frame rate remains the same? People usually play at their native resolution, upping to 1440p is linked to a big investment. And why? To get the same crappy performance, just knowing you CPU no longer bottlenecks that much. It's pointless and expensive.
 

·
New to OCN?
Joined
·
26,919 Posts
Quote:
Originally Posted by ronnin426850 View Post

Who cares how big the bottleneck is, if frame rate remains the same? People usually play at their native resolution, upping to 1440p is linked to a big investment. And why? To get the same crappy performance, just knowing you CPU no longer bottlenecks that much. It's pointless and expensive.
to those to have a way lower framerate with even stuttering..

You know there is downsampling via drivers which doesnt add any costs

By getting a lower framerate but playable...
 

·
Fan Man
Joined
·
674 Posts
Haven't read the wall of text yet, but as a critical person and a sceptic i'm weary of article titles that are not from the most respected review sites yet include the words "The Real Truth" .
Ambitious, pretentious, or close to truth? I don't know.
But i do know that it is not possible on this forum for registered members to change thread titles into more propriate words, which is very unpractical.

I'll read the post tonight.
 
  • Rep+
Reactions: Datsun

·
Premium Member
Joined
·
9,327 Posts
Ok.
 

·
Premium Member
Joined
·
1,810 Posts
Have you thought about asking AMD? I mean you could send a twitter message to @CatalystMaker. Afaik, AMD opted to simply move ahead with Mantle, which is now Vulcan and which placed pressure on MS to release DX12.

Would be nice to see what Mr Makedon says.
 

·
Programmer
Joined
·
28,680 Posts
Discussion Starter #12
I advise all readers to research the issue discussed in the OP. The evidence is overwhelming now that people have begun testing and exposing the CPU overhead fiasco on AMD drivers.

I understand that most people learning about this for the first time will probably be in denial for a short time before taking it on board (fanboys).

Those who are most likely to be affected by AMD's overhead are CrossFire users.
 

·
To The Game
Joined
·
7,674 Posts

 
  • Rep+
Reactions: BradleyW

·
Premium Member
Joined
·
20,159 Posts
Quote:
Originally Posted by BradleyW View Post

I advise all readers to research the issue discussed in the OP. The evidence is overwhelming now that people have begun testing and exposing the CPU overhead fiasco on AMD drivers.

I understand that most people learning about this for the first time will probably be in denial for a short time before taking it on board.

Those who are most likely to be affected by AMD's overhead are CrossFire users.
i am a crossfire user ( 2 290s). i just keep my HT on my i7 and i use 4K. heck even nvidia owners with mid-high end setups with an i5 experience bottleneck.
doh.gif
 

·
Programmer
Joined
·
28,680 Posts
Discussion Starter #15
Quote:
Originally Posted by Chargeit View Post

This is how it is in every game on AMD GPU's due to their overhead. There are countless video's showing tests like this. Overwhelming evidence! Thanks for posting this.
 
  • Rep+
Reactions: Cyro999

·
To The Game
Joined
·
7,674 Posts
NP. Read your title post and thought of that video which is pretty good proof of what you're saying. It also made me stop suggesting AMD gpu's for budget systems.
 

·
Programmer
Joined
·
28,680 Posts
Discussion Starter #17
Quote:
Originally Posted by rdr09 View Post

i am a crossfire user ( 2 290s). i just keep my HT on my i7 and i use 4K. heck even nvidia owners with mid-high end setups with an i5 experience bottleneck.
doh.gif
It goes like this. The lowest FPS in Tomb Raider on my system is 71. (Shanty Town).
Increasing resolution from 1080p to 1440p has no change on the fps. I am still sat at 71fps because the GPU's are still hanging back due to CPU limitation on the driver level.
Setting resolution to 3200x1800 drops fps to around 68. Overclocking the GPU's as far as I want won't let me past 71! I hit the limitation again. Increasing CPU core speed only then increased the 71 to 73. Running a game at 4K, you'll most likely be between 35 to 55 fps depending on game and settings. Therefore you are far less likely to come into a scenario whereby you'll recognise a CPU limitation.
 

·
Premium Member
Joined
·
20,159 Posts
Quote:
Originally Posted by BradleyW View Post

It goes like this. The lowest FPS in Tomb Raider on my system is 71. (Shanty Town).
Increasing resolution from 1080p to 1440p has no change on the fps. I am still sat at 71fps because the GPU's are still hanging back due to CPU limitation on the driver level.
Setting resolution to 3200x1800 drops fps to around 68. Overclocking the GPU's as far as I want won't let me past 71! I hit the limitation again. Increasing CPU core speed only then increased the 71 to 73. Running a game at 4K, you'll most likely be between 35 to 55 fps depending on game and settings. Therefore you are far less likely to come into a scenario whereby you'll recognise a CPU limitation.
i don't play that game. i play BF4 just fine using all Medium no AA. Both my gpus pegged in utilization. i don't even oc my 290s. my 4K is 60Hz. your 1080 is 144Hz?

edit: maybe you need to recalibrate your cpu oc. OR, just get an nvidia card. you've been suffering for a number of years.
 

·
Programmer
Joined
·
28,680 Posts
Discussion Starter #19
Quote:
Originally Posted by rdr09 View Post

i don't play that game. i play BF4 just fine using all Medium no AA. Both my gpus pegged in utilization. i don't even oc my 290s. my 4K is 60Hz. your 1080 is 144Hz?

edit: maybe you need to recalibrate your cpu oc.
I was just using Tomb Raider as an example to show you how things behave when a CPU limitation comes into play. Nothing wrong with the Overclock. In fact, as I suggested, increasing the CPU speed increased the min FPS because the AMD driver was already saturated on it's single core performance rendering (AMD don't make good use of multi rendering on a driver level and rely on single, causing the need for a stronger CPU when using an AMD card). The reasons behind all this are explained in the OP.

I've done Programming for five years. During my early days of University, I had the chance to play around with hardware at my facilities for programming projects. I used a similar system to my own and clearly demonstrated the exact driver overhead issues. Switched AMD for Nvidia in that very test rig and the overhead was no more, in any application! To back up the findings, I built several systems under the inspection of Computer Science tutors as part of the project. The findings all matched over the range of hardware used. The conclusion, the CPU is starved of it's full potential when using AMD as the GPU. Why? AMD driver uses single threaded processing. Nvidia uses multi. Simple as that.
 

·
Premium Member
Joined
·
20,159 Posts
Quote:
Originally Posted by BradleyW View Post

I was just using Tomb Raider as an example to show you how things behave when a CPU limitation comes into play. Nothing wrong with the Overclock. In fact, as I suggested, increasing the CPU speed increased the min FPS because the AMD driver was already saturated on it's single core performance rendering (AMD don't make good use of multi rendering on a driver level and rely on single, causing the need for a stronger CPU when using an AMD card).
5 years is lifetime in gaming. it is time to move on. try the other side. so, your 1080 is 144Hz?

Editi: I recommend the 980 Ti

 
1 - 20 of 454 Posts
Top