Originally Posted by sdlvx
It depends on what the product is. For a high end setup where cost is not as important, it will be fine. My point is that APUs can get a lot more bandwidth then they currently have
in an effort to address the fact that you're so hung up on APUs being bandwidth starved.
However, you're forgetting the key concept that "APUs are bandwidth starved" applies primarily to using the iGPU to play games when compared to dGPU performance with GDDR.
So, the point I'm making, again, is that APUs can scale with lots of bandwidth or they can remain with 2 channels or whatever.
But I don't think you understand the concept of HSA. iGPU gaming performance has absolutely nothing to do with HSA. And I do believe that this is where you are getting rather confused.
Between CPU, GPU, and system memory, PCIe 3.0 x16 is by far the slowest part. PCIe speed is measured in something equivalent to bits/second while memory uses bytes/second. I mentioned this earlier and clearly it fell on deaf ears, but PCIe 3.0 x16 only has about 16GB/s of bandwidth total. Not really comparable to a dual channel DDR3 setup running DDR3 1600. Yes, you don't even need to spend a ton of money on ram to get a ton of bandwidth.
I feel sort of bad picking on you since you don't even understand what HSA is, even though you're so vigorously attacking it, but if APUs are horribly memory bandwidth bottlenecked and can't compete with dGPUs, how do you explain this?
I think you are failing to get the point that there are just things that GPUs simply are unable to do, and the purpose of HSA is to be able to use the GPU in situations where you normally couldn't, or wouldn't want to because doing so would mean you'd have to move the entire application from system memory into video memory for GPGPU, then back into system memory to use it.
And you do you recall the part earlier about PCIe 3.0 being the slowest? That's what you have to do to move it all.
I have an extensive grasp of the issue, and you're interpreting it incorrectly.
Now the obligatory thing to point out: People always try to point to something else. If it's a non gaming discussion, people cite gaming in hopes to get their inaccurate points across, and vice versa. So obviously it must be addressed on both fronts.
I understand all of how HSA works. The primary thing being shared memory space.
When you talk about "and the purpose of HSA is to be able to use the GPU in situations where you normally couldn't" that's simply not true. You can do everything with a standalone GPU, that you can with an APU.....and better at the same price point or less too.
HSA just includes some functions programs can call on so that they can use the GPU inside of the APU without having to program specifically for an APU environment (example: programs don't have to code specifically for how memory will be handled in an APU system, that's all done automatically). When it comes to HSA, its benefits only really apply against APU's without HSA. HSA however, does not do things you can't already do on any modern GPU. It doesn't magically process x86 code on the GPU for example. So, no it's not doing anything a GPU can't already do.
And again, PCIE doesn't need to be that fast. It's not really a bottleneck, even a $1,000 GPU can safely run on PICE3 with a huge amount of performance. I can't make this any more clear, the PICE bandwidth requirements really aren't all that much in the grand scheme of things. In reality, you're probably not even maxing your SYSRAM bandwidth in a CPU/GPU system when either needs to manipulate something in the others RAM, let alone your PCIE bandwidth.
GPU's are all about massive amounts of I/O. Input, Output. And it needs fast memory to be able to move not just large amounts of data, but also rapidly. Break up say a 100GB/s bus, and you get about 10GB every tenth of a second. It's also very much like how SSD's have high IOPs, even the first generation of SSD's had very bad read/write performance, but the high IOPS made a difference. It's less about what's happening in a whole second, and more about what's happening in less than the tenth of a second.
For example, in order to get 60fps, your GPU has to be able to process each frame in 1/60th of a second. That requires enough bandwidth to be available in that time frame to achieve that (assuming the GPU power is up to snuff). Now keep in mind, when I talk about a frame, I'm not talking about the data that makes up the picture sent to your monitor.
Now let's think about it....How much bandwidth does a 20GB/s SYSRAM with APU have in 1/60th of a second? A mere 340MB. That's not much in the grand scheme of things, and 20GB/s is on the high end of SYSRAM speeds. Most common is 10GB/s, which brings it down to just 170MB. Obviously that's not enough I/O to draw a single frame in that amount of time. But there's also another downside, since RAM is better for for larger chunks over a longer period rather than tons of small chunks over a shorter period, so your transfer rate in those short time scales could actually be worse and uneven in reality. i.e. you might get more or less bandwidth in the same successive fractions of a second. The more you have in a given fraction of a second, the better. But if you have less than you need to work out that frame, then you have to eat into the bandwidth from the next fraction of a second to get enough data in or out of memory.
Now, How much bandwidth does a 150GB/s GDDR5 have in 1/60th of a second? A whopping 2.5GB, nearly 8 times more bandwidth in that given time frame of 1/60th of a second. That's clearly a lot more data to work with.
So if you don't have enough bandwidth in that fraction of time, you aren't going to have enough I/O to input data or output a frame, so your GPU just ends up waiting on Input or Output before it can do anything else, which on GPU processing speed timescale, is an eternity. It doesn't matter how powerful a GPU is, once it's limited on RAM bandwidth, it's not going to go any faster than the bandwidth speed allows it to change out input/output data. The GPU is effectively starved of data to do anything with.
This same thing also applies to non gaming GPU usage, since again, the GPU relies heavily on massive amounts of I/O. The GPU takes in some data, processes it, outputs it back to memory. The faster it gets the data outputted, the faster it can get new data inputted, and so on and so forth. This is a bedrock fact that simply can not be ignored nor downplayed, regardless of the usage scenario.Edited by AMDATI - 3/26/14 at 2:40am