Overclock.net banner

[NH] AMD to move floating point calculation to the graphics circuit in 3-5 years

1560 Views 16 Replies 13 Participants Last post by  Ethatron
Quote:


AMD to move floating point calculation to the graphics circuit in 3-5 years

hen we first reported on AMD's coming Bulldozer processor architecture we pointed out that future CPUs will transfer a lot of the the workload onto the graphics circuit. AMD has in an efficient way increased multi-threading performance with its new processor architecture by simply expanding the capacity for integers. Despite two integer units the floating point unit is still alone and instead AMD will transfer the floating point calculations to the graphics circuit.

This is something AMD will develop further over the coming years and it told Anandtech that will handle the majority of floating point calculations with the graphics circuit in 3-5 years....

Source
1 - 17 of 17 Posts
Seems like the right way to do things, as it would seem to improve performance a great deal with relatively little complexity.
How would they serve the mass market with this strategy? Would this mean that amd would start selling cpu-gpu combo deals to the mass market? Would they retain a line of "old style" cpus for mass consumption and only move the floating point calcs to gpu for the high end performance market?

Hopefully they know what they're doing, but this looks like a gamble to me. However it could be one way that they catch up to intel's performance lead.
Quote:

Originally Posted by TehStone View Post
How would they serve the mass market with this strategy? Would this mean that amd would start selling cpu-gpu combo deals to the mass market? Would they retain a line of "old style" cpus for mass consumption and only move the floating point calcs to gpu for the high end performance market?

Hopefully they know what they're doing, but this looks like a gamble to me. However it could be one way that they catch up to intel's performance lead.
You're missing the big picture.... In the future, it won't be CPU+GPU combos. It will just be a GPGPU, CPUGPU, CPU, or whatever they want to call it. The CPU will be handling the GPU tasks. The CPU and GPU will be one and the same.

This would be better for mass consumption as a single chip would handle more tasks.
See less See more
Quote:

Originally Posted by DuckieHo View Post
The CPU and GPU will be one and the same.
sort of like integrated graphics? I know what you're getting at but it's not exactly a revolutionary idea... it wouldn't be long before they start dual coring and quad coring those and we're back where we started. Imagine putting a dedicated gpu on a Coc system to take the load off the cpugpu, oh what performance!
See less See more
2
Quote:


Originally Posted by TehStone
View Post

sort of like integrated graphics? I know what you're getting at but it's not exactly a revolutionary idea... it wouldn't be long before they start dual coring and quad coring those and we're back where we started. Imagine putting a dedicated gpu on a Coc system to take the load off the cpugpu, oh what performance!

We use to have removable Floating Point co-processors. If you're around long enough, the more the things stay the same. Another example would be server technology.... single mainframe system to multiple servers systems... back to a single virtualized server.

See less See more
This can provide some incredible increases in performance for comparatively little cost to power consumption and die area. I don't think the pure GPU will ever be replaced, simply because on chip graphics will never reach the same levels of performance that can allow enthusiasts and HPC to forgo them (in games and in servers). But for home user and workstation, a single chip may just be able to "do it all" with very except able levels of performance. AMD is in the best position to be able to implement this in a big way... a 22nm CPU with a specialized graphics core will be very formidable.
Gamers will love this thing as well. Think of all the tests done with Physx, where even GTX 285's benefited from having a 9600GT daughter card handling the physx calcs, except in this case, the physx co-processor is built right onto your CPU.

When Fusion gains ground is when we begin to see GPGPU computing really start stretching its legs as a sizable number of computers will be able to handle GPGPU out of the box. No less, computer gaming might begin to see a semi-competent baseline in integrated graphics performance, opening up a much wider market to computer game developers.
I don't want to sound like a jackass, but I thought we knew this?
See less See more
2
Quote:

Originally Posted by xeeki View Post
I don't want to sound like a jackass, but I thought we knew this?

This is more of a clearification that all FP calculations will be done by the GPU in the future.
See less See more
Quote:


Originally Posted by TehStone
View Post

How would they serve the mass market with this strategy? Would this mean that amd would start selling cpu-gpu combo deals to the mass market? Would they retain a line of "old style" cpus for mass consumption and only move the floating point calcs to gpu for the high end performance market?

Hopefully they know what they're doing, but this looks like a gamble to me. However it could be one way that they catch up to intel's performance lead.

As Duckie said, in a few years, nobody will remember it as a separate unit because it will be integrated onto the CPU. People probably don't remember way back in the day when there used to be a separate math processor, x87. As CPU dies got smaller and math functions became more important, the x87 processor simply became fully integrated onto the CPU. The same thing will happen now with the CPU and GPU where the GPU will be used for FP operations.

Check out the wiki for x87.
See less See more
I think the future will lead to "System on a chip" PCs, but for enthusiasts, there will always be a dedicated graphical process unit.

Just like for a music enthusiast will always be a dedicated sound processor.
This is just hype, or a child's fantasy.

Compiler development takes too long to even think about offering generic compiler suites that automatically parallelize and offload inline FP-code onto another architecture. In addition it need to be propriate AMD because Intel will not adopt it the same way. But Intel is the one with the supreme compiler support (their own compiler), not AMD.
You can take Roadrunner as an example, it is a three-arch build (x86, PPC, CELL) and they needed two years just to output near the promised 1PFLOPs+.

Not to speak about that by far not all problems are asynchronous, you can't parallelize everything, and we all know that running a scalar complex formula on GPU sucks 100-fold.

It's delusional [speaking as programmer and assembler-freak].
Works for the Cell already....
4
Quote:


Originally Posted by DuckieHo
View Post

We use to have removable Floating Point co-processors. If you're around long enough, the more the things stay the same. Another example would be server technology.... single mainframe system to multiple servers systems... back to a single virtualized server.



Ah, the explains why I couldn't install Windows 98 on a 386 or 486 CPU I had back when I was a kid. Kept telling me about the missing FPU
.

Quote:


Originally Posted by chemicalfan
View Post

Works for the Cell already....

Yes, I believe this is why they are not making CPUs anymore using the CELL architecture. GPUs are much more efficient at this kind of calculation and it is definitely the way to go.
See less See more
Quote:


Originally Posted by chemicalfan
View Post

Works for the Cell already....

It's not about making it "work", it's all about making it "usable". You don't want to offer Fusions with 5% FP-performance right?

Whichever direction you go with this architecture, offloading _all_ FP-calcs to a SIMD-copro and removing the scalar-and-branch-optimized FP-unit will face very severe real-world problems.

In our current CPUs we have 2-4 interfaces for the FP-unit alone (x87, 3DNow!, SSE and AVX), you have multiple FP-alu's in the core whose FP-scheduler is a monstrous abomination, almost a hardware-compiler itself just to deal with scalar-vs.-vector and arch-vs.-arch utilization (or underutilization). Not to speak about the op-decoder which get's longer and longer the more op-codings you introduce (one cache-line for one single op is near!).
And finally you are simply unable to map some FP-features even onto SSE, let's say FP-calc status-flags being the most obvious.

If you replace current FP-units with the current GPU-style arch (which is not even mentioned, as this is a fully complete integration of a GPU fused into the architecture), and you try to map the current interfaces onto it, you will fall flat-face real hard.
See less See more
1 - 17 of 17 Posts
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top