Overclock.net banner

[SemiAccurate]AMD talks about Vega at a high level

10K views 111 replies 58 participants last post by  C2H5OH 
#1 ·
Quote:
AMD's new Vega GPU architecture changes things on two fronts, on die and off. Of the two the off-die seems the most fundamental change to SemiAccurate but both bring ground up new tech to GPUs.



Vega at a high level

First on the list of big changes is a really big bang, think DX9 or geometry shader addition. It is called the Primitive shader and it is lumped under the heading of New Programmable Geometry Pipeline. The old way of doing things was to have separate pixel, vertex, and geometry shaders fed by a compute engine (ACE) or geometry command processor (GCP). These fed the Geometry Processor and then the various pipelines, Vertex Shader(VS) then Geometry Shader(GS).



Pick your path wisely young learner

With the new Geometry Pipeline things are different. You can still do things the old way or you can take a new path, the Primitive Shader(PS). As you can see above, the PS is a separate path from the normal VS->GS path. While low-level details are going to wait a bit more for disclosure, the PS operates on higher level objects than the old path and can discard primitives that are not drawn at a much higher rate than before.

The PS path is much more programmable than before, the older pair of shaders were less flexible. Programmers will have to pick the new way of doing things but once they do, they have much more ability than before to use features from both domains to get the job done. In theory this allows new ways of doing the work of the old shaders, presumably more efficient ones too. I wonder what could catalyze the game devs to change their ways? *COUGH* consoles *COUGH*

One Achilles heel of the older GCN devices was they had a lot of ACEs but one GCP. AMD isn't talking about the fine structures of Vega yet, but they are saying that the Geometry Pipeline can launch threads at the same rate as the ACEs launch compute threads. This implies multiple GCPs or at least a massively threaded GCP.



Cross-draw call logic is the important bit

This implication is bolstered by the slide above, it is likely that the Intelligent Workload Distributor(IWD) is the new GCP or GCPs, but it is also more than that. The key advance here is that the IWD does not schedule a pipeline or three, it is intelligent. Instead of looking at a draw call, it can look across multiple draw calls to optimally schedule for the device, not just the hot thread of the moment. This also implies one big IWD like one big GCP.

As always the devil is in the details but this could be a major change for the better in the utilization of the shaders on a GPU. AMD has always had a brute force advantage in shader math but utilization in the real world isn't always optimal. If the new IWD changes this, it will be interesting to see how much of the peak performance Vega can extract on actual games. It should be higher than pre-Vega devices but by how much?

Now we come to the Next-Generation Compute engine or NCU. The Geometry Pipeline was new, this is Next Generation (Cue scary music from a 1950s TV monster show). AMD gave out some top-level specs on this unit starting out with a peak rate of 128 32-bit ops per clock. Actually they listed it at 512 8b ops or 256 32b ops per clock with a configurable DP rate. This one is interesting because if they are calling the NCU the replacement for an ACE, that would mean the listed rate is per unit. If you recall Fiji/Fury had eight ACEs, Vega will likely have more. This number puts the device in perspective unless AMD is calling an NCU a shader group, then the numbers make a lot more sense but the higher level bits don't.



Pack em in to the shaders

There are two interesting bits about the NCU that likely necessitated a ground-up redesign. The first and more minor is the packed math operations we talked about for Boltzman/ROCm. Like it's Polaris predecessors, Vega can do 2x 16b ops per clock or one 32b op, the non-packed 16b math is kind of useless but legacy code likely demands it's use of opcode space. There is also a limited but not called out ability to do 2x 8b ops but not 4x 8b packed ops. These packed functions are aimed directly at AI work but could have limited use in gaming until packed 16b cards become the majority of the installed base.

The next one is the biggie, AMD claims the "NCU is optimized for higher clock speeds and higher IPC". Remember that ground up redesign we harped on earlier? It looks like AMD is going down the same path as Nvidia did on clocks for much the same reason, the 'old way' is less efficient on modern processes with modern memories. Higher GPU clocks are going to be the norm now and you can point a finger at energy efficiency as the cause. Please note that this is a good thing for all the right reasons.

Higher IPC and clocks lead to much higher performance but graphics are not solely about brute force, you can't render a modern multi-million polygon scene with all the effects turned on anymore without cheating. That cheating, or optimizing as most would call it essentially means the GPU or the game engine only does the least work possible to render the scene. Anything that would not end up as a visible pixel, visible being the important part, is work that does not need to be done. Worse yet it takes cycles away from useful work that needs to be done.

One of the biggies in this area is the key point for the other next gen feature of Vega, the Next Generation Pixel Engine or NPE. Cue the same scary music as last time. The main addition to the NPE is something AMD calls the Draw Stream Binning Rasterizer. It is a smart rasterizer with a newly added cache called the on-chip bin cache. It does about what its name suggests, it caches primitives.

The idea here is to save fetches to off-chip memory to pull in polygons as needed for rasterization. Going off chip is very expensive both in time and for energy consumption. With a smart cache AMD can likely avoid fetching a polygon or primitive more than once, or at least vastly cut down on the number of multiple fetches. This is a classic case of not doing unneeded work, and like backface culling and hidden triangle removal, it makes a GPU do a lot less busywork. You can think of the DSBR as a cache aware scheduler that can do non-coherent pixel and texture memory accesses from a dedicated cache.

How do you feed this beast? That steps us out farther to the HBC or High-Bandwidth Cache and HBCC or High Bandwidth Cache Controller. AMD was a bit vague on what does what and why. If you look at the system diagram above, it becomes pretty clear that the HBC is a large dedicated cache for the HBCC. This may seem counter intuitive to have a second large cache besides the L2 but the HBCC really needs it.
Source
 
#2 ·
Quote:
Originally Posted by PontiacGTX View Post
Oh boy, great stuff! The new NCU sounds like AMD has finally found the magic sauce for " Nvidia" class compression, or "cheating" as the author calls it.
biggrin.gif
Not to mention that large HBM "L3" cache.
devil.gif
 
#3 ·
I agree, Vega is beginning to sound very promising and hopefully bring AMD back into the power game. Polaris was great to feed the bank, but we want more
thumb.gif
 
#5 ·
Well it sounds great on paper but from what i understand now is that everything on the GPU is programmable. The question is, who is going to program the GPU to get the most out of the arch? Its going to be the same problem all over again. No one will program the GPU to run primitive shaders or any other hardware on the GPU and it will run like crap.
 
#7 ·
Quote:
Originally Posted by dir_d View Post

Well it sounds great on paper but from what i understand now is that everything on the GPU is programmable. The question is, who is going to program the GPU to get the most out of the arch? Its going to be the same problem all over again. No one will program the GPU to run primitive shaders or any other hardware on the GPU and it will run like crap.
The beauty is that you don't need to program for it... it is done automatically by the GPU (just like on Maxwell and newer architectures). Any primitives which are covered by another primitive are automatically culled before they even hit the memory subsystem. The savings ought to considerably boost performance by saving on cache and memory bandwidth.

On top of this.. Vega will have 512GB/s of memory bandwidth, so not only will the architecture require less memory bandwidth in order to operate but will also come equipped with 512GB/s. For some, this does not sound surprising because Fiji had 512GB/s but remember... Fiji wasted a lot of bandwidth by drawing covered primitives.

There really is no solid basis for attempting to predict Vega performance. It ought to be quite different from previous GCN architectures as it is not likely to suffer from a CPU driven bottleneck (less GPU stalls due to having the ability to stream in textures through a dedicated processor on board rather than by using the GPU) as well as memory bottlenecks.
 
#8 ·
Quote:
Originally Posted by JackCY View Post

Yada yada da. Just launch it AMD and get it to stores.
This^

All talk and no hardware makes you zero profit.
 
  • Rep+
Reactions: Serandur
#10 ·
AMD back in the entusiast segment in order to counter a simple GP104 out in May 2016 ?
Well, time will tell but we all know that bla bla bla didn't help Polaris from being a bad arch compared to Pascal. We're still waiting the laptop Polaris wave promised by Lisa Su too.
So I hope that we won't see the same hot air talks and that Vega will really deliver and offer a serious alternative. We need it.
 
#11 ·
If I recall, much like the upcoming zen release, all the nuts and bolts stuff on their slides are things that the architecture is "capable" of, not what is going to be available and optimized on the initial release.

Basically, you won't see all this stuff on these slides working simultaneously until Big Vega (600mm instead of the 450ish being released), and Zen+ are released. Not saying this is a bad thing mind you. When you re-design from the ground up it takes a lot of time and R&D to get it all working perfectly, so if an incomplete version is capable of being sold to help recoup some of the losses and give you the time to release a fully functioning part, I say go for it.
Quote:
Originally Posted by Olivon View Post

AMD back in the entusiast segment in order to counter a simple GP104 out in May 2016 ?
Well, time will tell but we all know that bla bla bla didn't help Polaris from being a bad arch compared to Pascal. We're still waiting the laptop Polaris wave promised by Lisa Su too.
So I hope that we won't see the same hot air talks and that Vega will really deliver and offer a serious alternative. We need it.
What are you talking about? Polaris wasn't a bad arch at all. It's sole purpose was to gain market share by targeting the budget builds and it was successful at doing just that.

It was especially not bad when compared to Paxwell. At least Polaris had improvements built into it in some form, unlike Pascal which was just a die shrink and a fancy new GPU boost piece of software thrown on it (ironically, it locks the chips down forcing enthusiasts to hard mod the cards to maximize performance rather than a modded bios).

P.S. I'm running SLI TXP, so not an Nvidia hater in the least.
 
#13 ·
Quote:
Originally Posted by Mahigan View Post

The beauty is that you don't need to program for it... it is done automatically by the GPU (just like on Maxwell and newer architectures). Any primitives which are covered by another primitive are automatically culled before they even hit the memory subsystem. The savings ought to considerably boost performance by saving on cache and memory bandwidth.

On top of this.. Vega will have 512GB/s of memory bandwidth, so not only will the architecture require less memory bandwidth in order to operate but will also come equipped with 512GB/s. For some, this does not sound surprising because Fiji had 512GB/s but remember... Fiji wasted a lot of bandwidth by drawing covered primitives.
There are like 2 paragraphs saying how you do have to change the way you program for it so i don't understand

"Programmers will have to pick the new way of doing things but once they do, they have much more ability than before to use features from both domains to get the job done. In theory this allows new ways of doing the work of the old shaders, presumably more efficient ones too. I wonder what could catalyze the game devs to change their ways?"
 
#14 ·
Quote:
Originally Posted by Ultracarpet View Post

There are like 2 paragraphs saying how you do have to change the way you program for it so i don't understand

"Programmers will have to pick the new way of doing things but once they do, they have much more ability than before to use features from both domains to get the job done. In theory this allows new ways of doing the work of the old shaders, presumably more efficient ones too. I wonder what could catalyze the game devs to change their ways?"
Does this mean programming in OpenCl will change too? I would love to learn how to
 
#15 ·
To use primitive shaders you need to change your shaders to a general type .

Right now we have surface,geometry, vert and frag, compute shaders on PC. Probably the primitive shaders is an influence from console programming.

Primitive shader != primitive culling.
 
#17 ·
Quote:
Originally Posted by DNMock View Post

What are you talking about? Polaris wasn't a bad arch at all. It's sole purpose was to gain market share by targeting the budget builds and it was successful at doing just that.
Polaris didn't gain AMD any market share.

All of AMD's market share increase was from prior to its release. They actually very slightly lost market share in Q3 2016 which is when Polaris was in the spotlight.

Fail? Maybe not, but definitely not all that impressive either.
 
#18 ·
Quote:
Originally Posted by TokenBC View Post

Polaris didn't gain AMD any market share.

All of AMD's market share increase was from prior to its release. They actually very slightly lost market share in Q3 2016 which is when Polaris was in the spotlight.

Fail? Maybe not, but definitely not all that impressive either.
How much more would they have lost without Polaris? You got info that shows Polaris didn't sell?
 
#19 ·
Quote:
Originally Posted by TokenBC View Post

Polaris didn't gain AMD any market share.

All of AMD's market share increase was from prior to its release. They actually very slightly lost market share in Q3 2016 which is when Polaris was in the spotlight.

Fail? Maybe not, but definitely not all that impressive either.
I think market share is based on how much cards are being sold. Nvidia had 3 cards while AMD had 1 card. Thats 75/25 which is right where it should be.
 
#20 ·
Quote:
Originally Posted by TokenBC View Post

Polaris didn't gain AMD any market share.

All of AMD's market share increase was from prior to its release. They actually very slightly lost market share in Q3 2016 which is when Polaris was in the spotlight.

Fail? Maybe not, but definitely not all that impressive either.
Maybe it had to do with Nvidia breaking every single record in their quarterly results? You can have a stellar launch and still be shadowed by someone doing even better.

Just think of anyone that played against Roger Federer at his prime.
 
#21 ·
Quote:
Originally Posted by TokenBC View Post

Polaris didn't gain AMD any market share.

All of AMD's market share increase was from prior to its release. They actually very slightly lost market share in Q3 2016 which is when Polaris was in the spotlight.

Fail? Maybe not, but definitely not all that impressive either.
Beating Nvidia mindshare is a massive problem that will take time. I still meet guys in my IT department that still think AMD has terrible/buggy drivers.
 
#22 ·
Quote:
Originally Posted by Robenger View Post

Beating Nvidia mindshare is a massive problem that will take time. I still meet guys in my IT department that still think AMD has terrible/buggy drivers.
IT know what they are talking about. They have certifications in a lot of things.
 
#23 ·
Quote:
Originally Posted by bigjdubb View Post

If it requires game devs to do things differently we may not see anything from it until consoles have the same capabilities. It could mean that AMD hardware may start winning in performance on Gaming Evolved titles.
built with that in mind.
having all console gpu contracts helps.
 
This is an older thread, you may not receive a response, and could be reviving an old thread. Please consider creating a new thread.
Top