Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [TT] NVIDIA should launch its next-gen Pascal GPUs with HBM2 in 2H 2016
New Posts  All Forums:Forum Nav:

[TT] NVIDIA should launch its next-gen Pascal GPUs with HBM2 in 2H 2016 - Page 67

post #661 of 724
I think the root problem here is Mahigan's implication that the higher available bandwidth of HBM allows faster system memory related transactions. That would be down to a back-end change on the GPU and not HBM though as the data has to pass through the GPU on its way to system memory and the GPU handles the memory access requests. Of course the higher bandwidth allowed by HBM may allow this transfer to be "masked" more effectively as there is more bandwidth to use before saturating the VRAM bus, and this does make some sense as to why even small overclocks on HBM help a lot and seem to remove some "latency" that the low effective clock speed seems to produce.


It is certainly possible AMD did make a change to the GPU to allow faster DMA and/or driver optimized pre-transfers, or they are doing it all through software which would be less efficient I guess.

Basically HBM or GDDR5 makes no difference, its the GPU and/or software alone creating a performance difference when the capacity of the VRAM buffer is insufficient and this requires pre-buffering data because on-the-fly would incur large latency issues. A good example of this is CallsignVega's benchmarks of quad-sli on GTX480's back in the day (or was that 580's?) which saturated the PCI-E bus when VRAM limitations were reached with PCI-E 2.0. Using PCI-E 3.0 brought performances levels back up.
Edited by STEvil - 3/1/16 at 11:13pm
post #662 of 724
they can't really do it through hardware, since most of the fury users are on intel platforms.
what i'm implying is that they're following the same hardware specifications, which they can't change at all otherwise they risk incompatibility issues.

so i'm pretty sure what they're doing are entirely on driver and firmware optimizations.
Edited by epic1337 - 3/1/16 at 11:21pm
post #663 of 724
Quote:
Originally Posted by epic1337 View Post

they can't really do it through hardware, since most of the fury users are on intel platforms.
what i'm implying is that they're following the same hardware specifications, which they can't change at all otherwise they risk incompatibility issues.

so i'm pretty sure what they're doing are entirely on driver and firmware optimizations.

Well that's also my understanding of it as well in cave man terms from reading Mahigans posts. I guess either my question was stupid or difficult to answer directly but thats what Im getting from reading these posts.

From what I understand, its reasonable to assume that the Furies are going to be more limited by VRAM going forwards as its being done in software, particularly when they dont have as much time to spend optimizing game by game for them.

I do hope its not the case, it could end up looking like AMD's own Kepler except maybe worse then 5 or 10% lost. However if Polaris beats Pascal I'm sure it will be forgiven fairly quickly, I hope thats the case for the industries sake.
post #664 of 724
You could have the driver or GPU just scan VRAM and monitor usage. Lower use smaller items go to system ram, larger items like textures stay in VRAM. To a degree this is already done but typically it is not handled by the driver or GPU.

Also as I noted before this could be done in hardware in part. Changing the GPU to allow it to do more VRAM prediction wouldnt hurt compatability any. Maybe this is something Polaris will use in hardware where Fiji was a test mule (it was just that, for HBM).
post #665 of 724
Quote:
Originally Posted by Slink3Slyde View Post

Well that's also my understanding of it as well in cave man terms from reading Mahigans posts. I guess either my question was stupid or difficult to answer directly but thats what Im getting from reading these posts.

From what I understand, its reasonable to assume that the Furies are going to be more limited by VRAM going forwards as its being done in software, particularly when they dont have as much time to spend optimizing game by game for them.

I do hope its not the case, it could end up looking like AMD's own Kepler except maybe worse then 5 or 10% lost. However if Polaris beats Pascal I'm sure it will be forgiven fairly quickly, I hope thats the case for the industries sake.

it might actually be the opposite, that is to say theres potentially more room for improvement.
Fury cards are limited by "something" and its not just the VRAM capacity limitations, that is to say even when VRAM isn't saturated it still doesn't perform as fast as it was expected to be.
so on that note, the Fury cards can in the long run become one of the better cards that won't just be displaced by newer cards, existing Fury users wouldn't need to upgrade anytime soon.
Quote:
Originally Posted by STEvil View Post

Also as I noted before this could be done in hardware in part. Changing the GPU to allow it to do more VRAM prediction wouldnt hurt compatability any. Maybe this is something Polaris will use in hardware where Fiji was a test mule (it was just that, for HBM).

i do wonder about that, if its possible to make the VRAM a LL cache, that is to say using the entire VRAM as a last level cache with cache prediction.
what i'm implying is that, it would be very identical to what intel did with Broadwell-C, using an eDRAM as an LL Cache, but in this case AMD used HBM.
Edited by epic1337 - 3/2/16 at 12:51am
post #666 of 724
Quote:
Originally Posted by STEvil View Post

You could have the driver or GPU just scan VRAM and monitor usage. Lower use smaller items go to system ram, larger items like textures stay in VRAM. To a degree this is already done but typically it is not handled by the driver or GPU.

Also as I noted before this could be done in hardware in part. Changing the GPU to allow it to do more VRAM prediction wouldnt hurt compatability any. Maybe this is something Polaris will use in hardware where Fiji was a test mule (it was just that, for HBM).

If it was mostly in hardware it would seem to me to be somewhat redundant for Polaris, considering its probably going to have 8GB + RAM?
Quote:
Originally Posted by epic1337 View Post

it might actually be the opposite, that is to say theres potentially more room for improvement.
Fury cards are limited by "something" and its not just the VRAM capacity limitations, that is to say even when VRAM isn't saturated it still doesn't perform as fast as it was expected to be.
so on that note, the Fury cards can in the long run become one of the better cards that won't just be displaced by newer cards, existing Fury users wouldn't need to upgrade anytime soon.

I had thought that was speculated to be to do with the pixel fill rate/ROP count not increasing over Hawaii?
post #667 of 724
Quote:
Originally Posted by Slink3Slyde View Post

I had thought that was speculated to be to do with the pixel fill rate/ROP count not increasing over Hawaii?
theres numerous hypothesis actually.
*HBM low clock caused high access latencies
*insufficient front-end throughput / ROP unable to keep up
*driver issues

Quote:
Originally Posted by Slink3Slyde View Post

If it was mostly in hardware it would seem to me to be somewhat redundant for Polaris, considering its probably going to have 8GB + RAM?
not quite, they would still hold value for extreme scenarios where 8GB VRAM is insufficient, e.g. poorly made games.
plus they could make lesser HBM cards that only has 4GB of HBM, placed at the same spot as R9-390X e.g. $450, it would probably be a rebranded Fury Nano.

or even single HBM2 chip card (2GB) with a low-power die, something like a 14nm Finfet [ 120mm² 1536:128:32 ] 1024bit 1x 2GB HBM2 (256GB/s).
imagine a single-slot card that doesn't require auxiliary power (75W) and priced at $150 while performs quite close to R9-380.
not only would this be popular in the entry segment of desktops, that low power but high performance would be highly sought for in laptops.
Edited by epic1337 - 3/2/16 at 1:46am
post #668 of 724
Quote:
Originally Posted by epic1337 View Post

theres numerous hypothesis actually.
*HBM low clock caused high access latencies
*insufficient front-end throughput / ROP unable to keep up
*driver issues
not quite, they would still hold value for extreme scenarios where 8GB VRAM is insufficient, e.g. poorly made games.
plus they could make lesser HBM cards that only has 4GB of HBM, placed at the same spot as R9-390X e.g. $450, it would probably be a rebranded Fury Nano.

or even single HBM chip card (2GB) with a low-power die, imagine a single-slot card that doesn't require auxiliary power (75W) and priced at $150 while performs quite close to R9-380.
not only would this be popular for the entry segment of desktops, that low power but high performance would be highly sought for in laptops.

I see. I have seen benchmarks where overclocking the HBM seems to give a much bigger performance benefit then you would imagine give the already high bus width, that makes sense to me. The high CPU overhead on the DX11 drivers looks to be something that will never be fully resolved, but which will hopefully become irrelevant when DX12/Vulkan. Unfortunately DX 11 is with us for a while more at least. Also given the possible concerns over MS's shady seeming DX12 practises DX 11 might even hang on longer if theres a backlash in some way.


I feel that they would go with 8GB of GDDR5x for the upper mid/ mid range cards, and I can't see with the consoles being at the same level for a few more years that 8GB of VRAM is going to be insufficient any time soon. I dont think it will be feasible for them to go with HBM on lower end desk top cards because of the cost? Over the last couple of generations what has happened makes a Nano rebrand look a possibility, but thats going to be a high production cost binned chip at that size for a mid range priced card, makes it feel unlikely to me.

The low power laptop chip would be interesting and is surely in the works, but I'm sure that both sides have plans for that, and currently Nvidia are doing quite well without the power savings of HBM. So the Polaris architecture itself would need to be better per watt then Pascal for this to succeed, given the already non existant AMD mobile dGPU market share.

Of course a lot of this is my (relatively speaking to some here) low tech opinion. smile.gif
post #669 of 724
Quote:
Originally Posted by Slink3Slyde View Post

I feel that they would go with 8GB of GDDR5x for the upper mid/ mid range cards, and I can't see with the consoles being at the same level for a few more years that 8GB of VRAM is going to be insufficient any time soon. I dont think it will be feasible for them to go with HBM on lower end desk top cards because of the cost? Over the last couple of generations what has happened makes a Nano rebrand look a possibility, but thats going to be a high production cost binned chip at that size for a mid range priced card, makes it feel unlikely to me.

The low power laptop chip would be interesting and is surely in the works, but I'm sure that both sides have plans for that, and currently Nvidia are doing quite well without the power savings of HBM. So the Polaris architecture itself would need to be better per watt then Pascal for this to succeed, given the already non existant AMD mobile dGPU market share.

Of course a lot of this is my (relatively speaking to some here) low tech opinion. smile.gif

the reason why Fiji chips (Fury X / Fury Nano / Fury) are expensive is that its a very large die (596mm²), take note that the expense is two-fold due to the interposer it uses.
not to mention manufacturing maturity of HBM and possibly yield issues with Fiji dies itself, after HBM2 comes out they should be able to release the same Fiji chips at at much reasonable price.



now as for the low power chip, if they can make a smaller die using a combination of a slimmed down core config and 14nm + Finfet, then the cost would be quite dramatically less.
e.g. if 14nm + Finfet would incur a 1.5x production cost, but the die becomes 120mm² (120mm² / 596mm² = 0.201) then it would be roughly $196 (0.201*1.5*$649 = $196).
and thats not factoring in the size reduction of the interposer, by itself would lessen the cost by a huge chunk.

as for the power consumption, considering desktop Tonga is already 190Watt at 28nm, then we can expect that an HBM counter part, on top of a 14nm+Finfet die shrink can push it to way below 100Watt.
even the laptop version of Tonga has an assumed 120Watt TDP, that means to say theres potential to get a Fiji die down to such a drastically reduced power consumption.
binning the two would be something like "more efficient = laptop / less efficient = desktop" i can imagine them binning those ultra-low power HBM cards to 40~50Watt, that would be awesome.



on a side note, i forgot this thread was about pascal.
so speaking of pascal, the same thing can be done with it, an ultra-low power pascal HBM card. smile.gif
Edited by epic1337 - 3/2/16 at 2:19am
post #670 of 724
Quote:
Originally Posted by epic1337 View Post

the reason why Fiji chips (Fury X / Fury Nano / Fury) are expensive is that its a very large die (596mm²), take note that the expense is two-fold due to the interposer it uses.
not to mention manufacturing maturity of HBM and possibly yield issues with Fiji dies itself, after HBM2 comes out they should be able to release the same Fiji chips at at much reasonable price.

I understand that a lot of the cost is due to the large die and new HBM tech, thats why I was thinking it unlikely for a direct rebrand this time around, even given the maturing process and also possible introduction of GDDR5X at lower cost then HBM? Could be wrong.
Quote:
Originally Posted by epic1337 View Post

now as for the low power chip, if they can make a smaller die using a combination of a slimmed down core config and 14nm + Finfet, then the cost would be quite dramatically less.
e.g. if 14nm + Finfet would incur a 1.5x production cost, but the die becomes 120mm² (120mm² / 596² = 0.201) then it would be roughly $196 (0.201*1.5*$649 = $196).
and thats not factoring in the reduction of the size of the interposer, by itself would lessen the cost by a huge chunk.

O.K but Nvidia will benefit in exactly the same way from the shrink(Edit: as you said), and Pascal should be even more efficient then Maxwell.
Quote:
Originally Posted by epic1337 View Post

as for the power consumption, considering desktop Tonga is already 190Watt at 28nm, then we can expect that an HBM counter part, on top of a 14nm+Finfet die shrink can push it to way below 100Watt.
even the laptop version of Tonga has an assumed 120Watt TDP, that means to say theres potential to get a Fiji die down to such a drastically reduced power consumption.

binning the two would be something like "more efficient = laptop / less efficient = desktop" i can imagine them binning those ultra-low power HBM cards to 40~50Watt, that would be awesome.

Tonga is no doubt more effiecient then Tahiti, but they dont yet match Nvidia. Anand has the 380x drawing the same power as the superior performing 970 for example, and the 960 is using quite a bit less then the 380/285 from what Ive seen at similar performance.

Things can change dramatically in one generation though as we've seen before. Exciting year ahead all round thumb.gif
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Rumors and Unconfirmed Articles
Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [TT] NVIDIA should launch its next-gen Pascal GPUs with HBM2 in 2H 2016