Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [VC]GTX 1060 specifications leaked - faster than RX 480
New Posts  All Forums:Forum Nav:

[VC]GTX 1060 specifications leaked - faster than RX 480 - Page 58  

post #571 of 735


GTX 1060 iChill Gaming OC X2

base clock: 1784
boost clock:1860

mem:8200

wow this thing should break +2GHz out of the box with custom cooler
post #572 of 735
Quote:
Originally Posted by EightDee8D View Post

compare 280x ( 2011 28nm gpu) vs 480 (2016 14nm gpu ) there's barely any ipc increase, and that's with 4 revisions. imo that's the most pathetic thing about GCN.

why don't you compare the buswidth and bandwidth between the two uarch?
280X (Tahiti) = 2048:128:32 | 384bit @ 288GB/s = 100% perf
380X (Tonga) = 2048:128:32 | 256bit @ 182.4GB/s = 100% perf
480 (Polaris) = 2304:144:32 | 256bit @ 256GB/s = ~110% perf

you see the thing here? Tonga has much less buswidth, yet performs roughly the same as Tahiti.
this is because of multiple design changes, one is the introduction of compression, the other is ROP optimizations.

you're being too much biased in raw CU throughput and forgot that the front-end also needs optimizations.
with optimizations on the front-end, they can fit more CUs and thus increase card perf without dramatically increasing die-size.
Edited by epic1337 - 7/7/16 at 10:45pm
post #573 of 735
Quote:
Originally Posted by epic1337 View Post

why don't you compare the buswidth and bandwidth between the two uarch?
280X (Tahiti) = 2048:128:32 | 384bit @ 288GB/s = 100% perf
380X (Tonga) = 2048:128:32 | 256bit @ 182.4GB/s = 100% perf
480 (Polaris) = 2304:144:32 | 256bit @ 256GB/s = ~110% perf

you see the thing here? Tonga has much less buswidth, yet performs roughly the same as Tahiti.
this is because of multiple design changes, one is the introduction of compression, the other is ROP optimizations.

you're being too much biased in raw CU throughput and forgot that the front-end also needs optimizations.
run tahiti @ 1266mhz and i guarantee it will just 10% behind 480 because 12% less cu. 5 years and that's what they have achieved ?. nobody cares if you can achieve that same performance with 32bit bus and 4rop because you are still requiring more cu, where the hell is progression in perf/cu/mhz ? you know they cannot just keep adding cu right ?

move to 480 thread if you want to continue this, way ot atm.
post #574 of 735
Quote:
Originally Posted by EightDee8D View Post

run tahiti @ 1266mhz and i guarantee it will just 10% behind 480 because 12% less cu. 5 years and that's what they have achieved ?. nobody cares if you can achieve that same performance with 32bit bus and 4rop because you are still requiring more cu, where the hell is progression in perf/cu/mhz ? you know they cannot just keep adding cu right ?

move to 480 thread if you want to continue this, way ot atm.

the progress is on the perf/die cost, are you so short sighted as to not see this as progress?

furthermore, as i've mentioned, you're biased on raw CU throughput, the previous GCN can only handle 32CUs per 32ROPs, while Polaris can now handle 36CUs per 32ROPs.
or if i'd put it on a literal sense where you can understand, ROPs gained roughly 10% increase in IPC, oh hey thats an increase in IPC, imagine that surprise.

so overall, instead of making a die thats ~20% larger using 2304:144:36 | 320bit configuration and only gaining ~10% increase in perf.
they've managed to make a die thats only ~10% larger using 2304:144:32 | 256bit configuration and gaining ~10% increase in perf.
thats an improvement on the uarch itself, as the die will cost less for the same increase in perf.
Edited by epic1337 - 7/7/16 at 11:19pm
post #575 of 735
Quote:
Originally Posted by gamervivek View Post

rolleyes.gif

Considering 32ROPs of Ellsmere are matching 64 on Hawaii, this bodes fairly well for Vega unless of course the bottleneck was elsewhere.

Since they are near doubling shader count again, they are going to run into similar bottlenecks again with Vega, if vega only gets 64 ROPs.

It would have been better if Polaris started with 64 ROPs while Vega got 128.
Quote:
Originally Posted by epic1337 View Post

why don't you compare the buswidth and bandwidth between the two uarch?
280X (Tahiti) = 2048:128:32 | 384bit @ 288GB/s = 100% perf
380X (Tonga) = 2048:128:32 | 256bit @ 182.4GB/s = 100% perf
480 (Polaris) = 2304:144:32 | 256bit @ 256GB/s = ~110% perf

you see the thing here? Tonga has much less buswidth, yet performs roughly the same as Tahiti.
this is because of multiple design changes, one is the introduction of compression, the other is ROP optimizations.

you're being too much biased in raw CU throughput and forgot that the front-end also needs optimizations.
with optimizations on the front-end, they can fit more CUs and thus increase card perf without dramatically increasing die-size.

If we going to fawn over those improvements, what are we going to say about pascal? It has 50% increase of the performance over the rx 480 with the same bandwidth of polaris. And this is with 25% of it's shader's disabled.

The problem with being marveled by these accomplishment is that Tonga actually didn't shrink the die down or decrease the number of transistors vs Tahiti/hawaii. Tonga is actually a bit bigger and has 20% more transistors than tahiti but has performance maybe 5% better overall. I would be more impressed if the changes in Tonga and similarly Polaris(not changes due to change in node) decreased the transistor count for the same performance or increased the performance per transistor or performance per mm2 of die. This is what happened between the transition between kepler and maxwell and maxwell vs pascal. And this is what AMD needs to do to catch up.

Why AMD has been letting so many people down is for their changes to GCN, they are not increasing the performance per transistor or die size. This comes with good architectural changes. If their changes were actually doing things which increased the IPC, for the number of transistors the rx480 has, it should be 20% faster than it is. Hawaii has double precision which occupy alot of transistors, is 28nm and has better overall performance than the rx480 for 9% percent more transistors.

All these changes are just not getting the job done. For a pure gaming part with no double precision, unlike hawaii, on a superior node, 5.7 billions transistors of Polaris should outperform 6.2 billion hawaii transistors.

Pascal was able to beat the titan X by 30% using 10% less transistors.

RX480 loses to the 390x by 10% while giving double precision the boot while using 9% less transistors, and they were the ones that made the bigger architectural changes.


This is the most disappointing things. AMD should have gained more for the architectural changes.

What we have is AMD shuffling around it's transistor to beef some things up while making some things weaker which for all their changes = the same performance. That is a waste of R and D. What we need from GCN, if they are to keep on using it, are changes that result in overall increases in performance across the board, not weakening in some area while others increase so the overall performance is the same.

We have known for the 232mm2 part for a while now but when you take the lowest performance expectation that anyone was expecting on this forum and combine it with a worse power consumption than anyone could have predicted, you have mostly disappointing feeling towards polaris as an architecture. Anyone saying anything else is lowering their bar for their favorite company.
Edited by tajoh111 - 7/7/16 at 11:52pm
post #576 of 735
Quote:
Originally Posted by tajoh111 View Post

Since they are near doubling shader count again, they are going to run into similar bottlenecks again with Vega, if vega only gets 64 ROPs.

It would have been better if Polaris started with 64 ROPs while Vega got 128.
it won't be as drastic as before, as each ROPs can sustain more CUs.
Quote:
Originally Posted by tajoh111 View Post

If we going to fawn over those improvements, what are we going to say about pascal? It has 50% increase of the performance over the rx 480 with the same bandwidth of polaris. And this is with 25% of it's shader's disabled.

The problem with being marveled by these accomplishment is that Tonga actually didn't shrink the die down or decrease the number of transistors vs Tahiti/hawaii. Tonga is actually a bit bigger and has 20% more transistors than tahiti but has performance maybe 5% better overall. I would be more impressed if the changes in Tonga and similarly Polaris(not changes due to change in node) decreased the transistor count for the same performance or increased the performance per transistor or performance per mm2 of die. This is what happened between the transition between kepler and maxwell and maxwell vs pascal. And this is what AMD needs to do to catch up.

Why AMD has been letting so many people down is for their changes to GCN, they are not increasing the performance per transistor or die size. This comes with good architectural changes. If their changes were actually doing things which increased the IPC, for the number of transistors the rx480 has, it should be 20% faster than it is. Hawaii has double precision which occupy alot of transistors, is 28nm and has better overall performance than the rx480 for 9% percent more transistors.

All these changes are just not getting the job done. For a pure gaming part with no double precision, unlike hawaii, on a superior node, 5.7 billions transistors of Polaris should outperform 6.2 billion hawaii transistors.

Pascal was able to beat the titan X by 30% using 10% less transistors.

RX480 loses to the 390x by 10% while giving double precision the boot while using 9% less transistors, and they were the ones that made the bigger architectural changes.


This is the most disappointing things. AMD should have gained more for the architectural changes.

What we have is AMD shuffling around it's transistor to beef some things up while making some things weaker which for all their changes = the same performance. That is a waste of R and D. What we need from GCN, if they are to keep on using it, are changes that result in overall increases in performance across the board, not weakening in some area while others increase so the overall performance is the same.
the "perf gain" is mostly attributed to the difference between clock speed, although it can be argued otherwise as Pascal uses a bit less transistors.

Pascal GP104 doesn't have that much of a throughput in double precision, or rather GP104 doesn't have any DP units, they rely on software to translate DP workload for the SP units.
comparatively GP100 (tesla P100) has 40% more CUDAs, but has 15.3billion transistors, that is because its mostly made out of DP units.
GP106 = 1280:80:48 @ 1708Mhz | ???mm^2 @ ??? transistors = 0.14 TFLOPs Double Precision
GP104 = 2560:160:64 @ 1733Mhz | 314mm^2 @ 7.2 billion transistors = 0.28 TFLOPs Double Precision
GP100 = 3584:???:?? @ 1480Mhz | 610mm^2 @ 15.3 billion transistors = 1.30 TFLOPs Double Precision


i can't say much for transistor count, as theres some notable difference between finfet and planar.
on a side note, the reason why Tonga has such a high transistor count, is probably because it has a 128bit disabled bus for whatever reason.
Edited by epic1337 - 7/8/16 at 12:29am
post #577 of 735
Quote:
Originally Posted by tajoh111 View Post

Since they are near doubling shader count again, they are going to run into similar bottlenecks again with Vega, if vega only gets 64 ROPs.

It would have been better if Polaris started with 64 ROPs while Vega got 128.
If we going to fawn over those improvements, what are we going to say about pascal? It has 50% increase of the performance over the rx 480 with the same bandwidth of polaris. And this is with 25% of it's shader's disabled.

The problem with being marveled by these accomplishment is that Tonga actually didn't shrink the die down or decrease the number of transistors vs Tahiti/hawaii. Tonga is actually a bit bigger and has 20% more transistors than tahiti but has performance maybe 5% better overall. I would be more impressed if the changes in Tonga and similarly Polaris(not changes due to change in node) decreased the transistor count for the same performance or increased the performance per transistor or performance per mm2 of die. This is what happened between the transition between kepler and maxwell and maxwell vs pascal. And this is what AMD needs to do to catch up.

Why AMD has been letting so many people down is for their changes to GCN, they are not increasing the performance per transistor or die size. This comes with good architectural changes. If their changes were actually doing things which increased the IPC, for the number of transistors the rx480 has, it should be 20% faster than it is. Hawaii has double precision which occupy alot of transistors, is 28nm and has better overall performance than the rx480 for 9% percent more transistors.

All these changes are just not getting the job done. For a pure gaming part with no double precision, unlike hawaii, on a superior node, 5.7 billions transistors of Polaris should outperform 6.2 billion hawaii transistors.

Pascal was able to beat the titan X by 30% using 10% less transistors.

RX480 loses to the 390x by 10% while giving double precision the boot while using 9% less transistors, and they were the ones that made the bigger architectural changes.


This is the most disappointing things. AMD should have gained more for the architectural changes.

What we have is AMD shuffling around it's transistor to beef some things up while making some things weaker which for all their changes = the same performance. That is a waste of R and D. What we need from GCN, if they are to keep on using it, are changes that result in overall increases in performance across the board, not weakening in some area while others increase so the overall performance is the same.

We have known for the 232mm2 part for a while now but when you take the lowest performance expectation that anyone was expecting on this forum and combine it with a worse power consumption than anyone could have predicted, you have mostly disappointing feeling towards polaris as an architecture. Anyone saying anything else is lowering their bar for their favorite company.

Yeah, 1900 mhz pascal beating 1000 mhz titan x by %30 is clearly an architectural win biggrin.gif Well, it's still improvement and I agree AMD needed a bit more gain from polaris, especially higher mhz.
Edited by Catscratch - 7/8/16 at 12:05am
Intel Evilnow
(18 items)
 
   
CPUMotherboardGraphicsRAM
i5 2500k 4ghz @ Offset -0.015 Asus P8P67 Evo (bios 3207) Sapphire 280x Tri-x 3GB OC (Stock 1020/1500 Non... G.Skill RipjawsX 2x4gb 1866mhz 9-10-9-28-2n @ 1.5v 
Hard DriveHard DriveHard DriveHard Drive
SHSS37A120G WD5000AAKX-001CA0 WD20EARX WD20EZRZ 
Hard DriveOptical DriveCoolingOS
WD5001AALS-00L3B2 (Now External) ASUS DRW-1814BLT Noctua NH-u12p SE2 Windows 10 Pro 
MonitorKeyboardPowerCase
Asus VH242H Wobbly Stand :) Microsoft Ergo 4000 Enermax Infiniti 650 (28a,28a,30a) Cooler Master haf 912 Advanced 
MouseOther
A4tech x7 F3 Sunbeam RHK-EX-BA Rheobus-Extreme Fan Controlle... 
CPUMotherboardGraphicsRAM
Phenom II x6 1090t BE 3.6/4.0 Turbo@def.volt MSI K9A2 Platinum v1 Sapphire HD6850 1GB 850/1100@def.volt Kingston 2x2gb Hyperx 1066 5-5-5-15 
Hard DriveHard DriveOptical DriveOS
Western Digital WD5001AALS Seagate Barracuda ST3250410AS Asus DRW-1814BLT Windows 7 Ultimate x64 SP1 
MonitorKeyboardPowerCase
Asus VH242H 23.6" Wobbly Stand :D Microsoft Ergo 4000 Enermax Infiniti 650w (28a,28a,30a) Thermaltake Kandalf SuperTower 
Mouse
A4 tech Swop-3 
  hide details  
Intel Evilnow
(18 items)
 
   
CPUMotherboardGraphicsRAM
i5 2500k 4ghz @ Offset -0.015 Asus P8P67 Evo (bios 3207) Sapphire 280x Tri-x 3GB OC (Stock 1020/1500 Non... G.Skill RipjawsX 2x4gb 1866mhz 9-10-9-28-2n @ 1.5v 
Hard DriveHard DriveHard DriveHard Drive
SHSS37A120G WD5000AAKX-001CA0 WD20EARX WD20EZRZ 
Hard DriveOptical DriveCoolingOS
WD5001AALS-00L3B2 (Now External) ASUS DRW-1814BLT Noctua NH-u12p SE2 Windows 10 Pro 
MonitorKeyboardPowerCase
Asus VH242H Wobbly Stand :) Microsoft Ergo 4000 Enermax Infiniti 650 (28a,28a,30a) Cooler Master haf 912 Advanced 
MouseOther
A4tech x7 F3 Sunbeam RHK-EX-BA Rheobus-Extreme Fan Controlle... 
CPUMotherboardGraphicsRAM
Phenom II x6 1090t BE 3.6/4.0 Turbo@def.volt MSI K9A2 Platinum v1 Sapphire HD6850 1GB 850/1100@def.volt Kingston 2x2gb Hyperx 1066 5-5-5-15 
Hard DriveHard DriveOptical DriveOS
Western Digital WD5001AALS Seagate Barracuda ST3250410AS Asus DRW-1814BLT Windows 7 Ultimate x64 SP1 
MonitorKeyboardPowerCase
Asus VH242H 23.6" Wobbly Stand :D Microsoft Ergo 4000 Enermax Infiniti 650w (28a,28a,30a) Thermaltake Kandalf SuperTower 
Mouse
A4 tech Swop-3 
  hide details  
post #578 of 735
Quote:
Originally Posted by Catscratch View Post

Yeah, 1900 mhz pascal beating 1000 mhz titan x by %30 is clearly an architectural win biggrin.gif Well, it's still improvement and I agree AMD needed a bit more gain from polaris, especially higher mhz.

If there wasn't an increase in power consumption or die size which actually cost Nvidia something, then it is.

As far as engineering, there are two concerns costs and performance, power consumption adds to the cost as does die size.

If the cards were clocked at 10000mhz and there was no increase in power consumption or die size, but added 40% more performance, it wouldn't matter in terms of engineering. The cards would still cost the same for that level of performance and that is what matters most to the company. What maxwell and pascal triumphs is they can clock so high and the wattage doesn't climb that much. Add in they have less cores than AMD and it's part of the reason why they need high frequency to compete against AMD.

In addition, pascal is clocked at 1750mhz really, titan x is clocked at 1130mhz and has 20% more cores and it mostly the reason why there is only a 30% difference in performance. Also pascal at the top end is highly bandwidth limited which is shown by the less than proportional decrease between the 1070 and 1080.
post #579 of 735
Quote:
Originally Posted by epic1337 View Post

they don't actually need to abandon GCN, they'll just have to revise it a bit more than it currently is.
they've made multiple changes so far in making each CU more efficient in bandwidth usage, and they've made quite a bit of progress.

the last and most game-changing step would be pushing for 16bit compute (half-precision) support, the same with what NV is currently doing.
......

Actually, there is bit of misconception about FP16.

AMD RX Polaris does support native FP16, while Nvidia GTX Pascal doesn't and this has been discussed in great length at beyound3d. It's reserved for Tesla cards.

If I'm not mistaken they managed to test it and found that they can execute FP16 but nowhere fast enough to matter.

I will quote Andrew Lauritzen here:
Quote:
TBH it doesn't really bother me that they don't have fast fp16 on desktop, but they should have been a lot more upfront about it when they launched consumer Pascal. Lots of folks in the games industry are still working under the incorrect assumption that fp16 is supported and faster on NVIDIA.

Link to the full quote and thread:
https://forum.beyond3d.com/threads/nvidia-pascal-announcement.57763/page-80

But as we know, Nvidia are masters at this PR game.
post #580 of 735
Quote:
Originally Posted by tajoh111 View Post

If there wasn't an increase in power consumption or die size which actually cost Nvidia something, then it is.
Warning: Spoiler! (Click to show)
As far as engineering, there are two concerns costs and performance, power consumption adds to the cost as does die size.

If the cards were clocked at 10000mhz and there was no increase in power consumption or die size, but added 40% more performance, it wouldn't matter in terms of engineering. The cards would still cost the same for that level of performance and that is what matters most to the company. What maxwell and pascal triumphs is they can clock so high and the wattage doesn't climb that much. Add in they have less cores than AMD and it's part of the reason why they need high frequency to compete against AMD.

In addition, pascal is clocked at 1750mhz really, titan x is clocked at 1130mhz and has 20% more cores and it mostly the reason why there is only a 30% difference in performance. Also pascal at the top end is highly bandwidth limited which is shown by the less than proportional decrease between the 1070 and 1080.

Maxwell was already capable of reaching 1500Mhz, so maybe a slight tweak along with TSMC:
Quote:
TSMC's 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology.

At the end, most of the gains in Pascal are due to process.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Rumors and Unconfirmed Articles
This thread is locked  
Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [VC]GTX 1060 specifications leaked - faster than RX 480