Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [VC]GTX 1060 specifications leaked - faster than RX 480
New Posts  All Forums:Forum Nav:

[VC]GTX 1060 specifications leaked - faster than RX 480 - Page 57  

post #561 of 735
Quote:
Originally Posted by EightDee8D View Post

Your are talking about architecture gains, i'm talking about the rx480 card, not gcn 4. and looking at its tflops ( 5.8 for 480 vs 5.9 for 390x ) it doesn't show any improvements. now if you still can't see that i can't explain any better. it's already way too ot so i'll leave it here. redface.gif

Yup.

I don't think there was much of an IPC gain either.

Why we have to use tflops as a basis as it takes into account both core count and frequency. IPC is information per cycle but because GPU's have such varying clusters of cores, you have to integrate this into the equation which results as tflops.I.e tflops = frequency * cores * 2

Even Mahigan was using this metric before his prediction went off the rains.

IPC is best translated because it is normally a term reserved for CPU's as efficiency of work done per clock. Since GPU cores vary so much, we have to include the core count. This is then how much does a tflop equal into performance for gaming. or a ratio of real performance vs theoretical.

Because of this IPC is a great measurement of core occupancy or how efficient cores are being used towards actual work.

If we look at the performance of polaris, most of the gains from each CU have come from clocks. And the 15% just seems like the difference in clocks, not improvements in the architecture.

What is strange is IPC appears to have gone down because 5.8tflops of polaris performs somewhere along the lines of 5.2-5.4 tflops of 390x.

The problem for vega is besides pure bandwidth, if it is most the same architecture, they are going to run into the same bottleneck issues. This being we are likely to get 64ROP like fiji and it is just a limit on the architecture. And considering how inefficiently fiji tflops turned into real world performance, this doesn't bode well for vega.

If Polaris made some IPC gains were 5.8tflops was equal to 7tflops of hawaii, I would have much higher hopes for it.
post #562 of 735
rolleyes.gif

Considering 32ROPs of Ellsmere are matching 64 on Hawaii, this bodes fairly well for Vega unless of course the bottleneck was elsewhere.
    
CPUMotherboardGraphicsRAM
e2140@3.2Ghz abit IP35-E HIS IceQ4 4850 4GB 667@800 5-5-5-15 
OSPower
win xp 32 bit Corsair 450VX@stock 
  hide details  
    
CPUMotherboardGraphicsRAM
e2140@3.2Ghz abit IP35-E HIS IceQ4 4850 4GB 667@800 5-5-5-15 
OSPower
win xp 32 bit Corsair 450VX@stock 
  hide details  
post #563 of 735
Quote:
Originally Posted by gamervivek View Post

rolleyes.gif

Considering 32ROPs of Ellsmere are matching 64 on Hawaii, this bodes fairly well for Vega unless of course the bottleneck was elsewhere.

i mentioned it before, the bottleneck was in the bandwidth.
which means they could afford to reduce ROPs if they can make each individual ROPs perform closer to their peak throughput.

http://www.anandtech.com/show/5261/amd-radeon-hd-7970-review/4
Quote:
ROP operations are extremely bandwidth intensive, so much so that even when pairing up ROPs with memory controllers, the ROPs are often still starved of memory bandwidth.

The solution to that was rather counter-intuitive: decouple the ROPs from the memory controllers. By servicing the ROPs through a crossbar AMD can hold the number of ROPs constant at 32 while increasing the width of the memory bus by 50%. The end result is that the same number of ROPs perform better by having access to the additional bandwidth they need.

so in AMD's case, they had to reduce the ROPs whenever they reduce their buswidth and overall bandwidth.
otherwise their ROPs would just end up being ran at a less efficient state, making more ROPs simply pointless.

also, this indicates one thing, Tonga and Hawaii are mostly bandwidth starved when it comes to their ROPs.
Hawaii = 64ROPs @ 384GB/s = 6GB/s per ROP
Tonga = 32ROPs @ 182.4GB/s = 5.7GB/s per ROP
Polaris = 32ROPs @ 256GB/s = 8GB/s per ROP
* take note that Polaris has a much higher theoretical bandwidth due to more efficient compression.
Edited by epic1337 - 7/7/16 at 9:11pm
post #564 of 735
I was right there's no ipc increase, just frequency increase. what they did is compared 290 vs 480 in several games and divided their fps by cu. read the actual footnote 1* here-


Source - https://forums.overclockers.co.uk/showthread.php?t=18737717

So as i said there's no ipc gain per cu at same frequency. wink.gif and it's kind of pathetic imo.

also kiss goodbye to 2.7x p/w for 470, they are comparing wrong tdp of 270x (180w). basically it's a meh architecture aside from tessellation and 34% memory bandwidth increase from dcc and even that isn't enough.
Edited by EightDee8D - 7/7/16 at 9:40pm
post #565 of 735
well, neither did Pascal, the perf increase are mostly attributed to their vastly higher clocks, and even worse perf per CU had slightly regressed due to diminishing returns.
post #566 of 735
Quote:
Originally Posted by epic1337 View Post

well, neither did Pascal, the perf increase are mostly attributed to their vastly higher clocks, and even worse perf per CU had slightly regressed due to diminishing returns.
i know that, but they are so behind frequency wise and they can't just increase cu count, they will hit die size wall. they really need ipc increase this time. or goodbye to gcn and start new architecture. it's just not good enough to battle future nvidia architectures.
post #567 of 735


:x If you buy the founder's edition, better to like that cooler...

And no water compatibility, unless you leave the cables hanging over...
post #568 of 735
Quote:
Originally Posted by tajoh111 View Post

Yup.

I don't think there was much of an IPC gain either.

Why we have to use tflops as a basis as it takes into account both core count and frequency. IPC is information per cycle but because GPU's have such varying clusters of cores, you have to integrate this into the equation which results as tflops.I.e tflops = frequency * cores * 2

Even Mahigan was using this metric before his prediction went off the rains.

IPC is best translated because it is normally a term reserved for CPU's as efficiency of work done per clock. Since GPU cores vary so much, we have to include the core count. This is then how much does a tflop equal into performance for gaming. or a ratio of real performance vs theoretical.

Because of this IPC is a great measurement of core occupancy or how efficient cores are being used towards actual work.

If we look at the performance of polaris, most of the gains from each CU have come from clocks. And the 15% just seems like the difference in clocks, not improvements in the architecture.

What is strange is IPC appears to have gone down because 5.8tflops of polaris performs somewhere along the lines of 5.2-5.4 tflops of 390x.

The problem for vega is besides pure bandwidth, if it is most the same architecture, they are going to run into the same bottleneck issues. This being we are likely to get 64ROP like fiji and it is just a limit on the architecture. And considering how inefficiently fiji tflops turned into real world performance, this doesn't bode well for vega.

If Polaris made some IPC gains were 5.8tflops was equal to 7tflops of hawaii, I would have much higher hopes for it.

First bold statement is incorrect as I've already proved per shader performance is up vs hawaii by at least ~8.5%; this is in a synthetic situation in which, or as you like to cling to the fact that TFlops is a good representation of real performance, should best represent IPC gains.

Second, if we take the TPU averaged scores that were referenced earlier, assuming the 4% difference average by your standards that would put P10 @ 5.68 TFlops of "Hawaii performance". There is also proof from reviews and users that depending on resolution and per game basis that reference p10 is throttling. Taking into account the minimum core clock of 1120, this brings p10 down to ~ 5.16 TFlops. Now considering this range (5.16-5.83) there are games in which the RX 480 is faster than 390X in both DX11 and DX12, games where performance is almost Identical, and games in which it is behind. How could P10 beat Hawaii with significantly less hardware and lower IPC in ANY GAME?

You have to be a moron to say CU performance has regressed when P10 is Hawaii or GCN 1 on a HUGE diet, yet performance is almost Identical. P10 is not an enthusiast part. It's not a big chip at a premium price, with a crazy cooler. I really can't fathom how people expect so much of this card. For AMD to actively produce a product that has regressed in performance when they have all their chips in the GCN bag would be the STUPIDEST thing on the face of the planet. You heard it here, AMD, pack your bags your engineers can do nothing but go backwards. rolleyes.gif
Steins Gate
(17 items)
 
  
CPUMotherboardGraphicsRAM
i7 7800x @ TBD ASRock OC Formula X299 XFX Vega 64 4x4 Crucial Ballistix Elite @ TBD 
Hard DriveHard DriveHard DriveCooling
960 Evo 850 Evo WD Blue 1TB Thermalright True Spirit 140 Direct 
MonitorKeyboardPowerCase
Acer XR342CK G413 EVGA 750w G2 CM MC5 
MouseMouse PadAudioAudio
G703 Logitech Powerplay Onboard Logitech Z906 
Audio
HD518 
  hide details  
Steins Gate
(17 items)
 
  
CPUMotherboardGraphicsRAM
i7 7800x @ TBD ASRock OC Formula X299 XFX Vega 64 4x4 Crucial Ballistix Elite @ TBD 
Hard DriveHard DriveHard DriveCooling
960 Evo 850 Evo WD Blue 1TB Thermalright True Spirit 140 Direct 
MonitorKeyboardPowerCase
Acer XR342CK G413 EVGA 750w G2 CM MC5 
MouseMouse PadAudioAudio
G703 Logitech Powerplay Onboard Logitech Z906 
Audio
HD518 
  hide details  
post #569 of 735
Quote:
Originally Posted by EightDee8D View Post

i know that, but they are so behind frequency wise and they can't just increase cu count, they will hit die size wall. they really need ipc increase this time. or goodbye to gcn and start new architecture. it's just not good enough to battle future nvidia architectures.

they don't actually need to abandon GCN, they'll just have to revise it a bit more than it currently is.
they've made multiple changes so far in making each CU more efficient in bandwidth usage, and they've made quite a bit of progress.

the last and most game-changing step would be pushing for 16bit compute (half-precision) support, the same with what NV is currently doing.



Quote:
Originally Posted by NvNw View Post



:x If you buy the founder's edition, better to like that cooler...

And no water compatibility, unless you leave the cables hanging over...

very simple solution, buy an 8PIN PCI-E extension cable, cut-off the male end and solder that on the board.


Edited by epic1337 - 7/7/16 at 10:27pm
post #570 of 735
Quote:
Originally Posted by epic1337 View Post

they don't actually need to abandon GCN, they'll just have to revise it a bit more than it currently is.
they've made multiple changes so far in making each CU more efficient in bandwidth usage, and they've made quite a bit of progress.

the last and most game-changing step would be pushing for 16bit compute or half-precision support, the same with what NV is currently doing.
compare 280x ( 2011 28nm gpu) vs 480 (2016 14nm gpu ) there's barely any ipc increase, and that's with 4 revisions. imo that's the most pathetic thing about GCN.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Rumors and Unconfirmed Articles
This thread is locked  
Overclock.net › Forums › Industry News › Rumors and Unconfirmed Articles › [VC]GTX 1060 specifications leaked - faster than RX 480