Overclock.net › Forums › Industry News › Hardware News › [AdoredTV] Pascal vs Maxwell at same clockspeeds, same FLOPS (1080 vs 980 Ti)
New Posts  All Forums:Forum Nav:

[AdoredTV] Pascal vs Maxwell at same clockspeeds, same FLOPS (1080 vs 980 Ti) - Page 31

post #301 of 305
Quote:
Originally Posted by NightAntilli View Post

What exactly is weak about its front end and back end..?

Weaker geometry, rasterization, lower ROP count than it should have, lack of compression or inefficient compression, small and less efficient l2 cache, and a compute heavy design. GCN from the perspective of compute workloads is awesome, and when it gets to throw it's weight with it using it's ACEs you get to see what AMD put it's focus into. GCNs weakness reletive to Maxwell and Pascal is that it isn't as graphics focused, AMD has worked on this with Polaris by improving compression, l2, reducing pipeline stals when the front end isnt giving the shaders anything to do by increased buffer size and prefetching improvement, and improving geometry engine with primitive discard. AMD knows that their front end in regards to graphics workloads is weaker vs the graphics focused Paxwell, and in a sense it's had some pay off with dx12/vulkan which brings more compute to the table. This generation theyve been working on that and improving the graphics side and we see some of that fruit in Polaris. I can do a better more in depth in it later when my motherboards back but typing on a phone is awful so I end here

Simply put, nvidia and AMD have different focus. Nvidia on graphics, AMD on compute, when AMD gets to show its ability to do compute AND graphics it shines. When it's just graphics we see its weakness reletive to Paxwell due to a front end weighed to compute
Edited by Echoa - 8/1/16 at 10:51am
Toxic-DT
(19 items)
 
Toxic-LT
(12 items)
 
Toxic-SV
(10 items)
 
CPUMotherboardGraphicsRAM
i7 4770k Gigabyte Z87X-UD4H Gigabyte GTX 1060 Gskill Ares 
Hard DriveHard DriveHard DriveOptical Drive
Mushkin Eco3 Kingdian  Corsair LE  Lite On 
CoolingOSMonitorKeyboard
Cryorig H5 Windows 10 pro Dell S2415h Azio MK Retro 
PowerCaseMouseAudio
Seasonic G Series 650w Define R5  Logitech G500s Asus Xonar DSX 
AudioAudioAudio
Monoprice Retro Heaphones Monoprice Vocal Dynamic Mic Edifier r980t  
CPUMotherboardGraphicsRAM
I7 920xm Asus g73jh Radeon HD 6990m 8gb Hynix DDR3 1333 
Hard DriveHard DriveCoolingOS
Seagate Laptop Thin 500gb Corsair Force LE  Stock G73jh Windows 10 
MonitorKeyboardPowerMouse
1600x900 Glossy G73jh Screen G73jh Backlit Keyboard Delta 150w PSU Verbatim Laptop mouse 
CPUMotherboardGraphicsRAM
Xeon e5649 HP z400 FirePro v4800 12gb Gkill Ripjaw 
Hard DriveHard DriveCoolingOS
Hitachi Ultrastar Western Digital 320gb HP High Performance Cooler Ubuntu Gnome 16.10 
PowerCase
HP z400 HP z400 
  hide details  
Reply
Toxic-DT
(19 items)
 
Toxic-LT
(12 items)
 
Toxic-SV
(10 items)
 
CPUMotherboardGraphicsRAM
i7 4770k Gigabyte Z87X-UD4H Gigabyte GTX 1060 Gskill Ares 
Hard DriveHard DriveHard DriveOptical Drive
Mushkin Eco3 Kingdian  Corsair LE  Lite On 
CoolingOSMonitorKeyboard
Cryorig H5 Windows 10 pro Dell S2415h Azio MK Retro 
PowerCaseMouseAudio
Seasonic G Series 650w Define R5  Logitech G500s Asus Xonar DSX 
AudioAudioAudio
Monoprice Retro Heaphones Monoprice Vocal Dynamic Mic Edifier r980t  
CPUMotherboardGraphicsRAM
I7 920xm Asus g73jh Radeon HD 6990m 8gb Hynix DDR3 1333 
Hard DriveHard DriveCoolingOS
Seagate Laptop Thin 500gb Corsair Force LE  Stock G73jh Windows 10 
MonitorKeyboardPowerMouse
1600x900 Glossy G73jh Screen G73jh Backlit Keyboard Delta 150w PSU Verbatim Laptop mouse 
CPUMotherboardGraphicsRAM
Xeon e5649 HP z400 FirePro v4800 12gb Gkill Ripjaw 
Hard DriveHard DriveCoolingOS
Hitachi Ultrastar Western Digital 320gb HP High Performance Cooler Ubuntu Gnome 16.10 
PowerCase
HP z400 HP z400 
  hide details  
Reply
post #302 of 305
Quote:
Originally Posted by HaiderGill View Post

I don't think he's hating on it, he just pointed out that nVidia were able to build a really efficient architecture in Maxwell which has allowed them to clock it faster in conjunction with a die shrink and product the goods that way...


Factoring in the reduced OC headroom in the GTX 1080, it's only about 15-20% faster in games compared to the 980Ti, but still selling for a comparable price as to the 980Ti's release.

In that regard, this is disappointing, or Nvidia taking advantage of its marketshare to milk the market.

We can expect that the Titan Pascal will be perhaps 30-35% faster, but at double the cost. I guess the way to describe it is that the inter-generational leap in performance is disappointing considering that this is a die shrink and the fact that the price:performance is underwhelming.



Quote:
Originally Posted by NightAntilli View Post

What exactly is weak about its front end and back end..?

Echoa already covered it, but basically, there is not enough to feed the 4096 shaders. We would expect that a 4096 shader part at a comparable clock would have about 45% more performance than the 290X with 2816 shaders. In practice, we only got about 20%, so something is bottlenecking the Fury X.


  1. Needs more RBEs so that it can have more color and Z/Stencil ROPs (probably 50% more at least, and maybe even double)
  2. L2 cache was bottlenecking as well
  3. Triangle output was not improved from the 290X, which was already lagging behind the Nvidia counterparts
  4. We also did not see a massive leap in memory bandwidth with HBM1.

They seem to have mitigated 3 with Polaris. The optimal layout though will have double the triangle performance with the 4096 part though I suspect and triple with the 6144 part.

For Vega, they will need a larger L2 cache (4MB sounds about right) and much faster. They will also need to be able to feed all of the shaders.

For memory bandwidth I'm not overly concerned as HBM2 is going to offer 1 TB/s basically.
Edited by CrazyElf - 8/1/16 at 11:03am
Trooper Typhoon
(20 items)
 
  
CPUMotherboardGraphicsGraphics
5960X X99A Godlike MSI 1080 Ti Lightning MSI 1080 Ti Lightning 
RAMHard DriveHard DriveHard Drive
G.Skill Trident Z 32 Gb Samsung 850 Pro Samsung SM843T 960 GB Western Digital Caviar Black 2Tb 
Hard DriveOptical DriveCoolingCooling
Samsung SV843 960 GB LG WH14NS40 Cryorig R1 Ultimate 9x Gentle Typhoon 1850 rpm on case 
OSMonitorKeyboardPower
Windows 7 Pro x64 LG 27UD68 Ducky Legend with Vortex PBT Doubleshot Backlit... EVGA 1300W G2 
CaseMouseAudioOther
Cooler Master Storm Trooper Logitech G502 Proteus Asus Xonar Essence STX Lamptron Fanatic Fan Controller  
  hide details  
Reply
Trooper Typhoon
(20 items)
 
  
CPUMotherboardGraphicsGraphics
5960X X99A Godlike MSI 1080 Ti Lightning MSI 1080 Ti Lightning 
RAMHard DriveHard DriveHard Drive
G.Skill Trident Z 32 Gb Samsung 850 Pro Samsung SM843T 960 GB Western Digital Caviar Black 2Tb 
Hard DriveOptical DriveCoolingCooling
Samsung SV843 960 GB LG WH14NS40 Cryorig R1 Ultimate 9x Gentle Typhoon 1850 rpm on case 
OSMonitorKeyboardPower
Windows 7 Pro x64 LG 27UD68 Ducky Legend with Vortex PBT Doubleshot Backlit... EVGA 1300W G2 
CaseMouseAudioOther
Cooler Master Storm Trooper Logitech G502 Proteus Asus Xonar Essence STX Lamptron Fanatic Fan Controller  
  hide details  
Reply
post #303 of 305
Quote:
Originally Posted by CrazyElf View Post

Echoa already covered it, but basically, there is not enough to feed the 4096 shaders. We would expect that a 4096 shader part at a comparable clock would have about 45% more performance than the 290X with 2816 shaders. In practice, we only got about 20%, so something is bottlenecking the Fury X.


  1. Needs more RBEs so that it can have more color and Z/Stencil ROPs (probably 50% more at least, and maybe even double)
  2. L2 cache was bottlenecking as well
  3. Triangle output was not improved from the 290X, which was already lagging behind the Nvidia counterparts
  4. We also did not see a massive leap in memory bandwidth with HBM1.

They seem to have mitigated 3 with Polaris. The optimal layout though will have double the triangle performance with the 4096 part though I suspect and triple with the 6144 part.

For Vega, they will need a larger L2 cache (4MB sounds about right) and much faster. They will also need to be able to feed all of the shaders.

For memory bandwidth I'm not overly concerned as HBM2 is going to offer 1 TB/s basically.
I don't think there is much ROP bottleneck in the Fury line. The reason is here is the expected theoretical ideal:

And here is the practical:

247/269(%92) is not that far off unlike the Polaris chip which is 65% of optimal. That chip should have been 48 rops.mad.gif
The Machine
(14 items)
 
Nexus 7 2013
(11 items)
 
 
CPUMotherboardGraphicsRAM
A10 6800K Asus F2A85-V MSI 6870 Hawx, VTX3D 5770, AMD HD6950(RIP), Sap... G.skill Ripjaws PC12800 6-8-6-24 
Hard DriveOptical DriveOSMonitor
Seagate 7200.5 1TB NEC 3540 Dvd-Rom Windows 7 x32 Ultimate Samsung P2350 23" 1080p 
PowerCaseMouseAudio
Seasonic s12-600w CoolerMaster Centurion 5 Logitech G600 Auzen X-Fi Raider 
CPUMotherboardGraphicsRAM
Quad Krait 300 at 1.5Ghz Qualcomm APQ8064-1AA SOC Adreno 320 at 400mhz 2GB DDR3L-1600 
Hard DriveOSMonitorKeyboard
32GB Internal NAND Android 5.0 7" 1920X1200 103% sRGB & 572 cd/m2 LTPS IPS Microsoft Wedge Mobile Keyboard 
PowerAudio
3950mAh/15.01mAh Battery Stereo Speakers 
  hide details  
Reply
The Machine
(14 items)
 
Nexus 7 2013
(11 items)
 
 
CPUMotherboardGraphicsRAM
A10 6800K Asus F2A85-V MSI 6870 Hawx, VTX3D 5770, AMD HD6950(RIP), Sap... G.skill Ripjaws PC12800 6-8-6-24 
Hard DriveOptical DriveOSMonitor
Seagate 7200.5 1TB NEC 3540 Dvd-Rom Windows 7 x32 Ultimate Samsung P2350 23" 1080p 
PowerCaseMouseAudio
Seasonic s12-600w CoolerMaster Centurion 5 Logitech G600 Auzen X-Fi Raider 
CPUMotherboardGraphicsRAM
Quad Krait 300 at 1.5Ghz Qualcomm APQ8064-1AA SOC Adreno 320 at 400mhz 2GB DDR3L-1600 
Hard DriveOSMonitorKeyboard
32GB Internal NAND Android 5.0 7" 1920X1200 103% sRGB & 572 cd/m2 LTPS IPS Microsoft Wedge Mobile Keyboard 
PowerAudio
3950mAh/15.01mAh Battery Stereo Speakers 
  hide details  
Reply
post #304 of 305
Quote:
Originally Posted by CrazyElf View Post

Factoring in the reduced OC headroom in the GTX 1080, it's only about 15-20% faster in games compared to the 980Ti, but still selling for a comparable price as to the 980Ti's release.

In that regard, this is disappointing, or Nvidia taking advantage of its marketshare to milk the market.

We can expect that the Titan Pascal will be perhaps 30-35% faster, but at double the cost. I guess the way to describe it is that the inter-generational leap in performance is disappointing considering that this is a die shrink and the fact that the price:performance is underwhelming.
Echoa already covered it, but basically, there is not enough to feed the 4096 shaders. We would expect that a 4096 shader part at a comparable clock would have about 45% more performance than the 290X with 2816 shaders. In practice, we only got about 20%, so something is bottlenecking the Fury X.


  1. Needs more RBEs so that it can have more color and Z/Stencil ROPs (probably 50% more at least, and maybe even double)
  2. L2 cache was bottlenecking as well
  3. Triangle output was not improved from the 290X, which was already lagging behind the Nvidia counterparts
  4. We also did not see a massive leap in memory bandwidth with HBM1.

They seem to have mitigated 3 with Polaris. The optimal layout though will have double the triangle performance with the 4096 part though I suspect and triple with the 6144 part.

For Vega, they will need a larger L2 cache (4MB sounds about right) and much faster. They will also need to be able to feed all of the shaders.

For memory bandwidth I'm not overly concerned as HBM2 is going to offer 1 TB/s basically.

Intel came out with Nehalem to combat AMD once AMD had been vanquished they copuldn't be bothered. Sandybridge was good. Everything else since then has been disappoint per gain per clock...Happens in all industries...
post #305 of 305
rather than looking at how many SP:TMU:ROP AMD's GPUs has, the issue should be looked upon elsewhere.
take Hawaii Pro for example, 2560:160:64 @ 1.00Ghz | 384GB/s.
it is identical to GTX1080, 2560:160:64 @ 1.70Ghz | 320GB/s.
the difference is simply a 70% clock speed advantage, yet GTX1080 is literally 100% faster than Hawaii Pro.

furthermore lets compare polaris10 to hawaii pro and hawaii xt.
polaris10, 2304:144:32 @ 1.26Ghz | 256GB/s.
hawaii pro, 2560:160:64 @ 1.00Ghz | 384GB/s.
hawaii xt, 2816:176:64 @ 1.05Ghz | 384GB/s.
while looking at real-world results, polaris 10 could be seen swinging between hawaii pro and hawaii xt depending on workload.

this lets us see that the current ratios between SP:TMU:ROP aren't that much of an issue.
rather theres something else that had gone wrong.
Edited by epic1337 - 8/2/16 at 6:11am
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Hardware News
Overclock.net › Forums › Industry News › Hardware News › [AdoredTV] Pascal vs Maxwell at same clockspeeds, same FLOPS (1080 vs 980 Ti)