Overclock.net › Forums › Industry News › Hardware News › [CB] AMD Radeon Polaris architecture performance
New Posts  All Forums:Forum Nav:

[CB] AMD Radeon Polaris architecture performance - Page 6

post #51 of 175
Quote:
Originally Posted by Myst-san View Post

AMD drivers have bigger CPU overhead then Nvidia. Meaning the CPU has to do more work to saturate the GPU wheres for Nvidia it is less noticeable. Big part of this problem is that on DX11 only 1 Core can talk to the GPU at a time. DX12 and Vulcan can have more then 1 Core feeding the GPU. This is why there is performance gain on AMD

That's not exactly what is happening...

What is happening is that the AMD's DX11 drivers are single threaded therefore they interact with only a single CPU thread when it comes to feeding commands to the GPU. nVIDIA's DX11 drivers are multi-threaded so you have multiple threads being used to feed the GPU.

So the issue is the AMDs drivers hit the primary CPU thread harder than nVIDIA's.

AMD wanted to push the onus on developers. They wanted developers to keep the primary CPU thread free in order for it to serve as a GPU feeding mechanism while the other CPU threads could be used for AI/Physics/Complex Simulations etc.

Developer's did not budge so AMD lost out during the DX11 days. So AMD came back with their next idea... a multi-threaded API (Mantle) which pretty much pushed the next gen APIs into the direction AMD wanted to head towards.

So DX12/Vulkan is looking pretty good on AMD hardware so far while DX11 is still not capable of fully driving AMDs GPUs due to single threaded GPU drivers.
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
post #52 of 175
Quote:
Originally Posted by Mahigan View Post

That's not an 18% TFlops increase over the years... that's an 18% Compute efficiency boost over the last 4 years.

Why?

Because they tested all of the GPUs at the same theoretical TFlops rating. So what we're seeing are architectural improvements and not additional theoretical TFlops.
check again sir, i wrote Perf / teraflops, meaning more performance at same theoretical TFlops rating. and its only 18% increase. compared that to 35% increase on same node kepler->maxwell. it's pathetic and similar to bulldozer's moar cores idea. they can't just keep adding more CUs, that requires die space. and they aren't able to increase frequency either. power efficiency, i don't really care so doesn't matter to me, but all those things are bad and going to hit amd hard unless they increase IPC/shader efficiency. IMO ofc. smile.gif
Edited by EightDee8D - 8/15/16 at 5:48pm
post #53 of 175
Quote:
Originally Posted by EightDee8D View Post

check again sir, i wrote Perf / teraflops, meaning more performance at same theoretical TFlops rating. and its only 18% increase. compared that to 35% increase on same node kepler->maxwell. it's pathetic and similar to bulldozer's moar cores idea. they can't just keep adding more CUs, that requires die space. and they aren't able to increase frequency either. power efficiency, i don't really care so doesn't matter to me, but all those things are bad and going to hit amd hard unless they increase IPC/shader efficiency. IMO ofc. smile.gif

Well here is the thing... nVIDIA obtained their performance boost by delivering an architecture that was more GCN-code friendly in Maxwell over Kepler. In older titles, prior to this new console-optimization-oriented world we now live in, Kepler gave both GCN and can give Maxwell a run for their money. As newer titles released, people began to notice that Kepler was, by one method or another, acting crippled in newer games. These newer titles were console ports for the most part (GCN-optimized titles).

So nVIDIA had more to gain by delivering a more GCN-like architecture in terms of a boost than GCN has to gain since it is already GCN.

An 18% boost is pretty good tbh.
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
post #54 of 175
the 18% boost is pretty solid bump, what fall short is the clock speed, having only 1266Mhz vs the 28nm 1050/1100MHz GCN is pretty disappointing.
Gaming Rig
(10 items)
 
Work/Web Rig
(11 items)
 
Web Rig
(8 items)
 
CPUMotherboardGraphicsRAM
i5-2500K @4GHz stock volt OC Asus P8Z77-V LK Palit GTX 750Ti StormX Dual Corsair Vengence 1600MHz_CL8 4GBx4 
Hard DriveHard DriveCoolingOS
Seagate 80GB 7200rpm Maxtor 250GB 7200rpm Cooler Master Hyper 212+ Windows 10 Pro 
MonitorCase
BenQ XL2720z @ 1920x1080 144Hz Silverstone Ps06 
CPUMotherboardGraphicsRAM
Core 2 Quad Q9650 Gigabyte EP41-UD3L Asus 1GB Radeon 7790 DirectCU OC Corsair Gaming Ram 2GBx2 DDR2-800 
Hard DriveHard DriveCoolingOS
Samsung 80GB 7200rpm Western Digital 200GB 7200rpm Cooler Master Hyper TX3 Windows 7 pro 
MonitorPowerCase
Samsung 226bw 1680x1050 Acbel iPower 510w (450w PSU) LianLi PS05 
CPUMotherboardGraphicsRAM
AMD E350 MSI E350IA-E45 AMD Radeon 6310 4GB Kingston DDR3 1333 
Hard DriveOSMonitorPower
Western Digital 80GB Linux Slackware 14.1 Samsung 22" 1680x1050 LCD Acbel 300w PSU 
  hide details  
Reply
Gaming Rig
(10 items)
 
Work/Web Rig
(11 items)
 
Web Rig
(8 items)
 
CPUMotherboardGraphicsRAM
i5-2500K @4GHz stock volt OC Asus P8Z77-V LK Palit GTX 750Ti StormX Dual Corsair Vengence 1600MHz_CL8 4GBx4 
Hard DriveHard DriveCoolingOS
Seagate 80GB 7200rpm Maxtor 250GB 7200rpm Cooler Master Hyper 212+ Windows 10 Pro 
MonitorCase
BenQ XL2720z @ 1920x1080 144Hz Silverstone Ps06 
CPUMotherboardGraphicsRAM
Core 2 Quad Q9650 Gigabyte EP41-UD3L Asus 1GB Radeon 7790 DirectCU OC Corsair Gaming Ram 2GBx2 DDR2-800 
Hard DriveHard DriveCoolingOS
Samsung 80GB 7200rpm Western Digital 200GB 7200rpm Cooler Master Hyper TX3 Windows 7 pro 
MonitorPowerCase
Samsung 226bw 1680x1050 Acbel iPower 510w (450w PSU) LianLi PS05 
CPUMotherboardGraphicsRAM
AMD E350 MSI E350IA-E45 AMD Radeon 6310 4GB Kingston DDR3 1333 
Hard DriveOSMonitorPower
Western Digital 80GB Linux Slackware 14.1 Samsung 22" 1680x1050 LCD Acbel 300w PSU 
  hide details  
Reply
post #55 of 175
Quote:
Originally Posted by Mahigan View Post

Well here is the thing... nVIDIA obtained their performance boost by delivering an architecture that was more GCN-code friendly in Maxwell over Kepler. In older titles, prior to this new console-optimization-oriented world we now live in, Kepler gave both GCN and can give Maxwell a run for their money. As newer titles released, people began to notice that Kepler was, by one method or another, acting crippled in newer games. These newer titles were console ports for the most part (GCN-optimized titles).

So nVIDIA had more to gain by delivering a more GCN-like architecture in terms of a boost than GCN has to gain since it is already GCN.

An 18% boost is pretty good tbh.

but compare perf/tflop on 390 vs 1080 by running at same core frequency, 1080 is still ahead even in latest apis. (390 has same cc/rop/bw as 1080). once nvidia moves to 64cc/sm architecture they will again increase that lead. that's what i'm saying. GCN's perf/tflops is way behind nvidia, and it lags even more once we consider that nvidia is still using 128cc/sm and has more frequency.

18% boost is good if it was GCN1 ->GCN2 on same node. but it has took 4 generation and a node shrink to achieve that. which isn't good. mad.gif

i like amd, but i can't deny the fact they are falling behind. i guess low R&D budget is showing it's consequences.
post #56 of 175
Quote:
Originally Posted by Mahigan View Post

Well here is the thing... nVIDIA obtained their performance boost by delivering an architecture that was more GCN-code friendly in Maxwell over Kepler. In older titles, prior to this new console-optimization-oriented world we now live in, Kepler gave both GCN and can give Maxwell a run for their money. As newer titles released, people began to notice that Kepler was, by one method or another, acting crippled in newer games. These newer titles were console ports for the most part (GCN-optimized titles).

So nVIDIA had more to gain by delivering a more GCN-like architecture in terms of a boost than GCN has to gain since it is already GCN.

An 18% boost is pretty good tbh.
maxwell wasnt using more than 64Cores per SM? becuase then it should Pascal which is more GCN-Like

then Maxwell wasnt so similar to GCN and it didnt get crippled instead they were in some games around GCN performance but kepler was slower, especially after the driver of maxwell were "improved" because 780Ti was close to the 980, Titan to 970 , and for users there was no reason to sidegrade,even that their DX11 driver had similar level of performance
Edited by PontiacGTX - 8/15/16 at 6:51pm
post #57 of 175
Quote:
Originally Posted by EightDee8D View Post

18% perf/tflop increase in 4 years. lol another Bulldozer in waiting. all those features and apis won't save you when you hit a die size limit and cannot add moar cores. and they aren't able to increase frequency either. time to ditch this architecture and start from fresh. only keep good parts from GCN.

the only way to get a dramatic IPC increase is to change architecture, that is abandoning GCN.

on the other hand, things like improving clock ceiling, power density, core density, and cost per die, all of these are considered progress.
Edited by epic1337 - 8/15/16 at 7:36pm
post #58 of 175
I'm pretty sure that AMD designed their GCN architecture with scalability in mind; as the workload changes, so can the way the architecture processes data. This is why we have seen GCN continue to increase in efficiency as time has progressed. AMD is examining the way games are coded and adapting the way GCN processes them to be the most efficient. Nvidia, unfortunately, took a much more linear approach with their cards. They design the architecture, and then they give devs tools (gameworks???) to make the most out of their architecture.

The problem with Nvidia's approach is that it only really helps on the current architecture, and once Nvidia has moved on to the next, the previous architecture doesn't get a whole lot of optimization. AMD, on the other hand, sticks with the developers, and it's much easier to optimize for a game when the general architecture is exactly the same as it was 4-5 years ago.


tl;dr.

Nvidia's architectures are very lean; most of the fat cut out (very power-efficient)
AMD's architectures are designed to scale with time (lots of extra compute stuff, extra headroom in the future at the cost of efficiency)
Everest - Intel
(19 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 4790k Gigabyte Z97X Gaming 7 MSI Geforce GTX 1080 Ti Gaming X 16GB (2x8) Patriot Viper 1866Mhz  
Hard DriveHard DriveOptical DriveCooling
Seagate 3TB, WD 500GB HDD, WD 640GB HD Samsung 850 EVO 512GB Samsung DVD-Burner Corsair H110 w/ Dual Aerocool DS 140mm fans 
OSMonitorMonitorKeyboard
Windows 10 Pro Dell S2716DG (1440p, 144hz Gsync) AOC U3477 PQU (3440x1440 IPS) Logitech G810 Orion Spectrum 
PowerCaseMouseMouse Pad
Evga SuperNOVA 750 G2 NZXT Phantom 530 Black Logitech G502 Proteus Core Corsair MM400 
AudioAudioAudio
Creative Sound Blaster E5 DAC/AMP Sennheiser HD 598 Headphones HyperX Cloud Headset 
  hide details  
Reply
Everest - Intel
(19 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 4790k Gigabyte Z97X Gaming 7 MSI Geforce GTX 1080 Ti Gaming X 16GB (2x8) Patriot Viper 1866Mhz  
Hard DriveHard DriveOptical DriveCooling
Seagate 3TB, WD 500GB HDD, WD 640GB HD Samsung 850 EVO 512GB Samsung DVD-Burner Corsair H110 w/ Dual Aerocool DS 140mm fans 
OSMonitorMonitorKeyboard
Windows 10 Pro Dell S2716DG (1440p, 144hz Gsync) AOC U3477 PQU (3440x1440 IPS) Logitech G810 Orion Spectrum 
PowerCaseMouseMouse Pad
Evga SuperNOVA 750 G2 NZXT Phantom 530 Black Logitech G502 Proteus Core Corsair MM400 
AudioAudioAudio
Creative Sound Blaster E5 DAC/AMP Sennheiser HD 598 Headphones HyperX Cloud Headset 
  hide details  
Reply
post #59 of 175
Quote:
Originally Posted by Mad Pistol View Post

I'm pretty sure that AMD designed their GCN architecture with scalability in mind; as the workload changes, so can the way the architecture processes data. This is why we have seen GCN continue to increase in efficiency as time has progressed.

More like AMD's control of the console market has shifted developers to writing GCN-friendly game engines to take advantage of the compute-heavy architecture. This was the obvious consequence of almost every game on the PC being a port and the realities of developers not wanting to write two engines for one game.

Kepler got wrecked since it can't do mixed mode compute/graphics. Maxwell is dependent on the driver being able to correctly partition the GPU, which is never going to be optimal.

It's not a surprise then that Pascal does much better in Far Cry Primal and Hitman compared to Maxwell (and Kepler's waaaaay down there).

Is GCN an endless tree of low hanging fruit for console optimizations? Probably not. Will Maxwell take a dive in a few months? Maybe tongue.gif. While the consoles remain the majority market and run on the GCN 1.X (or is it GCN 1-4 now?) family of GPUs, I expect the games to be kind to AMD for the most part.
post #60 of 175
Quote:
Originally Posted by Mahigan View Post

That's not exactly what is happening...

What is happening is that the AMD's DX11 drivers are single threaded therefore they interact with only a single CPU thread when it comes to feeding commands to the GPU. nVIDIA's DX11 drivers are multi-threaded so you have multiple threads being used to feed the GPU.

I remember that there were some benches about the difference in Nvidia driver when the become multithreaded. There wasn't big difference. Also isn't Nvida architecture more good at serial task, compared to the GCN which is multithreaded at the beginning and much harder to feed it with serial tasks that are in the nature of DX11.

What i know for Multi treading DX11 is just moving the command lists to be created on different cores thus freeing the main rendering thread for dispatching and also has to be supported by the game.
Edited by Myst-san - 8/16/16 at 2:19am
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Hardware News
Overclock.net › Forums › Industry News › Hardware News › [CB] AMD Radeon Polaris architecture performance