Overclock.net › Forums › Industry News › Video Game News › [WCCF] HITMAN To Feature Best Implementation Of DX12 Async Compute Yet, Says AMD
New Posts  All Forums:Forum Nav:

[WCCF] HITMAN To Feature Best Implementation Of DX12 Async Compute Yet, Says AMD - Page 42

post #411 of 799
Quote:
Originally Posted by Themisseble View Post

Maxwell cannot run async shaders. Who cares? That NVIDIA lied? That few games will run better on AMD Fury X?... nobody even cared when GTX 970 was able to use whole 4Gb VRAM. Nobody!
People still recommend GTX 970 over R9 390 or GTX 980 over Fury/NANO and people are actually buying these cards. So who will care? Nobody! (Oh, I forgot few people on forum will care about it.. and thats it)

People are still buying NVIDIA over AMD just, because of PHYSX! Well Just cause 3 runs PHSYX on CPU only!


Thats a true story.
People don't buy Nvidia because of physx lol. That is just a silly statement.
People buy Nvidia because they are simply the bigger brand. When people come to me i will usually recommend an nvidia card. Why? Because i have experience with those and in all those year it has never been bad. Then those people do the same and it is a nice chain. Combine that with a ton of games that nvidia groups up with and yea.
Brand recognition is important and 80% of the market shows it.
post #412 of 799
Quote:
Originally Posted by Assirra View Post

People don't buy Nvidia because of physx lol. That is just a silly statement.
People buy Nvidia because they are simply the bigger brand. When people come to me i will usually recommend an nvidia card. Why? Because i have experience with those and in all those year it has never been bad. Then those people do the same and it is a nice chain. Combine that with a ton of games that nvidia groups up with and yea.
Brand recognition is important and 80% of the market shows it.
Instead of being part of the problem you should be trying to solve it by recommending cards based on what's actually better and not the name on the box.

The 390 is better than the 970. If more people recommended it over the 970 then we wouldn't have a bunch of people complaining their cards stutter on max settings because they only have 3.5GB of vram rolleyes.gif
i7k
(21 items)
 
  
CPUMotherboardGraphicsRAM
i7 4770k Asrock z87 Extreme6 Gigabyte 1080 Ti Aorus Xtreme 16GB G.Skill Ripjaws X 1866 
Hard DriveHard DriveHard DriveHard Drive
OCZ Vertex 3 120GB 500GB Samsung 850 EVO 500GB Samsung 850 EVO 2TB Seagate Barracuda LP 
Hard DriveCoolingOSMonitor
2TB Seagate Barracuda Deepcool Assassin Windows 10 Pro 64-bit Samsung KS8000 
MonitorMonitorMonitorKeyboard
QNIX QX2710 Evolution ll ASUS VG248QE SAMSUNG 2433BW Logitech G710+ 
PowerCaseMouseAudio
PC Power & Cooling Silencer Mk III Series 850W Corsair 750D Logitech G600 Creative Sound Blaster Z 
Audio
Sennheiser HD 380 Pro 
  hide details  
Reply
i7k
(21 items)
 
  
CPUMotherboardGraphicsRAM
i7 4770k Asrock z87 Extreme6 Gigabyte 1080 Ti Aorus Xtreme 16GB G.Skill Ripjaws X 1866 
Hard DriveHard DriveHard DriveHard Drive
OCZ Vertex 3 120GB 500GB Samsung 850 EVO 500GB Samsung 850 EVO 2TB Seagate Barracuda LP 
Hard DriveCoolingOSMonitor
2TB Seagate Barracuda Deepcool Assassin Windows 10 Pro 64-bit Samsung KS8000 
MonitorMonitorMonitorKeyboard
QNIX QX2710 Evolution ll ASUS VG248QE SAMSUNG 2433BW Logitech G710+ 
PowerCaseMouseAudio
PC Power & Cooling Silencer Mk III Series 850W Corsair 750D Logitech G600 Creative Sound Blaster Z 
Audio
Sennheiser HD 380 Pro 
  hide details  
Reply
post #413 of 799
Quote:
Originally Posted by Dargonplay View Post

Well, the GTX 960 still match and surpass the 780 Kepler card, which is absolutely bollocks, also the 970 in several games do hammers the 780Ti and GTX Titan, that shouldn't happen.

Ironically, Project Cars is one of the most Gameworks intensive games out there, check how the GTX 960 Match or beat the GTX 780 while the 970 seems to look down the best of Kepler.

Project Cars is certainly a special case. And yes, initially it was terrible and Roy Taylor had some special accusations for Slightly Mad Studios on Twitter. Magically a couple days later the post was removed and Roy and Slightly Mad were all best friends again with Roy promising they were going to work together to make it better.

http://www.hardocp.com/article/2016/01/25/xfx_r9_380_dd_black_edition_oc_4gb_review/5

So now here we have the 380 beating the 960 in Project Cars in a 1080p same exact settings scenario. What's important about this is that they also compared GTA V and the 380 beats the 960 in that game as well by the same margin.

So who's at fault here? It appears the promise Roy made about AMD working with SMS paid off. AMD worked with Rockstar on GTA V as well. The fact is we don't know for sure who's doing what here. Is the developer or AMD telling us fibs. I don't have proof to condemn either one. The only thing we can take from this is that when AMD and the developer do work together, things get fixed.
Quote:
Originally Posted by Dargonplay View Post


Also in Fallout 4 the trend continues.

A GTX 960 beating a GTX 780 and a GTX 970 hammering the best of Kepler including the Titan, tell me that this isn't wrong.

Fair point. It is wrong. But let's dissect this a little further by looking at the other settings Tom's ran in that test.

You're showing us God Rays at Ultra. We already know Maxwell is better than Kepler with tesselation which is how God Rays work.

Tom's dropped from Ultra to High and the gtx 780 went from 1 fps behind the 960 to 14 fps ahead.

I'm not really sure I trust Tom's numbers there though. One thing that stuck out to me was the Titan Black and the 780 Ti Windforce numbers. They are the same card with the Titan clocked 31 MHz higher. At high the Titan is 8 fps higher and at ultra, the Titan is only 2 fps higher. That doesn't make any sense.

If you look at the TPU review for the 980 Ti Matrix, the 780 Ti is eating the 970 for breakfast in fps.
Quote:
Originally Posted by Dargonplay View Post


The same seems to happen with The Witcher 3 with the 970 Hammering the Titan and the rest of Kepler, with the GTX 960 slightly behind the 780.

a GTX 970 beating the Titan for almost 50% more performance, that's disgusting, let alone the fact that its also beating the 295x2 for some reason, the GTX 960 is 3 Frames away from the Titan.

Let's look at some other conclusions for the Witcher 3 with patches and driver updates.

http://www.techpowerup.com/reviews/ASUS/R9_380X_Strix/18.html

The 960 ends up behind the 780 at all resolutions and the 970 is just slightly ahead of 780 Ti at all resolutions in that one game.

At release, the 980 was slightly faster than 780 Ti and the 970 was slightly slower. Since then the 980 has jumped a little further ahead of the 780 Ti due to a combination of driver maturity and tess rendering advantages. That will also mean the 970 is going to inch closer to the 780 Ti as well.

http://www.techpowerup.com/reviews/ASUS/GTX_980_Ti_Matrix/23.html

So in TPU's latest test with a wide range of games, the 970 ties the 780 Ti at 1080p and loses at 1440p and 4K.

So when we determine what is beating what, it's really best to have a bunch of games in the loop to make that decision. Using 1 or 2 games to make a conclusion will often show things that may be true at that sole point in time but fixed later.
Upstairs Rig
(11 items)
 
  
CPUMotherboardGraphicsRAM
4770k Asus Maximus VI Hero evga 1080 Ti with Hybrid mod Corsair Vengeance Pro 2133 mhz 
Hard DriveHard DriveCoolingOS
Samsung 850 EVO 500gb WD Caviar Black Corsair h100i GTX Windows 8.1 64bit 
MonitorPowerCase
xb280hk EVGA Supernova 1000 G2 Corsair Carbide Air 540 
  hide details  
Reply
Upstairs Rig
(11 items)
 
  
CPUMotherboardGraphicsRAM
4770k Asus Maximus VI Hero evga 1080 Ti with Hybrid mod Corsair Vengeance Pro 2133 mhz 
Hard DriveHard DriveCoolingOS
Samsung 850 EVO 500gb WD Caviar Black Corsair h100i GTX Windows 8.1 64bit 
MonitorPowerCase
xb280hk EVGA Supernova 1000 G2 Corsair Carbide Air 540 
  hide details  
Reply
post #414 of 799
Quote:
Originally Posted by mcg75 View Post

Project Cars is certainly a special case. And yes, initially it was terrible and Roy Taylor had some special accusations for Slightly Mad Studios on Twitter. Magically a couple days later the post was removed and Roy and Slightly Mad were all best friends again with Roy promising they were going to work together to make it better.

http://www.hardocp.com/article/2016/01/25/xfx_r9_380_dd_black_edition_oc_4gb_review/5

So now here we have the 380 beating the 960 in Project Cars in a 1080p same exact settings scenario. What's important about this is that they also compared GTA V and the 380 beats the 960 in that game as well by the same margin.

So who's at fault here? It appears the promise Roy made about AMD working with SMS paid off. AMD worked with Rockstar on GTA V as well. The fact is we don't know for sure who's doing what here. Is the developer or AMD telling us fibs. I don't have proof to condemn either one. The only thing we can take from this is that when AMD and the developer do work together, things get fixed.
Fair point. It is wrong. But let's dissect this a little further by looking at the other settings Tom's ran in that test.

You're showing us God Rays at Ultra. We already know Maxwell is better than Kepler with tesselation which is how God Rays work.

Tom's dropped from Ultra to High and the gtx 780 went from 1 fps behind the 960 to 14 fps ahead.

I'm not really sure I trust Tom's numbers there though. One thing that stuck out to me was the Titan Black and the 780 Ti Windforce numbers. They are the same card with the Titan clocked 31 MHz higher. At high the Titan is 8 fps higher and at ultra, the Titan is only 2 fps higher. That doesn't make any sense.

If you look at the TPU review for the 980 Ti Matrix, the 780 Ti is eating the 970 for breakfast in fps.
Let's look at some other conclusions for the Witcher 3 with patches and driver updates.

http://www.techpowerup.com/reviews/ASUS/R9_380X_Strix/18.html

The 960 ends up behind the 780 at all resolutions and the 970 is just slightly ahead of 780 Ti at all resolutions in that one game.

At release, the 980 was slightly faster than 780 Ti and the 970 was slightly slower. Since then the 980 has jumped a little further ahead of the 780 Ti due to a combination of driver maturity and tess rendering advantages. That will also mean the 970 is going to inch closer to the 780 Ti as well.

http://www.techpowerup.com/reviews/ASUS/GTX_980_Ti_Matrix/23.html

So in TPU's latest test with a wide range of games, the 970 ties the 780 Ti at 1080p and loses at 1440p and 4K.

So when we determine what is beating what, it's really best to have a bunch of games in the loop to make that decision. Using 1 or 2 games to make a conclusion will often show things that may be true at that sole point in time but fixed later.

Tessellation isn't that much better going from Kepler to Maxwell:



What is optimized is shader occupancy and efficiency. Each SMM under Maxwell is far more powerful than each SMX under Kepler. Maxwell v2 contains the fp32 compute optimizations that Maxwell 20nm was supposed to have (now split into two sku's Maxwell V2 28nm and Pascal 16nm Finfet+). These result in 35% more fp32 performance per core. So while a GTX 780 has a theoretical performance of 4 Tflops, a Maxwell based GPU only needs 2.6 Tflops to match it compute wise. A GTX 960 pulls around 2.4 Tlops. Pretty close. How?
Quote:
The end result is that in an SMX the 4 warp schedulers would share most of their execution resources and work out which warp was on which execution resource for any given cycle. But on an SMM, the warp schedulers are removed from each other and given complete dominion over a far smaller collection of execution resources. No longer do warp schedulers have to share FP32 CUDA cores, special function units, or load/store units, as each of those is replicated across each partition. Only texture units and FP64 CUDA cores are shared.
(Hint: Pascal will push this further and change the way fp64 CUDA cores are shared thus boosting fp64 performance).
Quote:
Moving on, along with the SMM layout changes NVIDIA has also made a number of small tweaks to improve the IPC of the GPU. The scheduler has been rewritten to avoid stalls and otherwise behave more intelligently. Furthermore by achieving higher utilization of their existing hardware, NVIDIA doesn’t need as many functional units to hit their desired performance targets, which in turn saves on space and ultimately power consumption.

Add the various compression algorythms found on Maxwell V2 and you have the technology to make a 900 series low-mid range card compete with a high-end 780 series card under certain conditions.

GeForce GTX 780
Pixel Fill rate: 43
Texel Fill rate: 173
Memory Bandwidth: 288 GB/s

GeForce GTX 960
Pixel Fill rate: 38
Texel Fill rate: 75
Memory Bandwidth: 112 GB/s

Memory bandwidth?
Quote:
While on the subject of performance efficiency, NVIDIA has also been working on memory efficiency too. From a performance perspective GDDR5 is very powerful, however it’s also very power hungry, especially in comparison to DDR3. With GM107 in particular being a 128-bit design that would need to compete with the likes of the 192-bit GK106, NVIDIA has massively increased the amount of L2 cache they use, from 256KB in GK107 to 2MB on GM107. This reduces the amount of traffic that needs to cross the memory bus, reducing both the power spent on the memory bus and the need for a larger memory bus altogether.

Increasing the amount of cache always represents an interesting tradeoff since cache is something of a known quantity and is rather dense, but it’s only useful if there are memory stalls or other memory operations that it can cover. Consequently we often see cache implemented in relation to whether there are any other optimizations available. In some cases it makes more sense to use the transistors to build more functional units, and in other cases it makes sense to build the cache. After staying relatively stagnant on their cache sizes for so long, it looks like the balance has finally shifted and the cache increase makes the most sense for NVIDIA.

I think that Tessellation culling (saving on cache and bandwidth) coupled with fp32 performance optimizations, larger L2 cache of 2MB and pixel fill rate + pixel compression algorythms explains the differences between Kepler, which lacks these, and Maxwell V2.

Oh and part of the gameworks agreement is that the developer needs the consent of nvidia to work with AMD and the consent on what can be communicated by the dev to AMD (gameworks black box). This consent is often given late or too close to launch. Forcing AMD to miss the launch driver optimizations and end up looking bad all over the media. This image stays with consumers affecting AMD graphics sales and the Radeon brand as a whole. AMD often fix problems after the launch though but the damage is already done.

Someone else already communicated this and it is the truth.
Edited by Mahigan - 2/16/16 at 6:06pm
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
post #415 of 799
@mcg75:

As you guessed, when he said 780 Ti he likely meant the 780. If so there are two more examples I can remember off the top of my head:

Call of Duty: Advanced Warfare


Far Cry 4


None of them have the 960 "stomping" the 780, but it does come dangerously close (within 10%) of the 780, when usually the 780 is at least a good 20-25% ahead of the 960.
post #416 of 799
Quote:
Originally Posted by Yvese View Post

Instead of being part of the problem you should be trying to solve it by recommending cards based on what's actually better and not the name on the box.

The 390 is better than the 970. If more people recommended it over the 970 then we wouldn't have a bunch of people complaining their cards stutter on max settings because they only have 3.5GB of vram rolleyes.gif
Part of the problem?
What problem exactly?

I really don't find this too hard to understand.
For myself i do extensive research and see what is actually best. If troubles come from going a new route i take them with it. At the time of purchase this card was the best for my liking.

For other people i tend to recommend the brands i use myself and have a good opinion on. I am not going to throw somebody else in a jungle of god knows what when said person knows less about this whole subject then me. Otherwise i didn't need to recommend a card in the first place. Imagine if i recommend someone else a different brand and said person will have nothing but troubles, then what? I ain't going to risk someone's else his hardware for that.

If that makes me "part of the problem" so be it.
post #417 of 799
Quote:
Originally Posted by Mahigan View Post

Tessellation isn't that much better going from Kepler to Maxwell:



What is optimized is shader occupancy and efficiency. Each SMM under Maxwell is far more powerful than each SMX under Kepler. Maxwell v2 contains the fp32 compute optimizations that Maxwell 20nm was supposed to have (now split into two sku's Maxwell V2 28nm and Pascal 16nm Finfet+). These result in 35% more fp32 performance per core. So while a GTX 780 has a theoretical performance of 4 Tflops, a Maxwell based GPU only needs 2.6 Tflops to match it compute wise. A GTX 960 pulls around 2.4 Tlops. Pretty close.

Add the various compression algorythms found on Maxwell V2 and you have the technology to make a 900 series low-mid range card compete with a high-end 780 series card under certain conditions.

I think that Tessellation culling (saving on cache and bandwidth) coupled with fp32 performance optimizations explains the differences between Kepler, which lacks these, and Maxwell V2.

Mahigan, your understanding of how these things work puts the vast majority of us to shame. Thank you for the insight.

Big Maxwell vs big Kepler still has about a 40% improvement in that test though. Just like anything these companies claim, the end result still ends up being less than what marketing states it to be. 40% certainly isn't the 3x claimed.
Quote:
Originally Posted by Mahigan View Post

Oh and part of the gameworks agreement is that the developer needs the consent of nvidia to work with AMD and the consent on what can be communicated by the dev to AMD (gameworks black box). This consent is often given late or too close to launch. Forcing AMD to miss the launch driver optimizations and end up looking bad all over the media. This image stays with consumers affecting AMD graphics sales and the Radeon brand as a whole. AMD often fix problems after the launch though but the damage is already done.

Someone else already communicated this and it is the truth.

This is for actual Gameworks labeled games correct?

For a title such as GTA V, AMD worked with the developer despite the presence of Gameworks features. But it wasn't a Gameworks game. Same goes for Fallout 4.

Is that why titles with Gameworks features but don't fall under the Gameworks banner seem to allow AMD to optimize quicker?
Upstairs Rig
(11 items)
 
  
CPUMotherboardGraphicsRAM
4770k Asus Maximus VI Hero evga 1080 Ti with Hybrid mod Corsair Vengeance Pro 2133 mhz 
Hard DriveHard DriveCoolingOS
Samsung 850 EVO 500gb WD Caviar Black Corsair h100i GTX Windows 8.1 64bit 
MonitorPowerCase
xb280hk EVGA Supernova 1000 G2 Corsair Carbide Air 540 
  hide details  
Reply
Upstairs Rig
(11 items)
 
  
CPUMotherboardGraphicsRAM
4770k Asus Maximus VI Hero evga 1080 Ti with Hybrid mod Corsair Vengeance Pro 2133 mhz 
Hard DriveHard DriveCoolingOS
Samsung 850 EVO 500gb WD Caviar Black Corsair h100i GTX Windows 8.1 64bit 
MonitorPowerCase
xb280hk EVGA Supernova 1000 G2 Corsair Carbide Air 540 
  hide details  
Reply
post #418 of 799
Quote:
Originally Posted by mcg75 View Post

Mahigan, your understanding of how these things work puts the vast majority of us to shame. Thank you for the insight.

Big Maxwell vs big Kepler still has about a 40% improvement in that test though. Just like anything these companies claim, the end result still ends up being less than what marketing states it to be. 40% certainly isn't the 3x claimed.
This is for actual Gameworks labeled games correct?

For a title such as GTA V, AMD worked with the developer despite the presence of Gameworks features. But it wasn't a Gameworks game. Same goes for Fallout 4.

Is that why titles with Gameworks features but don't fall under the Gameworks banner seem to allow AMD to optimize quicker?

NVIDIA marketing tends to multiply various architectural performance speedups in a linear fashion.
So say memory performance has increased 3 fold and fp64 2 fold they will claim that the new architecture is a 6 times faster.

It makes no sense but that's marketing.

As for your comments on Gameworks, it makes sense though neither AMD, Developers or nvidia have commented on thay aspect. The mud slinging has been focused towards the Gameworks banner. Probably because the title is running, from start to finish, nvidia IP. Other titles, with game works but not under the banner, have the devs own in house code spiced up with gameworks effects. AMD, and the dev, could work on optimizing the devs own in house code without an explicit permission from nvidia.
Edited by Mahigan - 2/16/16 at 6:38pm
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
Kn0wledge
(20 items)
 
Pati3nce
(14 items)
 
Wisd0m
(10 items)
 
Reply
post #419 of 799
Quote:
Originally Posted by Mahigan View Post

Tessellation isn't that much better going from Kepler to Maxwell:



What is optimized is shader occupancy and efficiency. Each SMM under Maxwell is far more powerful than each SMX under Kepler. Maxwell v2 contains the fp32 compute optimizations that Maxwell 20nm was supposed to have (now split into two sku's Maxwell V2 28nm and Pascal 16nm Finfet+). These result in 35% more fp32 performance per core. So while a GTX 780 has a theoretical performance of 4 Tflops, a Maxwell based GPU only needs 2.6 Tflops to match it compute wise. A GTX 960 pulls around 2.4 Tlops. Pretty close. How?
(Hint: Pascal will push this further and change the way fp64 CUDA cores are shared thus boosting fp64 performance).
Add the various compression algorythms found on Maxwell V2 and you have the technology to make a 900 series low-mid range card compete with a high-end 780 series card under certain conditions.

GeForce GTX 780
Pixel Fill rate: 43
Texel Fill rate: 173
Memory Bandwidth: 288 GB/s

GeForce GTX 960
Pixel Fill rate: 38
Texel Fill rate: 75
Memory Bandwidth: 112 GB/s

Memory bandwidth?
I think that Tessellation culling (saving on cache and bandwidth) coupled with fp32 performance optimizations, larger L2 cache of 2MB and pixel fill rate + pixel compression algorythms explains the differences between Kepler, which lacks these, and Maxwell V2.

Oh and part of the gameworks agreement is that the developer needs the consent of nvidia to work with AMD and the consent on what can be communicated by the dev to AMD (gameworks black box). This consent is often given late or too close to launch. Forcing AMD to miss the launch driver optimizations and end up looking bad all over the media. This image stays with consumers affecting AMD graphics sales and the Radeon brand as a whole. AMD often fix problems after the launch though but the damage is already done.

Someone else already communicated this and it is the truth.

Complete, stunning and eye opening explanation, I don't even know anyone who could have said it better. thumb.gif
post #420 of 799
Quote:
Originally Posted by Dargonplay View Post




a GTX 970 beating the Titan for almost 50% more performance, that's disgusting, let alone the fact that its also beating the 295x2 for some reason, the GTX 960 is 3 Frames away from the Titan.

To be fair, the reviewers always bench at stock clocks which are very low on Titan. Mine will do 1320mhz which would more than make up that difference (though would still lose to a max OC 970). No question Kepler driver optimizations are nonexistent at this point which is pretty lame...
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Video Game News
Overclock.net › Forums › Industry News › Video Game News › [WCCF] HITMAN To Feature Best Implementation Of DX12 Async Compute Yet, Says AMD