Overclock.net › Forums › Industry News › Video Game News › [Anand] Fable Legends DX12 Benchmark Analysis
New Posts  All Forums:Forum Nav:

[Anand] Fable Legends DX12 Benchmark Analysis - Page 33  

post #321 of 443
Quote:
Originally Posted by Devnant View Post

hahahaha what?

I'm just saying that some games might tax heavily the CUs on consoles, but not at all the CUs on modern PC GPUs. All thanks to poor ports, lazy devs, console parity, etc.
how console game would be affected if its programmed to run for a console?
post #322 of 443
Quote:
Originally Posted by Mahigan View Post

Well the R9 390 is already a better buy over the GTX 970 under DX11 titles. The R9 390x is close to the GTX 980 in DX11 titles and, so far, all indications show that it will surpass the GTX 980 under DX12 titles. Several DX12 titles are releasing soon. We have Fable Legends by years end and 3-4 more titles around Jan/Feb/March of 2016. All of these titles are AMD partnered (Hitman, NFS, Deus Ex, Tomb Raider etc).

Therefore, if I were in the market to buy a card right now, I likely would be looking at the R9 290 series, 390 series unless I wanted to spend more then it would be either the Fury (non-x) or the GTX 980 Ti. These appear to be the logical buys at the aforementioned price points.

New cards are arriving around June/July of 2016 (Pascal) as well as August/September (Arctic Islands).

EDIT: Since no card, with more than 4GB VRAM, can play at 4K with decent framerates... then I see no point in worrying about the Framebuffer sizing over 4GB at this point in time. For Pascal or Arctic Islands, then I'd be looking at larger than 4GB framebuffer being valid. If people are looking to pair two cards then with DX12 and shared memory pooling, any 4GB cards will do for 4K (as they will in effect represent a graphics card array of 8GB).
It is still too early to say what you should buy now. Who knows what nvidia might pull off in the coming months. All this is speculation and i really wouldn't put hundreds of whatever currency you got on the line on it.
post #323 of 443
Quote:
Originally Posted by Assirra View Post

It is still too early to say what you should buy now. Who knows what nvidia might pull off in the coming months. All this is speculation and i really wouldn't put hundreds of whatever currency you got on the line on it.
where are the improvements on nvidia driver for ashes of singularity? do you have any update?
post #324 of 443
Quote:
Originally Posted by PontiacGTX View Post

where are the improvements on nvidia driver for ashes of singularity? do you have any update?
They are probably working on it, but compare to a certain other gpu maker they don't yell "don't buy from x, wait for our stuff, it will be better!" every other week.
Let at least wait till a couple games come out and the drivers intended for said games.
post #325 of 443
Quote:
Originally Posted by PontiacGTX View Post

how console game would be affected if its programmed to run for a console?

Say a DX12 game takes full advantage of async compute capabilities, and take full advantage of all the 12 CUs a XBO has, but when porting to PC the devs figure there are hardly any benefits to be gained from taxing more than 20 CUs, or that they are so lazy that they won't even take advantage of more than 12 CUs even if a GPU has 32. Correct me if I'm wrong, but I still think console-parity will hurt DX12 multiplatform games.
Edited by Devnant - 9/26/15 at 8:30am
post #326 of 443
Quote:
Originally Posted by gamervivek View Post

iirc zlatan has posted earlier as well on AT forums about how GCN can do conservative rasterization and ROVs by emulating them, if so AMD would've been shouting it from the rooftops so I don't find his this quip credible either. Though I might be confusing him with somebody else.

As for Fiji bottleneck, unlike 980Ti it isn't a scaled up version so the drivers probably have some more work to be done. The ROPs bottleneck would've meant quite a poor showing at 4k.
hahahaha

ROVs aid with multiple transparencies (OIT - order independent transparencies) according to MS.
Quote:
This enables Order Independent Transparency (OIT) algorithms to work, which give much better rendering results when multiple transparent objects are in line with each other in a view.

https://msdn.microsoft.com/en-us/library/windows/desktop/dn914601(v=vs.85).aspx

i.e. ordering transparent objects correctly in 3D space. AMD already had this capability with the 5000 series cards

http://developer.amd.com/resources/documentation-articles/gpu-demos/ati-radeon-hd-5000-series-graphics-real-time-demos/

Conservative rasterization can be done in software and seems to be a performance reducing feature anyway.

The effects can be done in other ways or emulated. They aren't THAT important right now. All AMD has said is that the features enabled by the 12.1 features can be done well in other ways. Zlatan is not wrong
Edited by semitope - 9/26/15 at 8:30am
post #327 of 443
Quote:
Originally Posted by ku4eto View Post

The 390x OC'es around 15% core and 10% memory. The 980 Ti does ~25% core and ~15% memory. I wouldn't say that his is easy win for the memory at least. On core, yes 10% seems like a win.


Remember though, Nvidia's GPUs once you reach a point don't seem to scale linearly. They seem to be bottlenecked somewhere. Mahigan thinks it's the VRAM. Could be. We'll know soon enough (when Pascal comes out with HBM2). I've suggested elsewhere that it could be the graphics drivers - this will end with DX12.

Quote:
Originally Posted by Mahigan View Post

Warning: Spoiler! (Click to show)
I think that Arctic Islands did get the desired revamp. Since AMD feel confident in claiming it is Post-GCN or GCN Next then this would indicate that, unlike GCN1.0/GCN1.1/GCN1.2, Arctic Islands will in fact be based on a more efficient design. I'm not sure why, but I have this feeling that we won't necessarily see a large increase in ALUs, rather I see an increase in the IPC throughput of several elements in both the compute and graphics pipelines. I see AMD sticking with 8 ACEs or maybe even dropping down to 4 ACEs (staying at 8 ACEs if there is an increase in ALUs and dropping down to 4 if there is little to no increase in ALUs but rather an increase in computational performance per ALU). The current ACE organization is not truly being tapped yet therefore investing in this dept probably wouldn't lead to much if any gains in the immediate. One way of achieving higher performance, all around, would be to increase the memory pools on GCN. Increasing the size and speed of the LDS and GDS pools would have a rather large impact in both triangle as well as computational throughput under parallel workloads (unless AMD engineers re-worked HyperZ and improve Z-Culling which in that case they would need less of an investment in the on-die cache). More ROPs makes sense but more RBEs as well (or improved Z-Culling, as mentioned before, like a new version of HyperZ which could help to remove un-necessary pixels from being rendered thus boosting Tessellation performance by saving on memory and compute resources), perhaps an increase in TMU efficiency (particularly as it pertains to int16 performance by moving to 64-bit/FP16 @ 4 Texels/clk which newer games are making use of). 16nm FinFET should allow for Greenland to achieve higher clocks which would help in terms of improving the speed of the on-die cache.

AMD don't need to make drastic changes to GCN in order to have an architecture which fits the mold of what many devs will utilize because of their design wins in the console markets.

If AMD simply add:
  1. Improved Z-Culling (New HyperZ)
  2. 64-bit/FP16 @ 4 Texels/clk
  3. Boost RBEs and ROPs
  4. Improved Cache (LDS/GDS)

They'd go a long way in rectifying some of the shortcomings in their front end graphics Pipeline performance. That being said, they do need to beef up the front end and that may mean cutting down on the die space being used for the computational units. Perhaps 16nm FinFET will allow AMDs engineers to retain both the Computational advantage and beef up the front end but we'll see I suppose. Unless AMD continues with the forward thinking approach, which in terms of sales is quite foolish even though it does lend itself to a higher return on investment from a consumers perspective, which could mean a more powerful compute pipeline and the front end getting little to no revamp.

The front end no doubt needs an update:


I would agree that this should address the issues. Regardless of whether or not this is the ROPs

  • Do you think it'd be worth splitting into more CEs? Right now they've got 4 CEs, with 1024 SP, 16 ROPs, and 4 RBE groups per CE. Perhaps with more CEs that could be addressed.
  • HBM2 I think will actually benefit AMD more right now, because they are less efficient with their color compression. Either way, the bandwidth should not be a limit for either. Greenland needs to ship with at least 8GB of VRAM and perhaps 16GB would be ideal, if not overkill. I guess the key is to have the point where you have enough VRAM so that you don't run out of VRAM before you run out of Core, but too much decreases performance too.
  • Compute for the professional cards (like FireGL and Quadro/Tesla) are of course different. We'll see a lot more VRAM in those and it will be ECC HBM2. We'll also likely see FP64 performance on both Nvidia and AMD GPUs pushed back up. Both Maxwell and Fiji gimped their DP performance.

Personally I think splitting into 8 or even 10 CEs with a higher ROP and RBE to SP ratio should address the issue - perhaps 512 SP per CE, then keep 16 ROP and 16 RBE per CE. Plus with a more expanded front end, the capacity is 2 - 2.5x as good and that's not taking into account the new HyperZ.

End result:
  • 10 CEs with 5120 SP, 160 ROPs, 40 RBE groups
  • HBM2 so 1024 MB/s RAM, at least 8GB of HBM2 and perhaps 16 GB
  • Front end, most importantly has 2.5x the triangle/tesselation performance x whatever improvement HyperZ has (let's say it doubles it), so in that case 2.5 x 2 = 5x as powerful a triangle output. If it's not double, then it's 2.5 x Hyper Z improvement.

You don't think the ACEs are a bottleneck? I had been advocating more for a "hyper parallel" sort of GPU - perhaps as many as 16 or even 20 (with the 10 CE configuration) combined with a vastly improved cache. Can you go into details on the cache ideas?
Warning: Current ACE Configuration! (Click to show)

The question is, how parallel can a GPU get before we reach the limits? Or is it something close to near perfectly embarrassingly parallel?

Regardless of the method, I think though that by far the most urgent thing AMD has to do is to get the front end of that GPU vastly upgraded. We are in agreement here.
Quote:
Originally Posted by Blameless View Post

NVIDIA has a greater marketshare than AMD, but there is almost certainly more GCN hardware in circulation than there is Maxwell 2 hardware in circulation.

I would also be astounded if Pascal wasn't vastly better at handling async compute than Maxwell 2.

Maxwell 2 is going to be the outlier product, not the status quo.

I still think it's an ROP limitation. All the memory bandwidth and color compression in the world can't change final technical limits on pixel fill rate.

My Hawaii parts, back when I was mining on them, saw about a 50% reduction in frame rate by using custom firmware that substantially improved memory performance, but cut active ROPs (and only ROPs) by half...implying that even Hawaii was close to an ROP bottleneck.

Fiji improved fill rate by 5%, but shader horsepower by ~45%. Even with it's superior compression and loads of bandwidth, everything is pointing to a bottleneck in this area.

On the note of Nvidia, I think we will see gains with Pascal (it is also very Compute oriented after all), but a truly "parallel" GPU may actually have to wait until Volta. I think that they may have taped out Pascal before realizing the true extent of AMD's intentions.

On the note of AMD, they've got to address that bottleneck. Well, pulling up the hard specs again (from TechReport):


From what we can see:
  • Front end of Fury X was very similar to 290X
  • Rasterization seems unlikely to be the bottleneck (Fury X can actually support more draw calls than a 980Ti)
  • That leaves the triangles on the front end, or perhaps as you've noted the ROPs

The fill rate and the triangle output did not improve from Hawaii to Fiji. It's gotta be one of those two.

Another possibility is that it is both the ROPs and the triangles at different areas. So we might both be right.

I guess the way to describe this would be, imagine a factory with different steps in manufacturing a process. First the raw materials come in.
  1. Step 1 does 2000 units/day.
  2. Step 2 does 1000 units/day.
  3. Step 3 does 3000 units/day.
  4. Step 4 does 2500 units/day.
  5. Step 5 does 1000 units/day.
  6. Step 6 does 1500 units/day.
  7. Step 7 does 2700 units/day.

Out goes a finished product.

Steps 2 and 5 are the bottlenecks (so in this analogy, that would be the triangle output and the ROPs). I guess you could argue that by adding more shaders, AMD has done the equal of upgrading step 3.

Not a perfect analogy, but I think you get what I am trying to say here.

Regardless, I believe that adding more CEs and fewer shaders/clusters per CE should address this. If what you are saying is true, then even 704 SP per 16 ROPs may be too many, in that case, the optimal (and by optimal, to use the factory analogy, we want all steps to be about the same in maximum capacity), may be much lower, perhaps ~512 SP per 16 ROPs (I'm making an educated guess here - if you have any better ideas I would love for you to share).

On the note of the tessellation and memory compression, it's a lesser issue and not the bottleneck per se in that it is limiting the frame rates, but it's a weak point that assuming AMD has the resources, it should address. That and things like Hairworks will no longer work very well assuming AMD can address these. Memory bandwidth is the least important matter I think because HBM2 will be doubling the amount of bandwidth on top of HBM and the VRAM 4GB bottleneck will be gone.

Anyways, I presume you've read Mahigan's post which I've quoted. The key is to build a "balanced GPU", which the Fury X is clearly not, and although we may disagree on what the causes may be, it's clear that it's being bottlenecked somewhere or we'd see a 45% increase compared to Hawaii (assuming the same core clock of course).



By balanced, I'm referring to something like each step in that factory being able to pull off say, 2000 units a day. You're only as good as the weakest link in that chain. I think we both agree on the same goal, we just at this point are in disagreement over where the bottlenecks are.
Edited by CrazyElf - 9/26/15 at 9:13am
Trooper Typhoon
(20 items)
 
  
CPUMotherboardGraphicsGraphics
5960X X99A Godlike MSI 1080 Ti Lightning MSI 1080 Ti Lightning 
RAMHard DriveHard DriveHard Drive
G.Skill Trident Z 32 Gb Samsung 850 Pro Samsung SM843T 960 GB Western Digital Caviar Black 2Tb 
Hard DriveOptical DriveCoolingCooling
Samsung SV843 960 GB LG WH14NS40 Cryorig R1 Ultimate 9x Gentle Typhoon 1850 rpm on case 
OSMonitorKeyboardPower
Windows 7 Pro x64 LG 27UD68 Ducky Legend with Vortex PBT Doubleshot Backlit... EVGA 1300W G2 
CaseMouseAudioOther
Cooler Master Storm Trooper Logitech G502 Proteus Asus Xonar Essence STX Lamptron Fanatic Fan Controller  
  hide details  
Trooper Typhoon
(20 items)
 
  
CPUMotherboardGraphicsGraphics
5960X X99A Godlike MSI 1080 Ti Lightning MSI 1080 Ti Lightning 
RAMHard DriveHard DriveHard Drive
G.Skill Trident Z 32 Gb Samsung 850 Pro Samsung SM843T 960 GB Western Digital Caviar Black 2Tb 
Hard DriveOptical DriveCoolingCooling
Samsung SV843 960 GB LG WH14NS40 Cryorig R1 Ultimate 9x Gentle Typhoon 1850 rpm on case 
OSMonitorKeyboardPower
Windows 7 Pro x64 LG 27UD68 Ducky Legend with Vortex PBT Doubleshot Backlit... EVGA 1300W G2 
CaseMouseAudioOther
Cooler Master Storm Trooper Logitech G502 Proteus Asus Xonar Essence STX Lamptron Fanatic Fan Controller  
  hide details  
post #328 of 443
Here, come Directx 12 they said,
100% support will be fun they said...
-Directx12: "You have no power here!"biggrin.gif
The Machine
(14 items)
 
Nexus 7 2013
(11 items)
 
 
CPUMotherboardGraphicsRAM
A10 6800K Asus F2A85-V MSI 6870 Hawx, VTX3D 5770, AMD HD6950(RIP), Sap... G.skill Ripjaws PC12800 6-8-6-24 
Hard DriveOptical DriveOSMonitor
Seagate 7200.5 1TB NEC 3540 Dvd-Rom Windows 7 x32 Ultimate Samsung P2350 23" 1080p 
PowerCaseMouseAudio
Seasonic s12-600w CoolerMaster Centurion 5 Logitech G600 Auzen X-Fi Raider 
CPUMotherboardGraphicsRAM
Quad Krait 300 at 1.5Ghz Qualcomm APQ8064-1AA SOC Adreno 320 at 400mhz 2GB DDR3L-1600 
Hard DriveOSMonitorKeyboard
32GB Internal NAND Android 5.0 7" 1920X1200 103% sRGB & 572 cd/m2 LTPS IPS Microsoft Wedge Mobile Keyboard 
PowerAudio
3950mAh/15.01mAh Battery Stereo Speakers 
  hide details  
The Machine
(14 items)
 
Nexus 7 2013
(11 items)
 
 
CPUMotherboardGraphicsRAM
A10 6800K Asus F2A85-V MSI 6870 Hawx, VTX3D 5770, AMD HD6950(RIP), Sap... G.skill Ripjaws PC12800 6-8-6-24 
Hard DriveOptical DriveOSMonitor
Seagate 7200.5 1TB NEC 3540 Dvd-Rom Windows 7 x32 Ultimate Samsung P2350 23" 1080p 
PowerCaseMouseAudio
Seasonic s12-600w CoolerMaster Centurion 5 Logitech G600 Auzen X-Fi Raider 
CPUMotherboardGraphicsRAM
Quad Krait 300 at 1.5Ghz Qualcomm APQ8064-1AA SOC Adreno 320 at 400mhz 2GB DDR3L-1600 
Hard DriveOSMonitorKeyboard
32GB Internal NAND Android 5.0 7" 1920X1200 103% sRGB & 572 cd/m2 LTPS IPS Microsoft Wedge Mobile Keyboard 
PowerAudio
3950mAh/15.01mAh Battery Stereo Speakers 
  hide details  
post #329 of 443
Quote:
Originally Posted by gamervivek View Post

iirc zlatan has posted earlier as well on AT forums about how GCN can do conservative rasterization and ROVs by emulating them, if so AMD would've been shouting it from the rooftops so I don't find his this quip credible either. Though I might be confusing him with somebody else.

As for Fiji bottleneck, unlike 980Ti it isn't a scaled up version so the drivers probably have some more work to be done. The ROPs bottleneck would've meant quite a poor showing at 4k.
hahahaha

You can do conservative rasterization with geometry shaders..

And increasing resolution doesn't shift more load onto ROPs, it increases the shader invocation rate just as much.

Learn a little before you start making stuff up please.

BTW, Zlatan is an engineer working on a ps4 vr project.
post #330 of 443
Quote:
Originally Posted by PontiacGTX View Post

what about a rasterizer or RBE bottleneck?

RBEs are the same as ROPs so I don't think they are the bottleneck either. Rasterizers could be but you would expect that bottleneck to ease up at 4k.


Quote:
Originally Posted by Devnant View Post

hahahaha what?

I'm just saying that some games might tax heavily the CUs on consoles, but not at all the CUs on modern PC GPUs. All thanks to poor ports, lazy devs, console parity, etc.

Well, the way you worded your previous comment was pretty amusing.
Quote:
Originally Posted by semitope View Post

ROVs aid with multiple transparencies (OIT - order independent transparencies) according to MS.
https://msdn.microsoft.com/en-us/library/windows/desktop/dn914601(v=vs.85).aspx

i.e. ordering transparent objects correctly in 3D space. AMD already had this capability with the 5000 series cards

http://developer.amd.com/resources/documentation-articles/gpu-demos/ati-radeon-hd-5000-series-graphics-real-time-demos/

Conservative rasterization can be done in software and seems to be a performance reducing feature anyway.

The effects can be done in other ways or emulated. They aren't THAT important right now. All AMD has said is that the features enabled by the 12.1 features can be done well in other ways. Zlatan is not wrong

They've said that? I doubt it. And I should've made it clearer, if it was worth doing them that way then AMD would have been shouting it from the rooftops and his posts meant to imply that.
    
CPUMotherboardGraphicsRAM
e2140@3.2Ghz abit IP35-E HIS IceQ4 4850 4GB 667@800 5-5-5-15 
OSPower
win xp 32 bit Corsair 450VX@stock 
  hide details  
    
CPUMotherboardGraphicsRAM
e2140@3.2Ghz abit IP35-E HIS IceQ4 4850 4GB 667@800 5-5-5-15 
OSPower
win xp 32 bit Corsair 450VX@stock 
  hide details  
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Video Game News
This thread is locked  
Overclock.net › Forums › Industry News › Video Game News › [Anand] Fable Legends DX12 Benchmark Analysis