Joined
·
2,229 Posts
Intro
The purpose of this thread is to further discuss the new information + about Ryzen and CPU performance in games. As most OCNers know, when Ryzen came out, it was very good at workstation, but a bit behind (maybe 10-20%) Kaby Lake, clock for clock at games at 1080p.
With faster RAM, the gap that Ryzen has is mostly gone.
This is a follow-up to my earlier thread:
http://www.overclock.net/t/1624566/theories-on-why-the-smt-hurts-the-performance-of-gaming-in-ryzen-and-some-recommendations-for-the-future/0_100
In my previous thread, there was a discussion about possible theories and what to do to mitigate these problems.
Ryzen loves memory
In that thread, I hypothesized in the OP that Zen might love memory due to the unique topology of Zen, the CCXs. DRAM is used as a cache to communicate between the 2 CCXs due to the way Infinity Fabric works. Infinity Fabric first checks the L3 caches of the other CCXs and if the information is not there, checks DRAM.
From eTechnix, we have:
http://www.eteknix.com/memory-speed-large-impact-ryzen-performance/

From the Finnish website IO Tech, we have:
https://www.io-tech.fi/artikkelit/amdn-uusi-zen-x86-arkkitehtuuri-clock-to-clock-suorituskyky/

There is also this - Elmor from HWBOT:
http://forum.hwbot.org/showpost.php?p=479666&postcount=22

With faster RAM, Ryzen clearly sees gains.
Potential Hypotheses as to Why
Hypothesis 1: Due to the CCXs using the RAM to communicate with each other, in the absence of an L4 cache, they are using DRAM and they love the higher speeds, as unlike Intel, the unique topology makes them the last level cache. If so, an L4 or even an eDRAM like solution would see huge gains. Memory clocks have a disproportionate effect because they are the bottleneck. Keep in mind that right now on X370, you've got 8 cores being fed by dual channel RAM, while on X99, you've got 6-10 cores being fed by quad-channel RAM. While the quad channel RAM is a bit slower, it's still a lot more bandwidth because it is quad channel.
Hypothesis 2: The Infinity Fabric Speed is tightly related to the memory clocks. In the previous article I wrote, I noted that unlike Skylake, where there is a Core Clock, an Uncore Clock, a RAM clock, and the Base Clock of the CPU is separate from the PCIe functions, Ryzen does not have these separations. Overclocking the RAM may be overclocking the Infinity Fabric, which itself is believed to be based on HyperTransport. (Thanks Looncraz).
There is one reason to believe Hypothesis 1 - the Geekbench shows that secondary timings do matter at 3466. We will know more in a month or two when the ability to alter RAM clocks and timings happens.
Of course, we need the ability to unlock the data to test these 2 hypothesis and more people with boards.
What we need to see
We also need to test how fast this can scale?
AIDA64 for example is not up to date right now.
We'll need to test how far Zen can go with fast RAM and tight timings.
Are these typical of Ryzen?
Extremetech recently published a review of the 1080Ti. They used 3200 MT/s RAM on Ryzen and benchmarked it against a 6900K.
https://www.extremetech.com/gaming/245604-review-gtx-1080-ti-first-real-4k-gpu-drives-better-amd-intel
From the review, we can conclude that the following games scale with RAM.
Interestingly, Hitman shows AMD's 1800X lagging the 6900K. There is also a gap in Civ VI. It's a notable exception because it is CPU bottlenecked it would seem, so it may be that turning on SMT would actually help. Further testing required.

Averaged out, Broadwell E was just 1% faster than the 1800X at 4k with a 1080Ti.
Keep in mind of course ET's testing methods - they did used slower RAM 2666 MHz RAM for Broadwell E, although it is in quad channel so I doubt there would be a RAM bandwidth bottleneck. I do not believe that any CPUs were OC"ed. There were a few games where Ryzen was faster than Broadwell E!
Broadwell E of course has a bit of OC headroom (4.2 to 4.4 GHz), but keep in mind that in testing, ET kept Simultaneous Multi-Threaded off. It will be very interesting to see how the 4 Zen scales with RAM, as there is only 1 CCX and thus would suffer from no inter-CCX bottleneck. It may very well give the 7700K a fair fight, even if the 7700K is a faster CPU.
I'd say though that many, if not most games love faster RAM with Ryzen. With fast RAM, the gaming gap basically goes away for most games.
What we need to see
Clock for clock review of Ryzen vs Broadwell E, with Ryzen at top RAM speeds and Broadwell E overclocked with best possible RAM speeds, with Ryzen's SMT off.
I don't expect it will change much. If Broadwell E has more OC headroom than Zen, base of 3.2 GHz, turbo 3.7 GHz, single core turbo of 4.0 GHz (overclocks were in the 4.2 to 4.4 GHz range), against an 1800X (3.6 GHz, with overclocks in the 3.9 - 4.1 GHz range), most of the advantages should be recouped with AMD's SMT disabled. It will however be interesting to see if we can find a CPU limited game that uses all 16 threads on both CPUs to see who is the winner. Basically with faster RAM though, AMD should not come out behind, and may even come out on top in some titles.
Seeing that GPU not CPU will be the bottleneck, I don't expect much variation from the titles, which considering the lower cost of Zen is a win for AMD.
AMD Binning
This is from Silicon Lottery:
https://siliconlottery.com/collections/all
Note that this is as of March 2017. Processes tend to mature with time.
Ryzen 7 1700
Ryzen 7 1700X
Ryzen 7 1800X
Keep in mind that Silicon Lottery uses RealBench as their testing platform. That is "game stable", but not "server grade stable". There isn't a filtering by IMC either. That's very important now on Zen.
There is very clearly a binning process going on with AMD and the higher numbers clearly are better binned.
Does it matter? Well, keep in mind that if you play games that are CPU intensive, then it is well worth it.
Examples:
You should keep into account what you play. If you play strategy or simulation or the Battlefield series, you will frequently be CPU bottlenecked.
The Hitman game was also faster with Intel rather than Ryzen.
Is there anything else I should know?
This is not about Ryzen, but considering how the Vega GPUs are using Infinity Fabric, this may affect how we overclock them. Worse case situation, the Infinity Fabric might be a bottleneck, in which case we may be limited by either the fabric or the VRAM overclocks. HBM1 overclocked with about a 10-15% headroom in Fury X, although it was driver locked. Why is this an issue? It may be that overclocking the Infinity Fabric involves overclocking the HBM2 VRAM, much like overclocking the memory speed might be doing with Ryzen. We have no way of knowing at this time what the headroom of HBM2 will be for overclocking either.
It will likely be that the Infinity Fabric is what the NCUs (Next Generation Compute Units) use to communicate, so this is fairly important. Of course without knowing the speed of the Fabric, nor any other details, we have no way of knowing if it is even a bottleneck at all.
However, one problem of the Fury X, due to the deployment of HDLs was that there was limited overclocking headroom. I expect that AMD will use HDLs again on Vega, but have no further information.
Without further details, there is no way to know.
I just want the maximum gaming performance!
Ok first thing to do is to wait. We need to see which boards are best for RAM overclocking.
Buy good RAM and a motherboard competent at overclocking RAM
First thing to do is to buy top tier RAM and depending on how things look after the RAM timings are unlocked, a Base Clock Generator might be useful. Some people are saying it's just going to be marketing soon, while others say it is key.
Watch Buildzoid's video:
Keep in mind, the following boards have Bclk schedulers
However, what is certain is that you'll want to invest in some top binned RAM. You could even buy a lower end board (maybe the cheapest with the Bclk if it turns out to be useful, if not, then your choices widen). Why? Without much OC headroom, no point in overkill VRMs. Only time you need a flagship board is if you need the other neat features. Just make sure though that the board is good at overclocking RAM. If you don't believe me, scroll back up and look at that Witcher 3 and Arma 3 benchmark!
Overclock the RAM! We will need to see what the best combination is though.
Disable the SMT
Most games do not use more than 8 threads. Only for games using more than 8 threads should you enable SMT.
It is a pain to reboot, but this is how you get the most performance.
There are some things that you cannot control
If you read my previous post, I discussed how I felt that AMD should add more resources to the queues, so that the performance penalty in SMT is minimized. It's something that will wait for Zen+ or Zen++.
On the software side, we still need Windows and Linux to have schedulers that make the most of AMD's CCX topology. Programmers too for their software need to optimize for AMD.
Conclusions
As we can see, the games gap is largely gone. Actually, AMD even is able to pull a few wins!
If we combine the following:
For Zen+, what do we want?
First of all, faster clocks! There's no OC headroom. Hopefully they can get it onto Samsung's 14LPU (their fourth generation process on 14nm). 14LPU is dedicated to faster clocks judging by the description, which helps. I suspect that Skylake E will be faster. A 5960X did 4.4 to 4.6 GHz typically, which is roughly what a 4770k did. On average, the 4790K was around 300 MHz faster. If so, Skylake E should be a decent leap - 6700K speeds, but at Skylake IPC. Zen+ will need to be good to respond to that.
Unlink everything like on Skylake. On Skylake of course, you have the core clock, uncore, RAM clock, base clock, which in turn is separate from the CPU and PCIe, along with a Strap function on K CPUs.
Next we want more queue resources. That allows for no performance penalty with SMT on. AMD probably had inadequate performance there.
An L4 cache would be very helpful for inter-CCX communications. Even a small one could see huge gains. Failing that, something like the eDRAM of Broadwell or even an HBM solution for high end CPUs would help.
We need the Infinity Fabric Bandwidth to be higher so that it bottlenecks less the communication between cores.
A wider core would also help, although it is going to be diminishing returns like on Intel CPUs. Only certain apps can take advantage of it. WE also want better AVX2 performance (scaling not as good as Intel's).
Beef up the memory controller. Even with a L4 cache, this might help things out. AMD is trying to feed 8 fast cores with dual channel, versus INtel which is trying to feed their cores with quad channel.
Maybe AMD should consider offering an HEDT solution to fight the 6950X. Such a solution could be 2 1800X dies together in a Multi-Chip Module. CLocked in, that would be 190W, a lot less if they lower the clocks. It would feature the ability to have 2 M.2 PCIe x4 SSDs, quad channel RAM, and 32 PCIe 3.0 slots.
Closing thoughts
When I look at the resources that AMD had versus the resources that Intel had, yes it's a solid architecture. We may not even have the gaming penalty that we had initially feared. We may be pretty close to having our cake and getting to eat it.
That said, Zen+ has a lot of options for improvement.
The purpose of this thread is to further discuss the new information + about Ryzen and CPU performance in games. As most OCNers know, when Ryzen came out, it was very good at workstation, but a bit behind (maybe 10-20%) Kaby Lake, clock for clock at games at 1080p.
With faster RAM, the gap that Ryzen has is mostly gone.
This is a follow-up to my earlier thread:
http://www.overclock.net/t/1624566/theories-on-why-the-smt-hurts-the-performance-of-gaming-in-ryzen-and-some-recommendations-for-the-future/0_100
In my previous thread, there was a discussion about possible theories and what to do to mitigate these problems.
Ryzen loves memory
In that thread, I hypothesized in the OP that Zen might love memory due to the unique topology of Zen, the CCXs. DRAM is used as a cache to communicate between the 2 CCXs due to the way Infinity Fabric works. Infinity Fabric first checks the L3 caches of the other CCXs and if the information is not there, checks DRAM.
From eTechnix, we have:
http://www.eteknix.com/memory-speed-large-impact-ryzen-performance/
From the Finnish website IO Tech, we have:
https://www.io-tech.fi/artikkelit/amdn-uusi-zen-x86-arkkitehtuuri-clock-to-clock-suorituskyky/
There is also this - Elmor from HWBOT:
http://forum.hwbot.org/showpost.php?p=479666&postcount=22
With faster RAM, Ryzen clearly sees gains.
Potential Hypotheses as to Why
Hypothesis 1: Due to the CCXs using the RAM to communicate with each other, in the absence of an L4 cache, they are using DRAM and they love the higher speeds, as unlike Intel, the unique topology makes them the last level cache. If so, an L4 or even an eDRAM like solution would see huge gains. Memory clocks have a disproportionate effect because they are the bottleneck. Keep in mind that right now on X370, you've got 8 cores being fed by dual channel RAM, while on X99, you've got 6-10 cores being fed by quad-channel RAM. While the quad channel RAM is a bit slower, it's still a lot more bandwidth because it is quad channel.
Hypothesis 2: The Infinity Fabric Speed is tightly related to the memory clocks. In the previous article I wrote, I noted that unlike Skylake, where there is a Core Clock, an Uncore Clock, a RAM clock, and the Base Clock of the CPU is separate from the PCIe functions, Ryzen does not have these separations. Overclocking the RAM may be overclocking the Infinity Fabric, which itself is believed to be based on HyperTransport. (Thanks Looncraz).
There is one reason to believe Hypothesis 1 - the Geekbench shows that secondary timings do matter at 3466. We will know more in a month or two when the ability to alter RAM clocks and timings happens.
Of course, we need the ability to unlock the data to test these 2 hypothesis and more people with boards.
What we need to see
We also need to test how fast this can scale?
- What is the optimal point between timings and clocks? As you go faster, the timings get more loose, but it seems Ryzen is benefiting. An example, in X99, often with Haswell E, 2666 MHz was faster with tight timings than 3200 MHz with loose timings. With Ryzen, it might be faster speed, loose timings if Hypothesis 2 is true.
- How fast can Ryzen's memory controller go in raw clocks? Top G.Skill kits today come in 4266 @ 19-19-19-39 for Z270.
- Are there diminishing returns at some point?
AIDA64 for example is not up to date right now.
We'll need to test how far Zen can go with fast RAM and tight timings.
Are these typical of Ryzen?
Extremetech recently published a review of the 1080Ti. They used 3200 MT/s RAM on Ryzen and benchmarked it against a 6900K.
https://www.extremetech.com/gaming/245604-review-gtx-1080-ti-first-real-4k-gpu-drives-better-amd-intel
From the review, we can conclude that the following games scale with RAM.
- Witcher 3 (from above)
- ARMA 3 (from above)
- Company of Heroes
- Metro Last Light Redux - AMD 1800X is actually faster than Intel 6900K at 4k
- Ashes of Singularity
- Shadow of Mordor - AMD 1800X is actually faster than Intel 6900K at 4k
- Rise of Tomb Raider - AMD 1800X is actually faster than Intel 6900K at 4k
- DIRT Rally - AMD 1800X is actually faster than Intel 6900K at 4k
Interestingly, Hitman shows AMD's 1800X lagging the 6900K. There is also a gap in Civ VI. It's a notable exception because it is CPU bottlenecked it would seem, so it may be that turning on SMT would actually help. Further testing required.
Averaged out, Broadwell E was just 1% faster than the 1800X at 4k with a 1080Ti.
Keep in mind of course ET's testing methods - they did used slower RAM 2666 MHz RAM for Broadwell E, although it is in quad channel so I doubt there would be a RAM bandwidth bottleneck. I do not believe that any CPUs were OC"ed. There were a few games where Ryzen was faster than Broadwell E!
Broadwell E of course has a bit of OC headroom (4.2 to 4.4 GHz), but keep in mind that in testing, ET kept Simultaneous Multi-Threaded off. It will be very interesting to see how the 4 Zen scales with RAM, as there is only 1 CCX and thus would suffer from no inter-CCX bottleneck. It may very well give the 7700K a fair fight, even if the 7700K is a faster CPU.
I'd say though that many, if not most games love faster RAM with Ryzen. With fast RAM, the gaming gap basically goes away for most games.
What we need to see
Clock for clock review of Ryzen vs Broadwell E, with Ryzen at top RAM speeds and Broadwell E overclocked with best possible RAM speeds, with Ryzen's SMT off.
I don't expect it will change much. If Broadwell E has more OC headroom than Zen, base of 3.2 GHz, turbo 3.7 GHz, single core turbo of 4.0 GHz (overclocks were in the 4.2 to 4.4 GHz range), against an 1800X (3.6 GHz, with overclocks in the 3.9 - 4.1 GHz range), most of the advantages should be recouped with AMD's SMT disabled. It will however be interesting to see if we can find a CPU limited game that uses all 16 threads on both CPUs to see who is the winner. Basically with faster RAM though, AMD should not come out behind, and may even come out on top in some titles.
Seeing that GPU not CPU will be the bottleneck, I don't expect much variation from the titles, which considering the lower cost of Zen is a win for AMD.
AMD Binning
This is from Silicon Lottery:
https://siliconlottery.com/collections/all
Note that this is as of March 2017. Processes tend to mature with time.
Ryzen 7 1700
- It's presumed all CPUs go to 3.7 GHz (Silicon Lottery please correct me if I am wrong about this assumption)
- 93% can do 3.8GHz @ 1.376V
- 70% can do 3.9GHz @ 1.408V
- 20% can do 4.0GHz @ 1.440V
Ryzen 7 1700X
- It's presumed all CPUs go to 3.8 GHz
- 77% can do 3.9GHz @ 1.392V
- 33% can do 4.0GHz @ 1.424V
Ryzen 7 1800X
- It's presumed all CPUs go to 3.8 GHz
- 97% can do 3.9GHz @ 1.376V
- 67% can do 4.0GHz @ 1.408V
- 20% can do 4.1GHz @ 1.440V
Keep in mind that Silicon Lottery uses RealBench as their testing platform. That is "game stable", but not "server grade stable". There isn't a filtering by IMC either. That's very important now on Zen.
There is very clearly a binning process going on with AMD and the higher numbers clearly are better binned.
Does it matter? Well, keep in mind that if you play games that are CPU intensive, then it is well worth it.
Examples:
- Battlefield series
- Cities: Skylines
- Total War series
- Many flight simulators are CPU limited
- Starcraft 2 is single thread limited, as are many other strategy games
- Unsurprisingly, Civ 6 was faster with Intel right now
You should keep into account what you play. If you play strategy or simulation or the Battlefield series, you will frequently be CPU bottlenecked.
The Hitman game was also faster with Intel rather than Ryzen.
Is there anything else I should know?
This is not about Ryzen, but considering how the Vega GPUs are using Infinity Fabric, this may affect how we overclock them. Worse case situation, the Infinity Fabric might be a bottleneck, in which case we may be limited by either the fabric or the VRAM overclocks. HBM1 overclocked with about a 10-15% headroom in Fury X, although it was driver locked. Why is this an issue? It may be that overclocking the Infinity Fabric involves overclocking the HBM2 VRAM, much like overclocking the memory speed might be doing with Ryzen. We have no way of knowing at this time what the headroom of HBM2 will be for overclocking either.
It will likely be that the Infinity Fabric is what the NCUs (Next Generation Compute Units) use to communicate, so this is fairly important. Of course without knowing the speed of the Fabric, nor any other details, we have no way of knowing if it is even a bottleneck at all.
However, one problem of the Fury X, due to the deployment of HDLs was that there was limited overclocking headroom. I expect that AMD will use HDLs again on Vega, but have no further information.
Without further details, there is no way to know.
I just want the maximum gaming performance!
Ok first thing to do is to wait. We need to see which boards are best for RAM overclocking.
Buy good RAM and a motherboard competent at overclocking RAM
First thing to do is to buy top tier RAM and depending on how things look after the RAM timings are unlocked, a Base Clock Generator might be useful. Some people are saying it's just going to be marketing soon, while others say it is key.
Watch Buildzoid's video:
- Asrock X370 Taichi (my recommendation currently)
- Asrock X370 Fata1ty Professional Gaming
- Asus X370 Crosshair Hero
- Gigabyte X370 Gaming K7
However, what is certain is that you'll want to invest in some top binned RAM. You could even buy a lower end board (maybe the cheapest with the Bclk if it turns out to be useful, if not, then your choices widen). Why? Without much OC headroom, no point in overkill VRMs. Only time you need a flagship board is if you need the other neat features. Just make sure though that the board is good at overclocking RAM. If you don't believe me, scroll back up and look at that Witcher 3 and Arma 3 benchmark!
Overclock the RAM! We will need to see what the best combination is though.
Disable the SMT
Most games do not use more than 8 threads. Only for games using more than 8 threads should you enable SMT.
It is a pain to reboot, but this is how you get the most performance.
There are some things that you cannot control
If you read my previous post, I discussed how I felt that AMD should add more resources to the queues, so that the performance penalty in SMT is minimized. It's something that will wait for Zen+ or Zen++.
On the software side, we still need Windows and Linux to have schedulers that make the most of AMD's CCX topology. Programmers too for their software need to optimize for AMD.
Conclusions
As we can see, the games gap is largely gone. Actually, AMD even is able to pull a few wins!
If we combine the following:
- Faster RAM speeds and tighter timings from an unlocked RAM multiplier in a month or two
- Highly binned RAM
- Disabling SMT
- Motherboard with ability to good RAM OCing ability
For Zen+, what do we want?
First of all, faster clocks! There's no OC headroom. Hopefully they can get it onto Samsung's 14LPU (their fourth generation process on 14nm). 14LPU is dedicated to faster clocks judging by the description, which helps. I suspect that Skylake E will be faster. A 5960X did 4.4 to 4.6 GHz typically, which is roughly what a 4770k did. On average, the 4790K was around 300 MHz faster. If so, Skylake E should be a decent leap - 6700K speeds, but at Skylake IPC. Zen+ will need to be good to respond to that.
Unlink everything like on Skylake. On Skylake of course, you have the core clock, uncore, RAM clock, base clock, which in turn is separate from the CPU and PCIe, along with a Strap function on K CPUs.
Next we want more queue resources. That allows for no performance penalty with SMT on. AMD probably had inadequate performance there.
An L4 cache would be very helpful for inter-CCX communications. Even a small one could see huge gains. Failing that, something like the eDRAM of Broadwell or even an HBM solution for high end CPUs would help.
We need the Infinity Fabric Bandwidth to be higher so that it bottlenecks less the communication between cores.
A wider core would also help, although it is going to be diminishing returns like on Intel CPUs. Only certain apps can take advantage of it. WE also want better AVX2 performance (scaling not as good as Intel's).
Beef up the memory controller. Even with a L4 cache, this might help things out. AMD is trying to feed 8 fast cores with dual channel, versus INtel which is trying to feed their cores with quad channel.
Maybe AMD should consider offering an HEDT solution to fight the 6950X. Such a solution could be 2 1800X dies together in a Multi-Chip Module. CLocked in, that would be 190W, a lot less if they lower the clocks. It would feature the ability to have 2 M.2 PCIe x4 SSDs, quad channel RAM, and 32 PCIe 3.0 slots.
Closing thoughts
When I look at the resources that AMD had versus the resources that Intel had, yes it's a solid architecture. We may not even have the gaming penalty that we had initially feared. We may be pretty close to having our cake and getting to eat it.
That said, Zen+ has a lot of options for improvement.