Overclock.net banner

[Official] Zen 5 Owner's Club (9600X / 9700X / 9900X / 9950X)

5 reading
279K views 2.5K replies 181 participants last post by  MrFox  
#1 · (Edited)
edit 11.09.24
So more links to posts made in this thread

8400MT/ + 2200mhz FCLK done stable with hynix m-die
Pictures from delid 9950X ES
Core cpu binning results for 3* 9950X retail vs ES
Blameless 4x 9700X binning results
8200MT/s + 2200mhz FCLK done stable with hynix a-die

--------------------------------------------------------------------------------------------------------------------------------------------------------------



Review embargo for the 12 and 16cores Zen5 cpus have passed, and i can finally share my Zen5 16 ES results which i have been fiddling with for 2 months already :)
And since there are no Z5 owners thread on this forum yet, i guess we can share our results /feedback and-or benchmark/review videos here.

Can start with some stability-tested memory numbers in Aida and Clem's cache/memory benchmark (y)

Hardware used:
Its pretty strange that single CCD Zen5 shows lower bandwidth than single CCD Zen4 cpus, but this change when running with all 16 cores active :cool:

First off we have the following: 6600/2200 = 322.6 mb/s in Karhu
  • 6600MT/s @ 1.8 VDD
  • 2200mhz FCLK
  • 26-37-32-30
Image

Fullscreen with with clems and hwinfo


Next-up we my true and tested 8000/2200 profile from my Z4 systems = 349.70mb/s in Karhu
  • 8000MT/s @ 1.74 VDD
  • 2200mhz FCLK
  • 32-45-38-44
Image

Fullscreen with with clems and hwinfo


And lastly we have my new (stable) 8300MT/s profile. Sadly it seems like maybe the agesa is not done cooking yet, since i'm getting more and more VDD limited as higher i go on memspeed above 8000MT/s.. And as a results the memory timings suffer, so the end performance actually end up even/below my 8000MT/s profile above = 348.70mb/s in Karhu
Image

Fullscreen with with clems and hwinfo


Last time i tried DR 2x32GB a-die it didn't go so well, maxed out at 7400MT/s, so AGESA is not done cooking yet for these setups
 
#2 · (Edited)
Have also done some benchmarking for hwbot while i was playing around, got the following in PY-prime, which i would say is pretty good for a non X3D CPU 💪

6600/2200 = 7.663s 🥇
Image


In Geekbench 3 i'm getting around 14.6k memory score @ 8000MT/s 2200mhz FCLK
Image


And some minimum latency numbers ive managed to squeeze out of this ES 🧐
(not stability tested)

6600/2200 = 52.2ns in AIDA
Image


8000/2200 = 53.3ns
CL30 @ 8000 required 2.07 maxed VDD :ROFLMAO:
Image


Also got very close to beating 50k MT score in Cinebench R23, but since i'm limited by watercooling in the middle of the summer i didn't quite reach my goal tt
Image


For those interested, i have uploaded results for the full hwbot suite here
(cant really compete with the LN2 guys)
Image
 
#3 · (Edited)
The only thing left now is to butter up the F5-key so i'm ready to order myself a few retail 9550X's tomorrow when they go on sale tomorrow
(i fear that it will be low stock)
Image


Feel free to ask any questions/suggestions etc :)
I'm running a semi defective 3090 so i dont know how much use gaming benchmarks will be for my setup
 
#5 ·
The only thing left now is to butter up the F5-key so i'm ready to order myself a few retail 9550X's tomorrow when they go on sale tomorrow
(i fear that it will be low stock)
View attachment 2669069

Feel free to ask any questions/suggestions etc :)
Good work Dom. How does gaming performance compare to your 7950X3D?
 
#4 ·
Got a 9950X but going to stick to 7950X3D until 9950X3D in my gaming machine. I will test if my 2x32GB DDR5-6000 CL40 EXPO works with 9950X better than my 7950X3D which for the most part suck with it to POST.
 
#7 ·
Thank you sharing these infos to us !

Confirmation, Derbauer's delider works just identically than with the 7950x, nice !!

Does it matter what Silicon prediction an ES cpu has ? what is this cpu's SP ?
 
#10 ·
I can figure why my TimeSpy CPU score is so low. First with no EXPO was 21K but now doing 15.5k.
 
#12 ·
Probably for the better and the can just release 9600/9700 with lower bin at 65W for cheaper.
 
  • Rep+
Reactions: MadGoat
#13 ·
Thank you sharing these infos to us !

Confirmation, Derbauer's delider works just identically than with the 7950x, nice !!

Does it matter what Silicon prediction an ES cpu has ? what is this cpu's SP ?
My ES sample have a SP rating of 115, but it have a unfinished V/F curve, so this values cant be compared to retail
FMAX is 5650/5350mhz (100mhz lower than real 9950X)
If you could test one or two more games that'd be great and compare to either your 8 or 16 core Vcache CPU.
I really dont want to change to my 7800X3D, have lots of other testing to do! :)
New memory incoming 😇
Are your all your BM's above with delidded cpu?, just figuring out what how a stock CPU with AIO will compare to yours?
No i run with stock IHS, temps are really good on Zen5
This is baseline stock Cinebench r23 on this ES, with hwinfo open (my first boot on Z5 😅)
Image
 
#16 ·
Review embargo for the 12 and 16cores Zen5 cpus have passed, and i can finally share my Zen5 16 ES results which i have been fiddling with for 2 months already :)
And since there are no Z5 owners thread on this forum yet, i guess we can share our results /feedback and-or benchmark/review videos here.

Can start with some stability-tested memory numbers in Aida and Clem's cache/memory benchmark (y)

Hardware used:
  • 16core Zen5 ES
  • ROG CROSSHAIR X670E GENE
  • 2x16GB Hynix a-die -> Teamgroup UD5 7800MT/s
  • 2200mhz FCLK
  • Large custom watercooling (moh-ra on a different floor)
Its pretty strange that single CCD Zen5 shows lower bandwidth than single CCD Zen4 cpus, but this change when running with all 16 cores active :cool:

First off we have the following: 6600/2200 = 322.6 mb/s in Karhu
  • 6600MT/s @ 1.8 VDD
  • 2200mhz FCLK
  • 26-37-32-30
View attachment 2669054
Fullscreen with with clems and hwinfo
View attachment 2669055

Next-up we my true and tested 8000/2200 profile from my Z4 systems = 349.70mb/s in Karhu
  • 8000MT/s @ 1.74 VDD
  • 2200mhz FCLK
  • 32-45-38-44
View attachment 2669056
Fullscreen with with clems and hwinfo
View attachment 2669057

And lastly we have my new (stable) 8300MT/s profile. Sadly it seems like maybe the agesa is not done cooking yet, since i'm getting more and more VDD limited as higher i go on memspeed above 8000MT/s.. And as a results the memory timings suffer, so the end performance actually end up even/below my 8000MT/s profile above = 348.70mb/s in Karhu
View attachment 2669058
Fullscreen with with clems and hwinfo
View attachment 2669059

Last time i tried DR 2x32GB a-die it didn't go so well, maxed out at 7400MT/s, so AGESA is not done cooking yet for these setups
That's a good start for early BIOS/AGESA and ES CPU.
 
#21 ·
+200 and PBO works really good, -45 all core YC, -100 all core hwbotX265
That +200 and PBO sounds really good, if the voltage doesn't get too high.
What's the voltage and TDP with +200 and PBO for 9950X?
 
#25 · (Edited)
Ryzen 9 9950X: All the drawbacks of the dual-CCD X3D CPUs, and none of the 3D V-Cache benefits!

AMD could stand to improve their infinity fabric latency, and do an overhaul of their I/O die. This totally seems like a minimum viable product situation where the real target for these Zen 5 dies was Epyc CPUs that they'll be making a killing from, and the scraps went to the "DIY Enthusiasts" for these afterthought Ryzen 9000 CPUs.
 
#26 · (Edited)
I received a pair of 9700Xes from Newegg on Saturday. Have been comparing them in my B650M-HDV/M.2 and trying to decide which one I should delid and keep in my new primary system.

Both parts are 2429SUY, but one is a significantly lower serial number (by almost 600) than the other. Despite both being ordered from Newegg on the same day, they were not from the same order, and I wasn't even expecting the second one (it was a gift).

My first step in comparing them was to dial in some baseline settings (fully stock on the CPU) and record max VIDs and frequency with BoostTester.

Low serial sample:
Core(physical/CPPC)/VID/peak boost MHz
Core 01) 1.249 5525
Core 1(1) 1.241 5525
Core 2(4) 1.314 5525
Core 3(2) 1.295 5525
Core 4(5) 1.325 5525
Core 5(3) 1.310 5525
Core 6(7) 1.385 5505
Core 7(6) 1.370 5525

High serial sample:
Core 0(5) 1.301 5550
Core 1(1) 1.294 5550
Core 2(4) 1.299 5550
Core 3(3) 1.306 5550
Core 4(2) 1.309 5550
Core 5(1) 1.281 5550
Core 6(7) 1.360 5550
Core 7(6) 1.327 5550

First sample runs warm and never reaches FMAX; no matter what I do it's always falling a quarter multiplier short. VID spread is also pretty bad. Second sample runs quite cool, has a tight VID spread, and boost higher and more reliably with stock power limits.

However, once power limits are loosened up, the lower serial part proves to be superior. It takes much steeper negative COs (the newest Zen 5 AVX512 y-cruncher VT3 test is proving to fail most quickly with unstable curves), leading it to need less peak voltage for any given stable clocks. It is evidently a higher leakage part as it pulls more current and runs hotter at the same clocks/volts in the same tests.

The lower leakage part may or may not prove better with more extreme cooling as it will probably tolerate higher voltages, but I'm mostly looking for the peak stable performance on air as this CPU will be going in a compact build and all my potential radiator area is going to be devoted to my RTX 4090 loop. I'll probably reserve it for a friend's build as they are much more likely to use it at or near stock.

Both parts will boot and run fine with 2200 FCLK, but not at all with 2233. I'm not entirely certain 2200 is an improvement over 2133 though...memory copy improves (as that's purely FCLK limited), but few other benches show a difference and I cannot be sure 2200 is free of retransmission errors without a lot more testing.

I'm also encountering some oddities/teething issues with firmware on my B650M-HDV/M.2. Most notably the PMIC sensor info only shows up intermittently and the board refuses to post with more than 1.43 vDIMM, despite the memory I'm testing with (Team 8200 with 24Gb M-die ICs) having an unlocked PMIC. There are also no loadline calibration settings, and I'm sure excessive droop is holding back CPU overclocking slightly. Otherwise, I've been pretty impressed with the board.

PSA: if you use Revo Uninstaller to remove all traces of the AMD chipset drivers before a fresh install of them, you can get away with keeping your existing Windows installation.
Personally, I prefer to manually install/uninstall all chipset drivers via device manager. This is only absolutely necessary on OSes AMD's installer doesn't support (e.g. Windows Server), but it does leave less clutter when swapping drivers.

AMD is implementing this X3D CCD scheduling not as a fix, but instead its a work-around trying to mitigate the problem with ~175ns cross CCD latency
I'm doubtful that v-cache can mitigate inter-CCX latency in any meaningful way, which is why I'm skeptical of a dual-vcache consumer part. It's not going to address any of the underlying issues that make dual-CCX CPUs hard to schedule for games or other highly interdependent workloads...any time there is any cross-CCX traffic, you get nearly main memory levels of latency anyway, huge cache or not.

AMD might still release one, but I think it would setup a lot of people for disappointment and prompt a lot of negative reviews when the hypothetical 9950X3DX2 is no faster than the 9800X3D in games for the exact same reason the 7950X3D is rarely faster than the 7800X3D.

Multiple v-cache CCDs on the same chip make sense for server CPUs, where a bunch of independent tasks that don't really need to talk to each other much can still benefit from large pools of local memory, but not so much for most client uses. Most of the stuff we are likely to do either needs to stay in the same CCX, or doesn't really benefit from the cache at all.

AMD could stand to improve their infinity fabric latency, and do an overhaul of their I/O die.
I suspect major changes here are going to need to wait for new substrate tech. Having to go off die at all is a problem that could be partially remedied (without taking a step back to monolithic parts) by moving the IOD closer, increasing fabric clock, and widening the connection to the CCDs...but that would need a drastically different layout and a much more complex physical interconnect. Might not see large improvements until AM6 in this regard.
 
#28 · (Edited)
#29 ·
Reserved for future, looking for 9800X3D and deciding if chipset will matter.

Also as for a note for Yuri.
 
#30 ·
#31 ·
This increased logical core latency is the most interesting difference so far as it could explain some of Zen 5's non-power related performance anomalies. Normally SMT logical cores on a single physical core have very low latency to each other, which is one advantage of SMT. This increase in latency means that advantage is lost, while retaining all of the cache contention.

At first I thought this was some security mitigation/feature at work, but looking back at the Zen 5 architectural slides, it's clear there is more to it. With Zen 5 they widened almost everything, but they also added more hardware segmentation to each thread; when SMT is active the decode clusters, fetch pipes, and ROB are all statically partitioned. This probably makes SMT more advantageous in some scenarios, but is evidently very bad for inter-thread, intra-core, latency. Disabling SMT can't fix that and a lot of these resources (the extra fetch) go unused without a second thread, but each thread still gets all local cache and the entire ROB to itself.


Still not sure what's going on with the terrible inter-CCX latency, but, if it turns out not to be a power management issue, I speculate it might be related to increased buffering to make peak FCLK more consistent. Either that or a firmware/microcode issue. I did see a BIOS option that looked like it could be relevant...let me look that up.
 
  • Rep+
Reactions: OCmember and gtz
#32 ·
That BIOS option I mentioned is "Latency Under Load (LUL)" on my ASRock board. It's located in the AMD CBS/-> CPU menu.

Could someone with a dual-CCX Zen 5 see if enabling/disabling this option changes inter-CCX core-to-core latency?
 
  • Rep+
Reactions: Takla
#33 ·
Zen5 cores are extremely efficient, which probably matters more at the enterprise level running Zen 5 cores in Epyc. Still, Level1Tech seem to suggest that there are some missing Zen 5 optimizations in Windows compared to Linux which also affect gaming (his comments on CP2077 / run as admin is worth checking out...).

Also, some 'interesting' comments on memory (notwithstanding much faster and stable results w/ 2 and 4 dimms many of us are running). @domdtxdissar - around the 11 min+ mark, some non-traditional ways to mount DDR5, at least for DR. Your thoughts ?

 
#35 ·
CP2077 / run as admin is worth checking out...
Only thing I can think of that running as admin may help with is lock pages in memory/large memory pages. This almost universally reduces memory latency because of the enormous reduction in TLB entries needed.

Windows has tons of power/scheduling issues with the default balanced power profile and dual-CCX parts. Cyberpunk is also very well-threaded, for a non-RTS game.

I generally disable all Windows security features except the basic NX and ASLR stuff, plus whatever Spectre/Meltdown mitigations I can confirm to have negligible overhead. I also use the 'ultimate performance' or a fully custom power plan and typically run games with large pages enabled, unless I know it causes issues (which is incredibly rare, unless doing something stupid, like disabling page file).