Finally making progress with MSI MEG Creation X399 + 2950X 24/7 overclock - Overclock.net - An Overclocking Community

Forum Jump: 

Finally making progress with MSI MEG Creation X399 + 2950X 24/7 overclock

 
Thread Tools
post #1 of 2 (permalink) Old 05-06-2019, 12:52 AM - Thread Starter
New to Overclock.net
 
golfmiketango's Avatar
 
Join Date: Aug 2016
Posts: 2
Rep: 0
Finally making progress with MSI MEG Creation X399 + 2950X 24/7 overclock

For a while now I've been working on my new custom-loop MSI MEG Creation X399-based 2950X sort-of-silent 4U rackmount rig. I just wanted to report that I finally got PBO to do something useful on this board. I am using the latest 1usmus bios.
Because I run Linux I endeavored to do as much as possible in BIOS, so (almost) everything I've accomplished was achieved via BIOS tweaking -- that is, no Windows software allowed, as I won't be booting Windows except to verify/test.

I have 64G of ram in 4 dual-rank modules from a Trident Z F4-3200C14-16GTZ Glo-Stick kit. Getting this to run at an actual 3200 was hard AF but ultimately boiled down to this:

  • XMP profile 2 (3200)
  • ProcODT=53
  • SOC voltage ~1.09 (not quite... it's whatever is one tick down from 1.1 IIRC)
  • Adjust DRAM voltage to 1.39V as reported in BIOS§.
DRAM voltage seems to be quite a sensitive frob. If I go down to just 1.38 prime95 instanta-fails, yet I have successfully run prime95 for hours at 1.39 with no failures (once I set ProcODT 53; otherwise it eventually fails at any voltage).

For RAM testing I usually run prime95 until I want to scream, and then compile chromium with full debug symbols and high parallelism on a compressed ramdisk. Probably I am doing this in the wrong order: I find the latter test is often faster at uncovering RAM instability which is manifested as unexplained "ninja" failures, with an error message something like "ninja stopped working for no reason." Due to the huge time-cost of avoiding Type II errors, (hmm, so H0="ram stable"? Not really... so maybe they're type I errors then ), these RAM "over"clocking tasks take forever and I kind-of hate doing them.

I also have several options changed in the "DigitALL Power" menu (or whatever it's really called -- I'm referring to the menu where Spread Spectrum is found) which might affect RAM and/or system stability. I'm typing on the system in question and don't want to reboot but I'll endeavor to document all my BIOS changes (and correct any mistakes I made, working from memory) in a subsequent post, by reverting to stock, loading a stored OC profile, and photographing the "changes" dialog. Might take me a few days to get around to this.

As for the CPU, I'm surely forgetting a lot stuff but the main changes I made were:

  • Enable PBO using the 400W limits
  • Select manual PBO level and set it to "2"
  • Enable a negative voltage offset of (IIRC) 0.075Veven
  • LLC level "7"

I took inspiration from
on the interaction between XFR/PBO and offset voltage in MSI Ryzen BIOSen.

One thing that tripped me up for a long time: options in the MSI BIOS without the "[..]" markers are not necessarily read-only! Those markes only mean, a list of discrete values is available, but if you just highlight the option in question and start typing numbers, most of these can, in fact, be changed. The voltage offset value is one such frob.

This almost got me to stable, but not quite. Under linux it was rock-solid under load, but, occasionally, I would return to the machine after leaving it idle for several hours and it would be frozen (with no signs of life, whatsoever, i.e., three-finger-salute does nothing, network interfaces non-responsive, etc). Turns out I could solve this problem (so far, at least...) by disabling c6 using zenstates.py, on boot. Since I made that change, I haven't seen any problems (it's been several days without a lockup, now... Before c6-disablement, they were more-than-daily events; maybe I should just disable c6 in bios, assuming that's possible, but since it apparently ain't broke, I might just not try to fix it).

As a yard-stick, with these changes my multi-thread Cinebench 15 scores under Windows go from something under 3000 to something over 3500, and I don't see any terrifying voltages (except on idle cores, which I've decided to treat as mostly harmless artifacts of XFR), things stay reasonably cool... basically, I don't see any mortifying crap going on. Power efficiency is clearly a casualty but I don't get the impression that my CPU will wind up extra-crispy anytime soon. I should probably spend some time physically probing motherboard thermals though; there could easily be scary hotspots on the mobo that I have no clue about yet.

One power-related thing that's kind-of curious: under linux, my wall-power draw never drops below 140W or so, whereas in Windows, I see it idling at something more like 90W. I have no idea what Windows is doing differently to cause such a dramatic difference. I'm using the ondemand frequency governor and haven't tried obvious experiments like using the powersave governor, however. [Edit: come to think of it, this could well be entirely due to inferior power management under the amdgpu drivers vs. Windows AMD gaming/consumer GPU drivers, which sounds like a plausible and testable hypothesis.]

Would be interested in hearing any thoughts about this stuff. Overclocking threadripper is confusing AF compared to normal CPU's. It's hard to figure out what's going on, especially under linux*. It doesn't help that, for some reason, AMD seems determined not to properly document Ryzen's MSR's... maybe because they haven't finalized the interface yet, or because they don't want to expose frobs able to blow-up an otherwise highly nerfed platform that provides "safe-ish" OC capabilities?... Really not sure.

Obviously these "secrets" are going to partners and leaking to the usual SMEs, so why not tell the rest of us? My theory: a conspiracy to provide more datapoints to the Cortana spyware

--
§ For some reason if I set voltage v₀ then when I loop back to view the results, the bios reports voltage ~(v₀ + 0.01V) for channels A/B and ~(v₀) for channels C/D. Taking this as a hint, I set DRAM voltages of 1.40V for A/B and 1.39V for C/D, to achieve a BIOS-reported value of ~1.39V on all channels.

* Check out CoreFreq from github; insmod with the Experimental=1 option -- those guys/gals are slowly reverse-engineering the undocumented MSRs which close the information gap that currently exists between Linux and Windows on Threadripper (and Ryzen and many more, btw). Sadly it doesn't provide normal lm_sensors read-outs, so it is useful for troubleshooting, but probably not for general-purpose monitoring/tuning. Hopefully these capabilities will eventually find their way into lm_sensors and/or other standard linux platform introspection/power-tuning places. Also, for the Meg Creation (and likely other boards), manually loading the nct6775 kernel module will reveal some of the missing sensors -- although, unsurprisingly, the provided information seems incomplete/wrong to some extent.

Last edited by golfmiketango; 05-07-2019 at 02:39 AM.
golfmiketango is offline  
Sponsored Links
Advertisement
 
post #2 of 2 (permalink) Old 05-06-2019, 09:33 AM - Thread Starter
New to Overclock.net
 
golfmiketango's Avatar
 
Join Date: Aug 2016
Posts: 2
Rep: 0
Quote: Originally Posted by golfmiketango View Post
It doesn't help that, for some reason, AMD seems determined not to properly document Ryzen's MSR's...
Another guess: perhaps this has something to do with Ryzen consumer products sharing more architecture with datacenter products than prior architecture generations? I.e., if they reveal this stuff to the unwashed masses, it would cut into the competitive edge of datacenter OEMs who AMD is betting on, or threaten EPYC market segmentation somehow? I can only speculate; there does not seem to be much information available... basically nobody is talking about it much. The answer could, of course, be totally uninteresting, i.e., they just haven't gotten around to it and not enough people are complaining to motivate them to try harder.

Last edited by golfmiketango; 05-07-2019 at 02:46 AM.
golfmiketango is offline  
Reply

Quick Reply
Message:
Options

Register Now

In order to be able to post messages on the Overclock.net - An Overclocking Community forums, you must first register.
Please enter your desired user name, your email address and other required details in the form below.
User Name:
If you do not want to register, fill this field only and the name will be used as user name for your post.
Password
Please enter a password for your user account. Note that passwords are case-sensitive.
Password:
Confirm Password:
Email Address
Please enter a valid email address for yourself.
Email Address:

Log-in



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off