Overclock.net banner

1 - 20 of 31 Posts

·
Registered
Joined
·
107 Posts
Discussion Starter #1 (Edited)
2/28/21 - Uploaded new version.
Refactored into a few functions that should help with adding more tests in the future
Added a 2nd test that keeps the same p95 process running and quickly moves it around cores in the system, picking the next core randomly.

2/24/21 - Uploaded new version.
Added support for systems with SMT disabled (disable this in the BIOS and the script will work automatically)
Added logging routine with timestamp so if the machine crashes you can see where it was up to (thanks @helsyeah for this part)
Added the use of $PSScriptRoot so script will run without using run with powershell (thanks @helsyeah)
Added check to see if p95 dir already exists (thanks @helsyeah)
Added detection if prime95 exits unexpectedly without an error
Implemented a nicer way to exit prime95
Added option $use_smt in case anyone wants to experiment with that
Now on github: jasonpoly/per-core-stability-test-script

2/21/21 - Uploaded new version.
Fixes errors reported by @domdtxdissar
Added faster error detection
Added ability to continue testing remaining cores after an error (on by default)
Added some additional configuration to limit range of cores to test


I decided to write a simple script to help with stability testing when tuning curve offset on Zen 3 CPUs.

This uses a single thread of prime95, non-avx, a fixed fft size of 84, and then automatically changes core affinity, stopping when an error is hit.
There is a small cooldown period between testing each core, allowing cores to boost higher when the test is started, potentially exposing issues with higher boosts at lower temps.

Suggested usage:

Start with some BIOS settings that should be stable
Set fan speeds, etc as you would like.
Set CPU voltage to auto.
Set memory & SoC voltages, timing, frequency to defaults (or a known stable setting).
Enable PBO
Set power limits to Motherboard.
Set curve offset to 0 for all cores.
Set PBO boost override to 0.
Leave PBO Scaler at auto or start with a conservative 1x or 2x setting (you can raise this later if you wish but I have not found it to make much difference)
Optionally set PBO temperature limit to 85 (or some other value below the default 90 that you are comfortable with stress testing at).

Before you start
Exit all background applications in windows. Note: this is important since other software can stop a single core reaching it's highest possible boost setting.
Run prime95 manually, launch a single thread stress test and check temps and voltages are acceptable to you.
Launch a multi-core stress test and do the same.
Exit prime95
Use Cinebench, cpu-z, or some other benchmark to record baseline values for both single and multi-core.

Download the zip file containing the script
Previous versions:
Extract to a folder of your choice. Inside you should see 4 files:
  • local.txt - prime95 configuration file
  • prime.txt - prime95 configuration file
  • p95_core_cycle.ps1 - the powershell script you need to run
  • ReadMe.txt
Check the contents of each file in a text editor.

Download p95v303b6.win64.zip from GIMPS - Free Prime95 software downloads - PrimeNet and place into the same folder you extracted the zip into.

Exit all background applications in windows.
Note: this is important since some background applications can stop a single core reaching it's highest possible boost setting.

I would recommend leaving a single instance of HWInfo or another temperature and voltage monitoring application open. This does two things:
  1. It lets you monitor temps, clocks, voltages to make sure they are at safe levels.
  2. It provides an occasional interrupt which might expose instability.
Now you are ready to use the script
AT YOUR OWN RISK

Right click on p95_core_cycle.ps1 and select "Run with PowerShell"
It should open a PowerShell window and launch prime95 the background:

2479703


In case you exit the above window for any reason, make sure you also exit or stop any copies of Prime95 that are running (otherwise they will continue to run until you reboot):

2479704


While running the script for the first time (or after getting a new version or making any changes), check to see if the affinity is being set correctly using windows task manager. Note that due to SMT (Hyperthreading), each "core" corresponds to two CPU's in task manager. So when the script is testing on "core 0", you should see prime95.exe assigned to "CPU 0" and "CPU 1" as below:

2480482



After running the script
Make a note of any passing or failing cores.
Reboot and lower the curve offset for any cores that pass.
Raise the curve offset for any cores that fail.

I typically start by adjusting in steps of 10. If any fail, then back off by 5.

Example scenario
In initial test, all cores are stable at with a curve offset of 0, so set all to a curve offset of -10 and re-run
Now cores 2&3 fail, but all other cores pass.
Back off by 5 on cores 2&3, so set them both to -5.
Since all other cores passed, set them to -20.
Core 2 still fails, core 3 passes, cores 6 & 7 fail.
Set core 2 back to 0. Leave core 3 at -5 (you have found the stable limits for cores 2 & 3). Set cores 6 & 7 to -15. Set all remaining cores to -30.
Continue until you have found a stable setting for each core.

Now re-run your original benchmarks and record any improvements.
If you were able to set a negative offset on all cores, you can try a PBO boost override of ~50-100Mhz.
Keep core offset the same and re-check stability.
You might find you need to add some voltage back to individual cores at this point.
Re-run your benchmarks. If you did not gain anything (or now have lower results), set PBO boost override back to 0 and your curve offsets back to the previous settings.
Well done. You have successfully tuned your PBO curve offset and reached a (reasonably) stable limit for your cpu, motherboard and PSU.

Now do some longer stress tests, or just use the PC for a while to make sure it really is stable.

[Optional] Edit the default values in p95_core_cycle.ps1
Open p95_core_cycle.ps1 in a text editor and edit the top few lines in case you want to change anything from the defaults:

$p95path="p95v303b6.win64.zip"; # path to p95 .zip you want to extract and use
$loops=3; # Number of times to loop arount all cores
$cycle_time=180; # Approx time in s to run on each core (180s matches included p95 config of 3 min test time)
$cooldown=15; # Time in s to cool down between testing each core
$stop_on_error=0; # If "0" continue to next core if error found. If "1" stop.

# adjust next two values to limit testing to a specific range of cores
$first_core=0; # First core to test in each loop. Any cores lower than this number will not be tested.
$last_core=31; # Last core to test in each loop. MAX value=31. Any cores (that exist) higher than this number will not

You can reduce the loops to 2 and cycle time to 120s if you want to speed up initial iterations.
A fixed FFT size of 84 seems to cause my cpu cores to fail quickly when they are unstable, but you should experiment with different values and post your feedback.
Once you appear to have stable settings, I would recommend increasing the cycle time and number of loops. I have seen cores that appear to be stable fail after 1hr or more.
2hrs on each core would be a very stable setting to reach.

You might want to try testing with AVX or AVX2 enabled.

You should also use some other software to confirm complete stability.

Once you are happy with your CPU stability, you can work on memory & FCLK.
 

·
Registered
Joined
·
24 Posts
i was planning to do this exact thing after I wrapped up tuning my memory with my 5900x. I'm glad someone else beat me to it though! I'll give it a run on my non-so-great 5900x and see where it gets to!
 

·
Robotic Chemist
Joined
·
3,317 Posts
Clever! If I ever get that 5950X I preordered the day of release I will definitely try this.

Thank you very much for sharing your work. :)
 

·
Registered
Joined
·
107 Posts
Discussion Starter #5

·
Registered
Joined
·
107 Posts
Discussion Starter #7
I'm going to call this stable...

5800x, Asus B550-F, 1805 Bios
LLC set to Auto
CPU switching frequency set to 300kHz (not sure if this helps)
PBO set to motherboard
PBO boost override to 0.
PBO Scaler to 2x.

Interesting thing here is that core4 is rated as my worst core by windows, but after tuning curve offset it looks like my best one!
Disappointed that I still need +10 on core 3 after upgrading to a non-beta bios.

Core#perfCurve offsetMax effective clock
03/4-154850
14/5-154850
25/6-104837
31/1+104799
47/8-304850
56/7-204850
61/2+34788
72/3-54789

2479958


2479959
 

·
Registered
Joined
·
107 Posts
Discussion Starter #8
Windows seems to leave the p95 processes that have already been killed in the "hidden icons" section of the task bar. When you click there it will wake up and remove any that are no longer running. I don't think this is an issue, but if anyone knows how to clean it up let me know.

2479970
 

·
Registered
Joined
·
1,294 Posts
Great!

Will give this a go on my current stable 24/7 settings

:)

Just to add, for some reason prime95 is not maximising core frequency on any of the cores accept the first one, instead it splits the load between the real core and the hyperthreaded core so that the sum of the two equate to the "expected" effective frequency.

So ive set

CpuNumHyperthreads=0

Lets see what changes.....

And.....did not see any changes. Are others seeing the same thing? Anyway to force affinity to just the real core ?
 

·
Registered
Joined
·
107 Posts
Discussion Starter #11
In your powershell output, "Testing cores 0 through 8" should be "Testing cores 0 through 7". Minor bug...
Well spotted. I should fix that in the next iteration. Thanks.

Just to add, for some reason prime95 is not maximising core frequency on any of the cores accept the first one, instead it splits the load between the real core and the hyperthreaded core so that the sum of the two equate to the "expected" effective frequency.

So ive set

CpuNumHyperthreads=0

Lets see what changes.....

And.....did not see any changes. Are others seeing the same thing? Anyway to force affinity to just the real core ?
Can you post a screenshot of HWinfo similar to mine above with min/max/average values captured while running the script? I want to see if I can spot any other possible issue before going down the path of running on just a single thread on each core.

Right now the script assumes 2 available threads per core and uses both of them to better stress each core. If you disable SMP (or Hyperthreading) in the bios, or want to run on just a single thread on each core the script will need to be updated accordingly.
 

·
Registered
Joined
·
23 Posts
Well spotted. I should fix that in the next iteration. Thanks.
Thank you for sharing your script. If I'm ever able to find a 5950x in stock it will make performance tuning much easier.

Question: Why do you suggest setting PBO Scaler to 1x or 2x? Does that improve performance over leaving it set to Auto?
 

·
Overclock the World
Joined
·
2,243 Posts
And.....did not see any changes. Are others seeing the same thing? Anyway to force affinity to just the real core ?
Same sadly, they are not reaching max boost but the error testing seems to work
Sadly tho, we can not see if CO values are fine by this methods are cores do not reach maximum boost
Maybe a better FFT value
Thank you for sharing your script. If I'm ever able to find a 5950x in stock it will make performance tuning much easier.

Question: Why do you suggest setting PBO Scaler to 1x or 2x? Does that improve performance over leaving it set to Auto?
X1 doesn't add any voltage ontop , X2 is used on asus boards
Atm i verify with X5/X6 but
 

·
Registered
Joined
·
107 Posts
Discussion Starter #14 (Edited)
Thank you for sharing your script. If I'm ever able to find a 5950x in stock it will make performance tuning much easier.

Question: Why do you suggest setting PBO Scaler to 1x or 2x? Does that improve performance over leaving it set to Auto?
Auto should be fine too. I actually have not noticed this make a big difference on my mobo & cpu combination. The reason for saying 1x or 2x is just to keep it at a conservative value at least while initially running this test to try and limit the length of time you are running at high voltages and temperatures. I don't know if this is a real issue or not, was mostly just erring on the side of caution.

EDIT: added this info to original post. Thanks for the good question.
 

·
Registered
Joined
·
107 Posts
Discussion Starter #15
Same sadly, they are not reaching max boost but the error testing seems to work
Sadly tho, we can not see if CO values are fine by this methods are cores do not reach maximum boost
Maybe a better FFT value
A higher FFT value should allow it to reach higher boost clocks, but I have not tested that. The value I picked seems to work well for me in finding errors, but I didn't really consider different chips and different cooling.

Another way to reach higher boost clocks would be to do something similar to this script with TM5. I've done that by hand, and in my case it will reach higher boost clocks, but actually finds the same unstable settings as p95.
 

·
Overclock the World
Joined
·
2,243 Posts
A higher FFT value should allow it to reach higher boost clocks, but I have not tested that. The value I picked seems to work well for me in finding errors, but I didn't really consider different chips and different cooling.

Another way to reach higher boost clocks would be to do something similar to this script with TM5. I've done that by hand, and in my case it will reach higher boost clocks, but actually finds the same unstable settings as p95.
i was questioning how they hit it & if it was potentially possible to combine both
(recompile BoostTester & run at the same time like the script does)

Sadly if we take for example CTR AVX light, vs y-cruncher
CTR's voltages always end up too low and could nearly never pass the y-cruncher suite
The method is good on the script, it works for sure
but we have to see, if it isn't sensitive to memOC too like the typical linpack based benchmarks

I surely would prefer AVX2 test over normal AVX or TM5 SSE :)
 

·
Registered
Joined
·
1,294 Posts
Well spotted. I should fix that in the next iteration. Thanks.



Can you post a screenshot of HWinfo similar to mine above with min/max/average values captured while running the script? I want to see if I can spot any other possible issue before going down the path of running on just a single thread on each core.

Right now the script assumes 2 available threads per core and uses both of them to better stress each core. If you disable SMP (or Hyperthreading) in the bios, or want to run on just a single thread on each core the script will need to be updated accordingly.
Here you go.

LLC @6
BO @375
Scaler @10X
CO: +5, -2, -7, -7, +6, -11
CO: +5, -8, -5, -7, +5, -14, these previous CO values passed for Y-Cruncher all tests +5hrs, Y-Cruncher test 15/16 +2hrs, TM5 25 Cycles, RealBench, any CB multi/single

@Veii, yes, a little strange, as with all other workloads ive tested the cores/hyperthreaded cores all get taxed equally, so did not understand why this is occuring.

Regards core0, when looking at task manager, it shows the same thing, though looking at HWInfo64 effective clocks its hitting what I expect it to hit.

HWInfo64 is tweaked to updating evey 500ms

Noticed I have this in Prime95 window

Code:
Error setting affinity to cpuset 0x00000003: No such file or directory.
** EDIT ** the above error message is no longer appearing

I will test the Prime95 settings the script is using by inputing them to a manual all core run to see if any differences ....
** EDIT ** only when doing an all core manual prime95 do all the cores get loaded equally, when running anything less than all core, then some of the cores dont get fully loaded. Must be something to do with the boost algorithm and power envelope ....

2480092
 

·
Registered
Joined
·
1,171 Posts
@blu3dragon Okay I have run my 5950X usually with OCCT for one hour stress test to test and it would pass with a curve offset of -25 on all cores. When I ran your Prime95 single core test I would get core 1, 7, 9 and 12 that would fail but the others pass without an issue. Now I have to mention that I did not run with vcore set to AUTO, I actually do a negative offset of -0.0875v as that is how it always worked with -25 curve offset on all cores.

I decided that I wanted to try and get all cores to pass so I went back to a -10 curve offset on all cores and it worked without an issue, after which I tried -15 curve offset all cores and it also ran fine. Now I've run it with -20 curve optimizer off-set and it basically completed the entire 3 cycles without an issue. I'll test -22 next and if a core fail I will try -21. Interestingly, the effective clocks reported is lower than what is stated as core clocks.
2480097
 

·
Registered
Joined
·
107 Posts
Discussion Starter #19 (Edited)
Thanks for sharing your results.

@mongoled Just from your HWinfo screenshot I think everything is working as expected. The idea here is to only run a single thread of p95 so that testing can be isolated to a single core, while stressing it as much as possible with a relatively high load. Each core will be capable of boosting to a different frequency and effective clock rate with this load. Looks like you had to raise your COs a little over previous testing so I think this means the script is doing its job :) However, I do agree that you are not hitting the maximum boost values on several threads, and given the high core offset you have (which makes sense with a 5600x), then adding additional testing with a different load that allows the cores to boost higher would make sense.
 

·
Registered
Joined
·
107 Posts
Discussion Starter #20
@VPII Assuming you are going for maximum performance, I would try with vcore set to AUTO and see if that raises your effective clocks.
It will probably lower the actual clocks, but that is ok since it is effective clocks that matter for performance, and you want those as close as possible to your actual clocks. Then tune for the lowest CO you can get. This should be easier than trying to tune both CO and vcore offset, and give the same best case result at the end of it (since both CO and vcore offset are trying to reduce voltage, just in different ways). If it helps, you can think of it like vcore offset was the old way to try to undervolt, but CO is the new, better way to do it :)

The second suggestion I have is then to set curve offset per core so you can tune them individually. e.g. for your run with -25 curve offset on all cores, keep the -25 on all the cores that passed, but lower the ones that didn't to -20 and retest. (actually I would even then try -30 on all the cores that passed). Probably some will pass at -30, some only at -25 and your remaining at -20. That would be a very good result and at this point you could stop tuning and run some longer or different tests to confirm everything really is stable.

I realize you have a lot of cores so this will take some time, but I think it would be better to do that and just test each core in steps of 5, rather than try to set all the same and tune down to the level of 1 step. Of course, you could go down to a single step on each core individually as well, but you should probably keep some small margin for temp and power changes over time.
 
1 - 20 of 31 Posts
Top