Joined
·
107 Posts
2/28/21 - Uploaded new version.
Refactored into a few functions that should help with adding more tests in the future
Added a 2nd test that keeps the same p95 process running and quickly moves it around cores in the system, picking the next core randomly.
2/24/21 - Uploaded new version.
Added support for systems with SMT disabled (disable this in the BIOS and the script will work automatically)
Added logging routine with timestamp so if the machine crashes you can see where it was up to (thanks @helsyeah for this part)
Added the use of $PSScriptRoot so script will run without using run with powershell (thanks @helsyeah)
Added check to see if p95 dir already exists (thanks @helsyeah)
Added detection if prime95 exits unexpectedly without an error
Implemented a nicer way to exit prime95
Added option $use_smt in case anyone wants to experiment with that
Now on github: jasonpoly/per-core-stability-test-script
2/21/21 - Uploaded new version.
Fixes errors reported by @domdtxdissar
Added faster error detection
Added ability to continue testing remaining cores after an error (on by default)
Added some additional configuration to limit range of cores to test
I decided to write a simple script to help with stability testing when tuning curve offset on Zen 3 CPUs.
This uses a single thread of prime95, non-avx, a fixed fft size of 84, and then automatically changes core affinity, stopping when an error is hit.
There is a small cooldown period between testing each core, allowing cores to boost higher when the test is started, potentially exposing issues with higher boosts at lower temps.
Suggested usage:
Start with some BIOS settings that should be stable
Set fan speeds, etc as you would like.
Set CPU voltage to auto.
Set memory & SoC voltages, timing, frequency to defaults (or a known stable setting).
Enable PBO
Set power limits to Motherboard.
Set curve offset to 0 for all cores.
Set PBO boost override to 0.
Leave PBO Scaler at auto or start with a conservative 1x or 2x setting (you can raise this later if you wish but I have not found it to make much difference)
Optionally set PBO temperature limit to 85 (or some other value below the default 90 that you are comfortable with stress testing at).
Before you start
Exit all background applications in windows. Note: this is important since other software can stop a single core reaching it's highest possible boost setting.
Run prime95 manually, launch a single thread stress test and check temps and voltages are acceptable to you.
Launch a multi-core stress test and do the same.
Exit prime95
Use Cinebench, cpu-z, or some other benchmark to record baseline values for both single and multi-core.
Download the zip file containing the script
Extract to a folder of your choice. Inside you should see 4 files:
Download p95v303b6.win64.zip from GIMPS - Free Prime95 software downloads - PrimeNet and place into the same folder you extracted the zip into.
Exit all background applications in windows.
Note: this is important since some background applications can stop a single core reaching it's highest possible boost setting.
I would recommend leaving a single instance of HWInfo or another temperature and voltage monitoring application open. This does two things:
AT YOUR OWN RISK
Right click on p95_core_cycle.ps1 and select "Run with PowerShell"
It should open a PowerShell window and launch prime95 the background:
In case you exit the above window for any reason, make sure you also exit or stop any copies of Prime95 that are running (otherwise they will continue to run until you reboot):
While running the script for the first time (or after getting a new version or making any changes), check to see if the affinity is being set correctly using windows task manager. Note that due to SMT (Hyperthreading), each "core" corresponds to two CPU's in task manager. So when the script is testing on "core 0", you should see prime95.exe assigned to "CPU 0" and "CPU 1" as below:
After running the script
Make a note of any passing or failing cores.
Reboot and lower the curve offset for any cores that pass.
Raise the curve offset for any cores that fail.
I typically start by adjusting in steps of 10. If any fail, then back off by 5.
Example scenario
In initial test, all cores are stable at with a curve offset of 0, so set all to a curve offset of -10 and re-run
Now cores 2&3 fail, but all other cores pass.
Back off by 5 on cores 2&3, so set them both to -5.
Since all other cores passed, set them to -20.
Core 2 still fails, core 3 passes, cores 6 & 7 fail.
Set core 2 back to 0. Leave core 3 at -5 (you have found the stable limits for cores 2 & 3). Set cores 6 & 7 to -15. Set all remaining cores to -30.
Continue until you have found a stable setting for each core.
Now re-run your original benchmarks and record any improvements.
If you were able to set a negative offset on all cores, you can try a PBO boost override of ~50-100Mhz.
Keep core offset the same and re-check stability.
You might find you need to add some voltage back to individual cores at this point.
Re-run your benchmarks. If you did not gain anything (or now have lower results), set PBO boost override back to 0 and your curve offsets back to the previous settings.
Well done. You have successfully tuned your PBO curve offset and reached a (reasonably) stable limit for your cpu, motherboard and PSU.
Now do some longer stress tests, or just use the PC for a while to make sure it really is stable.
[Optional] Edit the default values in p95_core_cycle.ps1
Open p95_core_cycle.ps1 in a text editor and edit the top few lines in case you want to change anything from the defaults:
$p95path="p95v303b6.win64.zip"; # path to p95 .zip you want to extract and use
$loops=3; # Number of times to loop arount all cores
$cycle_time=180; # Approx time in s to run on each core (180s matches included p95 config of 3 min test time)
$cooldown=15; # Time in s to cool down between testing each core
$stop_on_error=0; # If "0" continue to next core if error found. If "1" stop.
# adjust next two values to limit testing to a specific range of cores
$first_core=0; # First core to test in each loop. Any cores lower than this number will not be tested.
$last_core=31; # Last core to test in each loop. MAX value=31. Any cores (that exist) higher than this number will not
You can reduce the loops to 2 and cycle time to 120s if you want to speed up initial iterations.
A fixed FFT size of 84 seems to cause my cpu cores to fail quickly when they are unstable, but you should experiment with different values and post your feedback.
Once you appear to have stable settings, I would recommend increasing the cycle time and number of loops. I have seen cores that appear to be stable fail after 1hr or more.
2hrs on each core would be a very stable setting to reach.
You might want to try testing with AVX or AVX2 enabled.
You should also use some other software to confirm complete stability.
Once you are happy with your CPU stability, you can work on memory & FCLK.
Refactored into a few functions that should help with adding more tests in the future
Added a 2nd test that keeps the same p95 process running and quickly moves it around cores in the system, picking the next core randomly.
2/24/21 - Uploaded new version.
Added support for systems with SMT disabled (disable this in the BIOS and the script will work automatically)
Added logging routine with timestamp so if the machine crashes you can see where it was up to (thanks @helsyeah for this part)
Added the use of $PSScriptRoot so script will run without using run with powershell (thanks @helsyeah)
Added check to see if p95 dir already exists (thanks @helsyeah)
Added detection if prime95 exits unexpectedly without an error
Implemented a nicer way to exit prime95
Added option $use_smt in case anyone wants to experiment with that
Now on github: jasonpoly/per-core-stability-test-script
2/21/21 - Uploaded new version.
Fixes errors reported by @domdtxdissar
Added faster error detection
Added ability to continue testing remaining cores after an error (on by default)
Added some additional configuration to limit range of cores to test
I decided to write a simple script to help with stability testing when tuning curve offset on Zen 3 CPUs.
This uses a single thread of prime95, non-avx, a fixed fft size of 84, and then automatically changes core affinity, stopping when an error is hit.
There is a small cooldown period between testing each core, allowing cores to boost higher when the test is started, potentially exposing issues with higher boosts at lower temps.
Suggested usage:
Start with some BIOS settings that should be stable
Set fan speeds, etc as you would like.
Set CPU voltage to auto.
Set memory & SoC voltages, timing, frequency to defaults (or a known stable setting).
Enable PBO
Set power limits to Motherboard.
Set curve offset to 0 for all cores.
Set PBO boost override to 0.
Leave PBO Scaler at auto or start with a conservative 1x or 2x setting (you can raise this later if you wish but I have not found it to make much difference)
Optionally set PBO temperature limit to 85 (or some other value below the default 90 that you are comfortable with stress testing at).
Before you start
Exit all background applications in windows. Note: this is important since other software can stop a single core reaching it's highest possible boost setting.
Run prime95 manually, launch a single thread stress test and check temps and voltages are acceptable to you.
Launch a multi-core stress test and do the same.
Exit prime95
Use Cinebench, cpu-z, or some other benchmark to record baseline values for both single and multi-core.
Download the zip file containing the script
- via GitHub: jasonpoly/per-core-stability-test-script
- via tinyupload: p95_single_core_torture.210228.zip
Extract to a folder of your choice. Inside you should see 4 files:
- local.txt - prime95 configuration file
- prime.txt - prime95 configuration file
- p95_core_cycle.ps1 - the powershell script you need to run
- ReadMe.txt
Download p95v303b6.win64.zip from GIMPS - Free Prime95 software downloads - PrimeNet and place into the same folder you extracted the zip into.
Exit all background applications in windows.
Note: this is important since some background applications can stop a single core reaching it's highest possible boost setting.
I would recommend leaving a single instance of HWInfo or another temperature and voltage monitoring application open. This does two things:
- It lets you monitor temps, clocks, voltages to make sure they are at safe levels.
- It provides an occasional interrupt which might expose instability.
AT YOUR OWN RISK
Right click on p95_core_cycle.ps1 and select "Run with PowerShell"
It should open a PowerShell window and launch prime95 the background:
In case you exit the above window for any reason, make sure you also exit or stop any copies of Prime95 that are running (otherwise they will continue to run until you reboot):
While running the script for the first time (or after getting a new version or making any changes), check to see if the affinity is being set correctly using windows task manager. Note that due to SMT (Hyperthreading), each "core" corresponds to two CPU's in task manager. So when the script is testing on "core 0", you should see prime95.exe assigned to "CPU 0" and "CPU 1" as below:
After running the script
Make a note of any passing or failing cores.
Reboot and lower the curve offset for any cores that pass.
Raise the curve offset for any cores that fail.
I typically start by adjusting in steps of 10. If any fail, then back off by 5.
Example scenario
In initial test, all cores are stable at with a curve offset of 0, so set all to a curve offset of -10 and re-run
Now cores 2&3 fail, but all other cores pass.
Back off by 5 on cores 2&3, so set them both to -5.
Since all other cores passed, set them to -20.
Core 2 still fails, core 3 passes, cores 6 & 7 fail.
Set core 2 back to 0. Leave core 3 at -5 (you have found the stable limits for cores 2 & 3). Set cores 6 & 7 to -15. Set all remaining cores to -30.
Continue until you have found a stable setting for each core.
Now re-run your original benchmarks and record any improvements.
If you were able to set a negative offset on all cores, you can try a PBO boost override of ~50-100Mhz.
Keep core offset the same and re-check stability.
You might find you need to add some voltage back to individual cores at this point.
Re-run your benchmarks. If you did not gain anything (or now have lower results), set PBO boost override back to 0 and your curve offsets back to the previous settings.
Well done. You have successfully tuned your PBO curve offset and reached a (reasonably) stable limit for your cpu, motherboard and PSU.
Now do some longer stress tests, or just use the PC for a while to make sure it really is stable.
[Optional] Edit the default values in p95_core_cycle.ps1
Open p95_core_cycle.ps1 in a text editor and edit the top few lines in case you want to change anything from the defaults:
$p95path="p95v303b6.win64.zip"; # path to p95 .zip you want to extract and use
$loops=3; # Number of times to loop arount all cores
$cycle_time=180; # Approx time in s to run on each core (180s matches included p95 config of 3 min test time)
$cooldown=15; # Time in s to cool down between testing each core
$stop_on_error=0; # If "0" continue to next core if error found. If "1" stop.
# adjust next two values to limit testing to a specific range of cores
$first_core=0; # First core to test in each loop. Any cores lower than this number will not be tested.
$last_core=31; # Last core to test in each loop. MAX value=31. Any cores (that exist) higher than this number will not
You can reduce the loops to 2 and cycle time to 120s if you want to speed up initial iterations.
A fixed FFT size of 84 seems to cause my cpu cores to fail quickly when they are unstable, but you should experiment with different values and post your feedback.
Once you appear to have stable settings, I would recommend increasing the cycle time and number of loops. I have seen cores that appear to be stable fail after 1hr or more.
2hrs on each core would be a very stable setting to reach.
You might want to try testing with AVX or AVX2 enabled.
You should also use some other software to confirm complete stability.
Once you are happy with your CPU stability, you can work on memory & FCLK.