I would like to kindly ask for help on an instability issue I have. Even though I already posted this over at the G.SKILL and MersenneForum, I will summarize it here again:
Basically, if I run Prime95 v28.9 blend stress test, sooner or later, the test will error out on a worker. The time is unfortunately spread rather wide: 13h, 19h, 39h, 49h and today 14h. The FFT sizes for the last three failures were 8k, 800k and 320k. The failures are always zeroed results like "FATAL ERROR: Final result was 00000000, expected: EE20AC08" and never rounding errors.
The system is generally not overclocked -- except for the memory and thus system agent and imc. But the last few tests were done completely at stock, meaning the memory was also clocked at DDR4-2133 @ 1.20v and everything at stock voltages.
MemTest86 v7 Pro with parallel testing on all cores revealed no errors but I have to admit, the longest I had it running was for 13h since I do need the machine for work.
I have meanwhile swapped the memory kit for the exact same model and the last failed test was performed with the new memory kit. Prior to that with the "old" kit, I also raised the VDIMM voltage, which had no effect either.
Temperatures are all in the green since the case is well ventilated and the CPU is kept well in line with a Noctua NH-D15S. During the test, I monitor everything with 500ms resolution through HWMonitor64 and see no anomalies whatsoever.
Here the specification of the machine:
- ASUS Z170-Deluxe with EFI firmware 1902
- G.SKILL Trident Z F4-3200C14D-32GT
- Intel 6700k (MC 74)
- Corsair HX850i
- Zotac GTX 1080 FE
- 4x Seagate ST2000NM0033 (Intel Rapid Storage RAID10)
The magic performance enhancements in the UEFI firmware are all off. Everything voltage related is on Auto. I also tried setting Vcore to Offset with SVID enabled, but that made no difference either.
I am now considering swapping CPU and board but since I cannot really locate the culprit, it leaves a very bad feeling that this becomes an endless cycle/story. I am still not 100% convinced that this is not another Skylake and/or Prime95 bug. Over in the MersenneForum, another user reported he is seeing the exact same behavior on a different brand board -- which naturally can also have an absolutely different cause. I know.
If anyone has any ideas or suggestions... I would really appreciate any helping hand. This is really grueling since I need the machine to be stable and dependable. And since I am walking in the dark here, it is even more frustrating. With real software bugs, I can at least have a look at the sources and fix it myself... but that is sheer impossible with hardware. :-(
Thanks so much in advance for any help,