Overclock.net banner

1 - 20 of 180 Posts

·
Registered
Joined
·
148 Posts
Discussion Starter #1 (Edited)
Yesterday I built my new PC which consists of:

Gigabyte X570 Aorus Pro (latest drivers and BIOS)
Ryzen 9 3950X
Noctua NH-D15 Chromax.black
Kingston HyperX Predator 2x16GB 3600MHz CL18
Gigabyte GTX 1080Ti Aorus Xtreme
Corsair HX750

CPU is most definitely a BEAST. Considering that I switched from a 9 year old i7 2600k, basically everything that I usually do showed immense performance improvements and I had no issues whatsoever.
However, today I wanted to see if the temperatures are okay and started testing with Prime95. Initially I started with blend which worked for ~20 minutes before the PC simply reset itself. So I disabled the XMP and thought that would be the end of it, reran blend and waited for ~40 minutes and just before I stopped the workers it reset again.

Temps were fine both times, usually at ~65, but peaked at ~83 a couple of times. So I guess that's not an issue. I still wasn't sure what's the issue, so I wanted to make sure that the CPU is fine and ran small FFTs considering that it will mostly stress the CPU and not the memory.
Aaaand boy was I wrong to assume that RAM was faulty. Immediately after the worker threads are started, PC just resets. And it happens every single time I run small FFTs. It doesn't even do anything, just starts the workers and bails. No BSODs, no freezing, nothing. A quick reset and I'm back at my Windows desktop.

After each of these resets, Event Viewer logs contain these errors:
A fatal hardware error has occurred.

Reported by component: Processor Core
Error Source: Machine Check Exception
Error Type: Cache Hierarchy Error
Processor APIC ID: 21

I found a couple of similar issues, and an identical one on AMD community forums which was resolved by getting a new CPU. There was also a workaround mentioned to switch the LLC from Auto to Medium. So I tried that as well. LLC at Low and everything above works just fine with small FFTs.
I ran it with LLC Low for more than an hour and didn't have any issues. All cores were at ~3580MHz, Vcore was at ~1.02v, temperature at ~60c. Here's a pic of Ryzen Master at that time: https://i.imgur.com/cpEGc4j.png
Other issues mentioned VRM cooling and CPU overcurrent protection. VRM MOS temps were never above 47 while I did these tests, and Aorus Pro should have sufficient cooling for it.

I also tried setting the clock to 4200MHz and increasing the Vcore to 1.375, which resulted in small FFTs starting and a complete shutdown after 10 seconds or so. That would indicate a power issue, so I could probably surpass it by using the 4pin CPU connector as well (8pin is obviously already connected).

What do you guys think? Is this a case of a best-possible-time-to-get-a-faulty-CPU (considering the coronavirus and everything), or do I just need to tweak the BIOS settings a bit and forget about it?
 

·
Registered
Joined
·
1,071 Posts
run a blender work load and see if it is ok. A stock CPU shouldn't fail on prime95. Personally I would return it.
 

·
Edgy & on the edge
Joined
·
1,539 Posts
Could be a weird auto LLC setting. Too high LLC => too little voltage => instability.
 

·
Premium Member
Joined
·
20,109 Posts

·
Registered
Joined
·
148 Posts
Discussion Starter #5
Blender benchmark passed multiple times without any issues with Low LLC. Prime95 small FFTs also worked well for more than 2 hours, when initially I couldn't even start the test. So far the settings that work flawlessly include either setting LLC to Low (or higher) or turning on PBO, or both.

As far as other temps go, they're pretty much okay. VRM never crossed ~50c even with PBO, and the rest of the motherboard temps were even lower.
 

·
Edgy & on the edge
Joined
·
1,539 Posts
Blender benchmark passed multiple times without any issues with Low LLC. Prime95 small FFTs also worked well for more than 2 hours, when initially I couldn't even start the test. So far the settings that work flawlessly include either setting LLC to Low (or higher) or turning on PBO, or both.

As far as other temps go, they're pretty much okay. VRM never crossed ~50c even with PBO, and the rest of the motherboard temps were even lower.
Glad to hear that I was right. Can you tell me what the max voltage variation is? I'm curious how much of an impact this LLC setting has.

And, by the way, you should update your BIOS to the latest version.
 

·
Registered
Joined
·
148 Posts
Discussion Starter #8
Glad to hear that I was right. Can you tell me what the max voltage variation is? I'm curious how much of an impact this LLC setting has.

And, by the way, you should update your BIOS to the latest version.
BIOS is already the latest one. I'm not sure I noticed any difference between max voltages with Low/Auto LLC. It goes up to ~1.475 or so with both settings for some benchmarks.
For small FFTs voltage (with Low LLC and PBO) voltage was usually around 1.17v with clocks ~3870MHz. Without PBO, it's around ~1.02v with clocks ~3560MHz.


The processor came defective, it should work without errors. you have to take it to the guarantee.
If the CPU is defective I would guess that the issue would be reproducible no matter the LLC/PBO setting?
 

·
Registered
Joined
·
1,810 Posts
The processor came defective, it should work without errors. you have to take it to the guarantee.
replies like this really piss me off. OP... First make sure that your motherboard is flashed with latest bios, then apply "optimized defaults" and reboot, and see if issue comes back. really think we have a bios option dealing with a voltage setting haunting you. setting things all to defaults should fix this issue, but we have to teach you how to tweak this platform after you fix this stability issue.
 

·
Registered
Joined
·
148 Posts
Discussion Starter #10
replies like this really piss me off. OP... First make sure that your motherboard is flashed with latest bios, then apply "optimized defaults" and reboot, and see if issue comes back. really think we have a bios option dealing with a voltage setting haunting you. setting things all to defaults should fix this issue, but we have to teach you how to tweak this platform after you fix this stability issue.
I already tried reverting to defaults in the BIOS, and the first thing I did when I assembled the PC was to flash BIOS to the latest one. Unfortunately, default settings changed nothing. The only things that do help is either manually setting LLC to Low (or higher), enabling PBO, or both.
 

·
Registered
Joined
·
148 Posts
Discussion Starter #11
I just tried to revert to default settings, then setting the voltage offset to +0.05v. That fixes the issue as well. Defaults are just not cutting it for some reason.
 

·
Edgy & on the edge
Joined
·
1,539 Posts
You should compare your low current & high current voltages with other 3950X chips and see if yours simply needs more voltage at stock or if the motherboard is just being weird and not supplying enough.
 

·
Registered
Joined
·
148 Posts
Discussion Starter #14
Maybe the motherboard or it's settings aren't up to snuff.
Mobos can vary greatly at times with how much voltage and how stable do they feed a CPU even at stock :(
You could try a different CPU sample but it's unlikely to help if the mobo is feeding it too low again.
Unfortunately, I'm not able to do that. Mainly because everything is on lockdown because of the virus.

You should compare your low current & high current voltages with other 3950X chips and see if yours simply needs more voltage at stock or if the motherboard is just being weird and not supplying enough.
I'd appreciate any tips on how to check this properly. So far I'm seeing regular voltages with Low LLC and/or PBO.
 

·
Edgy & on the edge
Joined
·
1,539 Posts
Unfortunately, I'm not able to do that. Mainly because everything is on lockdown because of the virus.


I'd appreciate any tips on how to check this properly. So far I'm seeing regular voltages with Low LLC and/or PBO.
To check the voltages, run Cinebench R20 CPU and write down the voltage from cpu-z somewhere once it's stabilized. That's your high current (high load) voltage. After that, run Cinebench CPU Single and write down that voltage. That's your low current (low load) voltage. Use them to compare with other 3950X chips online.
 

·
Registered
Joined
·
184 Posts
@webstar, you can't imagine how happy I am to know about problem. Because I have the exact same issue as you with a 3950X and a Aorus Master x570 (pretty similar configuration). I already thought my CPU was defective, but now I'm blaming Gigabyte's default settings. I have to rise the offset too +0.05V to pass OCCT without errors, so we are in the same boat. Enabling LLC low or PBO also solves the problem, so it seems the exact same problem. It's not a BIOS issue because I've already tried four different versions from 9 to 12e (beta) and each one of them have the same problem. It's just the board not giving enough juice with defaults, which is the first time I recall it happens to me in all my experience with PCs.

I don't know why I switched from Asrock to Gigabyte. Never again, period.
 

·
Registered
Joined
·
285 Posts
I have exactly the same problem!!!
3950x with an X570 Aorus Pro Wifi, BIOS F12e.

Based on this thread: https://community.amd.com/thread/248934, I made the assumption that it was the CPU since I similarly needed to set LLC to Medium to gain stability (LLC Low still had failures). I went through the warranty return process and spent the better part of a month waiting on a replacement chip due to delays caused by the pandemic. The replacement is having the same issue, though different cores tend to fail than on my original chip (Running prime95 on Linux since running it on windows tends to end in a BSOD).

Based on this thread it seems like it might be a Gigabyte power delivery thing, though I was starting to think it might be something to do with bad die binning since the serial number of the replacement chip is very close to that of my original chip.
I'll hopefully be able to borrow an Asus B450 board soon to test the chip with, I will report my findings.
 

·
Registered
Joined
·
184 Posts
I have exactly the same problem!!!
3950x with an X570 Aorus Pro Wifi, BIOS F12e.

Based on this thread: https://community.amd.com/thread/248934, I made the assumption that it was the CPU since I similarly needed to set LLC to Medium to gain stability (LLC Low still had failures). I went through the warranty return process and spent the better part of a month waiting on a replacement chip due to delays caused by the pandemic. The replacement is having the same issue, though different cores tend to fail than on my original chip (Running prime95 on Linux since running it on windows tends to end in a BSOD).

Based on this thread it seems like it might be a Gigabyte power delivery thing, though I was starting to think it might be something to do with bad die binning since the serial number of the replacement chip is very close to that of my original chip.
I'll hopefully be able to borrow an Asus B450 board soon to test the chip with, I will report my findings.
I can confirm LLC low still has failures. Setting it to medium manages to run OCCT without trouble.

But setting the voltage to "auto" seems to be the real culprit, no matter how many other settings you add into the mix. It could even show different results after rebooting, crashes include. It's just plain wrong how Gigabyte's auto setting is managing voltage droops with these boards. Setting the voltage to offset even a tiny bit (-0.01 or +0.01) + LLC gets the CPU stable.

Please keep in touch with us, because going through an international RMA these days is so much hussle. Never again, Gigabyte...
 

·
Registered
Joined
·
148 Posts
Discussion Starter #19
@smonkie, @Pavelow

Good to know I'm not the only one in this boat, as this pretty much shows that it's not a faulty CPU.
Since I couldn't afford being without a PC until the RMA goes through, I'm just using one of the stable settings I've found and got no other issues so far.

Has anybody contacted Gigabyte's support?
 

·
Registered
Joined
·
184 Posts
@smonkie, @Pavelow

Good to know I'm not the only one in this boat, as this pretty much shows that it's not a faulty CPU.
Since I couldn't afford being without a PC until the RMA goes through, I'm just using one of the stable settings I've found and got no other issues so far.

Has anybody contacted Gigabyte's support?
I opened a ticket, but never got answered. Have you made any progress?

I didn't buy Gigabyte since my last Nvidia 980, but this is really the last time I buy anything Gigabyte related.
 
1 - 20 of 180 Posts
Top