Overclock.net banner

My system with a Ryzen 5000 CPU reboots with BIOS defaults

1181 - 1200 of 1293 Posts

·
Registered
Joined
·
256 Posts
its not 2% "there are 75k votes, so that 2% means 1500 bad cpus versus 15750 good cpus... " its mean 10% FAILURE CPU
15000 good and 1500 bad!!! 10%!
I was referring to this part of the article:

A third vendor provided even more information. The company said it isn’t seeing PowerGPU’s reported failure rates with its own systems. Interestingly, however, the vendor actually shared data indicating that Ryzen parts are failing the company’s internal quality screening at a higher rate compared to Intel chips—almost three times as high:

  • Ryzen 5000 series fails at 2.9 percent.
  • Ryzen 3000 series fails at 3 percent.
  • ThreadRipper 3000 series fails at 2.5 percent.
For comparison, the company' data on Intel chips:

  • Intel 9th-gen fails at 0.9 percent.
  • Intel 10th-gen fails at 1.2 percent.
But it is 3%, not 2%, sry.
 

·
OG AMD
Joined
·
8,927 Posts
It's the same AGESA version, you won't get much difference.

Try to test stability like this: boot into windows, open chrome and some tabs like facebook, this one, imdb, epic games front page, open Cinebench R20 run MC test a few times and then close Chrome, open Chrome a bunch of times and then browse normally.

Test as much as you want but don't wait for the 14 day return period to expire. You can make the return claim to the store now and legally you have another 14 days after you make the claim to actually return the product. If you decide you don't want to return it anymore, don't return it. They won't mind.
This is pure bs. Know we know how Intel is trying to inflate return rates. Losers. Guy gets an error and your response is to return the cpu. Get real man.
 

·
Registered
Joined
·
67 Posts
This is pure bs. Know we know how Intel is trying to inflate return rates. Losers. Guy gets an error and your response is to return the cpu. Get real man.
Yeah, I am pretty sure the guy is an undercover Intel employee. Dude snap out of it. Personally I am against the idea of "omg, you got a sudden reboot then immediately return the CPU" and I advocate troubleshooting first buuuut there's obviously enough evidence that it does happen a lot. And Andrei didn't even say to the guy straight up return it but to try to consistently reproduce the error first but keep his options open. And yeah if you have everything in stock settings and you keep getting WHEA 18 errors and you are within your return window then returning it straight up to the shop is a much easier thing to do considering you can get your money back too while if the window passes then you are commited to this cpu forever (which obviously at some point you ll end up with a working one if you persevere but some people just prefer to switch to intel and end this).
 

·
Registered
Joined
·
120 Posts
I am even working on a benchmarking tool to constantly reproduce the issue, but i still need a few more days to have it ready.

But yeah, I'm an intel employee trying hard to make you return your good cpus that constantly crash your system with other good cpus that work perfectly once exchanged.
 

·
Registered
Joined
·
16 Posts
Just wanted to add my personal experience. I purchased my 5950x in November and it had been running WHEA free for months, with PBO and RAM overclocking, which had all been tested extensively. Suddenly, a little over a week ago, I started getting random reboots, and the event was the dreaded WHEA event 18 "Cache Hierarchy Error". So, I searched and found this thread promptly. Over a few days, I read the entire thread - all 60 pages and 12,000 posts, from beginning to end. I tried multiple BIOS versions, BIOS resets, CMOS reset (on the thought this is more thorough than an in-BIOS settings reset), VDDP/VDDG adjustments, everything. I was still getting WHEA 18 even with 100% BIOS defaults. So, based on everything I had read, I opened an RMA with AMD. That RMA is now approved and I have the shipping label in-hand. However, I will not be using it, as I have found the problem in the meantime. There is nothing wrong with my 5950x. In fact, I have restored my aggressive BIOS settings w/ PBO and RAM overclocking, and have been WHEA-free, 100% solid stable for 3 days now. This is with extended periods of idle time as well. In the end, my issue was the result of a combination of the latest RX 6800/6900 adrenaline drivers and hwinfo + GPU-Z, which I always have running in the background. The issue is well documented here and here, but the short version is - I updated my hwinfo to the latest beta and WHEAs go bye-bye. This is not to diminish any of the other folks who are suffering a true hardware defect. I'm sure many of those are valid. However, in the interest of balance and perhaps saving someone else from an unnecessary and time-consuming RMA process, I wanted to post this. Hope this helps someone.
 

·
Registered
Joined
·
4 Posts
Just wanted to add my personal experience. I purchased my 5950x in November and it had been running WHEA free for months, with PBO and RAM overclocking, which had all been tested extensively. Suddenly, a little over a week ago, I started getting random reboots, and the event was the dreaded WHEA event 18 "Cache Hierarchy Error". So, I searched and found this thread promptly. Over a few days, I read the entire thread - all 60 pages and 12,000 posts, from beginning to end. I tried multiple BIOS versions, BIOS resets, CMOS reset (on the thought this is more thorough than an in-BIOS settings reset), VDDP/VDDG adjustments, everything. I was still getting WHEA 18 even with 100% BIOS defaults. So, based on everything I had read, I opened an RMA with AMD. That RMA is now approved and I have the shipping label in-hand. However, I will not be using it, as I have found the problem in the meantime. There is nothing wrong with my 5950x. In fact, I have restored my aggressive BIOS settings w/ PBO and RAM overclocking, and have been WHEA-free, 100% solid stable for 3 days now. This is with extended periods of idle time as well. In the end, my issue was the result of a combination of the latest RX 6800/6900 adrenaline drivers and hwinfo + GPU-Z, which I always have running in the background. The issue is well documented here and here, but the short version is - I updated my hwinfo to the latest beta and WHEAs go bye-bye. This is not to diminish any of the other folks who are suffering a true hardware defect. I'm sure many of those are valid. However, in the interest of balance and perhaps saving someone else from an unnecessary and time-consuming RMA process, I wanted to post this. Hope this helps someone.
I am happy you solved it. In my case I don't have hwinfo installed. I've decided to go back to Intel, will ask my local vendor a refund and go full intel.
 

·
Registered
Joined
·
8 Posts
Which by the way, casted shades onto the HWInfo maintainer, because a lot of users jumped on the bandwagon attacking HWInfo, saying it was the cause of WHEA errors.
When in reality, HWInfo developers have been working tirelessly with the community over the years now, to provide decent and stable monitoring, over many platforms.
Once again, problematic AMD hardware and/or software was the cause, mainly because AMD does not share much with developers, linux ryzen monitoring still have issues nowadays.

And by the way, since we are citing Reddit, AMD has gone full Apple mode regarding the USB issues:
"AMD is aware of reports that a small number of users are experiencing intermittent USB connectivity issues reported on 500 Series chipsets."

AMD also seems to have fixed the RyzenMaster issue, when paired with dual 5600x/5800x cpu's.


Sadly nowadays, it is not because something is good on paper, that the stuff is certified to be good once in your hands.
That's a lot of issues all together, if you ask me!
 

·
Registered
Joined
·
120 Posts
Just wanted to add my personal experience. I purchased my 5950x in November and it had been running WHEA free for months, with PBO and RAM overclocking, which had all been tested extensively. Suddenly, a little over a week ago, I started getting random reboots, and the event was the dreaded WHEA event 18 "Cache Hierarchy Error". So, I searched and found this thread promptly. Over a few days, I read the entire thread - all 60 pages and 12,000 posts, from beginning to end. I tried multiple BIOS versions, BIOS resets, CMOS reset (on the thought this is more thorough than an in-BIOS settings reset), VDDP/VDDG adjustments, everything. I was still getting WHEA 18 even with 100% BIOS defaults. So, based on everything I had read, I opened an RMA with AMD. That RMA is now approved and I have the shipping label in-hand. However, I will not be using it, as I have found the problem in the meantime. There is nothing wrong with my 5950x. In fact, I have restored my aggressive BIOS settings w/ PBO and RAM overclocking, and have been WHEA-free, 100% solid stable for 3 days now. This is with extended periods of idle time as well. In the end, my issue was the result of a combination of the latest RX 6800/6900 adrenaline drivers and hwinfo + GPU-Z, which I always have running in the background. The issue is well documented here and here, but the short version is - I updated my hwinfo to the latest beta and WHEAs go bye-bye. This is not to diminish any of the other folks who are suffering a true hardware defect. I'm sure many of those are valid. However, in the interest of balance and perhaps saving someone else from an unnecessary and time-consuming RMA process, I wanted to post this. Hope this helps someone.
Could be that the new drivers just use a combination of instructions that trigger the behavior in the CPU. No one can tell for sure, except AMD.

But I've read the threads and something interesting I noticed is that they say: Every single reboot reported as WHEA 18 with different APIC numbers.
This means each time a different cpu core caused the fault. So maybe this is a good indication it's not the CPU.

In my case and others members in this thread, we are always getting the same APIC (thread) number in the WHEA errors. That means a specific bad core.

So maybe a good advice for everyone is to look at the APIC number in the WHEA events if it's always the same or not (or at least the same 1-2 cores).
 

·
Registered
Joined
·
52 Posts
Hi, I just finished my 5950X build weeks ago but was kinda frustrated with the random reboots and 'CPU Over Temperature Error' prompts from my motherboard so I figured I might as well write something about it here...


My build :

Ryzen 9 5950X
Asus Crosshair VIII Dark Hero (BIOS 3204)
G.Skill Trident Z Royale 4x32GB 3600 18-22-22-42 kit
Western Digital SN850 1TB NvmE Pci-E 4.0 x4 SSD ( + Intel 750 SSD, 2x 6TB HDDs from old build)
AMD Radeon 6900 XT (Reference)
Corsair AX-1000
IceGiant ProSiphon Elite CPU air cooler
Lian-li PC-O11D XL with 9x Arctic P12 PWM-PST fans

IceGiant cooler came with the Thermal Grizzly Kryonaut paste which I believe is better than most non-conductive compound in the market?
( well it should at least beat my MX-4 lying on the shelf.. )


So the CPU is not stable at all whenever PBO is enabled (with or without F-max or board core performance boost enabled)
it straight out spits 0x124 across my event logger with some occasional WHEA ID 18 logs flanked with maybe one or two ID 19 corrected ones
even if it ever lets me run benchmarks or CPU-Z score, the PBO clocks are simply not right with the performance and single core score shows my core clocks are super stretched
my CPU-Z score only shows single core at ~630 - 640 which is literally what my manual OC at 4.6Ghz could also score
in Cinebench R23 at PBO multi-core score is about 24k which I believe is on par with a stock 5950X
though even when accounting the core stretching at stock PBO, the core isn't even reaching above 4.95Ghz in HWinfo64 logs

Funny is my 5950X is only more stable when I punch manual OC and vcores to it (nothing too crazy, only CCD1 4.6 and CCD2 4.45 at 1.26V and LLC3 )
yet at manual OC it'll instead gives random black screen reboots or the motherboard screams 'CPU Over Temperature Error' while temps are only sitting at 70 ~ 80C

My previous build is an X99 platform, used to have a 6950X with 10 cores overclockable which I would have not needed an upgrade if it didn't die
fun fact, that 6950X died relatively slowly, from being able to OC at 4.1Ghz down to only able to run stock 3.5Ghz and finally a click to Post code 00
throughout that 9-month period (yes, that chip only lived 9-months) it just keeps spitting out random 0x124s or system hang ups which I can never replicate with stress tools like prime95 or occts

This 5950X gave me pretty much the flashbacks of that horrible chip


--


Small update :
As I attempted to type out this reply here in the meantime as I try and verify the chip with all default no oc settings
(aka everything auto with clock boost disabled and CPU at 3.4Ghz, memory 2133/1066FCLK)
it still crashes me with 0x124, I'll assume RMA is my only choice

The CPU batch no is 2047PGS (Malaysia), a fairly late chip I thought would be free of the WHEA issues but apparantly NOPE....

So much I loved you your Silver Sample 5950X but nah... I need a more stable 24/7 platform than these....


(This is my 2nd AMD cpu and it kinda didn't went well, my Intel 6950X also didn't went well...the names look similar, coincidence? : P)

also because of how scarce and scalped these chips are, mine came from Amazon instead of a local retailer which meant that I probably would have to bear shipping cost just to do the RMA as the times needed to wait for other parts to come in I've already long passed the return window Amazon had


update 2 :
Image for the whea and bsod logs
2479765
2479766

small note : CPU-Z shows I'm running 4.6Ghz, it's only because at stock or PBO and everything auto the PC would not stay up past 10 minutes and I had to go back to my manual oc attempts just to allow me time to do screen snippings and scroll through logs.... lolz
 

·
Registered
Joined
·
16 Posts
Could be that the new drivers just use a combination of instructions that trigger the behavior in the CPU. No one can tell for sure, except AMD.

But I've read the threads and something interesting I noticed is that they say: Every single reboot reported as WHEA 18 with different ACPI numbers.
This means each time a different cpu core caused the fault. So maybe this is a good indication it's not the CPU.

In my case and others members in this thread, we are always getting the same ACPI (thread) number in the WHEA errors. That means a specific bad core.

So maybe a good advice for everyone is to look at the ACPI number in the WHEA events if it's always the same or not (or at least the same 1-2 cores).
Yes, I agree that the takeaway should not be "hwinfo crashes Zen3". Even though hwinfo was able to work around the problematic code path, the root-cause of the issue is apparently still present, and any other application could (in theory) trigger that code path in just the right way to result in the same WHEA 18 "cache hierarchy error". So, ultimately, AMD will need to fix this either with a driver update or an AGESA update. In the meantime, I think your guidance about taking note of the APIC IDs is sound. For me, I was seeing WHEA 18 on at least 8 different APIC IDs, corresponding to 4 different cores. It's possible that if I waited through it longer, I would have seen it on even more APIC IDs. The possibility of that many cores being defective is very small, especially when the CPU was otherwise stable for months. So, everyone should take note of the APIC IDs corresponding to the WHEA 18 events. If it's more than a couple different IDs, it may point toward the issue I was experiencing (referenced in post # 1186), which, as far as I know, would only be applicable if you have a RDNA2 GPU and the latest Adrenaline drivers.
 

·
Registered
Joined
·
11 Posts
My first reply from AMD,

Provided here are some troubleshooting suggestions to help isolate the root cause(s) and resolve the problem. Make sure to check the system for stability after completing each step below:
1.Update the system BIOS to latest version available from motherboard manufacturer (refer to motherboard user manual for instructions on updating the BIOS).
2.Set the BIOS to use factory default settings / optimized default settings (refer to motherboard user manual for instructions on restoring BIOS default settings).
3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).
4.Update Windows to the latest version and build via Windows Update. For instructions, refer to article.
5.Update to latest chipset driver from AMD. For instructions, refer to article.
6.In Windows Control Panel, select Power Options and choose the Balanced (recommended) power plan. In Windows Settings, select Power & sleep and set the Performance and Energy slider to the middle.
7.Disable non-Microsoft services and startup items using the System Configuration Tool. For instructions, refer to article.
8.Reseat CPU, RAM, and all PSU power connections (end-to-end for modular PSUs). For more instructions, refer the product’s user manual.
Verify RAM sticks are installed in the correct DIMM slots (for socket AM4 motherboards with 4 DIMM slots, use A2 & B2).
 

·
Registered
Joined
·
52 Posts
My first reply from AMD,

Provided here are some troubleshooting suggestions to help isolate the root cause(s) and resolve the problem. Make sure to check the system for stability after completing each step below:
1.Update the system BIOS to latest version available from motherboard manufacturer (refer to motherboard user manual for instructions on updating the BIOS).
2.Set the BIOS to use factory default settings / optimized default settings (refer to motherboard user manual for instructions on restoring BIOS default settings).
3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).
4.Update Windows to the latest version and build via Windows Update. For instructions, refer to article.
5.Update to latest chipset driver from AMD. For instructions, refer to article.
6.In Windows Control Panel, select Power Options and choose the Balanced (recommended) power plan. In Windows Settings, select Power & sleep and set the Performance and Energy slider to the middle.
7.Disable non-Microsoft services and startup items using the System Configuration Tool. For instructions, refer to article.
8.Reseat CPU, RAM, and all PSU power connections (end-to-end for modular PSUs). For more instructions, refer the product’s user manual.
Verify RAM sticks are installed in the correct DIMM slots (for socket AM4 motherboards with 4 DIMM slots, use A2 & B2).
I'll assume this means expect some weeks before finally recieving an approval for RMA from AMD : (
 

·
Registered
Joined
·
97 Posts
Funny is my 5950X is only more stable when I punch manual OC and vcores to it (nothing too crazy, only CCD1 4.6 and CCD2 4.45 at 1.26V and LLC3 )
yet at manual OC it'll instead gives random black screen reboots or the motherboard screams 'CPU Over Temperature Error' while temps are only sitting at 70 ~ 80C
Wait, you were getting reboots even with manual overclock? Were sleep states enabled?
 

·
Registered
Joined
·
11 Posts
Getting back with an update after performing all the tests for AMD.

From 8 steps, 7 passed and one did not.

3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).

After changing that setting and booted in to Windows, system rebooted in less then 3 minutes.

Is that an evidence of a bad CPU or not? What do you think?
 

·
Registered
Joined
·
25 Posts
Is that an evidence of a bad CPU or not? What do you think?
I think that 'Typical' for 'Power Supply Idle Control' should make the system more stable for older PSUs. It could fix such called cold reboots - then there is no even a BSOD - pc just reboots the same way as by turning off power on PSU.

So if it crashes on a 'Typical' setting - it's another bad CPU evidence.
 

·
Registered
Joined
·
120 Posts
Getting back with an update after performing all the tests for AMD.

From 8 steps, 7 passed and one did not.

3.In the BIOS, locate the Power Supply Idle Control option and set it to Typical (this option should be available in the Advanced section of the BIOS).

After changing that setting and booted in to Windows, system rebooted in less then 3 minutes.

Is that an evidence of a bad CPU or not? What do you think?
Typical Idle Current just means that the CPU keeps an active core at all times even if there is no load on it just to make sure the power supply does not think your computer is sleeping due to too low power consumption. Low Idle Current (Default I think) means that all cores in the CPU can sleep at once with low voltages.

Did not make any difference in my bad 5900x, already tried it. But they did mention it because Ryzen 1xxx or 2xxx series had a bug where this setting caused them to deep sleep forever until hard reset.

For more info, I took these screenshots while trying the settings:

Typical Idle Current - You can see that even though all cores are asleep, 1 core actually stays at 0.98V and the CPU idle power consumption is 7.3W and EDC is 3% = 6A.
2479966


Low Idle Current - All voltages are under 0.54V and CPU power consumption is just 3W and EDC is 1% = 2A.
2479967


Since the SOC either way draws 18W at all times I don't see how the extra 4W of core CPU power would make any difference for the power supply. But the current draw is indeed triple.
 

·
Registered
Joined
·
120 Posts
Also azomiss, go into Event Viewer -> Windows Logs -> System and on the right use Filter Current Log, select Event level only Error and Event sources only WHEA-Logger.
Then look into all occurrences of the WHEA errors and tell us if the APIC IDs reported are always the same or they vary.
If it's always the same it means that only 1 CPU core causes the errors.

Also, I've left you a private message to try to use Positive Curve Optimizer to something like 10 all-core and see if the reboots are gone. That is a very clear indication of bad CPU.
 
1181 - 1200 of 1293 Posts
Top