Overclock.net banner
21 - 40 of 105 Posts

·
Registered
Joined
·
21 Posts
I have no WHEA errors at 1900 FCLK or lower. They start at 2000, so I can't run 1:1 without issues unless I stay at 3800 or below.
This pretty much sums up the issue and this thread.

Let's remind ppl the official rated FCLK is only 1600. The "official rated OC" from AMD Marketing (which I have little love when compared to AMD engineering) is only 1900.

In the end, to solve WHEA 19, isn't suppressing it, not hoping a magical combination of knobs to solve it. But a simple twist of your mind and face the reality.

Peace
 

·
Registered
Joined
·
73 Posts
Discussion Starter · #23 ·
This pretty much sums up the issue and this thread.

Let's remind ppl the official rated FCLK is only 1600. The "official rated OC" from AMD Marketing (which I have little love when compared to AMD engineering) is only 1900.

In the end, to solve WHEA 19, isn't suppressing it, not hoping a magical combination of knobs to solve it. But a simple twist of your mind and face the reality.

Peace
So we just say every over 3200 should be that buggy and we stop overclocking and close the forum?


There were and are whea 18s/20s, usb issues, pcie issues at bone stock. In my oppinion this is not related to us reaching instability, but the platform beeing immature and inherently unstable.

AMD didn't do proper testing / validation and rushed out a buggy product to the market. (I think the least they owe their customers is open communication, but I don't see that happening anytime soon. Lots of issues are not publicly adressed and every Agesa version officially fixed the usb dropouts ...)
 

·
Registered
Joined
·
739 Posts
So we just say every over 3200 should be that buggy and we stop overclocking and close the forum?


There were and are whea 18s/20s, usb issues, pcie issues at bone stock. In my oppinion this is not related to us reaching instability, but the platform beeing immature and inherently unstable.

AMD didn't do proper testing / validation and rushed out a buggy product to the market. (I think the least they owe their customers is open communication, but I don't see that happening anytime soon. Lots of issues are not publicly adressed and every Agesa version officially fixed the usb dropouts ...)
Cartoon Ecoregion Natural environment Organism Art
 

·
Registered
Joined
·
804 Posts
There were and are whea 18s/20s, usb issues, pcie issues at bone stock.
Whea 18 at stock most likely means your CPU is undoubtedly subject to RMA.
 

·
Iconoclast
Joined
·
31,568 Posts
Even if this eventually gets fixed, this has caused my hate for AMD to return and it is unlikely I will ever purchase anything from them again.
There is nothing to 'fix' here. You're taking a part and deliberately pushing it beyond it's capabilities by the least advantageous means possible. The part may not conform to your preferences, but that doesn't mean it's broken (firmware is another matter, I admit).

It's true that Zen 3 doesn't OC as well as Intel, and doesn't benefit much from core OCs without cooling capable of handling the fairly extreme thermal densities involved, but this should not have been a surprise.

Maybe my CPU is a poor silicon sample. That might explain my disappointment with Ryzen. But, PBO and curve optimizer overclocking simply do not work worth a damn for me.
I don't think a better sample would be enough of an improvement to give you the experience you're looking for.

The overriding limit on Zen 3 is almost always going to be temperature.

I wish I could find a bare die solution for the 5950X.
What's holding you back?

All the examples I have seen of a delidded 5950X were not good (I haven't seen many). The issue is that the chiplets are not at exactly the same height, so the advantage from delidding (very thin thermal interface material) is not there. At best the same as stock. :(
Lap the dies.

Also, the advantage from running bare die isn't just a thin TIM, it's the lack of the thermal resistance from ~2.5mm of excess copper between the dies and the cooler.

So we just say every over 3200 should be that buggy and we stop overclocking and close the forum?
No, but anything over official spec should be seen as a bonus, rather than a given.

95%+ of Vermeer parts will do 1900+ FCLK, but one that does less, while still meeting spec, isn't defective.

There were and are whea 18s/20s, usb issues, pcie issues at bone stock.
Not something I've personally encountered, but I'm aware of plenty who have. These parts are defective.
 

·
Overclocked / Overvolted
Joined
·
237 Posts
Last night I just said to hell with it, disabled the WHEA logging and let it rip. I let the chiller run long enough to get the CPU temps down into the single digits (a couple of cores below zero), ratcheted up the FCLK to 1:1 with MEMCLK and let it rip. Took some 1st, 2nd and 3rd place spots on the HWBOT/Futuremark leaderboards by no longer caring what happens. (Magic like that doesn't happen with PBO zombie overclocking.) No more logging and no more overhead created by the pointless and idiotic logging (thousands of times even) of a known defect. It's still Ryzen, but I like it better this way. I'd rather have fun killing it , or finding out how much abuse it can handle, than make concessions for its functional limitations. I didn't buy it to be a hardware babysitter.

What's holding you back?
Nothing, other than identifying how to mount the waterblock properly. I am looking for a solution, but haven't found one yet. It's unfortunate that the AM4 socket design presents a physical impediment with that stupid plastic bulge next to the CPU. Very poor design. An LGA implementation like Intel and Threadripper would have been a far better solution, but then a lot of people would have mitched and boaned about a socket change and whined about AMD "breaking its promises" even if the outcome would be dramatically better.
 

·
Overclock the World
Joined
·
3,283 Posts
In the end, to solve WHEA 19, isn't suppressing it, not hoping a magical combination of knobs to solve it. But a simple twist of your mind and face the reality.

Peace
Explain with twisting reality, why my sample does not WHEA at all - not at 2100 or 2133 :p
I don't need any suppressor nor do all people who own the first batch of samples.
The discussion isn't that easy. WHEA #19 has nothing to do with the IMC of the unit
 

·
Iconoclast
Joined
·
31,568 Posts
Nothing, other than identifying how to mount the waterblock properly. I am looking for a solution, but haven't found one yet. It's unfortunate that the AM4 socket design presents a physical impediment with that stupid plastic bulge next to the CPU. Very poor design. An LGA implementation like Intel and Threadripper would have been a far better solution, but then a lot of people would have mitched and boaned about a socket change and whined about AMD "breaking its promises" even if the outcome would be dramatically better.
The socket lid can be modified so the CPU dies will be the highest part. The socket lid is also removable, so you can modify a spare, if you have access to one, without mangling the original, should you need to restore the board to like-new condition.

LGA does have better electrical and mechanical properties, and AMD is finally switching to it on their consumer parts with AM5.
 

·
Registered
Joined
·
141 Posts
Last night I just said to hell with it, disabled the WHEA logging and let it rip. I let the chiller run long enough to get the CPU temps down into the single digits (a couple of cores below zero), ratcheted up the FCLK to 1:1 with MEMCLK and let it rip. Took some 1st, 2nd and 3rd place spots on the HWBOT/Futuremark leaderboards by no longer caring what happens. (Magic like that doesn't happen with PBO zombie overclocking.) No more logging and no more overhead created by the pointless and idiotic logging (thousands of times even) of a known defect. It's still Ryzen, but I like it better this way. I'd rather have fun killing it , or finding out how much abuse it can handle, than make concessions for its functional limitations. I didn't buy it to be a hardware babysitter.


Nothing, other than identifying how to mount the waterblock properly. I am looking for a solution, but haven't found one yet. It's unfortunate that the AM4 socket design presents a physical impediment with that stupid plastic bulge next to the CPU. Very poor design. An LGA implementation like Intel and Threadripper would have been a far better solution, but then a lot of people would have mitched and boaned about a socket change and whined about AMD "breaking its promises" even if the outcome would be dramatically better.
Look at it as a problem with the cold plate and not with the mounting?
 

·
Registered
Joined
·
21 Posts
Explain with twisting reality, why my sample does not WHEA at all - not at 2100 or 2133 :p
I don't need any suppressor nor do all people who own the first batch of samples.
The discussion isn't that easy. WHEA #19 has nothing to do with the IMC of the unit
Preach to your peers/followers. I'm not one of them.

:)
 

·
Overclock the World
Joined
·
3,283 Posts
Preach to your peers/followers. I'm not one of them.
I do not understand any religious/sects claims. You are behaving rude atm
Writing was logical. My sample runs it and is stable without any cheaterie active
WHEA Suppressor came to life after debugging WHEA #19 reason with the community here, and @ManniX-ITA had an idea what to do to help users out *

I have no connection to this, except for continuing the research (which is still not entirely done, else a fix would be published)
So far it falls to
"a sensor on remain newer batches, triggers WHEA#19 error. By an DPM power management issue , connected via (x)GMI"
"it has no connection whatsoever with memOC and has no direct connection to PCH issues or USB/PCIe dropouts"

* again, i do not have any #19, while they are reported correctly for me. There is no need to suppress something that doesn't exist
#18 only when i do nonsense with the CPU & #20 has no meaning if you check the error source of it

EDIT:
The goal of the research, is to replicate the same bug ~ first batch units have (november) while AMD messed something up on the later batches
When the research has completed, everyone will be WHEA #19 free (not disabled). So far i have couple of samples that don't have the issue, while everyone else has
This is unspecified by mainboards or cooling ability. Nor has anything to do with silicon lottery.
(my sample is silver/gold ~ very mediocre)

EDIT2:
I sadly lack the support & have no NDA ~ to get RSMU tools, in overriding the Ryzens
Soo logically this topic will take time. But currently i do have an idea what is , a clear idea ~ yet the research is not done, soo everyone has to wait.
AMD unlikely will fix it. I have a feeling, they are clueless so far ~ sadly
 

·
Robotic Chemist
Joined
·
4,314 Posts
"a sensor on remain newer batches, triggers WHEA#19 error. By an DPM power management issue , connected via (x)GMI"
"it has no connection whatsoever with memOC and has no direct connection to PCH issues or USB/PCIe dropouts"
I don't understand why the assumption that it isn't simply unstable at those clocks, making some DPM feature stop working correctly. It only happens if you set the FCLK too high, it seems very connected to the FCLK OC.

When the research has completed, everyone will be WHEA #19 free (not disabled).
I don't see why this is necessarily true, you cannot know the result of research that isn't done. Just because something changed batch to batch (Did it? What statistics do we have?) does not mean the problem is something that can be solved with software changes.

This is unspecified by mainboards or cooling ability. Nor has anything to do with silicon lottery.
(my sample is silver/gold ~ very mediocre)
Your sample can do 2100/2133 FCLK without errors and you call it mediocre? :confused:

Why do you think you don't have a golden IO die? How do you know it has nothing to do with the silicon lottery? Maybe Global Foundries started making slightly worse IO dies later or something.
 

·
Overclock the World
Joined
·
3,283 Posts
I don't understand why the assumption that it isn't simply unstable at those clocks, making some DPM feature stop working correctly. It only happens if you set the FCLK too high, it seems very connected to the FCLK OC.
Because we went over this with the community on the DRAM 24/7 thread months ago
It is not that the CPU is incapable to hold it. It passes all FCLK focused tests and does show positive scaling in performance
It is not always package throttled (while it is for me on 2133 so far, else i would run it)

Package throttle & instability are two different things. It can be perfectly stable, but autocorrect and throttle fully
On my AMD Maximum Voltage post, i wrote about it, 9 months ago
Reaching 2100 took months, 2033 was easy.

The problem is not instability. Instability is instability and people will notice it very fast when it hard shutdowns or shows cache related test errors
But yet close to everybody (i still think everybody) has to be able to run 2000 FCLK with ease
All the voltage prediction patters are set till 2067, they work perfectly fine across numerous samples & match personal voltage research.

The #19 issue has no connection with VDDG or cLDO_VDDP.
It has no direct connection to procODT , but has a subtle connection to LCLK DPM and normal DPM link speed.
In such case fixed DPM settings 2-1-1-2-2-1-1-2 Inside NBIO, SMU Common options ~ can help
Yet are there to override DPM balancing, and not to fix any big issues.

The issue is sadly a by design mistake made.
Whea #19 on their own , do not mean "errors".
Analyzing how WHEA are reported, and what they mean (they have device ID, and error types written) came to the result that such are "meaningless"
Soo WHEA Suppressor came to life.
The issue was not throttling, else it would have been seen. The sideproduct issue was, that all these reports caused huge DPC latency spikes and throttled performance.
It reports 200 messages a sec , and slows everything down.
Having global suppressor active (after the system is stable) fixed this and performance scaled up.
Package throttling is visible once you know what to look for, but reading (hopefully) ManniX's thread , should show you that performance kept scaling up ~ till he was unstable near 2067 FCLK

DPM is active, else i would have issues with NVMe's and PCIe devices
Also would have issues with LCLK not functioning ~ which i can bug out. Meaning the option is active and the sensor shows it's active

I don't see why this is necessarily true, you cannot know the result of research that isn't done. Just because something changed batch to batch (Did it? What statistics do we have?) does not mean the problem is something that can be solved with software changes.
I can tell the research is done, once the WHEA #19 issues are gone.
(EDIT: My sample is very mediocre, higher than average but not exceptional. I need 30-40mV more SOC than other samples and similar people)
There are multiple samples of mine, which are WHEA #19 free (not unreported, but just free from the issue at 2100 FCLK)

My research has nothing to do with WHEA suppressor but with the hardware itself
At first it where NIC errors ~ hence #19 has link connection (issue) with the PCH
But that wasn't the core issue, but just a byproduct issue. The core issue is a sensorics one. That's the conclusion i came to after couple of months fiddling with it

EDIT:
The sensor is not a software one but a hardware one.
If anybody bothers to give me RSMU in the hidden ~ i can likely fix it.
RSMU and similar have low access level to the samples - near PSP-FW access.
In order to permanently fix the issue AMD has to supply a PSP-Firmware with the fix ~ yet they very likely have no clue about resolving the issue. Only that the issue is there
Your sample can do 2100/2133 FCLK without errors and you call it mediocre? :confused:

Why do you think you don't have a golden IO die? How do you know it has nothing to do with the silicon lottery? Maybe Global Foundries started making slightly worse IO dies later or something.
Because of the sample size i have & the conclusion taken by the Zen RAM OC sheet
All dual CCD bugged samples from november, don't have this issue and can run higher FCLK.
(Even at XMP on a 4000C16-16 kit. It doesn't need any user voltage interaction)
Repeating myself a bit ~ the voltage patterns/presets for this FCLK are set in stone. AMD has tested them and they run
An FCLK that doesn't run, will result in a hard "no-boot"

SMU 56.34 Patch-C tried to enforce an 1900 FCLK but gladly community and me where able to show proof that such a lock is not necessary ~ as it was done on Matisse (Matisse can also run 1966 FCLK, before it got publicly locked down ~ in order to make PCIe 4.0 look good on launch)
Now the issue is only to fix this missreporting sensor.
I know it's missreporting and buggy, hence on some samples it runs ~ and on these samples, i have seen WHEA #19 once by doing stupid things
The sensor is not surpressed, but the issue is a collection of sensoric hardware issues. It is something fixable. Sadly i need RSMU tools to override the Ryzens
About dual CCD unlocking , that's another topic and i don't want to speak about it

EDIT 2:
Don't think about RMA'ing, since i've heard people do such
You can not get first batch dual CCD 6 or 8 cores anymore. They might be spread on ebay, but no you can not get them & AMD doesn't want them to exist
Also they have a broken V/F curve by being 12 or 16 cores. Soo are completely unstable on stock without 10-15mV positive vCore offset or CO edits
Warning is here since people did such and quote me about it. I remain certain that close to every sample (even more with knowledgeable community here) can run 2000FCLK. Likely higher, but 2000 is easy without any settings change
 

·
Registered
Joined
·
2,291 Posts
All the examples I have seen of a delidded 5950X were not good (I haven't seen many). The issue is that the chiplets are not at exactly the same height, so the advantage from delidding (very thin thermal interface material) is not there. At best the same as stock. :(
Wow, I would not have expected that, really find it difficult to believe thats the case (not doubting your info) as with all the automation in fabrication etc

Preach to your peers/followers. I'm not one of them.

:)
Wow, really, that comment was completely unnecessary !
 

·
Registered
Joined
·
569 Posts
Preach to your peers/followers. I'm not one of them.

:)
Please, let it go.

It is not worth the discussion, to be honest!

Especially, because there is absolutely no moderation on this forum.
So it's like battling in vain, and it is not worth fighting with words.

Better win the war, helping users who need, it is what matter the most.
And "we" provided help and advises, when it was the time, so no big deal!

So just, let it go, please. :(
 

·
Registered
Joined
·
73 Posts
Discussion Starter · #37 ·
I do not understand any religious/sects claims. You are behaving rude atm
Writing was logical. My sample runs it and is stable without any cheaterie active
WHEA Suppressor came to life after debugging WHEA #19 reason with the community here, and @ManniX-ITA had an idea what to do to help users out *

I have no connection to this, except for continuing the research (which is still not entirely done, else a fix would be published)
So far it falls to
"a sensor on remain newer batches, triggers WHEA#19 error. By an DPM power management issue , connected via (x)GMI"
"it has no connection whatsoever with memOC and has no direct connection to PCH issues or USB/PCIe dropouts"

* again, i do not have any #19, while they are reported correctly for me. There is no need to suppress something that doesn't exist
#18 only when i do nonsense with the CPU & #20 has no meaning if you check the error source of it

EDIT:
The goal of the research, is to replicate the same bug ~ first batch units have (november) while AMD messed something up on the later batches
When the research has completed, everyone will be WHEA #19 free (not disabled). So far i have couple of samples that don't have the issue, while everyone else has
This is unspecified by mainboards or cooling ability. Nor has anything to do with silicon lottery.
(my sample is silver/gold ~ very mediocre)

EDIT2:
I sadly lack the support & have no NDA ~ to get RSMU tools, in overriding the Ryzens
Soo logically this topic will take time. But currently i do have an idea what is , a clear idea ~ yet the research is not done, soo everyone has to wait.
AMD unlikely will fix it. I have a feeling, they are clueless so far ~ sadly
Hmm what changed from novemeber? Did they do a "silent" revision? Did they have yield issues and changed the binning criteria (was it okay for this sensor to be defect in november?)? Would be interesting to confirm if all early chips don't get whea 19s.

I will start a poll when this thread is more active.

Preach to your peers/followers. I'm not one of them.

:)
I would like to ask you and everyone else to be part of a constructive civilized discussion. Absolutism and personal attacks have no place here.

Thank you
 

·
Registered
Joined
·
21 Posts
I do not understand any religious/sects claims. You are behaving rude atm
I think you misunderstood my post. So I feel the need to clarify further.

I'm not interested in pushing hardware to its extreme. I never was and never will be. I'm also pretty sure myself way beyond the young demographic for such a game or seeking any fame out of it.

As an engineer by training, I do enjoy reading good analysis with sound reasoning.

So carry on.

And peace
 

·
Registered
Joined
·
16 Posts
[...]
It can not fix the user not having to optimize 1.8VDD rail , VDDG IOD & SOC.
[...]
so bringing the 1,8V rail as close as possible to the said 1,8V might do the trick? mine is in between 1,808 and 1,824.

[...]
The #19 issue has no connection with VDDG or cLDO_VDDP.
It has no direct connection to procODT , but has a subtle connection to LCLK DPM and normal DPM link speed.
In such case fixed DPM settings 2-1-1-2-2-1-1-2 Inside NBIO, SMU Common options ~ can help
Yet are there to override DPM balancing, and not to fix any big issues.
[...]
so those dpm settings COULD fix the #19 ? :unsure: or did I get this wrong ?
 

·
Registered
Joined
·
804 Posts
  • Sad
Reactions: K0N574N71N
21 - 40 of 105 Posts
Top