Originally Posted by Desolutional
Also my lawd, several hundred
hours of Prime95 on an OCed chip. No wonder you keep killing these things.
That dead Haswell-E had fewer stress test hours on it than any other OCed CPU in my possession (and there are about a dozen CPUs that have been OCed currently in my possession), other than this brand new part, of course. Pretty much all of my CPUs eventually see at least a few thousand hours of stress testing, and far more than that amount of time of heavy continuous use...the overwhelming majority of them are still working without issue.
Anyway, I isolated the cause of failure, with good certainty, to be a weak uncore/excessive uncore clock/volts/VLs. 90% of the time I ran Prime on the part was within the first month I had it, and I didn't start encountering issues until after I moved the part to my SOC Champion and starting pushing the uncore well past what was attainable on my OC Formula. The part remained stable with the same core OC and vcore throughout the time I had it, even when I needed significant overvolting of the uncore to keep it stable at stock. No doubt stress testing resulted in faster degradation of the uncore, but it was only fast enough to be an issue because the uncore was pushed too far.
Ultimately, my first sample was a necessary sacrifice so that I can better understand the limitations of the silicon. My goal with OCing has always been, to extract the most performance from a part possible, under the most demanding scenarios, while achieving stock or better levels of stability. I'm looking for settings that will execute any code conceivable without appreciably degrading the part over the time I intend to keep it.
Maybe my first sample wasn't representative. Maybe I'll kill this new sample as well. I doubt this will be the case, but I'll learn from such an occurrence, and I'll keep refining my settings until I have discovered the best way to take advantage of the margins of whatever chips survive.
I'm not going to stop running tests because the tests get better at extracting performance from my hardware, I'll adjust my settings so that the hardware can survive. A few parts may need to die in the process of finding out where the boundary between probably safe and probably not safe are, but that's trial and error that can't really be substituted with anything else...unless I want to sacrifice even more performance or confidence in stability.