Overclock.net › Forums › Industry News › Hardware News › [Phoronix] Segmentation Faults On Zen CPUs Under Heavy Workloads
New Posts  All Forums:Forum Nav:

[Phoronix] Segmentation Faults On Zen CPUs Under Heavy Workloads - Page 2

post #11 of 388
Quote:
Originally Posted by EniGma1987 View Post

Can someone explain to me exactly what the issue is? All that is said in the article is "heavy workloads can cause segmentation faults". And if it is hardware related, why does it only happen in Linux and not Windows?

Iirc a guy on 4chan's /g/ told me he ran into a similar or the same problem when doing hpc calculations on his labs new system. He said if two related threads were working on different CCXs sometimes one thread would hang waiting for another thread to set data, which would never happen. The implication is its related to the CCXs and SMT. AFAIK there are workarounds but it's an annoying, but rare bug. But it shouldn't affect consumers considering Ryzen chips can clear y-cruncher for days without issue. I only skimmed the explanation but apparently somewhere along the lines a register will return a random memory location and cause segfault
New and Shiny
(18 items)
 
  
CPUMotherboardGraphicsRAM
Rynze 7 1700 ASUS ROG Crosshair VI Hero Sapphire HD7950 G.SKILL TridentZ F4-3200C14D 
Hard DriveHard DriveHard DriveHard Drive
Kingston HyperX 3K Crucial MX300 Western Digital Black Western Digital Green 
Hard DriveCoolingOSMonitor
Western Digital Red Noctua NH-U14S Windows 10 Pro Dell U2414H 
MonitorKeyboardPowerCase
Dell P2414H Ducky One Corsair RM650x NZXT H440 White 
MouseAudio
Logitech G502 Proteus Spectrum Xonar DX 
  hide details  
Reply
New and Shiny
(18 items)
 
  
CPUMotherboardGraphicsRAM
Rynze 7 1700 ASUS ROG Crosshair VI Hero Sapphire HD7950 G.SKILL TridentZ F4-3200C14D 
Hard DriveHard DriveHard DriveHard Drive
Kingston HyperX 3K Crucial MX300 Western Digital Black Western Digital Green 
Hard DriveCoolingOSMonitor
Western Digital Red Noctua NH-U14S Windows 10 Pro Dell U2414H 
MonitorKeyboardPowerCase
Dell P2414H Ducky One Corsair RM650x NZXT H440 White 
MouseAudio
Logitech G502 Proteus Spectrum Xonar DX 
  hide details  
Reply
post #12 of 388
this is more an OS level bug not keeping the work separate than a hardware problem
post #13 of 388
Quote:
Originally Posted by prjindigo View Post

this is more an OS level bug not keeping the work separate than a hardware problem

I'm talking out of my backside now, but I would think what prjindigo has said would be correct. If a similar workload doesn't cause the same fault on other OS'es, then how could this be a hardware fault? If it's truly hardware, it could be replicated on pretty much any OS right?

Again...I know nothing of this, I'm sincerely asking.
Red Obsidian
(20 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 4770K Asus Maximus VI Extreme Galaxy GTX 680 2GB DDR5 Corsair Vengeance Pro Series 4x8 32GB DDR3 CMY3... 
Hard DriveHard DriveCoolingCooling
OCZ Vertex 4 256GB SSD  Crucial RealSSD C300 64GB Koolance CPU-370 Koolance PMP-450 
CoolingCoolingCoolingCooling
Koolance HX-CU1320V 4x120 Copper Radiator Koolance HX-CU1020V 3x120 Copper Radiator Koolance HX-CU720V 2x120 Copper Radiator Koolance CTR-CD1224 12/24V Pump and Fan Controller 
CoolingOSMonitorPower
Corsair AF120 High Performance Fans Windows 7 x64 Professional Dell S2409W Seasonic Platinum-1000  
CaseMouseAudioAudio
Corsair Obsidian 900D Logitech G700 Sound Blaster X-Fi Titanium HD  Logitech Z5500 THX 5.1 Surround System 
  hide details  
Reply
Red Obsidian
(20 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 4770K Asus Maximus VI Extreme Galaxy GTX 680 2GB DDR5 Corsair Vengeance Pro Series 4x8 32GB DDR3 CMY3... 
Hard DriveHard DriveCoolingCooling
OCZ Vertex 4 256GB SSD  Crucial RealSSD C300 64GB Koolance CPU-370 Koolance PMP-450 
CoolingCoolingCoolingCooling
Koolance HX-CU1320V 4x120 Copper Radiator Koolance HX-CU1020V 3x120 Copper Radiator Koolance HX-CU720V 2x120 Copper Radiator Koolance CTR-CD1224 12/24V Pump and Fan Controller 
CoolingOSMonitorPower
Corsair AF120 High Performance Fans Windows 7 x64 Professional Dell S2409W Seasonic Platinum-1000  
CaseMouseAudioAudio
Corsair Obsidian 900D Logitech G700 Sound Blaster X-Fi Titanium HD  Logitech Z5500 THX 5.1 Surround System 
  hide details  
Reply
post #14 of 388
interesting! It is more like Compiler Issue it looks like. But it could be one of the erratas too, lets see what AMD says on this.
Haswell i3
(18 items)
 
  
CPUMotherboardGraphicsRAM
Core i3-4150 @ 3.5 GHz Asus B85M-G Rev 1.01, Bios: 2501 Integrated Intel HD 4400 2x 4GB DDR3 1600 MHz CL9 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 750 EVO 250GB Seagate Barracuda 1TB 7200.14 Seagate 500 GB 2.5" Samsung DVD/RW 
CoolingOSMonitorKeyboard
Corsair H70 Windows 10 64 bit Samsung A300N 20" 1600 x 900 60Hz 5ms 19Watt PS/2 Microsoft Wired Keyboard 500 
PowerCaseMouse
Corsair TX850 V2 CoolerMaster Elite 430 Black Logitech M170 
  hide details  
Reply
Haswell i3
(18 items)
 
  
CPUMotherboardGraphicsRAM
Core i3-4150 @ 3.5 GHz Asus B85M-G Rev 1.01, Bios: 2501 Integrated Intel HD 4400 2x 4GB DDR3 1600 MHz CL9 
Hard DriveHard DriveHard DriveOptical Drive
Samsung 750 EVO 250GB Seagate Barracuda 1TB 7200.14 Seagate 500 GB 2.5" Samsung DVD/RW 
CoolingOSMonitorKeyboard
Corsair H70 Windows 10 64 bit Samsung A300N 20" 1600 x 900 60Hz 5ms 19Watt PS/2 Microsoft Wired Keyboard 500 
PowerCaseMouse
Corsair TX850 V2 CoolerMaster Elite 430 Black Logitech M170 
  hide details  
Reply
post #15 of 388
Quote:
Originally Posted by geoxile View Post

Iirc a guy on 4chan's /g/ told me he ran into a similar or the same problem when doing hpc calculations on his labs new system. He said if two related threads were working on different CCXs sometimes one thread would hang waiting for another thread to set data, which would never happen. The implication is its related to the CCXs and SMT. AFAIK there are workarounds but it's an annoying, but rare bug. But it shouldn't affect consumers considering Ryzen chips can clear y-cruncher for days without issue. I only skimmed the explanation but apparently somewhere along the lines a register will return a random memory location and cause segfault

thats interesting thanks for the share

@gupsterg
you may look at this
post #16 of 388
I guess you might have missed my reply earlier http://www.overclock.net/t/1635467/wccf-amd-ryzen-threadripper-1900x-8-core-hedt-cpu-officially-confirmed-will-cost-549-us-and-feature-64-pcie-lanes/50#post_26258423

Anyway I had several asteroids@home AVX WUs error out (several of hundreds) in a Linux VM on Ryzen 7 with Windows 7 Pro host. I found other users had the same issue on Intel i7s.

The error was SEGV
Code:
<core_client_version>7.6.31</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> SIGSEGV: segmentation violation

Caveat: errors were on Linux 4.8 guest, not 4.11

All my skynet POGs and Asteroids@home sse2/sse3 WUs validated.

Also given the amount of memory issues people have running XMP (i.e. Intel spec) timings, it might be a timing issue on certain motherboards? DDR4 2133 doesn't mean the timings are correct.


TL;DR : It's not truly a Ryzen bug if stock clocked intel CPUs error out on the code. This needs to be confirmed.

edit: also I noticed most of the erroring out motherboards are those with cheapo VRMs like ASUS B350 Plus
Edited by AlphaC - 8/4/17 at 3:04pm
Workstation stuff
(407 photos)
SpecViewperf 12.0.1
(179 photos)
 
Reply
Workstation stuff
(407 photos)
SpecViewperf 12.0.1
(179 photos)
 
Reply
post #17 of 388
Quote:
Originally Posted by LancerVI View Post

I'm talking out of my backside now, but I would think what prjindigo has said would be correct. If a similar workload doesn't cause the same fault on other OS'es, then how could this be a hardware fault? If it's truly hardware, it could be replicated on pretty much any OS right?

Again...I know nothing of this, I'm sincerely asking.
It probably is a hardware fault, but Windows probably manages things differently than Linux so the bug is never encountered. In all likeliness, it isn't a serious issue and some Linux/GCC patches will fix the problem with little or no performance penalty.
post #18 of 388
I've heard that this happens when doing kernel compiles for some people, but I've not run into it yet even when overclocked. I'd like to know more.
post #19 of 388
Quote:
Originally Posted by AlphaC View Post

I guess you might have missed my reply earlier http://www.overclock.net/t/1635467/wccf-amd-ryzen-threadripper-1900x-8-core-hedt-cpu-officially-confirmed-will-cost-549-us-and-feature-64-pcie-lanes/50#post_26258423

Anyway I had several asteroids@home AVX WUs error out (several of hundreds) in a Linux VM on Ryzen 7 with Windows 7 Pro host. I found other users had the same issue on Intel i7s.

The error was SEGV
Code:
<core_client_version>7.6.31</core_client_version> <![CDATA[ <message> process exited with code 193 (0xc1, -63) </message> <stderr_txt> SIGSEGV: segmentation violation

Caveat: errors were on Linux 4.8 guest, not 4.11

All my skynet POGs and Asteroids@home sse2/sse3 WUs validated.

Also given the amount of memory issues people have running XMP (i.e. Intel spec) timings, it might be a timing issue on certain motherboards? DDR4 2133 doesn't mean the timings are correct.


TL;DR : It's not truly a Ryzen bug if stock clocked intel CPUs error out on the code. This needs to be confirmed.

thats weird you mention that..

as i was passing gsat 1hr but failing hci when testing my ram overclocks+timings hci passed when i moved my timings up but gsat didnt show me no errors.. HCI will caught the error way too fast not even it make it to a 100% and it will crap out..

I also manage to run 16 instances of hci at the same time and i manage to run non stop 14 of them with no memory errors but i have 2 of them that were throwing memory errors that i was using for testing to raise and lower voltages up/down on the fly with the msi command appy and keep retesting the same memory blocks while the other 14 threads were eating the rest.. Keep getting memory errors on those 2 same threads.

This with no page file to make sure appy wasnt testing the page file instead. I literally maxed out the memory i just have less than 50mb free..

Then i finally give up the 14/14/14 gsat stable and went to 15/15/15 no hci errors.
Edited by zGunBLADEz - 8/4/17 at 3:03pm
post #20 of 388
It's the Ryzen bug from what I understand, here workarounds from FreeBSD and DragonflyBSD:

https://svnweb.freebsd.org/base?view=revision&revision=321899

https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Hardware News
Overclock.net › Forums › Industry News › Hardware News › [Phoronix] Segmentation Faults On Zen CPUs Under Heavy Workloads