Overclock.net › Forums › Overclockers Care › Overclock.net BOINC Team › Zen segmentation fault and BOINC projects
New Posts  All Forums:Forum Nav:

Zen segmentation fault and BOINC projects

post #1 of 16
Thread Starter 

I have been letting my AMD cards mine to build up enough cash to pay for a Ryzen system.  No sooner do I get close than this issue comes to the forefront.  Apparently it has been known since April but really getting a lot more press the past few days. 

 

Any of you guys running Zen having issues?  If so are they only happening in Linux, or with specific projects?

 

If you don't know what I am talking about, that's good since you probably aren't having issues, but here is a link to some more information on the problem:

 

http://www.overclock.net/t/1635749/phoronix-segmentation-faults-on-zen-cpus-under-heavy-workloads

W10 Desktop
(17 items)
 
Linux Desktop
(14 items)
 
Win 10 HTPC
(12 items)
 
CPUMotherboardGraphicsGraphics
Intel 5820k ASRock x99 Extreme 4 GTX-960 GTX-750ti 
GraphicsRAMHard DriveHard Drive
HD 7850 4 x 4G DDR4 128G Corsair SSD WD 1G  
Hard DriveOptical DriveCoolingOS
WD 2G generic CDRW/DVD CM 212 EVO Win 10 
MonitorKeyboardPowerCase
ASUS 1920x1200 IPS Microsoft Be Quiet 1000 Antec 305 
Mouse
Reaper 
CPUMotherboardGraphicsRAM
AMD FX-6300 ASUS M5A97 R2.0 GTX-760 Samsung 2 x 4G Wonder Ram 
Hard DriveHard DriveOptical DriveCooling
128G Corsair SSD WD 500G Blue generic CDRW/DVD Corsair H60 
OSMonitorKeyboardPower
Linux Mint 17.2 ASUS 1920x1200 IPS Microsoft XFX 650 
CaseMouse
NZXT 210 Reaper 
CPUMotherboardGraphicsRAM
AMD Phenom II B93 Gigabyte  EVGA GT 740 FTW Misc DD2-800 2x2G and 2x1G 
Hard DriveOptical DriveOSMonitor
Seagate  generic  Win 10 Tech Preview 64 bit HDTV 
KeyboardPowerCaseMouse
HP Corsair cx430 Aptevia HTPC IBM 
  hide details  
Reply
W10 Desktop
(17 items)
 
Linux Desktop
(14 items)
 
Win 10 HTPC
(12 items)
 
CPUMotherboardGraphicsGraphics
Intel 5820k ASRock x99 Extreme 4 GTX-960 GTX-750ti 
GraphicsRAMHard DriveHard Drive
HD 7850 4 x 4G DDR4 128G Corsair SSD WD 1G  
Hard DriveOptical DriveCoolingOS
WD 2G generic CDRW/DVD CM 212 EVO Win 10 
MonitorKeyboardPowerCase
ASUS 1920x1200 IPS Microsoft Be Quiet 1000 Antec 305 
Mouse
Reaper 
CPUMotherboardGraphicsRAM
AMD FX-6300 ASUS M5A97 R2.0 GTX-760 Samsung 2 x 4G Wonder Ram 
Hard DriveHard DriveOptical DriveCooling
128G Corsair SSD WD 500G Blue generic CDRW/DVD Corsair H60 
OSMonitorKeyboardPower
Linux Mint 17.2 ASUS 1920x1200 IPS Microsoft XFX 650 
CaseMouse
NZXT 210 Reaper 
CPUMotherboardGraphicsRAM
AMD Phenom II B93 Gigabyte  EVGA GT 740 FTW Misc DD2-800 2x2G and 2x1G 
Hard DriveOptical DriveOSMonitor
Seagate  generic  Win 10 Tech Preview 64 bit HDTV 
KeyboardPowerCaseMouse
HP Corsair cx430 Aptevia HTPC IBM 
  hide details  
Reply
post #2 of 16
Well, shoot. Guess I'll finish up my 1700 build and go looking for gremlins, rather than skipping to TR. Throwing away <$300 on the CPU that I still need to get would bother me, but not as much as >$500 & a pricey board & more pricey RAM. Might get lucky and avoid the issue, or a least generate a juicy bug report. tongue.gif

edit: Ah, heck with it, I'll leave that to better minds. Just ordered a 1300X to avoid the whole multi-threading issue, as well as save a bit more cash for that TR build, assuming they get this thing sorted.
Edited by C4pt41n M0 R0n - 8/6/17 at 1:06pm
post #3 of 16
bfromcolo , funny you ask.

I had a few asteroids@home avx WUs error out in Linux 4.8 guest (Mint) a month or two ago in Virtualbox guest. It wasn't a bare metal installation. Hundreds of SSe2/sse3 WUs all worked.

Skynet POGs : all WUS validated.

LHC @Home: all Wus validated

Other non BOINC tests : Prime95 Blend passed 4.5hours , memtest 86+ 6 passes

----

To be more specific the task for the WU lists:
Server state Over
Outcome Computation error
Client state Compute error
Exit status 193 (0xc1) EXIT_SIGNAL

and in the log
Code:
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (7 frames):
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx(boinc_catch_signal+0x47)[0x4228c7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f45893cd0c0]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x40b9bb]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x407394]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x40f651]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f458903d2b1]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x405761]

Exiting...

</stderr_txt>
]]>
Another client version , on Linux 4.8.0-53-generic
Code:
<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (7 frames):
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx(boinc_catch_signal+0x47)[0x4228c7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f9248b12390]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x40b9bb]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x407394]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x40f651]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f9248758830]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x405761]

Exiting...

</stderr_txt>
]]>


I PMed Kong over at asteroids@home since I had no response on the asteroids@home forum.


As far as I can tell, "libpthread" is the multithreading

I highly recommend Linux kernel 4.11 , since supposedly Linux 4.8 is causing issues to the point that LHC stops Linux 4.8 from reporting results (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4362)

Also if using Ryzen try to get AGESA 1.0.0.6. Tune your memory such that it is stable, don't rely on XMP! I'm on high performance plan rather than Ryzen balanced (it still drops power regardless since it isn't overclocked CPU clocks right now); 1.1V SOC fixed volts and 1.35 or 1.4V for memory depending on the sticks.

Boards I know have AGESA 1.0.0.6 and have decent power delivery for 8 cores (Click to show)
I had to enable SVM mode to get the 64-bit guest OS to work.
Edited by AlphaC - 8/5/17 at 12:55pm
post #4 of 16
Thread Starter 

Thanks for the post.  I'm surprised we haven't seem more complaints from BOINC users.

 

Guess I will wait and see what happens with Zen for a couple weeks.  Hopefully AMD can do something, since it seems to affect B1 and B2 (TR and EPYC) stepping.  I can only imagine the impact on AMD sales (especially server sales) if they can't provide a fix.

 

 

Quote:

Originally Posted by AlphaC View Post

I highly recommend Linux kernel 4.11 , since supposedly Linux 4.8 is causing issues to the point that LHC stops Linux 4.8 from reporting results (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4362)

 

Funny thing about that thread is it seems to be Intel multi-threading issues and not Ryzen.  I usually run whatever kernel Mint/Ubuntu give me, but with the most recent AMDGPU-Pro driver I had to upgrade the kernel to 4.10 I think.

W10 Desktop
(17 items)
 
Linux Desktop
(14 items)
 
Win 10 HTPC
(12 items)
 
CPUMotherboardGraphicsGraphics
Intel 5820k ASRock x99 Extreme 4 GTX-960 GTX-750ti 
GraphicsRAMHard DriveHard Drive
HD 7850 4 x 4G DDR4 128G Corsair SSD WD 1G  
Hard DriveOptical DriveCoolingOS
WD 2G generic CDRW/DVD CM 212 EVO Win 10 
MonitorKeyboardPowerCase
ASUS 1920x1200 IPS Microsoft Be Quiet 1000 Antec 305 
Mouse
Reaper 
CPUMotherboardGraphicsRAM
AMD FX-6300 ASUS M5A97 R2.0 GTX-760 Samsung 2 x 4G Wonder Ram 
Hard DriveHard DriveOptical DriveCooling
128G Corsair SSD WD 500G Blue generic CDRW/DVD Corsair H60 
OSMonitorKeyboardPower
Linux Mint 17.2 ASUS 1920x1200 IPS Microsoft XFX 650 
CaseMouse
NZXT 210 Reaper 
CPUMotherboardGraphicsRAM
AMD Phenom II B93 Gigabyte  EVGA GT 740 FTW Misc DD2-800 2x2G and 2x1G 
Hard DriveOptical DriveOSMonitor
Seagate  generic  Win 10 Tech Preview 64 bit HDTV 
KeyboardPowerCaseMouse
HP Corsair cx430 Aptevia HTPC IBM 
  hide details  
Reply
W10 Desktop
(17 items)
 
Linux Desktop
(14 items)
 
Win 10 HTPC
(12 items)
 
CPUMotherboardGraphicsGraphics
Intel 5820k ASRock x99 Extreme 4 GTX-960 GTX-750ti 
GraphicsRAMHard DriveHard Drive
HD 7850 4 x 4G DDR4 128G Corsair SSD WD 1G  
Hard DriveOptical DriveCoolingOS
WD 2G generic CDRW/DVD CM 212 EVO Win 10 
MonitorKeyboardPowerCase
ASUS 1920x1200 IPS Microsoft Be Quiet 1000 Antec 305 
Mouse
Reaper 
CPUMotherboardGraphicsRAM
AMD FX-6300 ASUS M5A97 R2.0 GTX-760 Samsung 2 x 4G Wonder Ram 
Hard DriveHard DriveOptical DriveCooling
128G Corsair SSD WD 500G Blue generic CDRW/DVD Corsair H60 
OSMonitorKeyboardPower
Linux Mint 17.2 ASUS 1920x1200 IPS Microsoft XFX 650 
CaseMouse
NZXT 210 Reaper 
CPUMotherboardGraphicsRAM
AMD Phenom II B93 Gigabyte  EVGA GT 740 FTW Misc DD2-800 2x2G and 2x1G 
Hard DriveOptical DriveOSMonitor
Seagate  generic  Win 10 Tech Preview 64 bit HDTV 
KeyboardPowerCaseMouse
HP Corsair cx430 Aptevia HTPC IBM 
  hide details  
Reply
post #5 of 16
Quote:
Originally Posted by bfromcolo View Post

Funny thing about that thread is it seems to be Intel multi-threading issues and not Ryzen.  I usually run whatever kernel Mint/Ubuntu give me, but with the most recent AMDGPU-Pro driver I had to upgrade the kernel to 4.10 I think.


That's the Skylake hyperthread bug (also on AVX) tongue.gif

https://arstechnica.com/information-technology/2017/06/skylake-kaby-lake-chips-have-a-crash-bug-with-hyperthreading-enabled/
https://www.extremetech.com/computing/251499-major-hyper-threading-flaw-can-destabilize-intel-cpus-based-kaby-lake-skylake

& debian mailing list https://lists.debian.org/debian-devel/2017/06/msg00308.html

edit: per Distrowatch the new Ubuntu 17.04 released this week should come with 4.10 kernel

Per some users you can install kernel 4.11 manually
Code:
$ cd /tmp

$ wget \

kernel.ubuntu.com/~kernel-ppa/mainline/v4.11.7/linux-headers-4.11.7-041107_4.11.7-041107.201706240231_all.deb \

kernel.ubuntu.com/~kernel-ppa/mainline/v4.11.7/linux-headers-4.11.7-041107-generic_4.11.7-041107.201706240231_amd64.deb \

kernel.ubuntu.com/~kernel-ppa/mainline/v4.11.7/linux-image-4.11.7-041107-generic_4.11.7-041107.201706240231_amd64.deb
Install via
Code:
$ sudo dpkg -i linux-headers-4.11*.deb linux-image-4.11*.deb

There's also UKUU (Ubuntu Kernel Update Utility)
https://fossbytes.com/install-linux-kernel-4-12-ubuntu-mint/
Manually for 4.12 kernel
Code:
cd /tmp/
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-headers-4.12.0-041200_4.12.0-041200.201707022031_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-headers-4.12.0-041200-generic_4.12.0-041200.201707022031_amd64.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-image-4.12.0-041200-generic_4.12.0-041200.201707022031_amd64.deb
Install via
Code:
sudo dpkg -i *.deb
Update grub
Code:
sudo update-grub

Edited by AlphaC - 8/5/17 at 2:04pm
post #6 of 16
Thread Starter 

Well I finally built up enough cash mining to take the plunge and bought a 1700, mother board and 16G of memory, it cost me about $90 in electricity.  I was hoping to get to a thread ripper but with my daily profit mining dropping from $8 a day to $1 I am about ready to stop mining, and went ahead and got it.  Unfortunately Newegg is still (2 weeks ago) shipping old stock and I got a pre-week 25 chip that fails the kill Ryzen script, I will probably RMA it at some point.  But I haven't seen any issues so far with BOINC projects.  Sensors don't work yet so I haven't tried any overclocking. 

W10 Desktop
(17 items)
 
Linux Desktop
(14 items)
 
Win 10 HTPC
(12 items)
 
CPUMotherboardGraphicsGraphics
Intel 5820k ASRock x99 Extreme 4 GTX-960 GTX-750ti 
GraphicsRAMHard DriveHard Drive
HD 7850 4 x 4G DDR4 128G Corsair SSD WD 1G  
Hard DriveOptical DriveCoolingOS
WD 2G generic CDRW/DVD CM 212 EVO Win 10 
MonitorKeyboardPowerCase
ASUS 1920x1200 IPS Microsoft Be Quiet 1000 Antec 305 
Mouse
Reaper 
CPUMotherboardGraphicsRAM
AMD FX-6300 ASUS M5A97 R2.0 GTX-760 Samsung 2 x 4G Wonder Ram 
Hard DriveHard DriveOptical DriveCooling
128G Corsair SSD WD 500G Blue generic CDRW/DVD Corsair H60 
OSMonitorKeyboardPower
Linux Mint 17.2 ASUS 1920x1200 IPS Microsoft XFX 650 
CaseMouse
NZXT 210 Reaper 
CPUMotherboardGraphicsRAM
AMD Phenom II B93 Gigabyte  EVGA GT 740 FTW Misc DD2-800 2x2G and 2x1G 
Hard DriveOptical DriveOSMonitor
Seagate  generic  Win 10 Tech Preview 64 bit HDTV 
KeyboardPowerCaseMouse
HP Corsair cx430 Aptevia HTPC IBM 
  hide details  
Reply
W10 Desktop
(17 items)
 
Linux Desktop
(14 items)
 
Win 10 HTPC
(12 items)
 
CPUMotherboardGraphicsGraphics
Intel 5820k ASRock x99 Extreme 4 GTX-960 GTX-750ti 
GraphicsRAMHard DriveHard Drive
HD 7850 4 x 4G DDR4 128G Corsair SSD WD 1G  
Hard DriveOptical DriveCoolingOS
WD 2G generic CDRW/DVD CM 212 EVO Win 10 
MonitorKeyboardPowerCase
ASUS 1920x1200 IPS Microsoft Be Quiet 1000 Antec 305 
Mouse
Reaper 
CPUMotherboardGraphicsRAM
AMD FX-6300 ASUS M5A97 R2.0 GTX-760 Samsung 2 x 4G Wonder Ram 
Hard DriveHard DriveOptical DriveCooling
128G Corsair SSD WD 500G Blue generic CDRW/DVD Corsair H60 
OSMonitorKeyboardPower
Linux Mint 17.2 ASUS 1920x1200 IPS Microsoft XFX 650 
CaseMouse
NZXT 210 Reaper 
CPUMotherboardGraphicsRAM
AMD Phenom II B93 Gigabyte  EVGA GT 740 FTW Misc DD2-800 2x2G and 2x1G 
Hard DriveOptical DriveOSMonitor
Seagate  generic  Win 10 Tech Preview 64 bit HDTV 
KeyboardPowerCaseMouse
HP Corsair cx430 Aptevia HTPC IBM 
  hide details  
Reply
post #7 of 16

I have yet to run into the seg fault bug on any BOINC projects with my 1700, that I bought at release.  The only way I am able to trigger the seg fault is with the kill_ryzen script. 

 

I've compiled a number of kernels along with dozens of packages, big and small, using all cores for compiling. MAKEFLAGS="-j$(nproc)"

 

I should RMA my CPU, but I have just been too busy to get around to it.

post #8 of 16
I do get an error but it's just recently happens when i use this system for BOINC. I keeps getting Cache L0 Error and either restart or black screen (sometimes restart and power button doesn't respond) everytime i get into 3.8GHz point where I was stable since 6 months ago. dunno what happens here.. I thought it was BOINC but i'm not sure about that. I'm quite skeptical if I got a degredation that fast too
Pinkybeast
(27 items)
 
Mobility
(6 items)
 
 
CPUMotherboardGraphicsRAM
Ryzen 7 1700X @ 3.8GHz Asus ROG Strix X370-F Gaming Inno3D GT1030 0dB TridentZ 3200C15 2x8 @ 3600C16 
Hard DriveHard DriveHard DriveHard Drive
Samsung 950 Pro 0.5T Western Digital Black 4T Seagate Barracuda 7200.12 0.5T Seagate Constellation ES SAS 2T 
Hard DriveHard DriveHard DriveOptical Drive
Western Digital RE 2T Seagate Cheetah 15K5 Western Digital Green 2T Asus BW-16D1HT 
CoolingCoolingCoolingCooling
Be Quiet! Silent Loop 280mm 3x Be Quiet! Silent Wings 3 140mm 1000RPM Delta FFB1212EH 2x Be Quiet! Pure Wings 2 140mm 1600RPM 
CoolingOSMonitorKeyboard
2x Cooler Master MasterFan Pro 120 Air Balance RGB Microsoft Windows 10 Samsung C27F591 generic 
PowerCaseMouseMouse Pad
Enermax MaxTytan 800 BeQuiet Dark Base 900 Pro Logitech G402 Corsair MM300 
AudioAudioAudio
Parasound Z-DAC v.2 Bryston 8BST THX S.AudioLab Aura7 
CPURAMHard DriveOS
AMD A6-5350M Patriot DDR3 1600 Sandisk SSD 64GB Microsoft Windows 10 Pro 
MonitorCase
17" HP Pavilion 17 e011nr 
  hide details  
Reply
Pinkybeast
(27 items)
 
Mobility
(6 items)
 
 
CPUMotherboardGraphicsRAM
Ryzen 7 1700X @ 3.8GHz Asus ROG Strix X370-F Gaming Inno3D GT1030 0dB TridentZ 3200C15 2x8 @ 3600C16 
Hard DriveHard DriveHard DriveHard Drive
Samsung 950 Pro 0.5T Western Digital Black 4T Seagate Barracuda 7200.12 0.5T Seagate Constellation ES SAS 2T 
Hard DriveHard DriveHard DriveOptical Drive
Western Digital RE 2T Seagate Cheetah 15K5 Western Digital Green 2T Asus BW-16D1HT 
CoolingCoolingCoolingCooling
Be Quiet! Silent Loop 280mm 3x Be Quiet! Silent Wings 3 140mm 1000RPM Delta FFB1212EH 2x Be Quiet! Pure Wings 2 140mm 1600RPM 
CoolingOSMonitorKeyboard
2x Cooler Master MasterFan Pro 120 Air Balance RGB Microsoft Windows 10 Samsung C27F591 generic 
PowerCaseMouseMouse Pad
Enermax MaxTytan 800 BeQuiet Dark Base 900 Pro Logitech G402 Corsair MM300 
AudioAudioAudio
Parasound Z-DAC v.2 Bryston 8BST THX S.AudioLab Aura7 
CPURAMHard DriveOS
AMD A6-5350M Patriot DDR3 1600 Sandisk SSD 64GB Microsoft Windows 10 Pro 
MonitorCase
17" HP Pavilion 17 e011nr 
  hide details  
Reply
post #9 of 16
Perhaps a little off from this thread but I was having major problems with mint crashing unexpectedly on my 1800x. I was using the 4.8 kernel. I have now switched to 4.13 and it seems that I am stable. Just a fyi
post #10 of 16

It's a good heads up for anyone running an older kernel.  Kernel 4.10 and above is what to run with Ryzen. :thumb:

New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Overclock.net BOINC Team
Overclock.net › Forums › Overclockers Care › Overclock.net BOINC Team › Zen segmentation fault and BOINC projects