Zen segmentation fault and BOINC projects - Overclock.net - An Overclocking Community

Forum Jump: 

Zen segmentation fault and BOINC projects

Reply
 
Thread Tools
post #1 of 28 (permalink) Old 08-05-2017, 10:15 AM - Thread Starter
New to Overclock.net
 
Join Date: Dec 2011
Location: 7200 ft above sea level
Posts: 2,695

I have been letting my AMD cards mine to build up enough cash to pay for a Ryzen system.  No sooner do I get close than this issue comes to the forefront.  Apparently it has been known since April but really getting a lot more press the past few days. 

 

Any of you guys running Zen having issues?  If so are they only happening in Linux, or with specific projects?

 

If you don't know what I am talking about, that's good since you probably aren't having issues, but here is a link to some more information on the problem:

 

https://www.overclock.net/t/1635749/phoronix-segmentation-faults-on-zen-cpus-under-heavy-workloads


Quote:I'm gonna throw in my 2 cents. Not because I'm an expert but because I have a keyboard.


bfromcolo is offline  
Sponsored Links
Advertisement
 
post #2 of 28 (permalink) Old 08-05-2017, 10:44 AM
New to Overclock.net
 
C4pt41n M0 R0n's Avatar
 
Join Date: Jun 2017
Posts: 55
Rep: 2 (Unique: 2)
Well, shoot. Guess I'll finish up my 1700 build and go looking for gremlins, rather than skipping to TR. Throwing away <$300 on the CPU that I still need to get would bother me, but not as much as >$500 & a pricey board & more pricey RAM. Might get lucky and avoid the issue, or a least generate a juicy bug report. tongue.gif

edit: Ah, heck with it, I'll leave that to better minds. Just ordered a 1300X to avoid the whole multi-threading issue, as well as save a bit more cash for that TR build, assuming they get this thing sorted.
C4pt41n M0 R0n is offline  
post #3 of 28 (permalink) Old 08-05-2017, 11:59 AM
⤷ αC
 
AlphaC's Avatar
 
Join Date: Sep 2012
Posts: 10,288
Rep: 833 (Unique: 555)
bfromcolo , funny you ask.

I had a few [email protected] avx WUs error out in Linux 4.8 guest (Mint) a month or two ago in Virtualbox guest. It wasn't a bare metal installation. Hundreds of SSe2/sse3 WUs all worked.

Skynet POGs : all WUS validated.

LHC @Home: all Wus validated

Other non BOINC tests : Prime95 Blend passed 4.5hours , memtest 86+ 6 passes

----

To be more specific the task for the WU lists:
Server state Over
Outcome Computation error
Client state Compute error
Exit status 193 (0xc1) EXIT_SIGNAL

and in the log
Code:
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (7 frames):
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx(boinc_catch_signal+0x47)[0x4228c7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0)[0x7f45893cd0c0]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x40b9bb]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x407394]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x40f651]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1)[0x7f458903d2b1]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x405761]

Exiting...

</stderr_txt>
]]>
Another client version , on Linux 4.8.0-53-generic
Code:
<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (7 frames):
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx(boinc_catch_signal+0x47)[0x4228c7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f9248b12390]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x40b9bb]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x407394]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x40f651]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f9248758830]
../../projects/asteroidsathome.net_boinc/period_search_10210_x86_64-pc-linux-gnu__avx[0x405761]

Exiting...

</stderr_txt>
]]>


I PMed Kong over at [email protected] since I had no response on the [email protected] forum.


As far as I can tell, "libpthread" is the multithreading

I highly recommend Linux kernel 4.11 , since supposedly Linux 4.8 is causing issues to the point that LHC stops Linux 4.8 from reporting results (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4362)

Also if using Ryzen try to get AGESA 1.0.0.6. Tune your memory such that it is stable, don't rely on XMP! I'm on high performance plan rather than Ryzen balanced (it still drops power regardless since it isn't overclocked CPU clocks right now); 1.1V SOC fixed volts and 1.35 or 1.4V for memory depending on the sticks.

Boards I know have AGESA 1.0.0.6 and have decent power delivery for 8 cores (Click to show)
I had to enable SVM mode to get the 64-bit guest OS to work.

► Recommended GPU Projects: [email protected] , [email protected] (FP64) (AMD moreso) ► Other notable GPU projects: [email protected] (Nvidia), GPUGrid (Nvidia) ► Project list


AlphaC is offline  
Sponsored Links
Advertisement
 
post #4 of 28 (permalink) Old 08-05-2017, 01:14 PM - Thread Starter
New to Overclock.net
 
Join Date: Dec 2011
Location: 7200 ft above sea level
Posts: 2,695

Thanks for the post.  I'm surprised we haven't seem more complaints from BOINC users.

 

Guess I will wait and see what happens with Zen for a couple weeks.  Hopefully AMD can do something, since it seems to affect B1 and B2 (TR and EPYC) stepping.  I can only imagine the impact on AMD sales (especially server sales) if they can't provide a fix.

 

 

Quote:

Originally Posted by AlphaC View Post

I highly recommend Linux kernel 4.11 , since supposedly Linux 4.8 is causing issues to the point that LHC stops Linux 4.8 from reporting results (https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4362)

 

Funny thing about that thread is it seems to be Intel multi-threading issues and not Ryzen.  I usually run whatever kernel Mint/Ubuntu give me, but with the most recent AMDGPU-Pro driver I had to upgrade the kernel to 4.10 I think.


Quote:I'm gonna throw in my 2 cents. Not because I'm an expert but because I have a keyboard.


bfromcolo is offline  
post #5 of 28 (permalink) Old 08-05-2017, 01:40 PM
⤷ αC
 
AlphaC's Avatar
 
Join Date: Sep 2012
Posts: 10,288
Rep: 833 (Unique: 555)
Quote:
Originally Posted by bfromcolo View Post

Funny thing about that thread is it seems to be Intel multi-threading issues and not Ryzen.  I usually run whatever kernel Mint/Ubuntu give me, but with the most recent AMDGPU-Pro driver I had to upgrade the kernel to 4.10 I think.


That's the Skylake hyperthread bug (also on AVX) tongue.gif

https://arstechnica.com/information-technology/2017/06/skylake-kaby-lake-chips-have-a-crash-bug-with-hyperthreading-enabled/
https://www.extremetech.com/computing/251499-major-hyper-threading-flaw-can-destabilize-intel-cpus-based-kaby-lake-skylake

& debian mailing list https://lists.debian.org/debian-devel/2017/06/msg00308.html

edit: per Distrowatch the new Ubuntu 17.04 released this week should come with 4.10 kernel

Per some users you can install kernel 4.11 manually
Code:
$ cd /tmp

$ wget \

kernel.ubuntu.com/~kernel-ppa/mainline/v4.11.7/linux-headers-4.11.7-041107_4.11.7-041107.201706240231_all.deb \

kernel.ubuntu.com/~kernel-ppa/mainline/v4.11.7/linux-headers-4.11.7-041107-generic_4.11.7-041107.201706240231_amd64.deb \

kernel.ubuntu.com/~kernel-ppa/mainline/v4.11.7/linux-image-4.11.7-041107-generic_4.11.7-041107.201706240231_amd64.deb
Install via
Code:
$ sudo dpkg -i linux-headers-4.11*.deb linux-image-4.11*.deb

There's also UKUU (Ubuntu Kernel Update Utility)
https://fossbytes.com/install-linux-kernel-4-12-ubuntu-mint/
Manually for 4.12 kernel
Code:
cd /tmp/
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-headers-4.12.0-041200_4.12.0-041200.201707022031_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-headers-4.12.0-041200-generic_4.12.0-041200.201707022031_amd64.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.12/linux-image-4.12.0-041200-generic_4.12.0-041200.201707022031_amd64.deb
Install via
Code:
sudo dpkg -i *.deb
Update grub
Code:
sudo update-grub

► Recommended GPU Projects: [email protected] , [email protected] (FP64) (AMD moreso) ► Other notable GPU projects: [email protected] (Nvidia), GPUGrid (Nvidia) ► Project list


AlphaC is offline  
post #6 of 28 (permalink) Old 10-05-2017, 05:52 PM - Thread Starter
New to Overclock.net
 
Join Date: Dec 2011
Location: 7200 ft above sea level
Posts: 2,695

Well I finally built up enough cash mining to take the plunge and bought a 1700, mother board and 16G of memory, it cost me about $90 in electricity.  I was hoping to get to a thread ripper but with my daily profit mining dropping from $8 a day to $1 I am about ready to stop mining, and went ahead and got it.  Unfortunately Newegg is still (2 weeks ago) shipping old stock and I got a pre-week 25 chip that fails the kill Ryzen script, I will probably RMA it at some point.  But I haven't seen any issues so far with BOINC projects.  Sensors don't work yet so I haven't tried any overclocking. 


Quote:I'm gonna throw in my 2 cents. Not because I'm an expert but because I have a keyboard.


bfromcolo is offline  
post #7 of 28 (permalink) Old 10-05-2017, 06:24 PM
2+2=5
 
tictoc's Avatar
 
Join Date: Feb 2011
Posts: 4,473

I have yet to run into the seg fault bug on any BOINC projects with my 1700, that I bought at release.  The only way I am able to trigger the seg fault is with the kill_ryzen script. 

 

I've compiled a number of kernels along with dozens of packages, big and small, using all cores for compiling. MAKEFLAGS="-j$(nproc)"

 

I should RMA my CPU, but I have just been too busy to get around to it.



tictoc is offline  
post #8 of 28 (permalink) Old 10-06-2017, 02:41 AM
New to Overclock.net
 
Join Date: Sep 2014
Location: Indonesia
Posts: 1,051
Rep: 27 (Unique: 24)
I do get an error but it's just recently happens when i use this system for BOINC. I keeps getting Cache L0 Error and either restart or black screen (sometimes restart and power button doesn't respond) everytime i get into 3.8GHz point where I was stable since 6 months ago. dunno what happens here.. I thought it was BOINC but i'm not sure about that. I'm quite skeptical if I got a degredation that fast too

Pinkybeast
(20 items)
CPU
Ryzen 7 1700X
Motherboard
Asus ROG Strix X370-F Gaming
GPU
GT1030
RAM
TridentZ 3200C15
Hard Drive
Samsung 950 Pro
Hard Drive
Western Digital Black
Hard Drive
Seagate Constellation ES SAS
Hard Drive
Western Digital RE
Hard Drive
Western Digital Green
Optical Drive
Asus BW-16D1HT
Power Supply
Enermax MaxTytan 800
Cooling
Delta FFB1212EH
Cooling
Be Quiet! Silent Loop 280mm
Cooling
Sunon Maglev
Case
BeQuiet Dark Base 900 Pro
Operating System
Microsoft Windows 10
Monitor
Samsung C27F591
Keyboard
generic
Mouse
Logitech G402
Mouse
Corsair MM300
▲ hide details ▲
sakae48 is offline  
post #9 of 28 (permalink) Old 12-04-2017, 06:11 PM
New to Overclock.net
 
BeerCan's Avatar
 
Join Date: Sep 2012
Location: Florida USA
Posts: 630
Rep: 25 (Unique: 14)
Perhaps a little off from this thread but I was having major problems with mint crashing unexpectedly on my 1800x. I was using the 4.8 kernel. I have now switched to 4.13 and it seems that I am stable. Just a fyi



BeerCan is offline  
post #10 of 28 (permalink) Old 12-04-2017, 06:36 PM
2+2=5
 
tictoc's Avatar
 
Join Date: Feb 2011
Posts: 4,473

It's a good heads up for anyone running an older kernel.  Kernel 4.10 and above is what to run with Ryzen. :thumb:



tictoc is offline  
Reply

Quick Reply
Message:
Options

Register Now

In order to be able to post messages on the Overclock.net - An Overclocking Community forums, you must first register.
Please enter your desired user name, your email address and other required details in the form below.
User Name:
If you do not want to register, fill this field only and the name will be used as user name for your post.
Password
Please enter a password for your user account. Note that passwords are case-sensitive.
Password:
Confirm Password:
Email Address
Please enter a valid email address for yourself.
Email Address:

Log-in



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off