Overclock.net › Forums › Overclockers Care › Overclock.net Folding@Home Team › SMP crashing on redhat 64
New Posts  All Forums:Forum Nav:

SMP crashing on redhat 64

post #1 of 17
Thread Starter 
Hey Everyone

I got my first 10 WU complete so now I'm running the real SMP/bigadv. I was completely stable for the first 10 even when running both gpus. Since I've past the 10 mark I have now crashed 3 times - 2 on overnight runs, once after about 30 mins. All of this in the last few days.

At first I thought it was the PSU dying due to the load (I know I'm pretty much on the edge there), however I still crash even with the gpu folding turned off. I've also found that the crash is not a hard crash, as it doesn't go to the bios, instead it goes to the part of boot up where it's starting all the services. At this point it then gives an error involving SIOCSIFFLAGS and CANNOT ALLOCATE MEMORY that isn't usually there.

As it mentioned memory I tried reducing my memory timings back to stock but that didn't help.

I looked at the fahlog and there's nothing out of the ordinary, the last entry is just a normal entry:

[21:44:38] Completed 105000 out of 500000 steps (21%)

Are the SMP runs just that much more likely to crash than the regular runs?

My ideas were:

- if it's x related - maybe the devel drivers are problematic and I should try moving back to the normal & latest drivers
- could be that despite being linx and prime stable (only 2 hours though), that I still need to bump vcore a little more

Any other ideas?
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
post #2 of 17
LinX/Prime stable means very little for Folding, and even less so for -bigadv Folding; the smallest instability, either form the CPU or the RAM can cause the WU to fail.

Drop the clocks a bit first, and see if that stabalises things.
Megadoomer
(14 items)
 
Family Computer
(13 items)
 
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T @ 4.0Ghz ASUS M4A89GTD PRO Sparkle GTS 450 2x4GB G-Skill Sniper 
Hard DriveCoolingOSMonitor
Samsung F1 1TB CM Hyper 212+ Windows 7 Professional x64 Samsung T220 
KeyboardPowerCaseMouse
Logitech MX3000 Laser CM 1000M HAF 922 Logitech VX Revolution 
CPUMotherboardGraphicsRAM
Q6600 Asus PN5-D 750i Evga GTS 250 2x2GB Crucial Ballistix 
Hard DriveOptical DriveOSMonitor
750GB Hitachi Samsung Super Writemaster Windows 7 Professional x64 19" Dell 
PowerCase
Corsair 450VX Antec 900 
  hide details  
Reply
Megadoomer
(14 items)
 
Family Computer
(13 items)
 
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T @ 4.0Ghz ASUS M4A89GTD PRO Sparkle GTS 450 2x4GB G-Skill Sniper 
Hard DriveCoolingOSMonitor
Samsung F1 1TB CM Hyper 212+ Windows 7 Professional x64 Samsung T220 
KeyboardPowerCaseMouse
Logitech MX3000 Laser CM 1000M HAF 922 Logitech VX Revolution 
CPUMotherboardGraphicsRAM
Q6600 Asus PN5-D 750i Evga GTS 250 2x2GB Crucial Ballistix 
Hard DriveOptical DriveOSMonitor
750GB Hitachi Samsung Super Writemaster Windows 7 Professional x64 19" Dell 
PowerCase
Corsair 450VX Antec 900 
  hide details  
Reply
post #3 of 17
Thread Starter 
Thanks for the reply zodac, and thanks for the guides too!

I bumped vcore last night, so far so good, we'll see how long it goes for, if no luck I'll have to start lowering bclk because I'm reluctant to push vcore any higher.
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
post #4 of 17
Thread Starter 
Despite not crashing, bumping vcore would sometimes cause difficulty posting, so I bumped bclk down instead. Again no crashes, but I worry that despite this that maybe I'm getting non fatal errors. Is there any way to know that I'm not accidentally submitting bad WU's?
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
post #5 of 17
As long as the WU doesn't crash, then the WU is fine.

An indication of an OC on the edge of being unstable would be a PPD (much) lower than the usual for your CPU... if you're getting anything less than 40k PPD on a -bigadv WU, that might also be an indication.
Megadoomer
(14 items)
 
Family Computer
(13 items)
 
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T @ 4.0Ghz ASUS M4A89GTD PRO Sparkle GTS 450 2x4GB G-Skill Sniper 
Hard DriveCoolingOSMonitor
Samsung F1 1TB CM Hyper 212+ Windows 7 Professional x64 Samsung T220 
KeyboardPowerCaseMouse
Logitech MX3000 Laser CM 1000M HAF 922 Logitech VX Revolution 
CPUMotherboardGraphicsRAM
Q6600 Asus PN5-D 750i Evga GTS 250 2x2GB Crucial Ballistix 
Hard DriveOptical DriveOSMonitor
750GB Hitachi Samsung Super Writemaster Windows 7 Professional x64 19" Dell 
PowerCase
Corsair 450VX Antec 900 
  hide details  
Reply
Megadoomer
(14 items)
 
Family Computer
(13 items)
 
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T @ 4.0Ghz ASUS M4A89GTD PRO Sparkle GTS 450 2x4GB G-Skill Sniper 
Hard DriveCoolingOSMonitor
Samsung F1 1TB CM Hyper 212+ Windows 7 Professional x64 Samsung T220 
KeyboardPowerCaseMouse
Logitech MX3000 Laser CM 1000M HAF 922 Logitech VX Revolution 
CPUMotherboardGraphicsRAM
Q6600 Asus PN5-D 750i Evga GTS 250 2x2GB Crucial Ballistix 
Hard DriveOptical DriveOSMonitor
750GB Hitachi Samsung Super Writemaster Windows 7 Professional x64 19" Dell 
PowerCase
Corsair 450VX Antec 900 
  hide details  
Reply
post #6 of 17
Thread Starter 
Thanks again.

Hmm interesting, my ppd is certainly lower than that:

Best cpu only ppd I've seen: 26K
Normal ppd while doing work stuff: 16K
Normal ppd while doing work stuff and adding in a gpu: 8-9K for cpu + 9K for the gpu
Normal ppd while doing work stuff and adding in a 2nd gpu: 4-6K for cpu, 9k/gpu

I'm running 6.23 with the -smp and -bigadv options. Do you think I should reduce the OC futher to see if it helps?

FYI I monitor ppd using HFM on win 7 under vmware. It was easier to do this than install it natively and I always have that vm open anyway. I had fahmon for a while on linux, but after a while it stopped working.
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
post #7 of 17
An i7 @ 4.6Ghz should be much higher than that. I'd try dropping the OC a bit.

Sometimes, when an OC is almost unstable, but not quite unstable enough to crash a WU, you will instead get lower PPD than normal. So drop to 4.2Ghz, and see what kind of PPD you get then.

And leave the GPU clients off for now; let's get to the bottom of the CPU, before teaching you how to deal with the GPUs too.
Megadoomer
(14 items)
 
Family Computer
(13 items)
 
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T @ 4.0Ghz ASUS M4A89GTD PRO Sparkle GTS 450 2x4GB G-Skill Sniper 
Hard DriveCoolingOSMonitor
Samsung F1 1TB CM Hyper 212+ Windows 7 Professional x64 Samsung T220 
KeyboardPowerCaseMouse
Logitech MX3000 Laser CM 1000M HAF 922 Logitech VX Revolution 
CPUMotherboardGraphicsRAM
Q6600 Asus PN5-D 750i Evga GTS 250 2x2GB Crucial Ballistix 
Hard DriveOptical DriveOSMonitor
750GB Hitachi Samsung Super Writemaster Windows 7 Professional x64 19" Dell 
PowerCase
Corsair 450VX Antec 900 
  hide details  
Reply
Megadoomer
(14 items)
 
Family Computer
(13 items)
 
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T @ 4.0Ghz ASUS M4A89GTD PRO Sparkle GTS 450 2x4GB G-Skill Sniper 
Hard DriveCoolingOSMonitor
Samsung F1 1TB CM Hyper 212+ Windows 7 Professional x64 Samsung T220 
KeyboardPowerCaseMouse
Logitech MX3000 Laser CM 1000M HAF 922 Logitech VX Revolution 
CPUMotherboardGraphicsRAM
Q6600 Asus PN5-D 750i Evga GTS 250 2x2GB Crucial Ballistix 
Hard DriveOptical DriveOSMonitor
750GB Hitachi Samsung Super Writemaster Windows 7 Professional x64 19" Dell 
PowerCase
Corsair 450VX Antec 900 
  hide details  
Reply
post #8 of 17
What's the project number? If you're running the 6.23 client that's probably not a BigAdv unit. To get BigAdv units you need to upgrade to the 6.34 client.
The Box
(13 items)
 
  
CPUMotherboardGraphicsRAM
I7 920 D0 @ 3.8 GA-EX58-UD4P Radeon 4890 6G Patriot Viper + 12G Mushkin 
Hard DriveOSPower
Intel X-25E/Seagate 7200.11 1.5TB Windows 7 Corsair 650TX 
  hide details  
Reply
The Box
(13 items)
 
  
CPUMotherboardGraphicsRAM
I7 920 D0 @ 3.8 GA-EX58-UD4P Radeon 4890 6G Patriot Viper + 12G Mushkin 
Hard DriveOSPower
Intel X-25E/Seagate 7200.11 1.5TB Windows 7 Corsair 650TX 
  hide details  
Reply
post #9 of 17
Thread Starter 
Ok so I lowered the cpu to 4.2GHz, and then I folded only on the cpu (12 threads available), I didn't run anything else except the vm which only was running hfm. PPD slowly increased until the end of the WU from 6K to 9k. Then the next unit pretty quickly went up to 18-19K. HFM claims the unit credit was 3803 and it took about 22 mins for 8%, which calculates out to about 20Kppd. So I guess I verified that HFM can do math :-p After 8% I closed the vm to make sure it wasn't slowing anything, and the next 8% finished within 1 min of the last 8%.

At that point I saw the message about 6.34, so I deleted the work and queue and upgraded to 6.34. My first unit with the new version is for 4057 credit and is giving me 23k ppd. The project is P6069 (R0, C105, C265). Here is the snippet from the log:

[00:00:40] + Processing work unit
[00:00:40] Core required: FahCore_a3.exe
[00:00:40] Core found.
[00:00:40] Working on queue slot 01 [March 12 00:00:40 UTC]
[00:00:40] + Working ...
[00:00:40] - Calling './FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 12 -checkpoint 5 -verbose -lifeline 7388 -version 634'

....

Note: tpx file_version 70, software version 73
Starting 12 threads
Making 2D domain decomposition 6 x 2 x 1
starting mdrun 'Mutant_scan'
133000016 steps, 266000.0 ps (continuing from step 132500016, 265000.0 ps).
[00:00:47] Mapping NT from 12 to 12
[00:00:47] Completed 0 out of 500000 steps (0%)

etc

Any ideas? Is this not a SMP work unit?
Edited by stren - 3/11/11 at 4:16pm
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
     
CPUMotherboardGraphicsRAM
i7-3930k @ 4.?GHz Rampage IV Extreme 2x 8800GT 32GB Dominator GT 2133 CL9 
Hard DriveCoolingOSMonitor
2xX25-E Raid 0, 1xC300 128GB, + Mechanicals DT 5Noz, EK full cover GPU blocks, EK full cove... RHEL 5.5 WS Dell U3011 + 2005WFP Portrait 
PowerCaseMouse
AX1200 CaseLabs TX10-D + Pedestal Razer DeathAdder 
CPUMotherboardGraphicsRAM
3930K@4.8 Rampage IV Extreme GTX580 3GB Tri SLI, GTX460 FizzyX 16GB Samsung SuperOverclockingTiny DDR3 
Hard DriveCoolingOSMonitor
Crucial M4 DT 5Noz, Koolance Full Cover GPU blocks, EK ful... Win 7 64 bit Pro Dell U3011 + 2005WFP 
PowerCaseMouseMouse Pad
EVGA NEX1500 CaseLabs TX10-D + Pedestal Logitech G5 Razer Goliathus 
Audio
Asus Essence One 
  hide details  
Reply
post #10 of 17
P6069 is an SMP WU, but it's not a -bigadv WU. Take a look here:
http://www.overclock.net/overclock-n...v-folders.html
Megadoomer
(14 items)
 
Family Computer
(13 items)
 
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T @ 4.0Ghz ASUS M4A89GTD PRO Sparkle GTS 450 2x4GB G-Skill Sniper 
Hard DriveCoolingOSMonitor
Samsung F1 1TB CM Hyper 212+ Windows 7 Professional x64 Samsung T220 
KeyboardPowerCaseMouse
Logitech MX3000 Laser CM 1000M HAF 922 Logitech VX Revolution 
CPUMotherboardGraphicsRAM
Q6600 Asus PN5-D 750i Evga GTS 250 2x2GB Crucial Ballistix 
Hard DriveOptical DriveOSMonitor
750GB Hitachi Samsung Super Writemaster Windows 7 Professional x64 19" Dell 
PowerCase
Corsair 450VX Antec 900 
  hide details  
Reply
Megadoomer
(14 items)
 
Family Computer
(13 items)
 
 
CPUMotherboardGraphicsRAM
Phenom II X6 1090T @ 4.0Ghz ASUS M4A89GTD PRO Sparkle GTS 450 2x4GB G-Skill Sniper 
Hard DriveCoolingOSMonitor
Samsung F1 1TB CM Hyper 212+ Windows 7 Professional x64 Samsung T220 
KeyboardPowerCaseMouse
Logitech MX3000 Laser CM 1000M HAF 922 Logitech VX Revolution 
CPUMotherboardGraphicsRAM
Q6600 Asus PN5-D 750i Evga GTS 250 2x2GB Crucial Ballistix 
Hard DriveOptical DriveOSMonitor
750GB Hitachi Samsung Super Writemaster Windows 7 Professional x64 19" Dell 
PowerCase
Corsair 450VX Antec 900 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Overclock.net Folding@Home Team
Overclock.net › Forums › Overclockers Care › Overclock.net Folding@Home Team › SMP crashing on redhat 64