New Posts  All Forums:Forum Nav:

SMP WUs Crashing

post #1 of 8
Thread Starter 
I know it can be very frustrating to loose a WU at 96%. It happens to us all. This can happen for a number of reasons.

I am testing some ways to recover WUs that have crashed. The last two WU crashes I have had I was able to recover. I dont want to purposely make a rig unstable just to test out my recovery process, so I was hoping to get some help from other folders.

If you have a WU crash, and will PM me, I will send you my email address so you can zip up your folding folder and send it too me. You will not have to wait, just continue folding on the next WU you are working on. If I can recover it, I will finish the WU on my rig and submit it under your name.

Couple of restrictions apply.

1. SMP WUs only please. All my testing and folding rigs are set up for SMP.
2. Because of the tight deadlines for SMP. You must identify the crashed WU within 24 hours of the crash so that I can fix it and send it off before the deadline.
3. If this occurs with a rig that is used for a folding team competition, you will need to provide me with an alternate name to submit the WU under, or I will just submit it under the overclock.net account so that I dont mess up the team competitions.
4. I can make no promises on being able to recover the WU, but it is worth a shot. Not all are recoverable, testing will give me a better idea on what is recoverable.
5. Only send WUs that had made it at least to 50%.
6. If this is happening constantly to you, you most likely have an unstable overclock. Adjust your setting so it does not continue. I just dont want someone to get in the habit of trashing WUs non stop and hoping they can be fixed.


If you are interested in helping me out, and have been having issues with loosing WUs, let me know.

Once I get the process down I will post my results so you can do it on your own.
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
post #2 of 8
Sounds great, very nice thing you are doing, however I don't think that many people use SMP yet.
ChemX1200
(13 items)
 
  
CPUMotherboardGraphicsRAM
Intel Q6700 @ 3.6Ghz DFI UT X48 T2R VisionTek HD4870 OCZ Titanium DDR800 (2x 2Gb) 
Hard DriveOptical DriveOSMonitor
3x WD RE3 500Gb RAID0 On Areca1210 1x Pioneer BD-RW 1x Asus DVD-RW Lightscribe Windows 7 Ultimate X64 Dell 24" G2410 LED LCD 
KeyboardPowerCaseMouse
Unicomp Customizer 104 Corsair 850W Mod. Cables Antec 1200 Logitech Performance MX 
  hide details  
Reply
ChemX1200
(13 items)
 
  
CPUMotherboardGraphicsRAM
Intel Q6700 @ 3.6Ghz DFI UT X48 T2R VisionTek HD4870 OCZ Titanium DDR800 (2x 2Gb) 
Hard DriveOptical DriveOSMonitor
3x WD RE3 500Gb RAID0 On Areca1210 1x Pioneer BD-RW 1x Asus DVD-RW Lightscribe Windows 7 Ultimate X64 Dell 24" G2410 LED LCD 
KeyboardPowerCaseMouse
Unicomp Customizer 104 Corsair 850W Mod. Cables Antec 1200 Logitech Performance MX 
  hide details  
Reply
post #3 of 8
Quote:
Originally Posted by kennymester View Post
Sounds great, very nice thing you are doing, however I don't think that many people use SMP yet.
? Of course people do! I know that (when my PC decides to operate) I do, along with probably 200+ other forum members here.

That's a great offer Knitelife, any insight into how you try to recover a work unit?
Burning Phoenix
(13 items)
 
  
CPUMotherboardGraphicsRAM
Intel Q9550 Gigabyte EP45-UD3P eVGA GTX 260 Core 216 SuperClocked 4GB G.Skill DDR2-1000 5-5-5-15 
Hard DriveOptical DriveOSMonitor
3 WD 80GB RAID0, Seagate 500GB, WD 1TB Caviar BLCK Lite-On LH-20A1S Windoze 7 Professional Dual Dell E207WFP 20.1" Widescreen LCD's 
KeyboardPowerCaseMouse
Logitech G15 Silverstone OP750 Lian-Li PC-A10B Logitech G5 
Mouse Pad
Harley Davidson 
  hide details  
Reply
Burning Phoenix
(13 items)
 
  
CPUMotherboardGraphicsRAM
Intel Q9550 Gigabyte EP45-UD3P eVGA GTX 260 Core 216 SuperClocked 4GB G.Skill DDR2-1000 5-5-5-15 
Hard DriveOptical DriveOSMonitor
3 WD 80GB RAID0, Seagate 500GB, WD 1TB Caviar BLCK Lite-On LH-20A1S Windoze 7 Professional Dual Dell E207WFP 20.1" Widescreen LCD's 
KeyboardPowerCaseMouse
Logitech G15 Silverstone OP750 Lian-Li PC-A10B Logitech G5 
Mouse Pad
Harley Davidson 
  hide details  
Reply
post #4 of 8
yeah, I'd love to know how you do it... and for me i had an odd crash today... my client literally says...

1:34:20 17 percent
4:34:20 at least 3 hours since checkpoint written...
F@H core shutdown : EARLY_UNIT_END
corestatus = 7B (123)
Client-core communications error :ERROR 0x7b

then it got a new unit, any reason why, its a stable OC...
SR2 PWNS J00
(22 items)
 
SL4V3
(16 items)
 
 
CPUMotherboardGraphicsRAM
2x intel Xeon X5650 @ 4Ghz Evga SR-2 SLi Evga GTX480s 18GB G.SKill Pi 1600 DDR3 
Hard DriveHard DriveHard DriveOptical Drive
60GB Vertex 3 2TB Western Digital Black 2TB External HDD (USB 2) 22x Sata Bluray/DVD Multi Drive 
CoolingCoolingCoolingCooling
Hardware Labs GTX 360 Radiator EK Supreme HF Copper Danger Den GTX480 Waterblock All Copper Natemandoo SR-2 Solid Copper Waterblock 
CoolingCoolingOSMonitor
Iwaki RD-30 D5 with DCThermo Top Windows 7 Ultimate 3x 24" LG LCDs (16:9) 
KeyboardPowerCaseMouse
Ducky Shine MX-Cherry Black Mechanical Keyboard... 1500W Silver stone Strider Mountain Mods Extended Ascension Horizon Razer Mamba 2012 (wired) 
Mouse PadAudio
Razer XactMat Asus Xonar SXT 
CPUMotherboardGraphicsRAM
intel Q6600 ASUS P5K-E Wifi AMD HD Radeon 6990 4GBs G.skill HZs 
Hard DriveOptical DriveCoolingOS
OCZ vertex 2 LG 22x combo drive Cooler Master hyper 212+ Windows 7 Ultimate x64 
MonitorMonitorKeyboardPower
LG FLATRON L1933TR LG FLATRON L1933TR Logitech G15 rev. 1 Silverstone Decathlon 750W modular PSU 
CaseMouseMouse PadAudio
Thermaltake Armor with 250mm side fan Logitech 2000dpi USB mouse Steel series QcK+ Auzentech X-Fi prelude 
  hide details  
Reply
SR2 PWNS J00
(22 items)
 
SL4V3
(16 items)
 
 
CPUMotherboardGraphicsRAM
2x intel Xeon X5650 @ 4Ghz Evga SR-2 SLi Evga GTX480s 18GB G.SKill Pi 1600 DDR3 
Hard DriveHard DriveHard DriveOptical Drive
60GB Vertex 3 2TB Western Digital Black 2TB External HDD (USB 2) 22x Sata Bluray/DVD Multi Drive 
CoolingCoolingCoolingCooling
Hardware Labs GTX 360 Radiator EK Supreme HF Copper Danger Den GTX480 Waterblock All Copper Natemandoo SR-2 Solid Copper Waterblock 
CoolingCoolingOSMonitor
Iwaki RD-30 D5 with DCThermo Top Windows 7 Ultimate 3x 24" LG LCDs (16:9) 
KeyboardPowerCaseMouse
Ducky Shine MX-Cherry Black Mechanical Keyboard... 1500W Silver stone Strider Mountain Mods Extended Ascension Horizon Razer Mamba 2012 (wired) 
Mouse PadAudio
Razer XactMat Asus Xonar SXT 
CPUMotherboardGraphicsRAM
intel Q6600 ASUS P5K-E Wifi AMD HD Radeon 6990 4GBs G.skill HZs 
Hard DriveOptical DriveCoolingOS
OCZ vertex 2 LG 22x combo drive Cooler Master hyper 212+ Windows 7 Ultimate x64 
MonitorMonitorKeyboardPower
LG FLATRON L1933TR LG FLATRON L1933TR Logitech G15 rev. 1 Silverstone Decathlon 750W modular PSU 
CaseMouseMouse PadAudio
Thermaltake Armor with 250mm side fan Logitech 2000dpi USB mouse Steel series QcK+ Auzentech X-Fi prelude 
  hide details  
Reply
post #5 of 8
Just wanted to say been 4 straight work units on my new RAM. Havent had one single lost WU so far, my "dead" RAM was definatly losing the WU's.


Off Topic***
Scream...You are the only person who I have under my "threats" section. What are you pushing for folding power?
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i-4790k GA-Z97X-UD5H-BK eVGA GTX 970 FTW 8GB G.Skill Trident X 
Hard DriveOSMonitorPower
Intel 180GB SSD Windows 7 Pro Dell Ultrasharp 24" eVGA 850 G2 
Case
Corsair 750D 
  hide details  
Reply
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i-4790k GA-Z97X-UD5H-BK eVGA GTX 970 FTW 8GB G.Skill Trident X 
Hard DriveOSMonitorPower
Intel 180GB SSD Windows 7 Pro Dell Ultrasharp 24" eVGA 850 G2 
Case
Corsair 750D 
  hide details  
Reply
post #6 of 8
Thread Starter 
Quote:
Originally Posted by Intervention View Post
Just wanted to say been 4 straight work units on my new RAM. Havent had one single lost WU so far, my "dead" RAM was definatly losing the WU's.
Thats great Intervention. Having a string of crashed WUs can realy suck!

Quote:
Originally Posted by H3||scr3am View Post
yeah, I'd love to know how you do it... and for me i had an odd crash today... my client literally says...

1:34:20 17 percent
4:34:20 at least 3 hours since checkpoint written...
F@H core shutdown : EARLY_UNIT_END
corestatus = 7B (123)
Client-core communications error :ERROR 0x7b

then it got a new unit, any reason why, its a stable OC...
Well, no computer is perfectly stable, even at stock settings. Plus you have to factor in the OS and other apps that may be running on your system that could cause a problem. In the end, there will always be a few lost WUs.

I will put together what I have figured out so far and post it. There is no magic too it. Because most of these clients are beta, they have not been coded correctly to recover from errors. They are coded to pick up where you left off if you shut down the folding. They just dont know what too do when a crash occurs. They should try to recover at the last checkpoint, and then get a different WU is the crash repeats itself.

All I do is convince the system to pick up where it left off at the last saved checkpoint. The system saves valid data at whatever interval you had set during installation of the client. Because of this, most of the time, the good data is still there, the client has just moved on to the next WU.

You can find out a lot about the inner workings of there code with a good dissassembler. There is quite a bit in these clients that is either disable, in development, just plain not working like it should.

Version 6 of the folding clients may bring some of this code to life.
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
post #7 of 8
Hope you can go somewhere with this idea. I lost several SMP work units that were nearly done because i would alt f4 on accident, or shut down the PC and then say "ARGGHh ****, i forgot about FAH"

I'm just glad vmware doesn't give me that problem like windows smp did, i can shut down with it running and it will pick up at the last %!
post #8 of 8
Quote:
Originally Posted by Burn View Post
? Of course people do! I know that (when my PC decides to operate) I do, along with probably 200+ other forum members here.
Even when my computers are fully functioning, getting SMP to function is hit and miss LOL

This is a great thing though Knitelife, good job on trying to solve some problems.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Overclock.net Folding@Home Team