Overclock.net › Forums › Overclockers Care › Overclock.net Folding@Home Team › NANs detected on GPU - UNSTABLE_MACHINE
New Posts  All Forums:Forum Nav:

NANs detected on GPU - UNSTABLE_MACHINE

post #1 of 10
Thread Starter 
Just got a Project 7622 for the first time and i keep getting this error:

GPU 0 failed to complete a project 7622 WU (UNSTABLE_MACHINE)


In the Log file for the GPU it says this:
Code:
[09:19:32] Project: 7622 (Run 219, Clone 0, Gen 33)
[09:19:32] 
[09:19:32] Assembly optimizations on if available.
[09:19:32] Entering M.D.
[09:19:34] Tpr hash work/wudata_01.tpr:  4050168784 2372487422 3581637985 4100233039 2715514380
[09:19:34] calling fah_main gpuDeviceId=0
[09:19:34] Working on Protein
[09:19:34] Client config found, loading data.
[09:19:34] Starting GUI Server
[09:20:36] Setting checkpoint frequency: 400000
[09:20:36] Completed         3 out of 40000000 steps (0%).
[09:26:34] Completed    400000 out of 40000000 steps (1%).
[09:26:35] mdrun_gpu returned 52
[09:26:35] NANs detected on GPU
[09:26:35] 
[09:26:35] Folding@home Core Shutdown: UNSTABLE_MACHINE
[09:26:38] CoreStatus = 7A (122)
[09:26:38] Sending work to server
[09:26:38] Project: 7622 (Run 219, Clone 0, Gen 33)
[09:26:38] - Read packet limit of 540015616... Set to 524286976.
[09:26:38] - Error: Could not get length of results file work/wuresults_01.dat
[09:26:38] - Error: Could not read unit 01 file. Removing from queue.
[09:26:38] Trying to send all finished work units
[09:26:38] + No unsent completed units remaining.
[09:26:38] + -oneunit flag given and have now finished a unit. Exiting.***** Got a SIGTERM signal (2)
[09:26:38] Killing all core threads

Folding@Home Client Shutdown.

I am using FAH GPU Tracker V2.

Anyone know how to fix this?

Thanks.
Keith.
(15 items)
 
Janet.
(13 items)
 
My PC.
(4 photos)
CPUMotherboardGraphicsRAM
Intel Core i7 2600K [5.0GHz@1.45v] Asus Maximus IV Extreme EVGA GTX 680 Classified 16GB Corsair Vengeance Red 1600Mhz 9-9-9-24 
Hard DriveHard DriveHard DriveCooling
60GB Corsair Force Series 3 SSD 160GB Samsung 7200RPM 1TB Samsung 7200RPM Corsair H100 
OSMonitorMonitorKeyboard
Windows 7 Professional 64Bit 27" Acer S273HLAbmii 19" Samsung SyncMaster 943SN Logitech diNovo Edge 
PowerCaseMouse
Antec High Current Gamer 900W Coolermaster HAF 932 Logitech G300 
CPUMotherboardGraphicsRAM
AMD Phenom II X4 965 @ 4.21Ghz ASUS M4A87TD/USB3 NVIDIA GeForce GTX 460 768MB Corsair 4x2GB DDR3 1333Mhz 
Hard DriveOSMonitorPower
Samsung 160GB, Samsung 1TB Windows 7 Ultimate 64Bit Samsung SyncMaster 943SN Corsair TX650W 
  hide details  
Reply
Keith.
(15 items)
 
Janet.
(13 items)
 
My PC.
(4 photos)
CPUMotherboardGraphicsRAM
Intel Core i7 2600K [5.0GHz@1.45v] Asus Maximus IV Extreme EVGA GTX 680 Classified 16GB Corsair Vengeance Red 1600Mhz 9-9-9-24 
Hard DriveHard DriveHard DriveCooling
60GB Corsair Force Series 3 SSD 160GB Samsung 7200RPM 1TB Samsung 7200RPM Corsair H100 
OSMonitorMonitorKeyboard
Windows 7 Professional 64Bit 27" Acer S273HLAbmii 19" Samsung SyncMaster 943SN Logitech diNovo Edge 
PowerCaseMouse
Antec High Current Gamer 900W Coolermaster HAF 932 Logitech G300 
CPUMotherboardGraphicsRAM
AMD Phenom II X4 965 @ 4.21Ghz ASUS M4A87TD/USB3 NVIDIA GeForce GTX 460 768MB Corsair 4x2GB DDR3 1333Mhz 
Hard DriveOSMonitorPower
Samsung 160GB, Samsung 1TB Windows 7 Ultimate 64Bit Samsung SyncMaster 943SN Corsair TX650W 
  hide details  
Reply
post #2 of 10
I had that earlier today (or yesterday) but after 19 failed attempts, turned off advmethods on the GPU

Hope this helps smile.gif
The every day-er
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7 2600K P8Z68-V PRO MSI GTX570 Corsair  
RAMHard DriveHard DriveHard Drive
Corsair  OCZ Agility 3 Samsung HD103SI Hitachi HDT725032VLA360 
Optical DriveCoolingOSMonitor
Samsung DVDWBD SH-B123L Thermalright HR-02 Macho Ubuntu 11.10 DGM L-2647WDH 
KeyboardPowerCaseMouse
Logitech diNovo + generic USB OCZ ModXStream Pro 700W Fractal Design Define R3 Microsoft Natural Wireless Laser Mouse 6000 
Mouse Pad
Wooden desk 
  hide details  
Reply
The every day-er
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7 2600K P8Z68-V PRO MSI GTX570 Corsair  
RAMHard DriveHard DriveHard Drive
Corsair  OCZ Agility 3 Samsung HD103SI Hitachi HDT725032VLA360 
Optical DriveCoolingOSMonitor
Samsung DVDWBD SH-B123L Thermalright HR-02 Macho Ubuntu 11.10 DGM L-2647WDH 
KeyboardPowerCaseMouse
Logitech diNovo + generic USB OCZ ModXStream Pro 700W Fractal Design Define R3 Microsoft Natural Wireless Laser Mouse 6000 
Mouse Pad
Wooden desk 
  hide details  
Reply
post #3 of 10
Thread Starter 
Quote:
Originally Posted by gceclifton View Post

I had that earlier today (or yesterday) but after 19 failed attempts, turned off advmethods on the GPU
Hope this helps smile.gif

doh.gif Completely forgot about that, i guess some GPU's just don't support it. Thanks smile.gif
Keith.
(15 items)
 
Janet.
(13 items)
 
My PC.
(4 photos)
CPUMotherboardGraphicsRAM
Intel Core i7 2600K [5.0GHz@1.45v] Asus Maximus IV Extreme EVGA GTX 680 Classified 16GB Corsair Vengeance Red 1600Mhz 9-9-9-24 
Hard DriveHard DriveHard DriveCooling
60GB Corsair Force Series 3 SSD 160GB Samsung 7200RPM 1TB Samsung 7200RPM Corsair H100 
OSMonitorMonitorKeyboard
Windows 7 Professional 64Bit 27" Acer S273HLAbmii 19" Samsung SyncMaster 943SN Logitech diNovo Edge 
PowerCaseMouse
Antec High Current Gamer 900W Coolermaster HAF 932 Logitech G300 
CPUMotherboardGraphicsRAM
AMD Phenom II X4 965 @ 4.21Ghz ASUS M4A87TD/USB3 NVIDIA GeForce GTX 460 768MB Corsair 4x2GB DDR3 1333Mhz 
Hard DriveOSMonitorPower
Samsung 160GB, Samsung 1TB Windows 7 Ultimate 64Bit Samsung SyncMaster 943SN Corsair TX650W 
  hide details  
Reply
Keith.
(15 items)
 
Janet.
(13 items)
 
My PC.
(4 photos)
CPUMotherboardGraphicsRAM
Intel Core i7 2600K [5.0GHz@1.45v] Asus Maximus IV Extreme EVGA GTX 680 Classified 16GB Corsair Vengeance Red 1600Mhz 9-9-9-24 
Hard DriveHard DriveHard DriveCooling
60GB Corsair Force Series 3 SSD 160GB Samsung 7200RPM 1TB Samsung 7200RPM Corsair H100 
OSMonitorMonitorKeyboard
Windows 7 Professional 64Bit 27" Acer S273HLAbmii 19" Samsung SyncMaster 943SN Logitech diNovo Edge 
PowerCaseMouse
Antec High Current Gamer 900W Coolermaster HAF 932 Logitech G300 
CPUMotherboardGraphicsRAM
AMD Phenom II X4 965 @ 4.21Ghz ASUS M4A87TD/USB3 NVIDIA GeForce GTX 460 768MB Corsair 4x2GB DDR3 1333Mhz 
Hard DriveOSMonitorPower
Samsung 160GB, Samsung 1TB Windows 7 Ultimate 64Bit Samsung SyncMaster 943SN Corsair TX650W 
  hide details  
Reply
post #4 of 10
Well, I had done a few WUs before that one cropped up I thought but perhaps not... oh well... their loss!
The every day-er
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7 2600K P8Z68-V PRO MSI GTX570 Corsair  
RAMHard DriveHard DriveHard Drive
Corsair  OCZ Agility 3 Samsung HD103SI Hitachi HDT725032VLA360 
Optical DriveCoolingOSMonitor
Samsung DVDWBD SH-B123L Thermalright HR-02 Macho Ubuntu 11.10 DGM L-2647WDH 
KeyboardPowerCaseMouse
Logitech diNovo + generic USB OCZ ModXStream Pro 700W Fractal Design Define R3 Microsoft Natural Wireless Laser Mouse 6000 
Mouse Pad
Wooden desk 
  hide details  
Reply
The every day-er
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7 2600K P8Z68-V PRO MSI GTX570 Corsair  
RAMHard DriveHard DriveHard Drive
Corsair  OCZ Agility 3 Samsung HD103SI Hitachi HDT725032VLA360 
Optical DriveCoolingOSMonitor
Samsung DVDWBD SH-B123L Thermalright HR-02 Macho Ubuntu 11.10 DGM L-2647WDH 
KeyboardPowerCaseMouse
Logitech diNovo + generic USB OCZ ModXStream Pro 700W Fractal Design Define R3 Microsoft Natural Wireless Laser Mouse 6000 
Mouse Pad
Wooden desk 
  hide details  
Reply
post #5 of 10
NAN (not a number error) plus Unstable Machine error is fairly indicative of an unstable overclock WITH that particular work unit.

You could try avoiding them, as suggested, by turning off advmnethods OR lowering you GPU clocks.

These are particularly OC sensitive WUs unfortunately.

Hope this helps.
post #6 of 10
Quote:
Originally Posted by gceclifton View Post

I had that earlier today (or yesterday) but after 19 failed attempts, turned off advmethods on the GPU
Hope this helps smile.gif

What GPU were you folding these on because you don't have a GPU listed in your sig. What sort of OC were you running on the card? Because that's what the problem was is an unstable OC on the GPU. If you want to continue to run advmethods WUs, which I prefer because they offer better PPD for anything above a 460 performance level and they use less CPU time so your CPU will get better PPD as well if you also fold on that, then you need to get your GPU to actually be stable. If you are unable or unwilling to make your GPU stable then you need to remove the advmethods tag and see if you are stable on regular WUs. Whatever you do you need to get your rig in a state of stability where you are not failing 19 WUs like that... That's really bad and is detrimental to the research.
Quote:
Originally Posted by Sonics View Post

doh.gif Completely forgot about that, i guess some GPU's just don't support it. Thanks smile.gif

As far as I know any fermi GPU (GTX 400 and GTX 500 series) support the advmethods WUs (P7620, P7621, and P7622), but they are only really recommended for cards at the GTX460 performance levels and above, because on those cards you start to see about a 1,000 PPD increase over the regular WUs, not to mention the decreased CPU time used so that if you are CPU folding as well you will see a significant CPU PPD increase as well.
Quote:
Originally Posted by Sethy666 View Post

NAN (not a number error) plus Unstable Machine error is fairly indicative of an unstable overclock WITH that particular work unit.
You could try avoiding them, as suggested, by turning off advmnethods OR lowering you GPU clocks.
These are particularly OC sensitive WUs unfortunately.
Hope this helps.

^This. A NANS error means that your GPU is not stable, so rather than failing a whole bunch of WUs in a row or giving up on those WUs entirely you should get your GPU stable.
Main Rig
(16 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 2700k ASUS P8P67 WS Revolution EVGA 980 Ti SC+ Samsung 4x4GB DDR3 1866MHz 
Hard DriveHard DriveOptical DriveCooling
Samsung 850 Evo 1TB Samsung Spinpoint F4 2TB Samsung BD Combo Noctua NH-D14 
OSMonitorPowerCase
Windows 10 64 bit Asus PG279Q Kingwin Lazer Platinum 1000W Silverstone Raven RV03 
  hide details  
Reply
Main Rig
(16 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 2700k ASUS P8P67 WS Revolution EVGA 980 Ti SC+ Samsung 4x4GB DDR3 1866MHz 
Hard DriveHard DriveOptical DriveCooling
Samsung 850 Evo 1TB Samsung Spinpoint F4 2TB Samsung BD Combo Noctua NH-D14 
OSMonitorPowerCase
Windows 10 64 bit Asus PG279Q Kingwin Lazer Platinum 1000W Silverstone Raven RV03 
  hide details  
Reply
post #7 of 10
Whoa! juano's back!

Where you been bro?
post #8 of 10
Quote:
Originally Posted by Sethy666 View Post

Whoa! juano's back!
Where you been bro?

Hey man, just had a whole bunch of machine problems and not very much time to sort through them. Been back for a little while though and I should have my machine pretty stable now (knock on wood).
Main Rig
(16 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 2700k ASUS P8P67 WS Revolution EVGA 980 Ti SC+ Samsung 4x4GB DDR3 1866MHz 
Hard DriveHard DriveOptical DriveCooling
Samsung 850 Evo 1TB Samsung Spinpoint F4 2TB Samsung BD Combo Noctua NH-D14 
OSMonitorPowerCase
Windows 10 64 bit Asus PG279Q Kingwin Lazer Platinum 1000W Silverstone Raven RV03 
  hide details  
Reply
Main Rig
(16 items)
 
  
CPUMotherboardGraphicsRAM
Intel i7 2700k ASUS P8P67 WS Revolution EVGA 980 Ti SC+ Samsung 4x4GB DDR3 1866MHz 
Hard DriveHard DriveOptical DriveCooling
Samsung 850 Evo 1TB Samsung Spinpoint F4 2TB Samsung BD Combo Noctua NH-D14 
OSMonitorPowerCase
Windows 10 64 bit Asus PG279Q Kingwin Lazer Platinum 1000W Silverstone Raven RV03 
  hide details  
Reply
post #9 of 10
Unstable overclock, try backing off your OC a little.
The 5187 point work units put more stress on your GPU, and its memory, than any others. They are more stressful than running Furmark on its most extreme burn in settings.

I had to back the OC on my water cooled GTX560 back from 1075 to 1065mhz, and my GTX470 from 825 to 810mhz to be stable folding them, but still get higher PPD with the -advmethods WU's.
Grog
(21 items)
 
TC Folding Gimp
(14 items)
 
 
CPUMotherboardGraphicsGraphics
i7-950 Asus P6X58D-E Asus ENGTX470 BFG 9800GT 
RAMHard DriveOptical DriveCooling
Corsair Dominator GT Velociraptor 3000HLFS Lite On DVD burner EK Supreme HF 
CoolingCoolingCoolingCooling
AquagraFX470 EK FC-88 Koolance RP402x2 MCP35x pumps 
CoolingCoolingOSMonitor
Phobya Extreme 200 and Quad 480 radiators. 9x Akasa Viper fans Windows 7 Ultimate Old LG Flatron 1680x1050 
KeyboardPowerCase
Logitech G15 + Logitech G13 game pad Thermaltake 850W Corsair 600T 
CPUMotherboardGraphicsRAM
Phenom x4 9650 Cheap MSI board EVGA GTX580 SC Junk PC-800 DDR2 
Hard DriveOptical DriveCoolingOS
WD Blue Generic DVD Cuplex Kryos HF CPU block, Swiftech MCW82 GPU b... Windows 7 Home Premium 
PowerCase
Antec True Power 650W >10 year old Antec something or other. 
  hide details  
Reply
Grog
(21 items)
 
TC Folding Gimp
(14 items)
 
 
CPUMotherboardGraphicsGraphics
i7-950 Asus P6X58D-E Asus ENGTX470 BFG 9800GT 
RAMHard DriveOptical DriveCooling
Corsair Dominator GT Velociraptor 3000HLFS Lite On DVD burner EK Supreme HF 
CoolingCoolingCoolingCooling
AquagraFX470 EK FC-88 Koolance RP402x2 MCP35x pumps 
CoolingCoolingOSMonitor
Phobya Extreme 200 and Quad 480 radiators. 9x Akasa Viper fans Windows 7 Ultimate Old LG Flatron 1680x1050 
KeyboardPowerCase
Logitech G15 + Logitech G13 game pad Thermaltake 850W Corsair 600T 
CPUMotherboardGraphicsRAM
Phenom x4 9650 Cheap MSI board EVGA GTX580 SC Junk PC-800 DDR2 
Hard DriveOptical DriveCoolingOS
WD Blue Generic DVD Cuplex Kryos HF CPU block, Swiftech MCW82 GPU b... Windows 7 Home Premium 
PowerCase
Antec True Power 650W >10 year old Antec something or other. 
  hide details  
Reply
post #10 of 10
That makes more sense smile.gif Thanks for clearing it up - I now have a GTX570 OC to 850Mhz with 1700Mhz Shader clock. Not gotten around to stabalising that one yet!

I'll chuck it down a couple of notches and see what happens - thanks
The every day-er
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7 2600K P8Z68-V PRO MSI GTX570 Corsair  
RAMHard DriveHard DriveHard Drive
Corsair  OCZ Agility 3 Samsung HD103SI Hitachi HDT725032VLA360 
Optical DriveCoolingOSMonitor
Samsung DVDWBD SH-B123L Thermalright HR-02 Macho Ubuntu 11.10 DGM L-2647WDH 
KeyboardPowerCaseMouse
Logitech diNovo + generic USB OCZ ModXStream Pro 700W Fractal Design Define R3 Microsoft Natural Wireless Laser Mouse 6000 
Mouse Pad
Wooden desk 
  hide details  
Reply
The every day-er
(17 items)
 
  
CPUMotherboardGraphicsRAM
Intel Core i7 2600K P8Z68-V PRO MSI GTX570 Corsair  
RAMHard DriveHard DriveHard Drive
Corsair  OCZ Agility 3 Samsung HD103SI Hitachi HDT725032VLA360 
Optical DriveCoolingOSMonitor
Samsung DVDWBD SH-B123L Thermalright HR-02 Macho Ubuntu 11.10 DGM L-2647WDH 
KeyboardPowerCaseMouse
Logitech diNovo + generic USB OCZ ModXStream Pro 700W Fractal Design Define R3 Microsoft Natural Wireless Laser Mouse 6000 
Mouse Pad
Wooden desk 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Overclock.net Folding@Home Team
Overclock.net › Forums › Overclockers Care › Overclock.net Folding@Home Team › NANs detected on GPU - UNSTABLE_MACHINE