Joined
·
678 Posts
I've lost a second rig to an endless EUE cycle. Working on 5506 (other WU's seem unaffected) this time, as each of my rigs updates to the new core, Version 1.19 (Mon Nov 3 09:34:13 PST 2008) I seem to be resetting them constantly for EUE limit errors.
My PPD is half what it was just a month ago.
The error codes etc. are pasted in below.
[23:18:02] + Processing work unit
[23:18:02] Core required: FahCore_11.exe
[23:18:02] Core found.
[23:18:02] Working on queue slot 09 [November 17 23:18:02 UTC]
[23:18:02] + Working ...
[23:18:02]
[23:18:02] *------------------------------*
[23:18:02] [email protected] GPU Core - Beta
[23:18:02] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[23:18:02]
[23:18:02] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[23:18:02] Build host: amoeba
[23:18:02] Board Type: Nvidia
[23:18:02] Core :
[23:18:02] Preparing to commence simulation
[23:18:02] - Looking at optimizations...
[23:18:02] - Created dyn
[23:18:02] - Files status OK
[23:18:02] - Expanded 45481 -> 246249 (decompressed 541.4 percent)
[23:18:02] Called DecompressByteArray: compressed_data_size=45481 data_size=246249, decompressed_data_size=246249 diff=0
[23:18:02] - Digital signature verified
[23:18:02]
[23:18:02] Project: 5506 (Run 8, Clone 13, Gen 297)
[23:18:02]
[23:18:02] Assembly optimizations on if available.
[23:18:02] Entering M.D.
[23:18:09] Working on p5506_supervillin_e1
[23:18:09] Client config found, loading data.
[23:18:09] mdrun_gpu returned
[23:18:09] NANs detected on GPU
[23:18:09]
[23:18:09] [email protected] Core Shutdown: UNSTABLE_MACHINE
[23:18:12] CoreStatus = 7A (122)
[23:18:12] Sending work to server
[23:18:12] Project: 5506 (Run 8, Clone 13, Gen 297)
[23:18:12] - Read packet limit of 540015616... Set to 524286976.
[23:18:12] - Error: Could not get length of results file work/wuresults_09.dat
[23:18:12] - Error: Could not read unit 09 file. Removing from queue.
[23:18:12] EUE limit exceeded. Pausing 24 hours.
My dedicated Folding RIG
Specs -
Q6600 on EVGA 650i Tuniq Tower
Vista Ultimate x32
XFX 9800GTX stock cooling
OCZ Platinum 4X1GB PC26400 800
I'm running GPU2 ver 6.20 + Client Version 6.22 SMP Beta2
Using Set_Affinity_II_1.035
GPU is on core 3, SMP is on 012
config:
[settings]
username==Digger=
team=37726
passkey=
asknet=no
machineid=3
bigpackets=big
extra_parms=-local -forcegpu Nvidia_g80
local=848
[http]
active=no
host=localhost
port=8080
usereg=no
proxy_name=
proxy_passwd=
[core]
priority=96
cpuusage=100
disableassembly=no
nocpulock=0
checkpoint=15
[power]
battery=no
So far I have:
1)Reloaded windows
2)upgraded to Vista Ultimate (SP1)
3)Double checked and set BIOS to Default clocks and voltages.
4)Rolled back NVIDIA drivers to 178.24 from 180.43
5)re-installed NForce 15.23
6)Swapped out memory with another PC
7)Verified temps (73C)
8)Put affected GPU in another PC (Version 1.15 Ran Fine)
9)Put another GPU in this PC (runs one WU ok and EUE's)
10) copy paste old core into each GPU client (just downloads the new one automatically)
This is not another Stanford bashing thread. I'm trying to figure out if I've missed anything.
Suggestions, anyone?
My PPD is half what it was just a month ago.
The error codes etc. are pasted in below.
[23:18:02] + Processing work unit
[23:18:02] Core required: FahCore_11.exe
[23:18:02] Core found.
[23:18:02] Working on queue slot 09 [November 17 23:18:02 UTC]
[23:18:02] + Working ...
[23:18:02]
[23:18:02] *------------------------------*
[23:18:02] [email protected] GPU Core - Beta
[23:18:02] Version 1.19 (Mon Nov 3 09:34:13 PST 2008)
[23:18:02]
[23:18:02] Compiler : Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.762 for 80x86
[23:18:02] Build host: amoeba
[23:18:02] Board Type: Nvidia
[23:18:02] Core :
[23:18:02] Preparing to commence simulation
[23:18:02] - Looking at optimizations...
[23:18:02] - Created dyn
[23:18:02] - Files status OK
[23:18:02] - Expanded 45481 -> 246249 (decompressed 541.4 percent)
[23:18:02] Called DecompressByteArray: compressed_data_size=45481 data_size=246249, decompressed_data_size=246249 diff=0
[23:18:02] - Digital signature verified
[23:18:02]
[23:18:02] Project: 5506 (Run 8, Clone 13, Gen 297)
[23:18:02]
[23:18:02] Assembly optimizations on if available.
[23:18:02] Entering M.D.
[23:18:09] Working on p5506_supervillin_e1
[23:18:09] Client config found, loading data.
[23:18:09] mdrun_gpu returned
[23:18:09] NANs detected on GPU
[23:18:09]
[23:18:09] [email protected] Core Shutdown: UNSTABLE_MACHINE
[23:18:12] CoreStatus = 7A (122)
[23:18:12] Sending work to server
[23:18:12] Project: 5506 (Run 8, Clone 13, Gen 297)
[23:18:12] - Read packet limit of 540015616... Set to 524286976.
[23:18:12] - Error: Could not get length of results file work/wuresults_09.dat
[23:18:12] - Error: Could not read unit 09 file. Removing from queue.
[23:18:12] EUE limit exceeded. Pausing 24 hours.
My dedicated Folding RIG
Specs -
Q6600 on EVGA 650i Tuniq Tower
Vista Ultimate x32
XFX 9800GTX stock cooling
OCZ Platinum 4X1GB PC26400 800
I'm running GPU2 ver 6.20 + Client Version 6.22 SMP Beta2
Using Set_Affinity_II_1.035
GPU is on core 3, SMP is on 012
config:
[settings]
username==Digger=
team=37726
passkey=
asknet=no
machineid=3
bigpackets=big
extra_parms=-local -forcegpu Nvidia_g80
local=848
[http]
active=no
host=localhost
port=8080
usereg=no
proxy_name=
proxy_passwd=
[core]
priority=96
cpuusage=100
disableassembly=no
nocpulock=0
checkpoint=15
[power]
battery=no
So far I have:
1)Reloaded windows
2)upgraded to Vista Ultimate (SP1)
3)Double checked and set BIOS to Default clocks and voltages.
4)Rolled back NVIDIA drivers to 178.24 from 180.43
5)re-installed NForce 15.23
6)Swapped out memory with another PC
7)Verified temps (73C)
8)Put affected GPU in another PC (Version 1.15 Ran Fine)
9)Put another GPU in this PC (runs one WU ok and EUE's)
10) copy paste old core into each GPU client (just downloads the new one automatically)
This is not another Stanford bashing thread. I'm trying to figure out if I've missed anything.
Suggestions, anyone?