Overclock.net banner

1 - 5 of 5 Posts

·
Premium Member
Joined
·
8,679 Posts
Discussion Starter #1
I've been experiencing a very odd problem with my SMP client. The client downloads an A3 WU, folds it, and then refuses to upload. I know that this is not an issue with my internet because I have been downloading and uploading GPU2 WUs all day and my other computer (C2D running SMP2 under Linux) can also connect to the servers. I can also access the server via FireFox.

Here's an exerpt from my log file, if you need more I can post it:

Code:
Code:
--- Opening Log file [July 6 16:31:36 UTC]

# Windows SMP Console Edition #################################################

###############################################################################

                       [email protected] Client Version 6.29

                          http://folding.stanford.edu

###############################################################################

###############################################################################

Launch directory: C:\\FAH\\SMP

Executable: C:\\FAH\\SMP\\[email protected]

Arguments: -smp -verbosity 9

[16:31:36] - Ask before connecting: No

[16:31:36] - User name: iFX (Team 37726)

[16:31:36] - User ID: 7728BA4739E8CD00

[16:31:36] - Machine ID: 1

[16:31:36]

[16:31:36] Loaded queue successfully.

[16:31:36]

[16:31:36] - Autosending finished units... [July 6 16:31:36 UTC]

[16:31:36] + Processing work unit

[16:31:36] Trying to send all finished work units

[16:31:36] Core required: FahCore_a3.exe

[16:31:36] Project: 6701 (Run 7, Clone 24, Gen 13)

[16:31:36] Core found.

[16:31:36] + Attempting to send results [July 6 16:31:36 UTC]

[16:31:36] - Reading file work/wuresults_00.dat from core

[16:31:36] Working on queue slot 01 [July 6 16:31:36 UTC]

[16:31:36]   (Read 43593369 bytes from disk)

[16:31:36] + Working ...

[16:31:36] Connecting to http://171.64.65.56:8080/

[16:31:36] - Calling '.\\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 3 -verbose -lifeline 780 -version 629'

[16:31:36]

[16:31:36] *------------------------------*

[16:31:36] [email protected] Gromacs SMP Core

[16:31:36] Version 2.22 (Mar 12, 2010)

[16:31:36]

[16:31:36] Preparing to commence simulation

[16:31:36] - Ensuring status. Please wait.

[16:31:46] - Looking at optimizations...

[16:31:46] - Working with standard loops on this execution.

[16:31:46] - Previous termination of core was improper.

[16:31:46] - Going to use standard loops.

[16:31:46] - Files status OK

[16:31:46] - Expanded 1764978 -> 2250761 (decompressed 127.5 percent)

[16:31:46] Called DecompressByteArray: compressed_data_size=1764978 data_size=2250761, decompressed_data_size=2250761 diff=0

[16:31:46] - Digital signature verified

[16:31:46]

[16:31:46] Project: 6052 (Run 0, Clone 18, Gen 64)

[16:31:46]

[16:31:46] Entering M.D.

[16:31:52] Using Gromacs checkpoints

[16:31:53] Resuming from checkpoint

[16:31:53] Verified work/wudata_01.log

[16:31:53] Verified work/wudata_01.trr

[16:31:53] Verified work/wudata_01.edr

[16:31:53] Completed 210608 out of 500000 steps  (42%)

[16:35:56] Completed 215000 out of 500000 steps  (43%)

[16:39:35] Completed 220000 out of 500000 steps  (44%)

[16:43:06] Completed 225000 out of 500000 steps  (45%)

[16:46:22] Completed 230000 out of 500000 steps  (46%)

[16:50:18] Completed 235000 out of 500000 steps  (47%)

[16:53:18] - Couldn't send HTTP request to server

[16:53:18] + Could not connect to Work Server (results)

[16:53:18]     (171.64.65.56:8080)

[16:53:18] + Retrying using alternative port

[16:53:18] Connecting to http://171.64.65.56:80/

[16:53:21] - Couldn't send HTTP request to server

[16:53:21] + Could not connect to Work Server (results)

[16:53:21]     (171.64.65.56:80)

[16:53:21] - Error: Could not transmit unit 00 (completed July 6) to work server.

[16:53:21] - 7 failed uploads of this unit.

[16:53:21] + Attempting to send results [July 6 16:53:21 UTC]

[16:53:21] - Reading file work/wuresults_00.dat from core

[16:53:21]   (Read 43593369 bytes from disk)

[16:53:21] Connecting to http://171.67.108.25:8080/

[16:54:12] Completed 240000 out of 500000 steps  (48%)

[16:57:45] Posted data.

[16:57:45] Initial: 0000; + Could not connect to Work Server (results)

[16:57:45]     (171.67.108.25:8080)

[16:57:45] + Retrying using alternative port

[16:57:45] Connecting to http://171.67.108.25:80/

[16:58:12] Completed 245000 out of 500000 steps  (49%)

[17:01:43] Completed 250000 out of 500000 steps  (50%)

[17:05:22] Completed 255000 out of 500000 steps  (51%)

[17:08:39] Completed 260000 out of 500000 steps  (52%)

[17:12:00] Completed 265000 out of 500000 steps  (53%)

[17:15:56] Completed 270000 out of 500000 steps  (54%)

[17:18:19] - Couldn't send HTTP request to server

[17:18:19] + Could not connect to Work Server (results)

[17:18:19]     (171.67.108.25:80)

[17:18:19]   Could not transmit unit 00 to Collection server; keeping in queue.

[17:18:19] Project: 6701 (Run 7, Clone 24, Gen 13)

[17:18:19] + Attempting to send results [July 6 17:18:19 UTC]

[17:18:19] - Reading file work/wuresults_00.dat from core

[17:18:19]   (Read 43593369 bytes from disk)

[17:18:19] Connecting to http://171.64.65.56:8080/

[17:20:03] Completed 275000 out of 500000 steps  (55%)

[17:23:18] Completed 280000 out of 500000 steps  (56%)

[17:26:59] Completed 285000 out of 500000 steps  (57%)

[17:30:33] Completed 290000 out of 500000 steps  (58%)

--- Opening Log file [July 6 17:37:05 UTC]

# Windows SMP Console Edition #################################################

###############################################################################

                       [email protected] Client Version 6.29

                          http://folding.stanford.edu

###############################################################################

###############################################################################

Launch directory: C:\\FAH\\SMP

Executable: C:\\FAH\\SMP\\[email protected]

Arguments: -smp -verbosity 9

[17:37:05] - Ask before connecting: No

[17:37:05] - User name: iFX (Team 37726)

[17:37:05] - User ID: 7728BA4739E8CD00

[17:37:05] - Machine ID: 1

[17:37:05]

[17:37:05] Loaded queue successfully.

[17:37:05]

[17:37:05] - Autosending finished units... [July 6 17:37:05 UTC]

[17:37:05] + Processing work unit

[17:37:05] Trying to send all finished work units

[17:37:05] Core required: FahCore_a3.exe

[17:37:05] Project: 6701 (Run 7, Clone 24, Gen 13)

[17:37:05] Core found.

[17:37:05] + Attempting to send results [July 6 17:37:05 UTC]

[17:37:05] - Reading file work/wuresults_00.dat from core

[17:37:05] Working on queue slot 01 [July 6 17:37:05 UTC]

[17:37:06]   (Read 43593369 bytes from disk)

[17:37:06] + Working ...

[17:37:06] Connecting to http://171.64.65.56:8080/

[17:37:06] - Calling '.\\FahCore_a3.exe -dir work/ -nice 19 -suffix 01 -np 8 -checkpoint 3 -verbose -lifeline 2400 -version 629'

[17:37:06]

[17:37:06] *------------------------------*

[17:37:06] [email protected] Gromacs SMP Core

[17:37:06] Version 2.22 (Mar 12, 2010)

[17:37:06]

[17:37:06] Preparing to commence simulation

[17:37:06] - Looking at optimizations...

[17:37:06] - Files status OK

[17:37:06] - Expanded 1764978 -> 2250761 (decompressed 127.5 percent)

[17:37:06] Called DecompressByteArray: compressed_data_size=1764978 data_size=2250761, decompressed_data_size=2250761 diff=0

[17:37:06] - Digital signature verified

[17:37:06]

[17:37:06] Project: 6052 (Run 0, Clone 18, Gen 64)

[17:37:06]

[17:37:06] Assembly optimizations on if available.

[17:37:06] Entering M.D.

[17:37:13] Using Gromacs checkpoints

[17:37:13] Resuming from checkpoint

[17:37:13] Verified work/wudata_01.log

[17:37:13] Verified work/wudata_01.trr

[17:37:13] Verified work/wudata_01.edr

[17:37:14] Completed 291908 out of 500000 steps  (58%)

[17:39:33] Completed 295000 out of 500000 steps  (59%)

[17:42:51] Completed 300000 out of 500000 steps  (60%)

[17:46:26] Completed 305000 out of 500000 steps  (61%)

[17:50:12] Completed 310000 out of 500000 steps  (62%)

[17:53:44] Completed 315000 out of 500000 steps  (63%)

[17:57:07] Completed 320000 out of 500000 steps  (64%)

[18:00:39] Completed 325000 out of 500000 steps  (65%)

[18:04:07] Completed 330000 out of 500000 steps  (66%)

[18:07:54] Completed 335000 out of 500000 steps  (67%)

[18:11:19] Completed 340000 out of 500000 steps  (68%)

[18:14:41] Completed 345000 out of 500000 steps  (69%)

[18:18:04] Completed 350000 out of 500000 steps  (70%)

[18:21:40] Completed 355000 out of 500000 steps  (71%)

[18:25:11] Completed 360000 out of 500000 steps  (72%)

[18:28:47] Completed 365000 out of 500000 steps  (73%)

[18:32:20] Completed 370000 out of 500000 steps  (74%)

[18:36:13] Completed 375000 out of 500000 steps  (75%)

[18:40:29] Completed 380000 out of 500000 steps  (76%)

[18:44:05] Completed 385000 out of 500000 steps  (77%)

[18:47:51] Completed 390000 out of 500000 steps  (78%)

[18:51:29] Completed 395000 out of 500000 steps  (79%)

[18:55:42] Completed 400000 out of 500000 steps  (80%)

[18:59:43] Completed 405000 out of 500000 steps  (81%)

[19:03:27] Completed 410000 out of 500000 steps  (82%)

[19:06:45] Completed 415000 out of 500000 steps  (83%)

[19:10:02] Completed 420000 out of 500000 steps  (84%)

[19:13:31] Completed 425000 out of 500000 steps  (85%)

[19:16:55] Completed 430000 out of 500000 steps  (86%)

[19:20:28] Completed 435000 out of 500000 steps  (87%)

[19:23:24] - Couldn't send HTTP request to server

[19:23:24] + Could not connect to Work Server (results)

[19:23:24]     (171.64.65.56:8080)

[19:23:24] + Retrying using alternative port

[19:23:24] Connecting to http://171.64.65.56:80/

[19:23:46] Completed 440000 out of 500000 steps  (88%)

[19:27:29] Completed 445000 out of 500000 steps  (89%)

[19:31:44] Completed 450000 out of 500000 steps  (90%)

[19:35:50] Completed 455000 out of 500000 steps  (91%)

[19:39:02] Completed 460000 out of 500000 steps  (92%)

[19:42:15] - Couldn't send HTTP request to server

[19:42:15] + Could not connect to Work Server (results)

[19:42:15]     (171.64.65.56:80)

[19:42:15] - Error: Could not transmit unit 00 (completed July 6) to work server.

[19:42:15] - 8 failed uploads of this unit.

[19:42:15] + Attempting to send results [July 6 19:42:15 UTC]

[19:42:15] - Reading file work/wuresults_00.dat from core

[19:42:15]   (Read 43593369 bytes from disk)

[19:42:15] Connecting to http://171.67.108.25:8080/

[19:42:22] Completed 465000 out of 500000 steps  (93%)

[19:44:16] - Couldn't send HTTP request to server

[19:44:16] + Could not connect to Work Server (results)

[19:44:16]     (171.67.108.25:8080)

[19:44:16] + Retrying using alternative port

[19:44:16] Connecting to http://171.67.108.25:80/

[19:46:21] Completed 470000 out of 500000 steps  (94%)

[19:47:35] Posted data.

[19:47:35] Initial: 0000; + Could not connect to Work Server (results)

[19:47:35]     (171.67.108.25:80)

[19:47:35]   Could not transmit unit 00 to Collection server; keeping in queue.

[19:47:35] Project: 6701 (Run 7, Clone 24, Gen 13)

[19:47:35] + Attempting to send results [July 6 19:47:35 UTC]

[19:47:35] - Reading file work/wuresults_00.dat from core

[19:47:35]   (Read 43593369 bytes from disk)

[19:47:35] Connecting to http://171.64.65.56:8080/

[19:50:14] Completed 475000 out of 500000 steps  (95%)

[19:54:15] Completed 480000 out of 500000 steps  (96%)

[19:57:53] Completed 485000 out of 500000 steps  (97%)

[20:01:22] Completed 490000 out of 500000 steps  (98%)

[20:05:27] Completed 495000 out of 500000 steps  (99%)

[20:08:56] Completed 500000 out of 500000 steps  (100%)

[20:08:57] DynamicWrapper: Finished Work Unit: sleep=10000

[20:09:07]

[20:09:07] Finished Work Unit:

[20:09:07] - Reading up to 3698496 from "work/wudata_01.trr": Read 3698496

[20:09:07] trr file hash check passed.

[20:09:07] edr file hash check passed.

[20:09:07] logfile size: 66520

[20:09:07] Leaving Run

[20:09:09] - Writing 3800568 bytes of core data to disk...

[20:09:09]   ... Done.

[20:09:09] - Shutting down core

[20:09:09]

[20:09:09] [email protected] Core Shutdown: FINISHED_UNIT

[20:09:14] CoreStatus = 64 (100)

[20:09:14] Unit 1 finished with 95 percent of time to deadline remaining.

[20:09:14] Updated performance fraction: 0.930753

[20:09:14] Sending work to server

[20:09:14] - Already sending work

[20:09:14] Trying to send all finished work units

[20:09:14] - Already sending work

[20:09:14] - Already sending work

[20:09:14] + Sent 0 of 2 completed units to the server

[20:09:14] - Preparing to get new work unit...

[20:09:14] Cleaning up work directory

[20:09:14] + Attempting to get work packet

[20:09:14] Passkey found

[20:09:14] - Will indicate memory of 2046 MB

[20:09:14] - Detect CPU. Vendor: GenuineIntel, Family: 6, Model: 14, Stepping: 5

[20:09:14] - Connecting to assignment server

[20:09:14] Connecting to http://assign.stanford.edu:8080/

[20:09:16] Posted data.

[20:09:16] Initial: 40AB; - Successful: assigned to (171.64.65.56).

[20:09:16] + News From [email protected]: Welcome to [email protected]

[20:09:16] Loaded queue successfully.

[20:09:16] Connecting to http://171.64.65.56:8080/

[20:09:21] Posted data.

[20:09:21] Initial: 0000; - Receiving payload (expected size: 763807)

[20:09:31] - Downloaded at ~74 kB/s

[20:09:31] - Averaged speed for that direction ~144 kB/s

[20:09:31] + Received work.

[20:09:31] Trying to send all finished work units

[20:09:31] - Already sending work

[20:09:31] - Already sending work

[20:09:31] + Sent 0 of 2 completed units to the server

[20:09:31] + Closed connections

[20:09:31]

[20:09:31] + Processing work unit

[20:09:31] Core required: FahCore_a3.exe

[20:09:31] Core found.

[20:09:31] Working on queue slot 02 [July 6 20:09:31 UTC]

[20:09:31] + Working ...

[20:09:31] - Calling '.\\FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 8 -checkpoint 3 -verbose -lifeline 2400 -version 629'

[20:09:31]

[20:09:31] *------------------------------*

[20:09:31] [email protected] Gromacs SMP Core

[20:09:31] Version 2.22 (Mar 12, 2010)

[20:09:31]

[20:09:31] Preparing to commence simulation

[20:09:31] - Looking at optimizations...

[20:09:31] - Created dyn

[20:09:31] - Files status OK

[20:09:31] - Expanded 763295 -> 1404481 (decompressed 184.0 percent)

[20:09:31] Called DecompressByteArray: compressed_data_size=763295 data_size=1404481, decompressed_data_size=1404481 diff=0

[20:09:31] - Digital signature verified

[20:09:31]

[20:09:31] Project: 6701 (Run 7, Clone 24, Gen 13)

[20:09:31]

[20:09:31] Assembly optimizations on if available.

[20:09:31] Entering M.D.

[20:09:38] Completed 0 out of 2000000 steps  (0%)

[20:11:15] Killing all core threads

[20:11:15] Could not get process id information.  Please kill core process manually

[email protected] Client Shutdown at user request.

[20:11:15] ***** Got a SIGTERM signal (2)

[20:11:15] Killing all core threads

[20:11:15] Could not get process id information.  Please kill core process manually

[email protected] Client Shutdown.

--- Opening Log file [July 6 20:19:43 UTC]

# Windows SMP Console Edition #################################################

###############################################################################

                       [email protected] Client Version 6.29

                          http://folding.stanford.edu

###############################################################################

###############################################################################

Launch directory: C:\\FAH\\SMP

Executable: C:\\FAH\\SMP\\[email protected]

Arguments: -smp -verbosity 9

[20:19:43] - Ask before connecting: No

[20:19:43] - User name: iFX (Team 37726)

[20:19:43] - User ID: 7728BA4739E8CD00

[20:19:43] - Machine ID: 1

[20:19:43]

[20:19:43] Loaded queue successfully.

[20:19:43]

[20:19:43] - Autosending finished units... [July 6 20:19:43 UTC]

[20:19:43] + Processing work unit

[20:19:43] Trying to send all finished work units

[20:19:43] Core required: FahCore_a3.exe

[20:19:43] Project: 6701 (Run 7, Clone 24, Gen 13)

[20:19:43] Core found.

[20:19:43] + Attempting to send results [July 6 20:19:43 UTC]

[20:19:43] - Reading file work/wuresults_00.dat from core

[20:19:43] Working on queue slot 02 [July 6 20:19:43 UTC]

[20:19:43]   (Read 43593369 bytes from disk)

[20:19:43] + Working ...

[20:19:43] Connecting to http://171.64.65.56:8080/

[20:19:43] - Calling '.\\FahCore_a3.exe -dir work/ -nice 19 -suffix 02 -np 8 -checkpoint 3 -verbose -lifeline 2036 -version 629'

[20:19:43]

[20:19:43] *------------------------------*

[20:19:43] [email protected] Gromacs SMP Core

[20:19:43] Version 2.22 (Mar 12, 2010)

[20:19:43]

[20:19:43] Preparing to commence simulation

[20:19:43] - Ensuring status. Please wait.

[20:19:53] - Looking at optimizations...

[20:19:53] - Working with standard loops on this execution.

[20:19:53] - Previous termination of core was improper.

[20:19:53] - Files status OK

[20:19:53] - Expanded 763295 -> 1404481 (decompressed 184.0 percent)

[20:19:53] Called DecompressByteArray: compressed_data_size=763295 data_size=1404481, decompressed_data_size=1404481 diff=0

[20:19:53] - Digital signature verified

[20:19:53]

[20:19:53] Project: 6701 (Run 7, Clone 24, Gen 13)

[20:19:53]

[20:19:53] Entering M.D.

[20:19:59] Completed 0 out of 2000000 steps  (0%)
I'd obviously like to get this resolved ASAP because I currently have 2 WUs that won't upload, and their usefulness to Stanford and the bonus points keep going down


One idea I had that would work at least temporarily is to -oneunit the client, copy the work folder and queue.dat over to another computer and upload it from there....would that work?
 

·
Premium Member
Joined
·
7,161 Posts
When I read your post the first thing I thought was, "I bet it is a 6701 he can't upload, now I am not sure, what WU did you have trouble uploading?for me it was the 6701.
 

·
Premium Member
Joined
·
8,679 Posts
Discussion Starter #3
Quote:


Originally Posted by PCCstudent
View Post

When I read your post the first thing I thought was, "I bet it is a 6701 he can't upload, now I am not sure, what WU did you have trouble uploading?for me it was the 6701.

Yep, a P6701. But it also wouldn't upload another A3 that it completed this afternoon


I've reinstalled Windows on a spare HDD, doesn't seem to be working


Any suggestions?
 

·
Premium Member
Joined
·
7,161 Posts
My feelings are the issue is by design not an error. Think of it,you just finished a good 3K+WU but you can't upload it so it goes to queqe. You can't upload but it sure is possible to download a 6701. Now if you want to delete this 6701 you must also delete the 3k in the queqe,so you finish the 6701, just a way to make sure you don't WU shop, Hey just brainstorming, I have no proof except the same thing happen with me. Perhaps, and I say just perhaps, too many 6701 are getting deleted.

When it happened with me it was "server maintince" time.
 

·
Premium Member
Joined
·
8,679 Posts
Discussion Starter #5
First I got a P6701, it finished about 12 hours ago and refused to upload. Then I got a standard A3, it finished and refused to upload as well. Now it's picked up another P6701
 
1 - 5 of 5 Posts
Top