Overclock.net - An Overclocking Community - Reply to Topic
Thread: Anyone know how to fix "there is no domain decomposition" error? Reply to Thread
Title:
Message:

Register Now

In order to be able to post messages on the Overclock.net - An Overclocking Community forums, you must first register.
Please enter your desired user name, your email address and other required details in the form below.
User Name:
If you do not want to register, fill this field only and the name will be used as user name for your post.
Password
Please enter a password for your user account. Note that passwords are case-sensitive.
Password:
Confirm Password:
Email Address
Please enter a valid email address for yourself.
Email Address:

Log-in


  Additional Options
Miscellaneous Options

  Topic Review (Newest First)
11-16-2019 10:16 PM
Particle I managed to get FAH limited down to 24 threads long enough to complete that WU. I had run for a couple of days before and hadn't had any WUs that ran into that issue until that odd one.

Part of the trouble is that I cannot install FAHControl. It depends on a python package that is not available on Debian anymore. As such, I had to scour for how to edit the config by hand. I'd done it before years ago, but between then and now the information seemed to have gotten buried. It wasn't quick to find, but I eventually stumbled over threads from many years ago talking about how to limit the CPU count for other reasons.
11-16-2019 10:43 AM
mmonnin
Quote: Originally Posted by Particle View Post
Like I mentioned, the error suggests the WU can't execute on as many processors as I have in my system. I said that because it's in the documentation for gromacs. I just don't know how to limit FAH to use fewer cores.

http://www.gromacs.org/Documentation...l_size_of_x_nm

But if you want to see the system information lines, you're certainly welcome to:

Code:
22:39:44:WU00:FS00:0xa7:************************** Gromacs [email protected] Core ***************************
22:39:44:WU00:FS00:0xa7:       Type: 0xa7
22:39:44:WU00:FS00:0xa7:       Core: Gromacs
22:39:44:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 39817 -checkpoint 15 -np
22:39:44:WU00:FS00:0xa7:             63
22:39:44:WU00:FS00:0xa7:************************************ CBang *************************************
22:39:44:WU00:FS00:0xa7:       Date: Nov 5 2019
22:39:44:WU00:FS00:0xa7:       Time: 06:06:57
22:39:44:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
22:39:44:WU00:FS00:0xa7:     Branch: master
22:39:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:39:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
22:39:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:39:44:WU00:FS00:0xa7:       Bits: 64
22:39:44:WU00:FS00:0xa7:       Mode: Release
22:39:44:WU00:FS00:0xa7:************************************ System ************************************
22:39:44:WU00:FS00:0xa7:        CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
22:39:44:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
22:39:44:WU00:FS00:0xa7:       CPUs: 64
22:39:44:WU00:FS00:0xa7:     Memory: 31.35GiB
22:39:44:WU00:FS00:0xa7:Free Memory: 1.76GiB
22:39:44:WU00:FS00:0xa7:    Threads: POSIX_THREADS
22:39:44:WU00:FS00:0xa7: OS Version: 5.2
22:39:44:WU00:FS00:0xa7:Has Battery: false
22:39:44:WU00:FS00:0xa7: On Battery: false
22:39:44:WU00:FS00:0xa7: UTC Offset: -6
22:39:44:WU00:FS00:0xa7:        PID: 39821
22:39:44:WU00:FS00:0xa7:        CWD: /opt/fah/work
22:39:44:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
22:39:44:WU00:FS00:0xa7:    Version: 0.0.18
22:39:44:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
22:39:44:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
22:39:44:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
22:39:44:WU00:FS00:0xa7:       Date: Nov 5 2019
22:39:44:WU00:FS00:0xa7:       Time: 06:13:26
22:39:44:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
22:39:44:WU00:FS00:0xa7:     Branch: master
22:39:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:39:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
22:39:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:39:44:WU00:FS00:0xa7:       Bits: 64
22:39:44:WU00:FS00:0xa7:       Mode: Release
22:39:44:WU00:FS00:0xa7:************************************ Build *************************************
22:39:44:WU00:FS00:0xa7:       SIMD: avx_256
22:39:44:WU00:FS00:0xa7:********************************************************************************
If this was posted earlier and you had 32t then I could have said its something else and not a thread count limit. Posting a log/specs an easy thing to do and allows others to help you.

Default the client is setup as -1. Change it to another lower number that has more factors like 60 or 62.
https://foldingathome.org/support/fa...-expert-users/
11-16-2019 09:07 AM
tictoc It's been awhile since I played around with [email protected] on a CPU, but I just tried it on my 2970WX without manually adjusting the slots. The client attempted to run at cpu:47, but then corrected itself to ultimately run at cpu:45.

Here's the relevant section of the log:
Code:
15:44:36:WU00:FS01:0xa7:Reducing thread count from 47 to 46 to avoid domain decomposition by a prime number > 3
15:44:36:WU00:FS01:0xa7:Reducing thread count from 46 to 45 to avoid domain decomposition with large prime factor 23
Thread counts of large primes or with large prime factors will error out on gromacs. It looks like the client tried to run at cpu:63 and didn't auto-correct to a thread count that works. Easiest solution is to manually set the thread count for different slots. If you want to use all the cores/threads, run two 32 thread CPU slots. Not sure why the client didn't auto-correct to a thread count that works.

Currently sitting at 447k PPD running a p13794 on 45 threads. The slot description still shows cpu:47, but it is actually only running on 45 threads.
11-15-2019 07:55 PM
Particle
Quote: Originally Posted by mmonnin View Post
You mention not being able to run on as many processors in your system but do not provide the 1st 30 or so lines of the FAH log provide exactly that information.
Like I mentioned, the error suggests the WU can't execute on as many processors as I have in my system. I said that because it's in the documentation for gromacs. I just don't know how to limit FAH to use fewer cores.

http://www.gromacs.org/Documentation...l_size_of_x_nm

But if you want to see the system information lines, you're certainly welcome to:

Code:
22:39:44:WU00:FS00:0xa7:************************** Gromacs [email protected] Core ***************************
22:39:44:WU00:FS00:0xa7:       Type: 0xa7
22:39:44:WU00:FS00:0xa7:       Core: Gromacs
22:39:44:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 39817 -checkpoint 15 -np
22:39:44:WU00:FS00:0xa7:             63
22:39:44:WU00:FS00:0xa7:************************************ CBang *************************************
22:39:44:WU00:FS00:0xa7:       Date: Nov 5 2019
22:39:44:WU00:FS00:0xa7:       Time: 06:06:57
22:39:44:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
22:39:44:WU00:FS00:0xa7:     Branch: master
22:39:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:39:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
22:39:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:39:44:WU00:FS00:0xa7:       Bits: 64
22:39:44:WU00:FS00:0xa7:       Mode: Release
22:39:44:WU00:FS00:0xa7:************************************ System ************************************
22:39:44:WU00:FS00:0xa7:        CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
22:39:44:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
22:39:44:WU00:FS00:0xa7:       CPUs: 64
22:39:44:WU00:FS00:0xa7:     Memory: 31.35GiB
22:39:44:WU00:FS00:0xa7:Free Memory: 1.76GiB
22:39:44:WU00:FS00:0xa7:    Threads: POSIX_THREADS
22:39:44:WU00:FS00:0xa7: OS Version: 5.2
22:39:44:WU00:FS00:0xa7:Has Battery: false
22:39:44:WU00:FS00:0xa7: On Battery: false
22:39:44:WU00:FS00:0xa7: UTC Offset: -6
22:39:44:WU00:FS00:0xa7:        PID: 39821
22:39:44:WU00:FS00:0xa7:        CWD: /opt/fah/work
22:39:44:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
22:39:44:WU00:FS00:0xa7:    Version: 0.0.18
22:39:44:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
22:39:44:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
22:39:44:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
22:39:44:WU00:FS00:0xa7:       Date: Nov 5 2019
22:39:44:WU00:FS00:0xa7:       Time: 06:13:26
22:39:44:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
22:39:44:WU00:FS00:0xa7:     Branch: master
22:39:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:39:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
22:39:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:39:44:WU00:FS00:0xa7:       Bits: 64
22:39:44:WU00:FS00:0xa7:       Mode: Release
22:39:44:WU00:FS00:0xa7:************************************ Build *************************************
22:39:44:WU00:FS00:0xa7:       SIMD: avx_256
22:39:44:WU00:FS00:0xa7:********************************************************************************
11-15-2019 05:43 PM
Hydroplane "there is no domain decomposition" sounds kind of ominously threatening... :O
11-15-2019 04:48 PM
mmonnin You mention not being able to run on as many processors in your system but do not provide the 1st 30 or so lines of the FAH log provide exactly that information.
11-15-2019 03:42 PM
Particle
Anyone know how to fix "there is no domain decomposition" error?

I'm getting an error that seems to suggest it can't run on as many processors as I've got in my system. Does anyone know how to resolve this? I can't process any work units as this one just restarts over and over again. If I could limit my client to use fewer cores or delete this work unit, either way I think I could get by it.

Code:
22:39:44:WU00:FS00:0xa7:Project: 14244 (Run 0, Clone 66, Gen 88)
22:39:44:WU00:FS00:0xa7:Unit: 0x0000006b80fccb0a5d6ee315100fc9c9
22:39:44:WU00:FS00:0xa7:Reading tar file core.xml
22:39:44:WU00:FS00:0xa7:Reading tar file frame88.tpr
22:39:44:WU00:FS00:0xa7:Digital signatures verified
22:39:44:WU00:FS00:0xa7:Calling: mdrun -s frame88.tpr -o frame88.trr -x frame88.xtc -cpt 15 -nt 63
22:39:44:WU00:FS00:0xa7:Steps: first=22000000 total=250000
22:39:44:WU00:FS00:0xa7:ERROR:
22:39:44:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
22:39:44:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
22:39:44:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
22:39:44:WU00:FS00:0xa7:ERROR:
22:39:44:WU00:FS00:0xa7:ERROR:Fatal error:
22:39:44:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 49 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
22:39:44:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
22:39:44:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
22:39:44:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
22:39:44:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
22:39:44:WU00:FS00:0xa7:ERROR:-------------------------------------------------------

Posting Rules  
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off