Anyone know how to fix "there is no domain decomposition" error? - Overclock.net - An Overclocking Community

Forum Jump: 

Anyone know how to fix "there is no domain decomposition" error?

 
Thread Tools
post #1 of 7 (permalink) Old 11-15-2019, 03:42 PM - Thread Starter
Debian Dude
 
Particle's Avatar
 
Join Date: Jun 2010
Location: Soviet Kansastan
Posts: 2,168
Rep: 166 (Unique: 118)
Anyone know how to fix "there is no domain decomposition" error?

I'm getting an error that seems to suggest it can't run on as many processors as I've got in my system. Does anyone know how to resolve this? I can't process any work units as this one just restarts over and over again. If I could limit my client to use fewer cores or delete this work unit, either way I think I could get by it.

Code:
22:39:44:WU00:FS00:0xa7:Project: 14244 (Run 0, Clone 66, Gen 88)
22:39:44:WU00:FS00:0xa7:Unit: 0x0000006b80fccb0a5d6ee315100fc9c9
22:39:44:WU00:FS00:0xa7:Reading tar file core.xml
22:39:44:WU00:FS00:0xa7:Reading tar file frame88.tpr
22:39:44:WU00:FS00:0xa7:Digital signatures verified
22:39:44:WU00:FS00:0xa7:Calling: mdrun -s frame88.tpr -o frame88.trr -x frame88.xtc -cpt 15 -nt 63
22:39:44:WU00:FS00:0xa7:Steps: first=22000000 total=250000
22:39:44:WU00:FS00:0xa7:ERROR:
22:39:44:WU00:FS00:0xa7:ERROR:-------------------------------------------------------
22:39:44:WU00:FS00:0xa7:ERROR:Program GROMACS, VERSION 5.0.4-20191026-456f0d636-unknown
22:39:44:WU00:FS00:0xa7:ERROR:Source code file: /host/debian-stable-64bit-core-a7-avx-release/gromacs-core/build/gromacs/src/gromacs/mdlib/domdec.c, line: 6902
22:39:44:WU00:FS00:0xa7:ERROR:
22:39:44:WU00:FS00:0xa7:ERROR:Fatal error:
22:39:44:WU00:FS00:0xa7:ERROR:There is no domain decomposition for 49 ranks that is compatible with the given box and a minimum cell size of 1.45733 nm
22:39:44:WU00:FS00:0xa7:ERROR:Change the number of ranks or mdrun option -rcon or -dds or your LINCS settings
22:39:44:WU00:FS00:0xa7:ERROR:Look in the log file for details on the domain decomposition
22:39:44:WU00:FS00:0xa7:ERROR:For more information and tips for troubleshooting, please check the GROMACS
22:39:44:WU00:FS00:0xa7:ERROR:website at http://www.gromacs.org/Documentation/Errors
22:39:44:WU00:FS00:0xa7:ERROR:-------------------------------------------------------

Particle is offline  
Sponsored Links
Advertisement
 
post #2 of 7 (permalink) Old 11-15-2019, 04:48 PM
New to Overclock.net
 
mmonnin's Avatar
 
Join Date: Nov 2012
Posts: 5,839
Rep: 283 (Unique: 135)
You mention not being able to run on as many processors in your system but do not provide the 1st 30 or so lines of the FAH log provide exactly that information.


mmonnin is offline  
post #3 of 7 (permalink) Old 11-15-2019, 05:43 PM
Hardware Princess
 
Hydroplane's Avatar
 
Join Date: Aug 2011
Location: Buffalo
Posts: 1,730
Rep: 31 (Unique: 26)
"there is no domain decomposition" sounds kind of ominously threatening... :O

CHILLER DESTRUCTION
Build Log - White Voodoo
Z390 Station
(11 items)
White Voodoo
(15 items)
CPU
9900K @ 5.4 1.50v
Motherboard
EVGA Z390 Dark
GPU
Titan RTX 2085/8500
RAM
Trident Z 3200c14 @ 4133c16 1.5v
Hard Drive
970 Evo Plus 250gb
Power Supply
silverstone 1200 from 2008 (other one exploded)
Cooling
HWLabs GTX 360 Rad (in RED)
Cooling
3x EK Vader 120mm 2200rpm Fans
Case
Dimastech Test Bench
Monitor
AOC Agon AG251FZ 240Hz
Audio
Built-in Monitor Speakers (3 watt)
CPU
7980XE @ 4.3 GHz 1.10v
Motherboard
Asus Rampage VI Apex
GPU
2 x Aorus Waterforce WB 1080 Ti @ 2037/6318
RAM
Trident Z 16gb 3200c14 @ 3800c16 1.50V (the other two sticks were sacrificed)
Hard Drive
960 Evo 1TB
Power Supply
EVGA 1600 T2
Cooling
HWLabs GTR 420+280 Rads
Cooling
7 x Noctua Industrial 140mm 3000 RPM Fans
Cooling
EK D5 Pump/Res (with RGB!)
Case
LD Cooling PC-V7
Operating System
Windows 10 LTSB
Monitor
NEC PA271W
Keyboard
Razer Blackwidow Chroma V2
Mouse
Logitech M510
Mousepad
anime tiddy mousepad
CPU
Dual Intel Xeon L5520 2.26 GHz
RAM
48 GB Samsung DDR3-1066
Hard Drive
120 GB Samsung 750 Evo SSD
Hard Drive
2 TB Western Digital Green
Hard Drive
2 TB Hitachi 7200 RPM
Power Supply
650W Delta PSU
Case
1U Rackmount
Operating System
Ubuntu Server 16.04
▲ hide details ▲
Hydroplane is offline  
Sponsored Links
Advertisement
 
post #4 of 7 (permalink) Old 11-15-2019, 07:55 PM - Thread Starter
Debian Dude
 
Particle's Avatar
 
Join Date: Jun 2010
Location: Soviet Kansastan
Posts: 2,168
Rep: 166 (Unique: 118)
Quote: Originally Posted by mmonnin View Post
You mention not being able to run on as many processors in your system but do not provide the 1st 30 or so lines of the FAH log provide exactly that information.
Like I mentioned, the error suggests the WU can't execute on as many processors as I have in my system. I said that because it's in the documentation for gromacs. I just don't know how to limit FAH to use fewer cores.

http://www.gromacs.org/Documentation...l_size_of_x_nm

But if you want to see the system information lines, you're certainly welcome to:

Code:
22:39:44:WU00:FS00:0xa7:************************** Gromacs [email protected] Core ***************************
22:39:44:WU00:FS00:0xa7:       Type: 0xa7
22:39:44:WU00:FS00:0xa7:       Core: Gromacs
22:39:44:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 39817 -checkpoint 15 -np
22:39:44:WU00:FS00:0xa7:             63
22:39:44:WU00:FS00:0xa7:************************************ CBang *************************************
22:39:44:WU00:FS00:0xa7:       Date: Nov 5 2019
22:39:44:WU00:FS00:0xa7:       Time: 06:06:57
22:39:44:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
22:39:44:WU00:FS00:0xa7:     Branch: master
22:39:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:39:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
22:39:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:39:44:WU00:FS00:0xa7:       Bits: 64
22:39:44:WU00:FS00:0xa7:       Mode: Release
22:39:44:WU00:FS00:0xa7:************************************ System ************************************
22:39:44:WU00:FS00:0xa7:        CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
22:39:44:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
22:39:44:WU00:FS00:0xa7:       CPUs: 64
22:39:44:WU00:FS00:0xa7:     Memory: 31.35GiB
22:39:44:WU00:FS00:0xa7:Free Memory: 1.76GiB
22:39:44:WU00:FS00:0xa7:    Threads: POSIX_THREADS
22:39:44:WU00:FS00:0xa7: OS Version: 5.2
22:39:44:WU00:FS00:0xa7:Has Battery: false
22:39:44:WU00:FS00:0xa7: On Battery: false
22:39:44:WU00:FS00:0xa7: UTC Offset: -6
22:39:44:WU00:FS00:0xa7:        PID: 39821
22:39:44:WU00:FS00:0xa7:        CWD: /opt/fah/work
22:39:44:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
22:39:44:WU00:FS00:0xa7:    Version: 0.0.18
22:39:44:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
22:39:44:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
22:39:44:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
22:39:44:WU00:FS00:0xa7:       Date: Nov 5 2019
22:39:44:WU00:FS00:0xa7:       Time: 06:13:26
22:39:44:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
22:39:44:WU00:FS00:0xa7:     Branch: master
22:39:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:39:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
22:39:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:39:44:WU00:FS00:0xa7:       Bits: 64
22:39:44:WU00:FS00:0xa7:       Mode: Release
22:39:44:WU00:FS00:0xa7:************************************ Build *************************************
22:39:44:WU00:FS00:0xa7:       SIMD: avx_256
22:39:44:WU00:FS00:0xa7:********************************************************************************

Particle is offline  
post #5 of 7 (permalink) Old 11-16-2019, 09:07 AM
2+2=5
 
tictoc's Avatar
 
Join Date: Feb 2011
Posts: 4,493
It's been awhile since I played around with [email protected] on a CPU, but I just tried it on my 2970WX without manually adjusting the slots. The client attempted to run at cpu:47, but then corrected itself to ultimately run at cpu:45.

Here's the relevant section of the log:
Code:
15:44:36:WU00:FS01:0xa7:Reducing thread count from 47 to 46 to avoid domain decomposition by a prime number > 3
15:44:36:WU00:FS01:0xa7:Reducing thread count from 46 to 45 to avoid domain decomposition with large prime factor 23
Thread counts of large primes or with large prime factors will error out on gromacs. It looks like the client tried to run at cpu:63 and didn't auto-correct to a thread count that works. Easiest solution is to manually set the thread count for different slots. If you want to use all the cores/threads, run two 32 thread CPU slots. Not sure why the client didn't auto-correct to a thread count that works.

Currently sitting at 447k PPD running a p13794 on 45 threads. The slot description still shows cpu:47, but it is actually only running on 45 threads.


tictoc is offline  
post #6 of 7 (permalink) Old 11-16-2019, 10:43 AM
New to Overclock.net
 
mmonnin's Avatar
 
Join Date: Nov 2012
Posts: 5,839
Rep: 283 (Unique: 135)
Quote: Originally Posted by Particle View Post
Like I mentioned, the error suggests the WU can't execute on as many processors as I have in my system. I said that because it's in the documentation for gromacs. I just don't know how to limit FAH to use fewer cores.

http://www.gromacs.org/Documentation...l_size_of_x_nm

But if you want to see the system information lines, you're certainly welcome to:

Code:
22:39:44:WU00:FS00:0xa7:************************** Gromacs [email protected] Core ***************************
22:39:44:WU00:FS00:0xa7:       Type: 0xa7
22:39:44:WU00:FS00:0xa7:       Core: Gromacs
22:39:44:WU00:FS00:0xa7:       Args: -dir 00 -suffix 01 -version 705 -lifeline 39817 -checkpoint 15 -np
22:39:44:WU00:FS00:0xa7:             63
22:39:44:WU00:FS00:0xa7:************************************ CBang *************************************
22:39:44:WU00:FS00:0xa7:       Date: Nov 5 2019
22:39:44:WU00:FS00:0xa7:       Time: 06:06:57
22:39:44:WU00:FS00:0xa7:   Revision: 46c96f1aa8419571d83f3e63f9c99a0d602f6da9
22:39:44:WU00:FS00:0xa7:     Branch: master
22:39:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:39:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie -fPIC
22:39:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:39:44:WU00:FS00:0xa7:       Bits: 64
22:39:44:WU00:FS00:0xa7:       Mode: Release
22:39:44:WU00:FS00:0xa7:************************************ System ************************************
22:39:44:WU00:FS00:0xa7:        CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor
22:39:44:WU00:FS00:0xa7:     CPU ID: AuthenticAMD Family 23 Model 8 Stepping 2
22:39:44:WU00:FS00:0xa7:       CPUs: 64
22:39:44:WU00:FS00:0xa7:     Memory: 31.35GiB
22:39:44:WU00:FS00:0xa7:Free Memory: 1.76GiB
22:39:44:WU00:FS00:0xa7:    Threads: POSIX_THREADS
22:39:44:WU00:FS00:0xa7: OS Version: 5.2
22:39:44:WU00:FS00:0xa7:Has Battery: false
22:39:44:WU00:FS00:0xa7: On Battery: false
22:39:44:WU00:FS00:0xa7: UTC Offset: -6
22:39:44:WU00:FS00:0xa7:        PID: 39821
22:39:44:WU00:FS00:0xa7:        CWD: /opt/fah/work
22:39:44:WU00:FS00:0xa7:******************************** Build - libFAH ********************************
22:39:44:WU00:FS00:0xa7:    Version: 0.0.18
22:39:44:WU00:FS00:0xa7:     Author: Joseph Coffland <[email protected]>
22:39:44:WU00:FS00:0xa7:  Copyright: 2019 foldingathome.org
22:39:44:WU00:FS00:0xa7:   Homepage: https://foldingathome.org/
22:39:44:WU00:FS00:0xa7:       Date: Nov 5 2019
22:39:44:WU00:FS00:0xa7:       Time: 06:13:26
22:39:44:WU00:FS00:0xa7:   Revision: 490c9aa2957b725af319379424d5c5cb36efb656
22:39:44:WU00:FS00:0xa7:     Branch: master
22:39:44:WU00:FS00:0xa7:   Compiler: GNU 8.3.0
22:39:44:WU00:FS00:0xa7:    Options: -std=c++11 -O3 -funroll-loops -fno-pie
22:39:44:WU00:FS00:0xa7:   Platform: linux2 4.19.0-5-amd64
22:39:44:WU00:FS00:0xa7:       Bits: 64
22:39:44:WU00:FS00:0xa7:       Mode: Release
22:39:44:WU00:FS00:0xa7:************************************ Build *************************************
22:39:44:WU00:FS00:0xa7:       SIMD: avx_256
22:39:44:WU00:FS00:0xa7:********************************************************************************
If this was posted earlier and you had 32t then I could have said its something else and not a thread count limit. Posting a log/specs an easy thing to do and allows others to help you.

Default the client is setup as -1. Change it to another lower number that has more factors like 60 or 62.
https://foldingathome.org/support/fa...-expert-users/


mmonnin is offline  
post #7 of 7 (permalink) Old 11-16-2019, 10:16 PM - Thread Starter
Debian Dude
 
Particle's Avatar
 
Join Date: Jun 2010
Location: Soviet Kansastan
Posts: 2,168
Rep: 166 (Unique: 118)
I managed to get FAH limited down to 24 threads long enough to complete that WU. I had run for a couple of days before and hadn't had any WUs that ran into that issue until that odd one.

Part of the trouble is that I cannot install FAHControl. It depends on a python package that is not available on Debian anymore. As such, I had to scour for how to edit the config by hand. I'd done it before years ago, but between then and now the information seemed to have gotten buried. It wasn't quick to find, but I eventually stumbled over threads from many years ago talking about how to limit the CPU count for other reasons.

Particle is offline  
Reply

Quick Reply
Message:
Options

Register Now

In order to be able to post messages on the Overclock.net - An Overclocking Community forums, you must first register.
Please enter your desired user name, your email address and other required details in the form below.
User Name:
If you do not want to register, fill this field only and the name will be used as user name for your post.
Password
Please enter a password for your user account. Note that passwords are case-sensitive.
Password:
Confirm Password:
Email Address
Please enter a valid email address for yourself.
Email Address:

Log-in



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Show Printable Version Show Printable Version
Email this Page Email this Page


Forum Jump: 

Posting Rules  
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off