Overclock.net › Forums › Overclockers Care › Overclock.net BOINC Team › Project "Headless Linux CLI Multiple GPU Boinc Server" - Ubuntu Server 12.04.4/14.04.1 64bit - Using GPU's from GeForce GT610/GT640/GTX750ti/+ to crunch data.
New Posts  All Forums:Forum Nav:

Project "Headless Linux CLI Multiple GPU Boinc Server" - Ubuntu Server 12.04.4/14.04.1 64bit - Using GPU's from GeForce GT610/GT640/GTX750ti/+ to crunch data. - Page 9

post #81 of 343
Quote:
Originally Posted by DanHansenDK View Post

Hi Magic,
OK, this I didn't know wink.gif That's indeed a way to solve the problem. I didn't know that it was possible to remove the chipset from the board. Without a hacksaw that is wink.gif I'll look a the chips to identify this possibility and thanks a lot for letting me know wink.gif
Glad you got it working thumb.gif
Precious
(23 items)
 
Intel 4P Rig
(16 items)
 
AMD 4P Rig
(9 items)
 
CPUCPUCPUCPU
Xeon E5-4650 (ES) C0 Xeon E5-4650 (ES) C0 Xeon E5-4650 (ES) C0 Xeon E5-4650 (ES) C0 
MotherboardGraphicsRAMHard Drive
SuperMicro X9QRi-F+ G200 on board  4GB PC3-12800R x16 OCZ Deneva  
Optical DriveCoolingOSMonitor
Samsung CD/DVD burner 4 x Cooler Master Hyper 212 Server 2012 Standard None 
PowerCaseMouse
OCZ ZX 1250 Rosewill Blackhawk Ultra None 
CPUCPUCPUCPU
Opteron 6376 Opteron 6376 Opteron 6376 Opteron 6376 
MotherboardRAMHard DriveCooling
H8QGi+-F Hynix 16 x HMT151R7BFR4C-H9 4GB 2RX4 PC3-10600R Silicon Power Slim S55 2.5" 480GB SATA III SSD ... Cooler Master Hyper 212 
OS
Ubuntu Server 14.04 
  hide details  
Reply
Precious
(23 items)
 
Intel 4P Rig
(16 items)
 
AMD 4P Rig
(9 items)
 
CPUCPUCPUCPU
Xeon E5-4650 (ES) C0 Xeon E5-4650 (ES) C0 Xeon E5-4650 (ES) C0 Xeon E5-4650 (ES) C0 
MotherboardGraphicsRAMHard Drive
SuperMicro X9QRi-F+ G200 on board  4GB PC3-12800R x16 OCZ Deneva  
Optical DriveCoolingOSMonitor
Samsung CD/DVD burner 4 x Cooler Master Hyper 212 Server 2012 Standard None 
PowerCaseMouse
OCZ ZX 1250 Rosewill Blackhawk Ultra None 
CPUCPUCPUCPU
Opteron 6376 Opteron 6376 Opteron 6376 Opteron 6376 
MotherboardRAMHard DriveCooling
H8QGi+-F Hynix 16 x HMT151R7BFR4C-H9 4GB 2RX4 PC3-10600R Silicon Power Slim S55 2.5" 480GB SATA III SSD ... Cooler Master Hyper 212 
OS
Ubuntu Server 14.04 
  hide details  
Reply
post #82 of 343
Thread Starter 
Hi Guys,


Thanks Tex, me too.. It always helps to hear from others! Especially others who know what they are talking about thumb.gif


Hi Tex,
Tex, I missed this post! Why I don't know:
Quote:
You asked a lot of questions... well, you surely don't want to experience thermal runaway and about the only cure for that would be a case mod or change. Fact is, a 2U or 3U case may work better for you.
It is 2U cases! I chose 2U cases because CPU fans needed the space and because it's impossible to get more than 2 graphic cards into a 1U case. It's to small too, as you also said. It's just not right for the project. But, there's no need for 3 or 4U cases I think. Only if you want to use faster and better graphic cards. As you know I'm struggling to find a faster card than GT640 which can endure the same heat as this Asus card wink.gif



I just rebuild the case! As you see at the picture, the PSU is placed in the front of the case. This because an ordinary PSU can be used. I don't like that! It's totally stu... to place the PSU there. All the heat from the PSU will be routed right in to the centre of the case where we are working to keep the temp. down!! Not a very smart solution. Well, smart for those who just need a 2U case and hasn't got issues with CPU's/GPU's which are working at 100% all the time a produces a ton of heat wink.gif
This is how I place the PSU, HD etc. The HD is placed between the 2 front openings and the place where the PSU was suppose to be:




Quote:
Most projects take little memory, others take 500Meg or more (NFS) up to 10Gig or more (Lattice) per WU. There would be a few projects you simply could not run and you would learn fast what those were when the virtual HD started thrashing. IF it was up to me, 2Gig per core would be nice... but likely most projects would do fine with 1Gig per core...
Please explain me about this Tex wink.gif Because I always thought that 2Gigs of memory was more than enough. One of the Guru's from SETI talked about this some time ago. So if I'm wrong, please let me know. I'm using 4 Gb for the test systems, but the new test system has got 8Gb. It was a better deal for me to by two at a time instead of one 4Gb module. So please explain why it's better to use 8 or even more, instead of 4. Because I was about to remove one 4Gb module to be used in the next test system. Test system 4 where I planned using water cooling wink.gif
Edited by DanHansenDK - 8/12/14 at 11:02am
post #83 of 343
Thread Starter 
Hello friends wink.gif

20:33 --> Updating BIOS - Very pleased, because I read the manual. To update the Asus mobo there were some pretty funny things I had to do. DL the update, extract it and rename the imagefile(.cab) to M6E.CAP wink.gif
After this I'll update ASRock board. It's not needed because I changed CPU, but to avoid other issues. Why not do it every time you get a new mobo? Update the BIOS. I didn't think the BIOS would be so much older. Well, let's go.. and then on with the test.

23:39 --> BIOS updated on Asus mobo (my test bench) where I "lent" the Z87 CPU from and it now runs with the "Devils Canyon" CPU (i5-4690K) Remember I wished for a function to update the ASRock mobo without the need of a Z87 CPU to make it possible to update? Of course the Asus Maximus VI Extreme mobo had such a function!!! Just needed to copy the BIOS imagefile to a Fat32 Usb-key, rename it to M6E.CAP, plug it in the usb-connector on the backside of the mobo, and then press a special "update" button. Lights were flashing and after about 60 seconds the BIOS was updated!! Pretty fancy stuff. I guess you get some extra tools to play with, when paying more than double wink.gif
OK. testing time. I've installed all 4 graphic cards and we are ready to install the "Headless Linux Multiple GPU Boinc Server". I'll use the Nvidia/grep command to check temp. of the GPU's. It's more accurate. Then, after the system is running, without getting to hot on card 4, I'll install the Fan-control/Temp.-control from AeroCool. Remember, the reason for using this peace of hardware is to avoid the issues which comes along the automated fan-control on the mobo's. Regardless of which setup you choose, the fan's are always increasing and decreasing, all the time. Never stops. This function is to save energy, of course, and this is pretty need to, but not when you are running 1 CPU and 4 GPU's 100%, 24/7/365 !!!! With AeroCool we get to control it ourselves and then we have a visual indicator as well. My shell scripts CPUWatchDog and GPUWatchDog, HDWatchDog and FanWatchDog will control all of it and shut the system down if anything get's too hot or something brakes down. But, it's nice to be able to see the numbers (RPM,Temp, etc) and when something goes wrong or e.g. a Fan stops working AeroCool will deploy an alarm. More about this later smile.gif

First we'll make the ToDo, so that you can make a "Headless Linux Multiple GPU's Boinc Server" as well.
* because of the issues that appeared with the Ubuntu 12.04.4 update, and the CUDA5.5 debian packages wouldn't work, this ToDo is still based on the 12.10. I got a few things I would like to be solved, so if you have a solution, please don't hesitate to write us wink.gif
Problem: # static IP . I'll show how it's done in 12.04, and here it works perfectly, it always has. But in 12.10 it doesn't. Please don't google it and write the result. We have already tried several things, so it's not an easy fix. Not that easy biggrin.gif
Code:
# Reconfigure the network to static IP - Ubuntu Server 12.04:

Command: #1 vi /etc/network/interfaces
File:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet static
        address 192.168.1.xxx
        netmask 255.255.255.0
        network 192.168.1.0
        broadcast 192.168.1.255
        gateway 192.168.1.1
        dns-nameservers 8.8.8.8 8.8.4.4

#2 
Command: # /etc/init.d/networking restart

#3 
Command: # vi /etc/hosts
File:

127.0.0.1       localhost.localdomain   localhost
192.168.1.xxx   xxxxxx.domain.tld               xxxxxx

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

But it's different in 12.10. We'll only need 12.10 until they fix the bug in the 12.04.4 update. But until then, it would be nice to be able to use static IP's wink.gif
You can read about the "Bug" here ( look for *** ): https://developer.nvidia.com/cuda-toolkit-55-archive
Edited by DanHansenDK - 8/13/14 at 3:08am
post #84 of 343
Well, aside from the benefit of having dual channel memory operation with two modules, 2 Gig for each core will allow you to run NFS big tasks that are one of the best CPU points projects out there. If you are not concerned about NFS or Lattice or other large memory footprint projects, feel free to run the minimum of 1 Gig per core or less. Some projects take very little to be sure.

Looks awesome BTW...

You are doing a great job!

biggrin.gif
Blue Beast
(13 items)
 
  
CPUMotherboardGraphicsRAM
W3670 4.0 GHz (HT On) 1.345v 24/7 Asus P6X58D Premium DUAL EVGA GTX 560 Ti SC 1G in SLI 12G Corsair Dominator GT2000MHz 
Hard DriveOptical DriveOSMonitor
Kingston V300 120G SSD, 2x 1TB Barracuda HD's LG BlueRay/LightScribe Burner Windows 7 Pro 64Bit 3x24", 1x22" LCD 
KeyboardPowerCaseMouse
Wireless Logitech K320 ULTRA X4 1200W Custom Danger Den LDR-29 Wireless Logitech M310 
Mouse Pad
Custom 
  hide details  
Reply
Blue Beast
(13 items)
 
  
CPUMotherboardGraphicsRAM
W3670 4.0 GHz (HT On) 1.345v 24/7 Asus P6X58D Premium DUAL EVGA GTX 560 Ti SC 1G in SLI 12G Corsair Dominator GT2000MHz 
Hard DriveOptical DriveOSMonitor
Kingston V300 120G SSD, 2x 1TB Barracuda HD's LG BlueRay/LightScribe Burner Windows 7 Pro 64Bit 3x24", 1x22" LCD 
KeyboardPowerCaseMouse
Wireless Logitech K320 ULTRA X4 1200W Custom Danger Den LDR-29 Wireless Logitech M310 
Mouse Pad
Custom 
  hide details  
Reply
post #85 of 343
Thread Starter 
Here's the ToDo wink.gif This only took me about 3 month to solve. Well, it's not a thing to be proud about, I know. To work 3 month on solving the problem of bulding a "Rack-mounted Headless CLI Ubuntu Linux Multiple GPU Boinc Server". Some solve these things in a day I guess, but I've only been "doing" Linux for about a year and a half, so for me this was a huge step to take wink.gif I got some ideas from a few guy's at SETI, and I got a few good ideas from a guy in Netherland and then I got some help from you guys inhere wink.gif Indeed I did, but I mainly solved it because I tried again, and again, and again, and again....

The idea is to build a system which is semi-professional and not so pricey. So that an ordinary guy like me can build a system like this and add a new "Boinc Server" to the Rack every now and then. So that everybody can participate and be a part of the solution. The solution to support one or more of all the great Boinc projects out there. This is my way to help increasing the numbers of "members" or computers doing Boinc.

05:42 am --> The ToDo (only short version, showing how to install and set up a headless cli linux multiple gpu boinc cruncher)


07:51 am --> OK, have been trying to make a temporary ToDo, with some help regarding the Linux Server installation, but there's just to many unanswered issues, so here's the raw ToDo "How To Make A Headless CLI Ubuntu Linux Multiple GPU Boinc Server". I will make a complete ToDo showing everything. But here's how to make it work, if you already have a Ubuntu Server running (12.10).
Code:
HEADLESS LINUX CLI MULTIPLE GPU BOINC SERVER - RACK-MOUNTED HEADLESS BOINC SUPER CRUNCHER
OS: UBUNTU SERVER 12.04 64Bit
GPU'S: NVIDIA
VER.1.1.2 13.08.14 00:27:00

IMPORTANT! BECAUSE OF THE UBUNTU 12.04.4 UPDATE "BUG" THE CUDA5.5 DEBIAN PACKAGE WILL NOT WORK. THEREFORE 12.10 IS USED FOR NOW!
Read about it here ( look for *** ) [URL=https://developer.nvidia.com/cuda-toolkit-55-archive]https://developer.nvidia.com/cuda-toolkit-55-archive[/URL]

- LM-sensors - need it to run my shellscripts CPUTempWatchdog.sh,GPUTempWatchdog.sh,HDDTempWatchdog,FANcontrolWatchdog.sh
- Vim-nox - for easier VI and enhanced monitor look
- Midnight Commander - to manage files in the old-fashion way
- NTP Time Server Update - to keep server time updated
- bash as default shell - not sure about this, just yet.

- Boinc-client from ppa:costamagnagianfranco/boinc
- BoincTasks to remotely view and check jobs for all servers
- AndroBoinc to remotely view and check jobs from your Android
- Setup Boinc-client for remote control/SSH
- Setup Boinc-client for using multiple GPU's --> "cc_config.xml" 

TODO'S - THESE ISSUES NEEDS TO BE DONE/SOLVED
- CRON set to run shell scripts
- CRON for other jobs?? Backup??
- GRUB boot - change startup config --> no halt! boot no mather what's wrong!
- Static IP for 12.10 !!!
- Setup Boinc-client for special applications --> "app-info.xml" Set application specific instructions!?
- Setup Boinc-client for special applications --> "global_prefs-override.xml" Set status "work" from the beginning!?

Install Ubuntu Server 12.10! And then:

#1 
Command: # apt-get install build-essential linux-headers-`uname -r`

#2 
Command: # wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1210/x86_64/cuda-repo-ubuntu1210_5.5-0_amd64.deb

#3
Command: # dpkg -i cuda-repo-ubuntu1210_5.5-0_amd64.deb

#4
Command: # apt-get update

#5
Command: # apt-get install cuda-5-5

#6
Command: # export CUDA_HOME=/usr/local/cuda-5.5

#7
Command: # export LD_LIBRARY_PATH=${CUDA_HOME}/lib64

#8
Command: # PATH=${CUDA_HOME}/bin:${PATH}

#9
Command: # export PATH

#10
Command: # reboot -h now

#11
Command: # apt-get install linux-image-extra-$(uname -r) x11-xserver-utils mesa-utils

#12 
Command: # modprobe nvidia

#13
Command: # nvidia-xconfig --enable-all-gpus

#14
Command: # cp /etc/X11/XF86Config /etc/X11/xorg.conf

#15 Setup Boinc to use all GPU or selected GPU's:
Command: # vi /etc/boinc-client/cc_config.xml
<cc_config>
        <log_flags>
                
                <task>1</task>
                <sched_ops>1</sched_ops>
                <file_xfer>1</file_xfer>
                <app_msg_receive>1</app_msg_receive>
                <app_msg_send>1</app_msg_send>
                
                <cpu_sched_status>1</cpu_sched_status>
                <cpu_sched>1</cpu_sched>

                <gui_rpc_debug>0</gui_rpc_debug>
                <slot_debug>0</slot_debug>
                <std_debug>0</std_debug>
                <task_debug>0</task_debug>

        </log_flags>
        <options>

                <start_delay>10</start_delay>
                <use_all_gpus>1</use_all_gpus>
                <allow_remote_gui_rpc>0</allow_remote_gui_rpc> 
                <allow_multiple_clients>0</allow_multiple_clients>
        
        </options>
</cc_config>

#16 The file cc_config.xml needs a reboot. File will be read during bootup:
Command: # apt-get reboot -h now

#17 Install boinc-client and attach to projects using the cmd command line tool ;)


THIS TODO WILL BE UPDATED AND WILL IN THE END SHOW YOU HOW TO DO IT ALL.
I WILL POST AN UPDATED VERSION LATER TODAY thumb.gif
Edited by DanHansenDK - 8/13/14 at 3:06am
post #86 of 343
Thread Starter 
12:15 pm

Hi Guys!! OS installed. I'm about to start at the top of the list and build the "Headless CLI Ubuntu Linux Multiple GPU Boinc Server". But I just noticed that the Ubuntu 12.04.4 Update/CUDA5.5 .deb "Bug" seems to have been solved in the new CUDA version. CUDA6.0/Ubuntu 12.04 doesn't appear to have issues wink.gif Well, if this is true I will switch to 12.04 Server again. But, for now we will go on with the test, sticking to what we know will work! The task we have right now, is to test the heat when using 4 Asus GeForce GT640 Low Profile graphic cards, using ordinary case fans and the special fans that I got from my supplier wink.gif

Here's a couple of links, in case you wan't to see/read about the "Bug" that I'm talking about:
CUDA5.5/Ubuntu 12.04.4 issues. (look for ***)https://developer.nvidia.com/cuda-toolkit-55-archive
CUDA6.0/Ubuntu 12.04. No remarks or warnings: https://developer.nvidia.com/cuda-downloads

Quote:
Well, aside from the benefit of having dual channel memory operation with two modules, 2 Gig for each core will allow you to run NFS big tasks that are one of the best CPU points projects out there. If you are not concerned about NFS or Lattice or other large memory footprint projects, feel free to run the minimum of 1 Gig per core or less. Some projects take very little to be sure.
OK, I see. I checked BIOS, and "Dual Channel Memory" is active it says. Regarding NFS and Lattice, I'm not sure what this is. Sorry. I'll look it up wink.gif

Let's go on with the installation. I'm doing a version 1.1.3 of the ToDo at the same time wink.gif
Edited by DanHansenDK - 8/13/14 at 4:44am
post #87 of 343
Thread Starter 
17:22 pm --> Even more issues wink.gif

12.10 is not supported anymore and can therefore not be used for this anymore. I used weeks to change from 12.04 --> 12.10 because of the update "bug" mentioned earlier on and now it's "out of date" biggrin.gif




Well, that's life. I'll go right ahead with making a new ToDo using the 12.04. It should be possible using CUDA6.0. Haven't found any warnings or "bug's" yet. I just wanted to get going with the test, but there's no way around it I guess wink.gif
Actually this "problem" solves another problem wink.gif The issue regarding static IP in 12.10 will be solved when returning to 12.04 because the configuration works perfectly in 12.04. thumb.gif




So, let's go.
(had to rest a little, but now we are ready to go on wink.gif


Making new ToDo - using Ubuntu Server 12.04.5 and CUDA 6.0
19:09 pm
Code:
HEADLESS LINUX CLI MULTIPLE GPU BOINC SERVER - RACK-MOUNTED HEADLESS BOINC SUPER CRUNCHER
OS: UBUNTU SERVER 12.04.5 64Bit
CUDA: CUDA 6.0
VER.1.1.4 13.08.14 17:48:00

- LM-sensors - need it to run my shellscripts CPUTempWatchdog.sh,GPUTempWatchdog.sh,HDDTempWatchdog,FANcontrolWatchdog.sh
- Vim-nox - for easier VI and enhanced monitor look
- Midnight Commander - to manage files in the old-fashion way
- NTP Time Server Update - to keep server time updated
- bash as default shell - not sure about this, just yet.

- Boinc-client from ppa:costamagnagianfranco/boinc
- BoincTasks to remotely view and check jobs for all servers
- AndroBoinc to remotely view and check jobs from your Android
- Setup Boinc-client for remote control/SSH
- Setup Boinc-client for using multiple GPU's --> "cc_config.xml" 

TODO'S - THESE ISSUES NEEDS TO BE DONE/SOLVED
- CRON set to run shell scripts
- CRON for other jobs?? Backup??
- GRUB boot - change startup config --> no halt on error! boot no matter what's wrong!
- Setup Boinc-client for special applications --> "app-info.xml" Set application specific instructions!?
- Setup Boinc-client for special applications --> "global_prefs-override.xml" Set status "work" from the beginning!?

NO FINISHED YET - WORKING ON THIS PAGE

23:31 pm --> Hello friends wink.gif I can't promise anything yet, but after making a new ToDo and setting up the system using Ubuntu Server 12.04 & CUDA 6.0, I just tested one thing. I wanted to know how many graphic cards is accepted after installing the CUDA ToolKit wink.gif It's looks good, but let's not get exited just yet wink.gif
Code:
# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 33 C
        Gpu                         : N/A
        Gpu                         : 33 C
        Gpu                         : N/A
        Gpu                         : 34 C
        Gpu                         : N/A
        Gpu                         : 33 C


01:53 am --> Houston, we've got a problem!
As you can see above, all GPU's are accepted by the mobo. Boinc is running on the system, but the GPU's are not accepted! I got a couple of error messages when doing this step - step #11 in the ToDo from earlier on:
Code:
# apt-get install linux-image-extra-$(uname -r) x11-xserver-utils mesa-utils

Reading package lists... Done
Building dependency tree
Reading state information... Done

E: Unable to locate package linux-image-extra-3.13.0-32-generic
E: Couldn't find any package by regex 'linux-image-extra-3.13.0-32-generic'

Please help me solve this issue wink.gif I think it's the only problem that needs to be solved before the test can go on. I've used several hours editing/redoing the ToDo 12.04.4 --> 12.10 and then back to 12.04 with the new 12.04.5 update. So if anyone has got an idea how to fix the problem, please don't hesitate to do so wink.gif


11:08 am --> Status:
It looks like the new update 12.04.5 and CUDA6.0 has got issues as well. I've asked around at Ubuntu Forums, but it's not that easy when having problems with a totally new update (12.04.5). Only 7 days old! Ohh, I was just dreaming of being able to use 12.04 which I now so well for this project. It's just to bad there's all these issues.
I'm thinking, if nobody has got a solution to the problem, that I'll go on trying the combination of 13.10 and CUDA5.5 or 13.10 and CUDA6.0. Maybe there's not to many changes from 12.10 --> 13.10 . Then maybe I can get this command to work:

apt-get install linux-image-extra-$(uname -r) x11-xserver-utils mesa-utils

I'm talking to a few guys still, so maybe we do not have to give in just yet. It's to bad, I never saw this coming. Totally forgot about the fact that 12.10 had a short "live" and that I had to fight the same problems again. Not just yet wink.gif
Edited by DanHansenDK - 8/14/14 at 2:21am
post #88 of 343
Thread Starter 
20:29pm --> STATUS

OK, so far we've found no solutions to the issues regarding:
Code:
# apt-get install linux-image-extra-$(uname -r) x11-xserver-utils mesa-utils

Reading package lists... Done
Building dependency tree
Reading state information... Done

E: Unable to locate package linux-image-extra-3.13.0-32-generic
E: Couldn't find any package by regex 'linux-image-extra-3.13.0-32-generic'


This is why I'll now try and install Ubuntu Server 13.10 (12.10 not supported any more) and CUDA 5.5 to see if this will work. If the difference between 12.10 and 13.10 is not to big, there's a change it will work. Actually I'm getting a little tired of these issues.
I'm getting to the point where another distro sounds like a good idea... Or maybe even windows biggrin.gif YES Of course!! Let's spend another US$ 200,- on each Boinc Server so that we can run 1 peace of software wink.gif


STATUS

14.08.14 10:06:00pm
Problem! 12.10 not supported any more.
Try 12.04.5/CUDA6.0

13.08.14 11:31:00pm
12.04.5/CUDA6.0 Tested. NOT WORKING!!! Installation issues!
Try 13.10/CUDA5.5

14.08.14 10:06:00pm
13.10/CUDA5.5 Tested. NOT WORKING!!! Installation issues!
Try 13.04/CUDA6.0

15.08.14 01:41:00am
13.04/CUDA6.0 Tested. NOT WORKING!!! A LOT of installation issues!
Try 14.04.5/CUDA6.0

15.08.14 04:03:00am
Try 14.04.5/CUDA6.0. Solved the problem, but not in a very fancy way wink.gif

Try 12.04.3/CUDA5.5 (Before CUDA/Ubuntu 12.04.4 Bug!)


STATUS:

15.08.14 22:05:00 --> OK, I cracked it, but my G.. !! I'm not sure about the stability of this configuration, because I really had to "cut some corners". Anyway, here's the proof:

Time & date: 15 Aug 2014, 20:01:58
Boinc version: 7.2.42
CPU: GenuineIntel Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz [Family 6 Model 60 Stepping 3](4 processors)
GPU: [4] NVIDIA GeForce GT 640 (1023MB) OpenCL: 1.1
OS: Linux 3.13.0-34-generic

I think I'll test one other thing I've been thinking about and then if this doesn't work, I'll install a Desktop version instead and go on with the test. Several days has past fighting this, and this was not the plan. So I'll try this other thing and then go on with the test. Then we can get back to the issue of making a stable version of a "Headless Linux CLI Multiple GPU Boinc Server" wink.gif


STATUS:

18.08.14 06:41pm --> OK then... I had to wait for Asteroid's system to be up and running, so I couldn't test my new solution - the 14.04 combined with CUDA6.0. I was pretty sure it would work, but I couldn't know for sure.. So I waited!! Today Asteroid was up and running again so I was able to test it. It didn't work and it was no surprise to me, because there was a lot of short-cut's along the way.
So I made the next and second last attempt according to the "plan" which was to try the 12.04.3 version which should work along with CUDA5.5 (according to Nvidia's own "CUDA DL Zone Support"
I tried it, and there were only a few problems during the installation. So I tried to run Boinc without trying to correct these issues and it almost works as it's suppose to do. What we need is the 3 "extra" GPU's to "get into game" wink.gif But, this can be due to a lot of stuff or at least that's what I think.
This is why I'm writing again. Maybe one of you has got an idea. Is it perhaps a motherboard issue (there was plenty of them installing the mobo from the beginning), or is it maybe due to the issues during the installation? Or maybe it's a whole other reason which I haven't been thinking about!?!?

Please let me hear a few of your ideas wink.gif

During setup, when following my ToDo version 1.1.5 and installing CUDA5.5 I ran into these issues when using this command:
Code:

apt-get install linux-image-extra-$(uname -r) x11-xserver-utils mesa-utils

Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package linux-image-extra-3.8.0-29-generic
E: Couldn't find any package by regex 'linux-image-extra-3.8.0-29-generic'


Or this one. There's only these two issues in total! Not many compared to the s... I've been dealing with the last 6 days or so wink.gif
Code:

# nvidia-xconfig --enable-all-gpus

WARNING: Unable to locate/open X configuration file.

WARNING: Unable to parse X.Org version string.



Here's how it looks wink.gif
4 jobs being chewed by the CPU and 1 being crunched by 1 out of 4 GPU's wink.gif We are almost there guys thumb.gif
Code:
# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 51 C  <-------- THIS ONE IS RUNNING AT FULL SPEED ;)
        Gpu                         : N/A
        Gpu                         : 30 C  <-------- zZzzZz zZZzZ zzZz
        Gpu                         : N/A
        Gpu                         : 31 C  <-------- zZzzZz zZZzZ zzZz
        Gpu                         : N/A
        Gpu                         : 30 C  <-------- zZzzZz zZZzZ zzZz

# sensors
coretemp-isa-0000
Physical id 0:  +66.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:         +66.0°C  (high = +80.0°C, crit = +100.0°C)  <-------- THIS ONE IS RUNNING AT FULL SPEED ;)
Core 1:         +64.0°C  (high = +80.0°C, crit = +100.0°C)  <-------- THIS ONE IS RUNNING AT FULL SPEED ;)
Core 2:         +63.0°C  (high = +80.0°C, crit = +100.0°C)  <-------- THIS ONE IS RUNNING AT FULL SPEED ;)
Core 3:         +59.0°C  (high = +80.0°C, crit = +100.0°C)  <-------- THIS ONE IS RUNNING AT FULL SPEED ;)

fan1:            0 RPM  (min =    0 RPM)  ALARM
fan2:         8490 RPM  (min =    0 RPM)  ALARM
fan3:            0 RPM  (min =    0 RPM)  ALARM
fan4:            0 RPM  (min =    0 RPM)  ALARM
fan5:            0 RPM  (min =    0 RPM)  ALARM



Edited by DanHansenDK - 8/18/14 at 10:06am
post #89 of 343
Thread Starter 
STATUS:
20.08.14 01:55am

Hello friends wink.gif We got it !!!! Solved the b..... problem thumb.gif

And now we are running Ubuntu 12.04 which means the "time of static IP trouble" is over !! Here's the proof:
Code:
# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 52 C
        Gpu                         : N/A
        Gpu                         : 56 C
        Gpu                         : N/A
        Gpu                         : 57 C
        Gpu                         : N/A
        Gpu                         : 54 C




OK, there's still a few issues I need to solve. This is an "elderly" version of 12.04. It's the 12.04.3 update to avoid the problem with CUDA5.5. But I'll fix it later on. Now it's on with the test.

I'm testing 1 CPU and 4 GPU's at full speed right now. In this test I'm using the "standard" fans. This is how it looks after about 10min. or so:

10 min. of testing at 100%:
Code:
# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 56 C
        Gpu                         : N/A
        Gpu                         : 57 C
        Gpu                         : N/A
        Gpu                         : 65 C
        Gpu                         : N/A
        Gpu                         : 56 C

20 min. of testing at 100%:
Code:
# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 67 C
        Gpu                         : N/A
        Gpu                         : 57 C

Notice that it's card in socket d2 (third card in the row - and the second last). Actually I would have thought, that the card next to the 2U PSU would have been the one with issues. There's not much room between the card's heatsink and the PSU and therefore not much air. There's no fan pointing directly at it either! This is why tests are such a great thing wink.gif

AeroCool temp. sensor shows the same thing. But the temp. lies 5-6 degrees C below the system output. I guess I've chosen a bad spot to place the sensors. That's too bad, because I did a little exta making pictures of the installation/mounting of the sensores, using strips, insulators etc. Remember, this will all be part of the complete tutorial I'll do wink.gif

30 min. of testing at 100%:
Code:
# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 66 C
        Gpu                         : N/A
        Gpu                         : 57 C

40 min. of testing at 100%:
Code:
# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 67 C
        Gpu                         : N/A
        Gpu                         : 57 C

50 min. of testing at 100%:
Code:
# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 57 C
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 66 C
        Gpu                         : N/A
        Gpu                         : 57 C

After 1 hour of testing CPU & 4 GPU's at 100% - this is the result, using the standard fans and no modifications to the case:
Code:
# sensors
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +78.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:         +78.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:         +73.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:         +69.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:         +68.0°C  (high = +80.0°C, crit = +100.0°C)

# nvidia-smi -a |grep Gpu
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 58 C
        Gpu                         : N/A
        Gpu                         : 66 C
        Gpu                         : N/A
        Gpu                         : 57 C


I think we can say the temp. using the standard fans has been found thumb.gif
The CPU temp. is a little high as well. I hope this will all go away when changing the to the new type of fans.

Another thing. Just learned that there's 4 chassis/case fan connectors on this mobo. But, again the fan's are running according to CPU temp. If only the manufacturers would leave these very nice add-on's alone. Stop making all kind of funny "set-up's" to make it easy! You are making it difficult to use the board for more than one kind of operation. If these 4 extra fan connectors had been working alone, it would have been possible to use fan control from lm-sensors to make the most beautiful script to to control and watch the fans. This is the reason I use AeroCool. Well, I've written about that several times, so let's not go down road again, but it's just too bad...

On with the test. Let's change the fans and se the difference thumb.gif
Edited by DanHansenDK - 8/19/14 at 6:17pm
post #90 of 343
Wow, a lot of work! Glad to see it's starting to pay off!

biggrin.gif
Blue Beast
(13 items)
 
  
CPUMotherboardGraphicsRAM
W3670 4.0 GHz (HT On) 1.345v 24/7 Asus P6X58D Premium DUAL EVGA GTX 560 Ti SC 1G in SLI 12G Corsair Dominator GT2000MHz 
Hard DriveOptical DriveOSMonitor
Kingston V300 120G SSD, 2x 1TB Barracuda HD's LG BlueRay/LightScribe Burner Windows 7 Pro 64Bit 3x24", 1x22" LCD 
KeyboardPowerCaseMouse
Wireless Logitech K320 ULTRA X4 1200W Custom Danger Den LDR-29 Wireless Logitech M310 
Mouse Pad
Custom 
  hide details  
Reply
Blue Beast
(13 items)
 
  
CPUMotherboardGraphicsRAM
W3670 4.0 GHz (HT On) 1.345v 24/7 Asus P6X58D Premium DUAL EVGA GTX 560 Ti SC 1G in SLI 12G Corsair Dominator GT2000MHz 
Hard DriveOptical DriveOSMonitor
Kingston V300 120G SSD, 2x 1TB Barracuda HD's LG BlueRay/LightScribe Burner Windows 7 Pro 64Bit 3x24", 1x22" LCD 
KeyboardPowerCaseMouse
Wireless Logitech K320 ULTRA X4 1200W Custom Danger Den LDR-29 Wireless Logitech M310 
Mouse Pad
Custom 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Overclock.net BOINC Team
Overclock.net › Forums › Overclockers Care › Overclock.net BOINC Team › Project "Headless Linux CLI Multiple GPU Boinc Server" - Ubuntu Server 12.04.4/14.04.1 64bit - Using GPU's from GeForce GT610/GT640/GTX750ti/+ to crunch data.