Overclock.net › Forums › Overclockers Care › Overclock.net Folding@Home Team › SMP Folding: Understanding and Improving
New Posts  All Forums:Forum Nav:

SMP Folding: Understanding and Improving

post #1 of 23
Thread Starter 
There have been many questions that have gone around about the speed of different WUs. I mainly use SMP folding, but some of the things I have learned affect all types of folding. This will not answer how to set up folding, there are already some great FAQs on how to get the actual program set up and running. This will hopefully answer other SMP folding questions you might have.

Questions Like:

Why is WU A so much faster than WU B?
Why is WU A worth more points than WU B?
Why does CPU A fold faster then CPU B at the same Mhz?
How do different settings affect the speed of my folding?
How can I get the most out of my system?

I have been doing some testing to try and answer some of these questions. Hopefully what I have learned will help some of you. I can’t answer all folding questions, as I still have many of my own. But my goal is to learn as much as I can and then pass it on.


To understand SMP folding, you should understand how folding works first. I will try to put most of my explanation into simple to understand terms, which may not give justice to the true complexities of folding, but will help you understand what is happening while you are folding.

When you launch the single core client, you launch what we will call the Client. The Client program can be a GUI client or a console client. The Client does not use much of any CPU time, but it keeps track of what is going on with the folding and keeps everything running smoothly.

The first thing the Client program does is check a file called queue.dat which we will just call the Queue. The Queue stores what WU your system is currently working on and what Stanford server assigned you the WU. It also stores things like when you were assigned the WU, and when it is due to be returned. If there is currently no work assigned to your system, the Queue will be empty, and the Client will then send a message up to Stanford asking for a new WU to work on. Stanford has 3 main Assignment Servers. These servers recieve your request and then assign you to a Work Server based on parameters such as the Client you are running and other setting you chose during the Client setup. The Work Server then assigns you a WU and keeps track of what was assigned to you and when it expects data to be returned.

Now that the Queue has a WU to work on, the Client then check to see if you have the correct Core to process the WU. The Core contains the code needed to process any WUs that are designed for the client you are running. If you don’t have the correct Core, the Client downloads it from Stanford.

Now that you have the correct Core, and a WU to work on, the Client launches the Core with a set of arguments included. These arguments tell the Core which WU to process by passing it a number of the current WU you should be processing and the location of the WU data. The actual WU data is stored within a fold called WORK which is a sub directory of your folding directory. The Core program begins to process the data stored in the WU. This is where rubber meets the road and you CPU usage spikes. The Core will store the current status of its work based on a time interval you chose during the setup of the folding client. It also reports back to the Client program each time if completes 1 percent of the WU for the console client or each step of the WU if you are running the GUI client. This continues till the Core informs the Client that it has completed 100 percent of the WU.

Once the Core has finished the current WU, it stores its final output in a file in the work directory and then shuts down. At this point your CPU usage drops and the Client program starts the submission process. The Client gets the work servers address from the Queue and then tries to upload the final output file back to the work server. If all goes well, the work server will be up, and after a short time you will get the thank you for your submission message. If the work server is not up, the submission will fail. At this point the Client again requests a new WU to work on from an available work server. If the previous WU did not upload, it just moves on to the next WU and then attempts to submit previous finished WUs until they finally go through.

Once the Client gets a new WU to work on, the entire process starts over again.

Enter SMP folding……………..
SMP (Symmetric multiprocessing) dates back to as early as 1961. It was developed as a way to utilize multiple processors either on one physical computer, or multiple computers connected together by some form of interconnect.

SMP folding works on some of same principals as the single folding client. The SMP folding client starts by launching the Client just like above, but from there, things get much more complex. The Client goes through the process of checking the Queue and getting a WU if needed. At this point it launches a program called MPIEXEC. This program uses a technology called MPI (Message Passing Interface). On the windows version of MPI, a program called smpd.exe is also used, but this is bypassed by the linux and unix implementations of MPI.

I will not go too much into detail about MPI as it is complex in nature and there are entire books devoted to the subject. Simply put, it allows multiple threads to monitor and communicate with each other in a very structures way.

Now that the Client program has launch the MPI program, the MPI program then launches 4 individual Cores. The Cores get distributed equally between the processors on your system. These 4 Cores are much like the single Core from above; accept they are in constant communication with each other. The cores work together on the same WU. Each core performs calculations independently, but must still communicate and stay in sync with the other 3 cores. If 1 core gets behind, locks up, or has an error, the other three cores get out of sync and you get a crashed client, or worse, a crashed WU.

An interesting thing (interesting to me that is, if it is not interesting to you, please feel free to skip ahead) to note about the communications between the Cores is that they still use the network to communicate. This is why errors can occur when your system renews its DHCP lease on an IP address, or your wifi connection drops or is searching for a better host. If you look at the command line for one of the Cores in linux you will see what I am talking about.

Code:
/mpiexec –np 4 –host 127.0.0.1 ./FahCore_a1.exe –dir work/ -suffix 05 
–priority 96 –checkpoint 10 –forceasm –lifeline 3624 –version 591
Code:
/mpiexec: This is the MPI program that makes SMP possible.
 
-np 4
This tells the MPI program how many Core threads to spawn. I have actually
disassembled the folding client and changed this setting, but the WUs
themselves are only designed to work on 4 Cores, so after spawning the
number of cores I specified, the client had an error and shut down.
 
-host 127.0.0.1
This is the ip address of the internal loopback of your system. This is hard
coded into the client. If changed, you could in theory use multiple computers
to process a single WU rather than just one computer. If you monitor the
network traffic across you internal loopback, you can see the amount of
communication that is occurring between each of the Cores.
 
./FahCore_a1.exe
This is the Core. Each folding client has a differently core so the windows
Core will be different than the linux Core.
 
-dir work/
This is the name of the directory where the actual WU data is stored.
 
-priority 96
This wont mean much to you unless you spend a lot of time reading about
the Linux Scheduler and how it prioritizes interactive vs non-interactive
threads. Basically the Scheduler internally uses a scale of 1 to 140 with
below 100 being non-interactive threads and above being interactive
threads.
 
-checkpoint 10
This is the setting which tells the Core how often to save its work. This is
set when you configured your client.
 
-forceasm
This controls what types of loops to use during processing. –forceasm will
tell the client to ignore issues from the previous run, and use SSE
optimized calculations.
 
-lifeline 3624
This give the Core the process id of the Client so that it can keep it
updated on the Cores current status.
 
-version 591
This is the folding Core version. The version of the Core must match the version the WU was designed to run on.
Needless to say, there is a lot more going on under the hood in with SMP folding compared to a single Core client.

If you are running a dual core cpu, two of the Cores will be assigned to each cpu core. If you are running a quad core cpu, each cpu core gets its own Core process.

Now for the fun stuff….. fun being a subjective word of course.

Taking into account the complexities of SMP folding, you can see why the speed of your system and the number of cpu cores you have will directly impact your folding production.

But wait….. There’s more!!

The type of WU can also impact the performance of your folding. Each WU is designed to perform different types of calculations.

Lets dig into what makes a WU, and what makes them different.

In simple terms, a WU (Work Unit) is a set of calculations that are actually part of a much larger calculation. These calculations are used to identify how proteins respond to different types of interactions at the atomic level. The more we can learn about what makes a protein fold they way it does, and what influences its structure, the better we can understand what causes the protein “malfunctionâ€. These malfunctions are what lead to some of the diseases Stanford is currently studying. This may be a overly simplistic view of what exactly protein folding is, but that is beyond the realm of my expertise, so I will leave it at that.

If you have ever tracked which WUs you are working on, you have probably noticed the steps that are listed with each WU. The two WUs that I use for my testing have very different numbers of steps. The WU 2608 has 500,000 steps while WU 3050 has 10,000,000 steps. That does not mean that WU 3050 should take 20 times longer than WU 2608. What it means the calculations included in each WU are very different. An overly simple way to look at it is WU 2608 may have to do 500,000 calculations of complex multiplication, while WU 3050 is 10,000,000 calculations of simple addition. So comparing the steps involved with each WU is not only irrelevant, but can be misleading if taken as anything more than just the WUs way of keeping track of its own completion.

Simply put….
ALL WU ARE CREATED EQUAL!!!! ALL COMPUTERS ARE NOT!!!

What does that mean? They sure don’t take the same amount of time of complete. They don’t seem to give the same amount of points. How can it be that all WUs are created equal?

To understand this you first have to understand how a WU is created and how its points are assigned. Stanford has what they call benchmark computers. These computers are used to decide how much a WU should be worth.

For the single folding client, WUs are benchmarked on a 2.8 Ghz Pentium 4 (Stanford does not overclock there benchmark rigs). The WU is then assigned 110 points for each 24 hours the WU takes to process. So a WU that is worth 550 points took 5 days to complete on the Stanford single client benchmark computer. A WU worth 330 points took 3 days to complete. Obviously if you have a faster computer, you will complete the WU in a faster time. So if your computer can complete the 550 point WU in 1 day, you will be doing 550 PPD (Points Per Day).

For SMP WUs, I am not sure what baseline points were used. Whether bonus points are awarded because it is SMP is a question for Stanford to answer. But points are awarded equally across all SMP WUs regardless based on the benchmark machine. What we do know is that it is a 4 cpu core computer, either 2 dual cores, or a single quad. It does have 8meg of L2 cache, 2meg per core. Stanford does not release the exact specifics of there benchmark machine for a number of reasons that you can read about in their forums. But WUs are benchmarked on the same setup and points assigned equally across all SMP WUs based on there completion times on the benchmark machine.

So why does one computer produce more points per day than another computer?

There are a lot of things that affect the points production on a computer. Computers can be configured very differently. These changes in configurations may have a huge impact on the performance of one WU, and have little impact on the performance of a different WU.

Below I will go through some of the configurations I have tested and hopefully answer some questions you may have about the performance of WUs. All testing was done on a single computer in order to remove variations that might be seen between different systems. Although your system may not match mine, many of the things I discovered can be applied to the tuning of your system. I used my best folding rig so that I could test a larger range of setting without hitting an overclocking ceiling too early. I did test some fairly low settings as well in order to show variations in performance.

The Test Rig:

Intel Q6600 core 2 Quad with 8meg L2 Cache
Asus P5K Deluxe Motherboard
2GB G.Skill DDR2-1000 HZ Memory
Vapochill Lightspeed Phase Change CPU Cooling
Ubuntu 7.04 running without a desktop GUI such as Gnome or KDE.

Video Card, Hard Drive, PSU are irrelevant to the testing as they have no real affect on your SMP folding as long as they are in good working order and don’t cause instability in the system.

The Test WUs:

I used SMP projects 2608 and 3050 for the testing. There are many different WUs, and these two are just a sampling of WUs that I have found can have very different performance on the same computer depending on your settings. I plan on putting a few more WUs through the same tests, but these are enough for now to illustrate my findings.

Testing process:

For each setting I allowed the WU to process 3 percent of the WU and then calculated the average time per percent. I do find it interesting that each percent was never more than a second off from the others because there was basically no load on the system other than folding. The 1 second variation is most likely the effect of rounding to the nearest whole second when reporting the time of completion between each percent.

Test 1: FSB and CPU Multiplier
Test results of running each WU at a different FSB setting and CPU multiplier. I wanted to answer the question of whether or not the final Ghz the CPU was running at would determine the performance of the WU or would FSB play a part in the performance.



Test 1: Results
The FSB had much more of an effect on WU 2608 than it did on WU 3050. A good example of this is if you compare the PPD production at 3.2Ghz. Running the system at 400 X 8 vs 356 X 9 showed only a about a ~1% increase in PPD on WU 3050. The same setting on WU 2608 showed a ~6% increase in performance using the 400mhz bus at the same 3.2Ghz CPU speed. The difference at 3.6Ghz was ~2% for WU 3050 vs ~7% for WU 2608.



Test 2: FSB:RAM Divider
Test each WU at the same FSB and CPU Multiplier, but change the FSB:RAM Divider. I wanted answer the question of whether or not the speed the memory was at affected the performance of the WUs if the processor speed was unchanged.



Test 2: Results
Memory speed had almost no effect on WU 3050. There was only a 1 second difference between the two settings on WU 3050 which could have just been due to rounding. WU 2608 on the other hand, showed a drastic increase in performance when running the RAM at 1184mhz vs 854mhz. The difference was ~12% increase in performance of WU 2608 at the higher memory speed.



Test 3: RAM Timing
Test was run in the same manner as Test 2, but instead of changing the memory speed, I left it at 4:5 or 960mhz. I wanted to know if the actual timing of the memory made a difference at all. I had been told that memory timing did not have an impact on real world applications using an Intel processor. That may be true, but folding is not what some would consider a real world application.



Test 3: Results
WU 3050 showed no real gain with the tighter timing beyond a margin of rounding. WU 2608 on the other hand showed a different result of almost a ~4% increase in performance with the tighter timing.


Conclusions:
The testing shows that for at least these two WUs, setting that may have no impact on one WU, can have considerable impact on another. Finding the right balance can prove to be difficult. When searching for the highest overclock, we sometimes sacrifice FSB for a higher multiplier. Other times we sacrifice overall CPU mhz in order to hit a higher FSB. Tighter timing can also be difficult to get at higher memory speeds. Because you cant pick the WUs that will perform best on your system, the best you can do is go for the highest FSB and Memory speeds, while sacrificing as little of the overall CPU clock speed and memory timing as possible.

This may not be the answer to what is the best setting to have when folding. But it does help answer some questions about why your system performs better or worse on some WUs.

I can tell you that with my FSB set at 450, CPU Multiplier at 9, Memory at 1125mhz with 4-4-4-6 timing, WU 2608 produced ~7% more ppd than WU 3050. WU 2608 is usually considered an undesirable WU to get, but was giving me 4.3K ppd at these settings.

So are we done yet? Not yet.
One more set of findings that are specific to the Quad Core folders out there.

I run multiple Quad Core folding rigs. Native Linux SMP folding is hands down, without a doubt, the best performing folding client you can have. But two of my folding rigs must run Windows because they are used for some other tasks that I need Windows to perform.

The next best thing to Native Linux SMP is using VMWare over Windows and then running the Linux SMP client inside VMWare. One draw back of VMWare is it can only utilize two CPU cores. To get the most out of your Quad Core system you have to run two VMWare sessions and then set the affinity of each to two of your processor cores. While running like this I was a bit surprised to see that the Quad Core running Windows with two VMWare clients was producing the same amount of points per day as a Native Linux SMP rig running on identical settings.

How can this be?

There are a few things that contribute to this. The SMP architecture does not scale perfectly with the number of cores you have. Also the Quad Cores that are out today are not truly Native Quad Cores. They are two dual cores bound at the hip working together. There are inherent inefficiencies that multiple cores have such as how each set of shared L2 cache is available to only 2 of the 4 cores on the die.

If SMP scaled perfectly with the number of CPU cores, and the two dual cores on one die did not affect performance, running two SMP clients at once on one system would produce the same if not less points per day. This is not the case.

So I decided to try running 2 SMP clients directly on a Native Linux box. I launched each of the clients with the command that would set there affinity to only two cores. The first client was set to core 0 and 1, the second to core 2 and 3. This would keep each SMP client running on just one of the dual cores CPUs that make up the Intel Quad Core processor.

Test 4: Running single vs multiple SMP clients on one computer.
I set up two folding directories and installed the SMP client in each. This is done much the same way you set up two single core folding clients on a dual core rig. You must assign each client a different Machine ID.

To launch each client I used the following commands:

Code:
 
taskset –c 0,1 ./fah5 –forceasm –local
taskset –c 2,3 ./fah5 –forceasm –local


Test 4: Results
The time taken to complete a WU with just one client running on 4 cores was 8.69 hours. If the single SMP client was perfectly efficient, running with 2 SMP clients should have taken at least 17.38 hours each. Instead they took only 14.5 hours each. This is why the 2 VMWare session were holding there own compared to the Native Linux. Running with 2 SMP clients in Native Linux allowed me to complete 3.3 WUs per day instead of the 2.8 WUs on 1 SMP client. This gave the single computer almost an 800 ppd increase in production. Depending on the WUs my computer gets, I have been running at over 5K ppd on my main folding rig.

Conclusions: SMP folding has tight deadlines due to the fact that they want the results back quickly. All SMP WUs have a preferred and final deadline. These two deadlines are important to understand. The final deadline is the cutoff point in which the WU will no longer give points for completion. If you were folding just for points, and did not care to much about the science, this is the only deadline you would need to worry about. But the science should matter. This is where the preferred deadline comes in. The preferred deadline will be anywhere from 36 hours to 4 days from when you start processing the WU. Some WUs that are known to be slow on many computers have the preferred and final deadline both set to 4 days.

If you don’t complete a WU before the preferred deadline, but complete it before the final deadline, you get the same points.
BUT…. IMPORTANT TO THE SCIENCE….
At the point in which the preferred deadline passes, if the WU results have not been submitted, the Stanford work servers goes ahead and assigns the WU you are working on to someone else. They do this because there is a chance that your WU may have crashed, will not finish in time, or for whatever reason, wont be submitted back to Stanford. It is still important to finish the WU even if you don’t hit the preferred deadline because you will still most likely get it submitted before the next person finishes the WU. In the end, this is a waste of folding power and should be avoided if at all possible.

Because of this, the only time I run multiple SMP clients on one computer is if it can still complete any WU it starts within 24 hours. This is faster than even the minimum 36 hour preferred deadlines, but I would rather sacrifice the points before having someone else working on the same WU I am working on. Either two VMWare Linux clients running on a Windows Quad Core, or two Native Linux clients should be able to do this on a well overclocked rig.

I hope this answered some of the questions you may have about how SMP folding works, and why some WUs seem to be worth less than others. If some of my testing helps you tweak your setting to produce more work, even better!

And remember...
Fold on whatever you can, whenever you can, and fold for team 37726!!!!
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
post #2 of 23
holy cow, this explains everything!

excellent work! rep +
mini ITX
(16 items)
 
  
CPUMotherboardGraphicsRAM
Intel i5-2500K @ 3.8GHz ASRock Z77E-ITX XFX Radeon HD 6950 2GB Kingston Hyper X 1600 RAM (2X4GB) 
Hard DriveHard DriveCoolingCooling
Samsung 830 128GB Western Digital WD2000KS 2TB Antec Kuhler 620 3X Xigmatek 120mm (UV), 1X Xigmatek 140mm (UV) 
OSMonitorMonitorKeyboard
Windows 7 Ultimate x64 Apple 23" Cinema Display HD (1) Apple 23" Cinema Display HD (2) Dell Black Kid 
PowerCaseMouseAudio
Seasonic SS-660XP2 660W Cubitek Mini Tank Razer Imperator 2012 Klipsch Promedia 5.1 w/ Pioneer VSX-520 receive... 
  hide details  
Reply
mini ITX
(16 items)
 
  
CPUMotherboardGraphicsRAM
Intel i5-2500K @ 3.8GHz ASRock Z77E-ITX XFX Radeon HD 6950 2GB Kingston Hyper X 1600 RAM (2X4GB) 
Hard DriveHard DriveCoolingCooling
Samsung 830 128GB Western Digital WD2000KS 2TB Antec Kuhler 620 3X Xigmatek 120mm (UV), 1X Xigmatek 140mm (UV) 
OSMonitorMonitorKeyboard
Windows 7 Ultimate x64 Apple 23" Cinema Display HD (1) Apple 23" Cinema Display HD (2) Dell Black Kid 
PowerCaseMouseAudio
Seasonic SS-660XP2 660W Cubitek Mini Tank Razer Imperator 2012 Klipsch Promedia 5.1 w/ Pioneer VSX-520 receive... 
  hide details  
Reply
post #3 of 23
very thorough, very good read.


Awesome job, you covered everything
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i-4790k GA-Z97X-UD5H-BK eVGA GTX 970 FTW 8GB G.Skill Trident X 
Hard DriveOSMonitorPower
Intel 180GB SSD Windows 7 Pro Dell Ultrasharp 24" eVGA 850 G2 
Case
Corsair 750D 
  hide details  
Reply
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i-4790k GA-Z97X-UD5H-BK eVGA GTX 970 FTW 8GB G.Skill Trident X 
Hard DriveOSMonitorPower
Intel 180GB SSD Windows 7 Pro Dell Ultrasharp 24" eVGA 850 G2 
Case
Corsair 750D 
  hide details  
Reply
post #4 of 23
Ya, very nice work knitelife, good information, + for you ! I didn't realize how much fsb and ram speed/timings could affect folding. Since many things see little performance boost with more fsb, I've just been using stock multi assuming my cpu speed was the only major factor. I get around the same performance from my rig, and I can run my memory less volts/heat etc., you know the drill.

But from seeing your test results, I think I'll have to change things here . Time to drop multi, crank it up, and tighten some timings I guess.

Couple of questions I have perhaps you can enlighten:
1) From my thread about the 2608/9 WU's you were also in on, we were talking about single vs dual channel. I was recalling that I didn't think it made much difference, I was just adding back RMA'd ram and don't remember adding the extra gig/+ dual channel helping much, but I wasn't really documenting anything, so I don't know how much diff. My question is, did you happen to note the effects running single channel vs dual? This could prove useful info, to those building barebone folding farm rigs, knowing they could just use one stick and have no ill effect.

2) OK, about the timings thing, raises another question I have. I have always heard that CAS will affect performance, by far the most in Intel rigs of all the timings. I've heard it go so far as to say that the other timings, may have almost no impact on performance. So did you try other timings? Like for example, is there significant difference between running 4-4-4-12, as opposed to say 4-4-3-5?
    
CPUMotherboardGraphicsRAM
I7 920 D0 @4GHz Asus P6X58D-E evga gtx260 896mb core216 superclocked 6GB Kingston HyperX 
Hard DriveOptical DriveOSMonitor
4x WDRaptor 74G Raid0 BenQ DW1655 Win7 Home Premium 64bit Samsung 226BW 
KeyboardPowerCaseMouse
Logitech G15 Seasonic S12 600W Antec Nine Hundred Logitech G7 
Mouse Pad
S&S Steel 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
I7 920 D0 @4GHz Asus P6X58D-E evga gtx260 896mb core216 superclocked 6GB Kingston HyperX 
Hard DriveOptical DriveOSMonitor
4x WDRaptor 74G Raid0 BenQ DW1655 Win7 Home Premium 64bit Samsung 226BW 
KeyboardPowerCaseMouse
Logitech G15 Seasonic S12 600W Antec Nine Hundred Logitech G7 
Mouse Pad
S&S Steel 
  hide details  
Reply
post #5 of 23
Thread Starter 
Quote:
Originally Posted by RoscoeMcGurk View Post
Ya, very nice work knitelife, good information, + for you ! I didn't realize how much fsb and ram speed/timings could affect folding. Since many things see little performance boost with more fsb, I've just been using stock multi assuming my cpu speed was the only major factor. I get around the same performance from my rig, and I can run my memory less volts/heat etc., you know the drill.

But from seeing your test results, I think I'll have to change things here . Time to drop multi, crank it up, and tighten some timings I guess.

Couple of questions I have perhaps you can enlighten:
1) From my thread about the 2608/9 WU's you were also in on, we were talking about single vs dual channel. I was recalling that I didn't think it made much difference, I was just adding back RMA'd ram and don't remember adding the extra gig/+ dual channel helping much, but I wasn't really documenting anything, so I don't know how much diff. My question is, did you happen to note the effects running single channel vs dual? This could prove useful info, to those building barebone folding farm rigs, knowing they could just use one stick and have no ill effect.

2) OK, about the timings thing, raises another question I have. I have always heard that CAS will affect performance, by far the most in Intel rigs of all the timings. I've heard it go so far as to say that the other timings, may have almost no impact on performance. So did you try other timings? Like for example, is there significant difference between running 4-4-4-12, as opposed to say 4-4-3-5?
I am happy you guys are finding this informative. Your two questions are great questions. I will run those tests tonight and add the results to the first post.

I also ordered a core2duo today with only 2meg L2 cache so I can do some tests with it as well. It wont go to waste since it will be an upgrade to my daughters computer so I will put it in there when I am done with it.
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
post #6 of 23
Wow. Time for me to do some changing.
Main
(13 items)
 
  
CPUMotherboardGraphicsRAM
Core i5 3570K ASRock Z77 Pro3 Galaxy 560Ti G.Skills DDR3 
Hard DriveOSMonitorPower
WD Cavier 250GB + SimpleTech 320GB EXHDD Windows 7 x64 Samsung 19" SyncMaster 940BW OCZ 600w GameXStream 
Case
COOLER MASTER Centurion 5 
  hide details  
Reply
Main
(13 items)
 
  
CPUMotherboardGraphicsRAM
Core i5 3570K ASRock Z77 Pro3 Galaxy 560Ti G.Skills DDR3 
Hard DriveOSMonitorPower
WD Cavier 250GB + SimpleTech 320GB EXHDD Windows 7 x64 Samsung 19" SyncMaster 940BW OCZ 600w GameXStream 
Case
COOLER MASTER Centurion 5 
  hide details  
Reply
post #7 of 23
Thread Starter 
Quote:
Originally Posted by TaiDinh View Post
Wow. Time for me to do some changing.
I am looking forward to getting the E6400 to do some more testing with the lower level L2. I suspect that because of the lowere cache, memory performance and bandwidth will be even more important than it is on CPUs with 4meg L2. Should be interesting.
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
Folding Rigs
(13 items)
 
  
CPUMotherboardGraphicsRAM
AMD or Intel Anything with a lot of GPU Slots PPD Cruncher Just enough to fold. 
Hard DriveOptical DriveMonitorPower
Smallest Possible None None Whatever it takes 
Case
Bare Naked 
  hide details  
Reply
post #8 of 23
Quote:
Originally Posted by Knitelife View Post
I also ordered a core2duo today with only 2meg L2 cache so I can do some tests with it as well. It wont go to waste since it will be an upgrade to my daughters computer so I will put it in there when I am done with it.
Good idea, I am certain you will find big difference on cache, at least for some WU's. In my thread, you saw how big of a difference I was getting on a 2608. My 6600 was doing it significantly faster than my 6400, both rig same mem timings and same cpu clock, and the 6400 even had the advantage of higher fsb. But the 6600 still did the WU much faster (approx. 12min vs 18 per frame), the cache seemed to have a huge impact. I think the 2608 is gonna be a long one, for anybody with under 4mb cpu cache.

Look forward to your results .
    
CPUMotherboardGraphicsRAM
I7 920 D0 @4GHz Asus P6X58D-E evga gtx260 896mb core216 superclocked 6GB Kingston HyperX 
Hard DriveOptical DriveOSMonitor
4x WDRaptor 74G Raid0 BenQ DW1655 Win7 Home Premium 64bit Samsung 226BW 
KeyboardPowerCaseMouse
Logitech G15 Seasonic S12 600W Antec Nine Hundred Logitech G7 
Mouse Pad
S&S Steel 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
I7 920 D0 @4GHz Asus P6X58D-E evga gtx260 896mb core216 superclocked 6GB Kingston HyperX 
Hard DriveOptical DriveOSMonitor
4x WDRaptor 74G Raid0 BenQ DW1655 Win7 Home Premium 64bit Samsung 226BW 
KeyboardPowerCaseMouse
Logitech G15 Seasonic S12 600W Antec Nine Hundred Logitech G7 
Mouse Pad
S&S Steel 
  hide details  
Reply
post #9 of 23
Quote:
Originally Posted by Knitelife View Post
I am looking forward to getting the E6400 to do some more testing with the lower level L2. I suspect that because of the lowere cache, memory performance and bandwidth will be even more important than it is on CPUs with 4meg L2. Should be interesting.
So far, I have compared that the E6600 folds better over the E6400. I am guessing it is because of the extra 2MB cache that the E6600 have.

I fold somewhere between 21~23 minutes per 1% on many WUs at 2.8Ghz
I have read many E6600ers fold around 13 ~16 minutes at 3.0Ghz.

That's just my estimate.
Main
(13 items)
 
  
CPUMotherboardGraphicsRAM
Core i5 3570K ASRock Z77 Pro3 Galaxy 560Ti G.Skills DDR3 
Hard DriveOSMonitorPower
WD Cavier 250GB + SimpleTech 320GB EXHDD Windows 7 x64 Samsung 19" SyncMaster 940BW OCZ 600w GameXStream 
Case
COOLER MASTER Centurion 5 
  hide details  
Reply
Main
(13 items)
 
  
CPUMotherboardGraphicsRAM
Core i5 3570K ASRock Z77 Pro3 Galaxy 560Ti G.Skills DDR3 
Hard DriveOSMonitorPower
WD Cavier 250GB + SimpleTech 320GB EXHDD Windows 7 x64 Samsung 19" SyncMaster 940BW OCZ 600w GameXStream 
Case
COOLER MASTER Centurion 5 
  hide details  
Reply
post #10 of 23
Wow! Good work. Very informative.
    
CPUMotherboardGraphicsRAM
I7 920 Evga X58 758 Asus 470 GTX 3x2GB G.Skill F3-12800CL7 
Hard DriveOptical DriveOSMonitor
128GB King SSD + 2x750GB WD 2x Samsung DVD Win 7 Home Premium x64 Samsung 226BW+Asus VH202T-P 
KeyboardPowerCaseMouse
Logitech G15 HX850w Lian Li PC-K62 Old microsoft mouse 
Mouse Pad
Dell freebie 
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
I7 920 Evga X58 758 Asus 470 GTX 3x2GB G.Skill F3-12800CL7 
Hard DriveOptical DriveOSMonitor
128GB King SSD + 2x750GB WD 2x Samsung DVD Win 7 Home Premium x64 Samsung 226BW+Asus VH202T-P 
KeyboardPowerCaseMouse
Logitech G15 HX850w Lian Li PC-K62 Old microsoft mouse 
Mouse Pad
Dell freebie 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Overclock.net Folding@Home Team
Overclock.net › Forums › Overclockers Care › Overclock.net Folding@Home Team › SMP Folding: Understanding and Improving