This post has been edited significantly to state my findings in an abbreviated, clear, and concise manner.
Post #6 contains the actual data as measured over 48 hours of folding, calculating, compiling, posting, then editing the data.
Post #8 contains theoretical data based on power savings derived from an iterative method of estimating power savings by using a more efficient PSU. Note that this introduces a margin of error, and although scientifically valid, the data are not actual measured values.
NOTE:
My intent when posting this thread, and my choice to post it in the Community Folding Project section, was to make anyone planning to donate equipment or donate their own electricity costs and time to the folding cause aware that what looks best on paper or what looks best in a competition is probably not the most economically or electrically efficient solution to folding.
The intent of this thread is not to frighten people, nor is it to discourage anyone from folding. I deliberately posted this outside of the Team Competition forums because I wanted to be sure that I was not discouraging people from folding in the competitions or detracting from the competitive nature and good morale of the competitions.
If this post helps even one person decide what to fold and how to fold it, then I'll consider the few hours of writing and editing and the last two days of data collection and measurement worth every minute of my time.
Often times, folders only consider the cost of the hardware they're going to fold on and the points per day (PPD) values when they're deciding what system or hardware to fold on, how far to overclock, and what settings to use when folding.
Some folders take it a step farther and use PPD/MHz to get an idea of clock speed or overclock efficiency for folding.
I know that people have mentioned this before, but what truly matters is not your PPD or your PPD/MHz but instead your PPD/KWh or in layman's terms, your PPD vs your cost in utility bills. And what may matter even more to you, is that you have the least energy consumption possible, regardless of your PPD.
This might be a good place to collect data (voluntarily, of course) and compare notes on the most power efficient (and thus cost efficient) folding setups. PPD is well and good, and of course Stanford wants to see the fastest turnaround time on all WUs that they can get, but we should be concerned about providing research in the most cost effective and energy-efficient manner possible and in my recent experience, Fermi GPUs are not the way to do that.
Conclusions:
After conducting two days of data collection I have compiled a summary of my recommendations for anyone who is serious about contributing to the [email protected] cause, whether in donating equipment or donating their own cycles and electricity to the cause.
For anyone who wishes to contribute to the [email protected] project, the most efficient way to do so would be as follows: (This is both the most efficient in terms of new hardware investment and in operational costs.)
There are guides on how to set up headless folding satellites that will link to a main PC (which can be a very low power and inexpensive unit) that coordinates sending and receiving the WUs for the entire folding farm.
For the cost of building a system that can fold SMP and GPU simultaneously, one could build two satellite folding rigs and a low-cost "server" to coordinate them. In terms of operating costs, two highly overclocked headless 2500Ks can operate at the same power consumption as an overclocked SMP and reference GPU combination unit can in a complete system.
The total cost of a 2500K high-efficiency headless folding satellite unit is equal to or less than the cost of a single GTX 580 at this time and the operating costs are significantly less. (Even a highly overclocked 2500K draws 12% less power as a system than a reference clocked GTX 580 does in a system operating at idle. That's the worst-case scenario power savings. Best case is a 36% savings in operating costs comparing reference CPU to reference GPU. )
A next-generation Xeon may be even more efficient both in terms of power consumption and PPD/W, but the initial cost could be prohibitive, especially when one considers the cost of server motherboards and registered ECC DIMMs.
Below is a more detailed conclusion of my findings, supported by the data that are included in my later posts:
Just to drive home how inefficient the GPU client is, at reference CPU and GPU clocks, running the lower intensity normal methods GPU client, my lowest total system power was 328 W. At 4.9 GHz and 1.416 Vcore, the -advmethods SMP client was only using 245 W. Reference CPU clocks on the same WU were 183W. The GPU client is extremely inefficient; the only reason to use the GPU client is to fill a slot in Team Competition, or to fold for high PPD values on an incredibly outdated CPU.
Below is my opinion only. I feel disheartened by how inefficient the GPU clients are and how disproportionately low the PDD values are both in terms of initial equipment investment and in annual operating costs. I feel that Stanford has done the folding community a disservice with the GPU point system as compared to the SMP system. Perhaps someone on the [email protected] advisory panel will see this post and take this information to the [email protected] team.
In my opinion, Stanford should reconsider how GPU WUs are valued (or assigned) to reflect the significantly higher electrical costs associated with them. I believe that Stanford is doing folders, ---and our electrical power resources--- as both a local and global community, a disservice with the GPU client point values. Points are intangible and are only supposed to reflect the value of the research. As such, Stanford is tacitly stating that SMP research is worth more than GPU research, even though GPU research costs us considerably more in initial investment and operating costs than CPU SMP research does.
I believe that Stanford [email protected] needs to restructure the point system to address this disparity. Stanford [email protected] could accomplish this in a number of ways:
As things stand right now, I see absolutely no reason to recommend folding on a GPU to anyone, for any reason, other than to complete the Stanford [email protected] GPU projects.
In the interest of continuity, and so as not to make the responses to this post seem out of place, this portion of the post contains some of my PRELIMINARY results and analysis, which the responses relate to. Please note that these data have changed, both in scope and format, to much more presentable and scientifically accurate data sets in the following posts.
My signature rig is in my signature below, but the pertinent components are listed, along with voltages and frequencies:
With standard Windows 7 64 Professional background processes running, Open Hardware Monitor, evga precision for software controlled GPU fan speeds, and NVIDIA Inspector for multi-display power saver and overclock settings running as well, I tested the following clients:
From that, I determined that my average PPD on SMP was 26,000 and my average PPD on Fermi was 20,000. This is over 323 WUs between both clients in the last two months.
My PPD/MHz on SMP was 5.5 and my PPD/MHz on GPU was 11.
However, I measured the total power drawn at the UPS at the following values:
I then applied the following simple formulas (which are more unit-conversion than anything else)
( Total Power (W) / 1000 W / KW ) * ( 24 hr / day ) * ( 365.25 day / yr ) * ( 1 yr / 12 mo ) = Total KWh / month
( Total KWh / month ) * ( utility cost / KWh ) = Monthly Cost ( cost / mo )
( cost / month ) * ( 12 mo / yr ) = Annual Cost ( cost / yr )
I then verified this at my utility meter, which measures KWh or Kilowatt-hours over an hour of operation with no cyclic loads and compared it to the same no cyclic load condition with no clients running. My theoretical values based on the UPS were accurate at my utility meter within +/- 2%
Conclusion:
The SMP client yields 30% more PPD at 47.8% less power use, making it 249% more efficient than the Fermi client, in terms of PPD/W.
US average electrical cost per KWh is 9.83 cents / KWh (nationwide household average 2010 U.S. Energy Information Administration)
My total costs, if I paid the US national average for power would be as follow: (rounded to the penny)
Furthermore, in the U.S. most electrical companies have a tiered price per KWh. They'll have something similar to the following:
0-500 KWh / month = $0.075 / KWh
501-1300 KWh / month = $0.0883 / KWh
1300+ KWh / month = $0.1102 / KWh
(The above values are simply examples, not my costs or rates, and prices vary by region and type of power used to supply the region. ie. Power in Hawaii is considerably more costly than power in Idaho.)
So if you're already using around 1100 KWh / month (based on an annual average) and if you have a system like mine, using 416 KWh / month tacked on, the first 200 KWh of your SMP/Fermi will be at around 9 cents / KWh and the additional 216 KWh get charged at around 11 cents / KWh. (Using the above table and the hypothetical values that I just listed, folding my system in those utility parameters would cost an extra $41.46 / month or $497.52 / year.) (Rather than folding, I could have bought a second GTX 580 for SLI or paid in full for my car insurance...)
Now these dollar amounts are not what I pay, and I'm not going to discuss what my actual electricity rates are or what my actual utility bills are, but I am posting actual power statistics of my system in a 24/7 real-world folding use state.
Electrical osts add up very fast and unless you're a college student with free electrical power or in a unique rent situation where your electricity is covered, the costs of folding are more than you might expect.
To make matters worse, if you are in a hot climate and you use your air-conditioning, then folding will heat up your PC room, causing your AC system to work harder to maintain a constant comfortable temperature, and the hidden costs of additional cooling are not something that I have even considered. (Of course in a cold climate, folding acts like a space heater and you might reduce your heater run-time to keep the house warm, which won't reduce your power bill, but at least will offset it some.)
Post #6 contains the actual data as measured over 48 hours of folding, calculating, compiling, posting, then editing the data.
Post #8 contains theoretical data based on power savings derived from an iterative method of estimating power savings by using a more efficient PSU. Note that this introduces a margin of error, and although scientifically valid, the data are not actual measured values.
NOTE:
My intent when posting this thread, and my choice to post it in the Community Folding Project section, was to make anyone planning to donate equipment or donate their own electricity costs and time to the folding cause aware that what looks best on paper or what looks best in a competition is probably not the most economically or electrically efficient solution to folding.
The intent of this thread is not to frighten people, nor is it to discourage anyone from folding. I deliberately posted this outside of the Team Competition forums because I wanted to be sure that I was not discouraging people from folding in the competitions or detracting from the competitive nature and good morale of the competitions.
If this post helps even one person decide what to fold and how to fold it, then I'll consider the few hours of writing and editing and the last two days of data collection and measurement worth every minute of my time.
Often times, folders only consider the cost of the hardware they're going to fold on and the points per day (PPD) values when they're deciding what system or hardware to fold on, how far to overclock, and what settings to use when folding.
Some folders take it a step farther and use PPD/MHz to get an idea of clock speed or overclock efficiency for folding.
I know that people have mentioned this before, but what truly matters is not your PPD or your PPD/MHz but instead your PPD/KWh or in layman's terms, your PPD vs your cost in utility bills. And what may matter even more to you, is that you have the least energy consumption possible, regardless of your PPD.
This might be a good place to collect data (voluntarily, of course) and compare notes on the most power efficient (and thus cost efficient) folding setups. PPD is well and good, and of course Stanford wants to see the fastest turnaround time on all WUs that they can get, but we should be concerned about providing research in the most cost effective and energy-efficient manner possible and in my recent experience, Fermi GPUs are not the way to do that.
Conclusions:
After conducting two days of data collection I have compiled a summary of my recommendations for anyone who is serious about contributing to the [email protected] cause, whether in donating equipment or donating their own cycles and electricity to the cause.
For anyone who wishes to contribute to the [email protected] project, the most efficient way to do so would be as follows: (This is both the most efficient in terms of new hardware investment and in operational costs.)
- Operate a "headless" (meaning no monitor, keyboard, mouse, or GPU) "satellite" folding machine networked to an ultra-high efficiency home "server" to coordinate the satellites. (The server could be incredibly low cost and low power.)
- Use a Linux distro optimized for [email protected] in a console (no GUI) format.
- Use the most power efficient motherboard possible. (Platinum 90+ is a good start.)
- Use the most efficient PSU possible. (80+ Platinum or 80+ Titanium)
- Choose a PSU Watt rating that will have peak efficiency at the desired operating load of the system. (The best way to determine this would be to purchase, tune, and operate the system on an old or borrowed PSU, then purchase the precise PSU to match).
- Use ECO DIMM for RAM (1.250V DIMMs for Sandy Bridge / Ivy Bridge.
- Use the smallest die technology of the latest generation possible. Sandy Bridge and Ivy Bridge are both excellent choices for power to efficiency ratios. (The 2600K and 2700K SBs may have an advantage over the 2500K SBs. Further testing is required.)
- Operate without a UPS if possible, but be aware that power spikes could be catastrophic, even through a surge protector.
- Consider heavily overclocking if using an Intel K series CPU. (and most likely any of the "core" processors). The additional power cost of overclocking the CPU is minor compared to the performance gains.
- Even the most extreme overclock on a 2500K uses less power than running a reference GTX 580.
- PPD (on a CPU eligible for bonuses) scales in a positive and non-linear fashion with clock speed. The higher the clock speed, the more PPD you receive and the more efficient the PPD/W value becomes, because of how Stanford's bonus points work.
- Note that operating a Fermi card requires a copy of Windows and either running Windows native or Windows in a VM or emulated WINE environment within Linux. (No legitimate copy of Windows is free.) Linux is free. The savings of not needing to purchase an OS with the headless folding satellites and for the "server" that coordinates them is considerable.
There are guides on how to set up headless folding satellites that will link to a main PC (which can be a very low power and inexpensive unit) that coordinates sending and receiving the WUs for the entire folding farm.
For the cost of building a system that can fold SMP and GPU simultaneously, one could build two satellite folding rigs and a low-cost "server" to coordinate them. In terms of operating costs, two highly overclocked headless 2500Ks can operate at the same power consumption as an overclocked SMP and reference GPU combination unit can in a complete system.
The total cost of a 2500K high-efficiency headless folding satellite unit is equal to or less than the cost of a single GTX 580 at this time and the operating costs are significantly less. (Even a highly overclocked 2500K draws 12% less power as a system than a reference clocked GTX 580 does in a system operating at idle. That's the worst-case scenario power savings. Best case is a 36% savings in operating costs comparing reference CPU to reference GPU. )
A next-generation Xeon may be even more efficient both in terms of power consumption and PPD/W, but the initial cost could be prohibitive, especially when one considers the cost of server motherboards and registered ECC DIMMs.
Below is a more detailed conclusion of my findings, supported by the data that are included in my later posts:
The -advmethods SMP client with big packets enabled at a default CPU clock uses the least electrical power.
The most efficient client in terms of PPD/W is the -advmethods SMP client with big packets enabled in a highly overclocked CPU state.The more overclocked the CPU is, the more efficient, in terms of PPD/W the rating is. Stanford's bonus points are an inverse and non-linear proportion to frame time; the faster your frame time, the higher your bonus becomes. In the case of this WU, so long as computational performance scales linearly with CPU clocks, the PPD/W values will increase in a non-linear rate. This will apply to all WUs, so long as one compares the same WU with different clock values. Perhaps at some point, due to thermal or current-limit throttling of the CPU core, one would reach a point where the overclock becomes detrimental, but as long as you can keep the CPU cool enough to avoid temperature and current/power limits, the higher the overclock, the higher the PPD and PPD/W values will be. A 2600K or 2700K should show considerably better efficiency in terms of PPD/W with minimal power increases. High overclocks on the 2500K (and most likely the 2600K, 2700K and upcoming Ivy Bridge systems) are surprisingly electrically efficient. The additional power use is far less than I expected it to be.
The least efficient CPU client (or SMP client) still uses significantly less total power and produces produces significantly more efficient performance in terms of PPD/W than the most electrically efficent or most PPD/W efficient GPU client. To be blunt, the GPU clients are wretchedly innefficient.
A mild overclock (around 4.5 GHz on my system) on either the normal methods SMP client or the -admethods SMP client is required to match the PPD output of the normal GPU client (non -advmethods) at reference GPU clocks. A reference clock 2600K or 2700K should match the normal GPU client at reference clocks.
Only the highest stable overclock (on my system) for the -admethods SMP client is able to produce more PPD than the -advmethods GPU client when the GPU is overclocked.
The GPU client with the highest PPD, when not running SMP, is the -advmethods Fermi client with the core affinity lock in its default enabled position with the CPU either forced into a high clock state on all cores, or locked in a high clock state on the core with matching affinity to the GPU client. Speedstep can be left enabled and the core can remain idle, but performance in the GPU client will suffer slightly. As there are no bonuses for early completion of WUs with the GPU clients, lower clock speeds give better PPD/W, but the difference is relatively small. A mildly overclocked 2600K or 2700K should match the highly overclocked -advmethods GPU PPD.
The -advmethods GPU client uses considerably more power than the standard methods GPU client, by 18 to 27% ! (And the PPD increase from using it is 5% if only running GPU client to 8% if running with SMP clients. (More details on that interesting phenomenon below.)
Maximum overall PPD in the GPU client in -advmethods mode is not impacted in the slightest by the SMP client running, with the core affinity lock disabled (non-default setting). Not only does the SMP client running have no negative impact on the GPU PPD performance. In fact, running SMP (or keeping the CPU cores at maximum clocks by any other means, even Prime95) actually increases GPU -advmethods performance.
Just to drive home how inefficient the GPU client is, at reference CPU and GPU clocks, running the lower intensity normal methods GPU client, my lowest total system power was 328 W. At 4.9 GHz and 1.416 Vcore, the -advmethods SMP client was only using 245 W. Reference CPU clocks on the same WU were 183W. The GPU client is extremely inefficient; the only reason to use the GPU client is to fill a slot in Team Competition, or to fold for high PPD values on an incredibly outdated CPU.
Below is my opinion only. I feel disheartened by how inefficient the GPU clients are and how disproportionately low the PDD values are both in terms of initial equipment investment and in annual operating costs. I feel that Stanford has done the folding community a disservice with the GPU point system as compared to the SMP system. Perhaps someone on the [email protected] advisory panel will see this post and take this information to the [email protected] team.
In my opinion, Stanford should reconsider how GPU WUs are valued (or assigned) to reflect the significantly higher electrical costs associated with them. I believe that Stanford is doing folders, ---and our electrical power resources--- as both a local and global community, a disservice with the GPU client point values. Points are intangible and are only supposed to reflect the value of the research. As such, Stanford is tacitly stating that SMP research is worth more than GPU research, even though GPU research costs us considerably more in initial investment and operating costs than CPU SMP research does.
I believe that Stanford [email protected] needs to restructure the point system to address this disparity. Stanford [email protected] could accomplish this in a number of ways:
- Stanford [email protected] could implement a K factor bonus for GPU WUs just as they do SMP WUs.
- Stanford [email protected] could increase the base point values on GPU WUs.
- Stanford [email protected] could implement both a K factor bonus for GPU WUs as well as change the base value of GPU WUs, by increasing or decreasing base values as necessary for balance. (This is probably the most attractive alternative to me.)
- Stanford [email protected] could decrease the base point values on SMP WUs. (This is not my preference, and probably the least productive of the alternatives, but it is an alternative.)
- Stanford [email protected] could give up on GPUs entirely, move GPU WUs to distributed CPU projects, and accomplish the same research at a much lower electrical burden. (This is probably impossible, due to the way parallel stream processing architectures differ so greatly from SMP architectures.)
As things stand right now, I see absolutely no reason to recommend folding on a GPU to anyone, for any reason, other than to complete the Stanford [email protected] GPU projects.
In the interest of continuity, and so as not to make the responses to this post seem out of place, this portion of the post contains some of my PRELIMINARY results and analysis, which the responses relate to. Please note that these data have changed, both in scope and format, to much more presentable and scientifically accurate data sets in the following posts.
My signature rig is in my signature below, but the pertinent components are listed, along with voltages and frequencies:
- CPU: i5-2500K overclocked to 4.7 GHz / 1.336 - 1.344 V (average 1.340V) (as tested)
- RAM: 2x PC3-12800 GmSkill CL8 / 800 MHz (1600 effective) / stock at 1.500V (as tested)
- Motherboard: Asus P8P67 WS Revolution Rev B3 (92% platinum power efficiency rating) (as tested)
- HDD: WDC WD2002FAEX-007BA0 (2 TB Caviar Black 7200 RPM Sata3 6.0GB/s) (as tested)
- Graphics: evga GeForce GTX 580 SC (1.5 GB VRAM) overclocked to 904.5 MHz core / 1809 MHz shader / 2106 MHz Memory / 1.113Vcore (as tested)
- OS: Windows 7 64 bit Professional
- PSU: Corsair TX850W (CMPSU-850TX) ( load as tested is one of three values: 235W (28% load / 83% efficient) , 450W (53% load / 84% efficient), and 570W (67% load / 83% efficient) )
- UPS: APC BX1500G 1500VA / 865W (87% efficiency at full load 865W / 86% efficiency at half load 432.5 W / unknown efficiency at 235 W, assumed to be >80% )
With standard Windows 7 64 Professional background processes running, Open Hardware Monitor, evga precision for software controlled GPU fan speeds, and NVIDIA Inspector for multi-display power saver and overclock settings running as well, I tested the following clients:
- Windows XP/2003/Vista/2008/7 SMP2 client console version 6.34 (32 bit) running -SMP 4 / -verbosity 9 / -advmethods / bigpackets / checkpoint=30 / nocpulock=1
- Windows XP/2003/Vista/7 GPU3 (required for Fermi) no-nonsense console client version 6.41 (32 bit) -advmethods / bigpackets / -verbosity 9 / priority=96 (low) / checkpoint=30 / nocpulock=1
From that, I determined that my average PPD on SMP was 26,000 and my average PPD on Fermi was 20,000. This is over 323 WUs between both clients in the last two months.
My PPD/MHz on SMP was 5.5 and my PPD/MHz on GPU was 11.
However, I measured the total power drawn at the UPS at the following values:
- 235 W total power use for SMP and no other clients.
- 450 W total power use for Fermi and no other clients.
- 570 W total power use values with both SMP and Fermi client combined.
I then applied the following simple formulas (which are more unit-conversion than anything else)
( Total Power (W) / 1000 W / KW ) * ( 24 hr / day ) * ( 365.25 day / yr ) * ( 1 yr / 12 mo ) = Total KWh / month
( Total KWh / month ) * ( utility cost / KWh ) = Monthly Cost ( cost / mo )
( cost / month ) * ( 12 mo / yr ) = Annual Cost ( cost / yr )
I then verified this at my utility meter, which measures KWh or Kilowatt-hours over an hour of operation with no cyclic loads and compared it to the same no cyclic load condition with no clients running. My theoretical values based on the UPS were accurate at my utility meter within +/- 2%
Conclusion:
- SMP client in my configuration, with my system, and at my overclock, is 110.64 PPD/W
- Fermi client, in my configuration, with my system, and at my overclock is 44.46 PPD/W
- Both clients running simultaneously in my configurations, with my system, and at my overclocks are 80.70 PPD/W
The SMP client yields 30% more PPD at 47.8% less power use, making it 249% more efficient than the Fermi client, in terms of PPD/W.
- SMP client in my configuration, with my system, and at my overclock, is 2060 KWh / year or 171.6675 KWh / month
- Fermi client, in my configuration, with my system, and at my overclock is 3944.7 KHh / year or 328.725 KWh / month
- Both clients running simultaneously in my configurations, with my system, and at my overclocks are 4996.62 KWh / year or 416.385 KWh / month
US average electrical cost per KWh is 9.83 cents / KWh (nationwide household average 2010 U.S. Energy Information Administration)
My total costs, if I paid the US national average for power would be as follow: (rounded to the penny)
- SMP client in my configuration, with my system, and at my overclock, is $202.50 / year or $16.88 / month
- Fermi client, in my configuration, with my system, and at my overclock is $387.76 / year or $32.31 / month
- Both clients running simultaneously in my configurations, with my system, and at my overclocks are $491.17 / year or $40.93 / month
Furthermore, in the U.S. most electrical companies have a tiered price per KWh. They'll have something similar to the following:
0-500 KWh / month = $0.075 / KWh
501-1300 KWh / month = $0.0883 / KWh
1300+ KWh / month = $0.1102 / KWh
(The above values are simply examples, not my costs or rates, and prices vary by region and type of power used to supply the region. ie. Power in Hawaii is considerably more costly than power in Idaho.)
So if you're already using around 1100 KWh / month (based on an annual average) and if you have a system like mine, using 416 KWh / month tacked on, the first 200 KWh of your SMP/Fermi will be at around 9 cents / KWh and the additional 216 KWh get charged at around 11 cents / KWh. (Using the above table and the hypothetical values that I just listed, folding my system in those utility parameters would cost an extra $41.46 / month or $497.52 / year.) (Rather than folding, I could have bought a second GTX 580 for SLI or paid in full for my car insurance...)
Now these dollar amounts are not what I pay, and I'm not going to discuss what my actual electricity rates are or what my actual utility bills are, but I am posting actual power statistics of my system in a 24/7 real-world folding use state.
Electrical osts add up very fast and unless you're a college student with free electrical power or in a unique rent situation where your electricity is covered, the costs of folding are more than you might expect.
To make matters worse, if you are in a hot climate and you use your air-conditioning, then folding will heat up your PC room, causing your AC system to work harder to maintain a constant comfortable temperature, and the hidden costs of additional cooling are not something that I have even considered. (Of course in a cold climate, folding acts like a space heater and you might reduce your heater run-time to keep the house warm, which won't reduce your power bill, but at least will offset it some.)
