Summary
There seems to be a great deal of misconception, exaggeration, and other truth-bending ideas about water cooling multiple GPUs, so I will attempt to clarify all that I can to help anyone who is interested.
One Loop vs Multi-Loop
To start, let's clear up the reasons for doing a single or multi-loop setup in terms of performance (ignoring aesthetics).
Single Loop
The main reason you would want to run a single loop is that it uses your radiator space more efficiently because all heat and all dissipation are within the same loop, meaning that all of your water cooling components are utilized 100% of the time. An easy way to think of this is if you have two 480 radiators, one CPU, and one GPU; in a single loop, if the CPU is not under load then the GPU receives 120.8 worth of dissipation instead of 120.4; similarly, if the GPU is not under load then the CPU receives 120.8 worth of dissipation instead of 120.4. Another reason is to save money and space on extra water cooling components; each loop requires its own pump, reservoir, and tubing.
Multi-Loop
The only case (outside aesthetics) where a multi-loop environment is more ideal is when you want to isolate the CPU and GPU heat in order to run each loop at different air/water deltas (temp difference). The reason to do this would be to maintain the lowest possible water temperature for the CPU (for overclocking) independent of GPU load. CPUs are significantly more prone to reaching thermal limits of overclocking than GPUs, thus you can run a GPU loop with a higher air/water delta without diminishing your overclocking ability.
Water Pressure, Flow Rate, and Multiple Pumps
The more water blocks you add to a water cooling loop, the more restrictive that loop becomes and requires more pressure in order to maintain the same flow rate. The faster your water is moving, the more restrictive each block becomes (the relationship is usually not linear). After a great deal of data had been gathered by many sources, the accepted optimal flow rate for water cooling loops is between 0.9gpm and 1.4gpm depending on the block/radiator models.
How slow is too slow?
While the actual flow rate of "too slow" is determined by the amount of heat being transferred and the design of the water block; the too slow mark begins somewhere around 0.5gpm to 0.3gpm (depending on model). If your flow rate falls below this, then you will begin to see significant increases to the temperature of the water between components (in excess of 4C per block for average heat transfer). Remember that as your flow rate increases, the same amount of heat is spread across a greater volume of water.
How many pumps do I need?
Adding additional pumps in series will add that pump's pressure to the pressure already exerted by your existing pump(s). It is ideal to use the same model of pump or more specifically a pump with a similar PQ curve (pressure vs flow), but not required. You must pay attention to the PQ curve of a pump though because adding a pump that causes the flow rate to exceed the maximum flow rate of your existing pump will cause your existing pump to actually cause restriction instead of adding any pressure. The exact number of pumps you need depends on your flow design and the PQ curves of all water cooling components within your loop versus the PQ curve of the pump.
What is the difference between running GPUs in parallel or in series?
To setup GPUs in parallel means that you split the water flow and go to each GPU, then converge them all back into one. In series, the water flows through each GPU in succession. There are two key aspects of running in parallel that you must be aware of: the first is that your restriction will be lowered; the second is that the flow rate will be divided. The amount by which the restriction and flow rate are lowered depend on the number of GPU blocks and the setup that they use. In series, the GPU restrictions are simply added together.
Multi-GPU Loop Designs
In this section, I'll comment on the most common multi-GPU setups (2, 3, and 4) and address when it is best to use series and when it is best to use parallel.
Two GPUs
With two cards, putting the blocks in series will yield a restriction of 2*R (two times the restriction of a single card) and the flow rate through each will be 1*F (the same flow rate as the overall loop). With two cards in parallel, the restriction will be R/2 (half the restriction of one card) and the flow rate through each will be F/2 (half the overall flow rate). This means that putting cards in parallel is 1/4 the restriction of putting them in series, with each card having half the flow rate.
Noting the above minimum flow rate of 0.5gpm, you would not want to put the cards in parallel unless the overall flow rate would be about 1gpm (or more) for most blocks. There are some blocks (generally high restriction) that perform well even at lower flow rates, which above we determined their minimum flow rate to be around 0.3gpm, making the minimum overall flow rate for parallel to 0.6gpm. Once we've established that our overall flow rate will exceed this, we have the choice between series or parallel (otherwise, series is the only viable option).
Now, if our blocks are low-restriction (lets say 0.8psi @ 1gpm (XSPC)) then it would be silly to cut our flow rate in half just to gain 1.2psi (1.6psi vs 0.4psi); but if our blocks are high-restriction (lets say 2psi @ 1gpm (Hydrocopper)) then the benefit becomes more obvious to gain back 3psi (4psi vs 1psi). To give you a perspective: the D5 puts out about 2.8psi @ 1gpm (vario setting 4; D5 B on max); that means in series your overall flow rate would be less than 1gpm if the hydrocoppers were in series and significantly more than 1gpm while in parallel. In the case of the low-restriction blocks with the same D5, in series the overall flow would still exceed 1gpm, and if put in parallel the overall flow rate would likely not exceed 2gpm so you would have less than 1gpm through each block.
Recommended:
Low-Restriction Blocks (0.5-1psi @ 1gpm): Series
Med-Restriction Blocks (1-1.5psi @ 1gpm): Parallel
High-Restriction Blocks (1.5psi+ @ 1gpm): Parallel
Three GPUs
You have three choices of loop design here: all series, all parallel, or one in series and two in parallel. Having all three in series will now begin to add up to somewhat significant restriction. In the case of very low-restriction blocks, you're still probably fine: as with the above example, you would be at 2.4psi drop; but with a medium restriction (say 1.2psi @ 1gpm), you'd be sitting at a 3.6psi drop, which is too high for that single pump if we want to maintain at least 1gpm overall flow rate.
Having two in parallel and one in series will yield a total restriction of 1.5*R with one card having full flow and the other two having half. Our low restriction blocks would now only have a 1.2psi drop, but having them all in series wasn't too bad and allowed all three cards to have full flow; you can easily go either way on this since both ways will still remain above the minimum. In the case of our medium restriction blocks, that would make an overall 1.8psi drop. That is very comfortable and since our overall flow rate remains above 1gpm, we know that each of the two GPUs in parallel are still getting more than minimum flow rate. For our high restriction blocks, we are looking at 3psi, which is doable, but the overall flow will fall below 1gpm.
Putting all three cards in parallel requires us to consider other variables. Previously, I mentioned how some blocks are designed/optimized for low flow rates and those blocks perform to a satisfactory level down to about 0.3gpm whereas other blocks (the norm) you generally wouldn't want to fall below 0.5gpm. Having all three cards in parallel causes our flow rate to cut to 1/3rd the overall flow. That means if we have an overall flow rate of 1gpm, each card receives only 0.33gpm. The overall restriction of the cards is all but non-existent (1/3*R or 0.66psi for our high restriction block), but given an average card that requires 0.5gpm flow rate as a minimum, we would need to have an overall loop flow rate of 1.5gpm or better to keep each GPU at or above the minimum flow rate.
Recommended:
Low-Restriction Blocks (0.5-1psi @ 1gpm): 3 Series or 1 Series + 2 Parallel
Med-Restriction Blocks (1-1.5psi @ 1gpm): 1 Series + 2 Parallel
High-Restriction Blocks (1.5psi+ @ 1gpm): 1 Series + 2 Parallel or 3 Parallel (for low-flow optimized)
Four GPUs
While there are several ways that this can be done, I am only going to comment on all series, all parallel, and 2x2 series/parallel. In all series, with the low restrictions blocks, you are looking at a total 3.2psi drop. We've over-stepped our setting-4/B to maintain 1gpm overall, but as long as we use a very low restriction CPU block and radiator and don't go overboard with fittings, we should still be above the minimum flow rate albeit below our overall 1gpm target.
For the 2x2 series/parallel, let me first explain the setup. There are two loop designs that utilize this: one that you connect two sets of two cards in parallel with a single tube, thus each set is in series. The other way is put two sets of two cards that are connected in series into parallel (I'll post an example so it's easier to visualize). In this setup, our low restriction blocks come out with a total 0.8psi drop, medium at total 1.2psi, and high at total 2psi with all four cards having half the overall flow rate. You probably noticed the pattern there, and you'd be correct: having four GPUs in 2x2 parallel/series will yield a total restriction equal to a single GPU. This means that regardless of whether you have low, medium, or high restriction blocks, you'll be able to meet or exceed the minimum flow rate per block as well as a 1gpm+ overall flow rate.
For having all four cards in parallel I'll just say: don't do it. In order to have all four cards in parallel and maintain a minimum flow rate per card, your overall flow rate in your loop would have to be 2gpm+. Unless you are running an Iwaki pump or have multiple pumps, you simply won't be able to have a total loop restriction low enough to allow that high of a flow rate.
Recommended:
Low-Restriction Blocks (0.5-1psi @ 1gpm): 2x2 series/parallel
Med-Restriction Blocks (1-1.5psi @ 1gpm): 2x2 series/parallel
High-Restriction Blocks (1.5psi+ @ 1gpm): 2x2 series/parallel
Post Notes
I wanted to make sure to comment about the non-GPU restrictions left. Anything in your loop causes restriction: fittings, radiators, tubing, blocks, etc. In the descriptions above I only focused on the GPU restriction, but you must remember to also factor in other restrictions when wanting to stay at or above an overall flow rate of 1gpm. Notice my comment about having three low-restriction GPUs in series: at a total 2.4psi drop and a pump that offers 2.8psi, that leaves you with only 0.4psi for the rest of the loop if you want to maintain 1gpm: that is not even enough for most low-restriction CPU blocks.
I mention several times a 'low-flow optimized block', but didn't really describe it. You will have to research reviews about block performance versus flow rate in order to pick out which blocks I am specifically talking about, but as an example, Aquacomputer has a tendency to make blocks that are high-restriction and low-flow optimized.
Customizing
I know well enough that not everyone will run a D5 vario on setting 4 or use a D5 B pump, so in this section, I'll try to quantify how to go about coming up with your own numbers for your own loop designs.
Pumps
I will list a few valid data points about a couple of well known pumps, but you will want to find the PQ curve data on your own pump to get the right numbers.
D5 Set 4/B @ 1gpm = 2.8psi
D5 Set 5 @ 1gpm = 4.6psi
D5 Strong 24v @ 1gpm = 7.1psi
MCP35X 40% @ 1gpm = 3.5psi
MCP35X 50% @ 1gpm = 5.7psi
MCP35X 100% @ 1gpm = 6.4psi
Iwaki RD30 12v @ 1gpm = 4.4psi
Iwaki RD30 18v @ 1gpm = 8.7psi
Iwaki RD30 24v @ 1gpm = 13.7psi
Aquastream XT Ultra @ 1gpm = 3psi
XSPC X20 @ 1gpm = 1.8psi
Restriction
Above, I mentioned that the restriction of a block is not linear to the flow rate; while this is true, it is almost linear. Since it's almost linear, we can use a normal resistance formula to estimate the total restriction of multiple blocks. That is: R1 + R2 = Rt for series and 1/R1 + 1/R2 = 1/Rt for parallel. From the above example, we would have 1/2 + 1/2 = 1/Rt; Rt = 1 for two high restriction blocks. Using this formula, you can even figure out the total restriction of dissimilar blocks, but I would highly recommended you not put dissimilar restriction blocks in parallel.
Flow Layout
There are many different ways you can lay out the flow of water through your components, and each will have advantages and disadvantages. For instance, if you have 3 GPUs, instead of using any of the above 3 layouts, you could add a valve to simulate the restriction of a 4th card and instead use the 2x2 series/parallel. The disadvantage to doing that is that you would need to figure out exactly how far open the valve would need to be to simulate the same restriction.
You should also be mindful when putting GPUs in parallel that you maintain equal pressure. When using a bridge, most people have bottom in and top out or top in and bottom out, but some have top in and top out or bottom in and bottom out. When just talking about two cards, the difference should not be enough to worry about and will only minorly affect the amount of flow through the cards (the one closest to the in and out will have greater flow). When you're talking about 3 or 4 cards in parallel, the issue compounds and I would not suggest having both in and out on the same side (top or bottom). To understand this, we must realize that the reason water flows is because the pressure is higher at one point than another. In a top-in, bottom-out scenario, the top card has the highest input pressure and the bottom card less by an amount depending on the restriction of the card and the pressure at the out port. In the same scenario, the top card has a higher outgoing pressure than the bottom card, so the pressure difference between both cards' ins and outs are the same. In the top/top or bottom/bottom scenario, the first card has either the highest in and lowest out or the lowest in and highest out, so the difference in pressure between the ins and outs of the two cards are not the same. In most cases, people who have 3+ cards also invest in having more pump pressure than is necessary, so the effects are not noticed due to all cards still having enough flow.
In regards to the loop order for GPUs and CPU, the key to keep in mind is that at 1gpm it takes about 250 watts (typical high-end video card) to increase the temperature of water by 1C. That means that as long as your flow rate meets or exceeds 1gpm, the order doesn't matter for one or two cards. If, however, your flow rate is lower or you have more than two GPUs, then I would recommend putting the CPU before the GPUs because the temperature raise of the water will begin to be noticeable, especially if you have overclocked your CPU to near Tmax.
Odds and Ends
Maintenance
Although it is always important to maintain the cleanliness of your loop, it becomes especially important if you use a parallel setup. If you have gunk build up in one of your GPUs, the overall flow rate and therefore CPU cooling may not be affected, but it will cause less water to flow through the gunked up GPU block.
Dissimilar Restriction
Don't put two different water blocks with different PQ curves in parallel unless they are very close; otherwise you will have a great deal more water flowing through one instead of the other.
Crummy Pumps
If your pump is low end to begin with, be very careful about using parallel at all as it will likely cause the flow rate through each video card to reach unacceptably low levels.
Diagrams
*Did I forget anything or get something wrong? Let me know.
There seems to be a great deal of misconception, exaggeration, and other truth-bending ideas about water cooling multiple GPUs, so I will attempt to clarify all that I can to help anyone who is interested.
One Loop vs Multi-Loop
To start, let's clear up the reasons for doing a single or multi-loop setup in terms of performance (ignoring aesthetics).
Single Loop
The main reason you would want to run a single loop is that it uses your radiator space more efficiently because all heat and all dissipation are within the same loop, meaning that all of your water cooling components are utilized 100% of the time. An easy way to think of this is if you have two 480 radiators, one CPU, and one GPU; in a single loop, if the CPU is not under load then the GPU receives 120.8 worth of dissipation instead of 120.4; similarly, if the GPU is not under load then the CPU receives 120.8 worth of dissipation instead of 120.4. Another reason is to save money and space on extra water cooling components; each loop requires its own pump, reservoir, and tubing.
Multi-Loop
The only case (outside aesthetics) where a multi-loop environment is more ideal is when you want to isolate the CPU and GPU heat in order to run each loop at different air/water deltas (temp difference). The reason to do this would be to maintain the lowest possible water temperature for the CPU (for overclocking) independent of GPU load. CPUs are significantly more prone to reaching thermal limits of overclocking than GPUs, thus you can run a GPU loop with a higher air/water delta without diminishing your overclocking ability.
Water Pressure, Flow Rate, and Multiple Pumps
The more water blocks you add to a water cooling loop, the more restrictive that loop becomes and requires more pressure in order to maintain the same flow rate. The faster your water is moving, the more restrictive each block becomes (the relationship is usually not linear). After a great deal of data had been gathered by many sources, the accepted optimal flow rate for water cooling loops is between 0.9gpm and 1.4gpm depending on the block/radiator models.
How slow is too slow?
While the actual flow rate of "too slow" is determined by the amount of heat being transferred and the design of the water block; the too slow mark begins somewhere around 0.5gpm to 0.3gpm (depending on model). If your flow rate falls below this, then you will begin to see significant increases to the temperature of the water between components (in excess of 4C per block for average heat transfer). Remember that as your flow rate increases, the same amount of heat is spread across a greater volume of water.
How many pumps do I need?
Adding additional pumps in series will add that pump's pressure to the pressure already exerted by your existing pump(s). It is ideal to use the same model of pump or more specifically a pump with a similar PQ curve (pressure vs flow), but not required. You must pay attention to the PQ curve of a pump though because adding a pump that causes the flow rate to exceed the maximum flow rate of your existing pump will cause your existing pump to actually cause restriction instead of adding any pressure. The exact number of pumps you need depends on your flow design and the PQ curves of all water cooling components within your loop versus the PQ curve of the pump.
What is the difference between running GPUs in parallel or in series?
To setup GPUs in parallel means that you split the water flow and go to each GPU, then converge them all back into one. In series, the water flows through each GPU in succession. There are two key aspects of running in parallel that you must be aware of: the first is that your restriction will be lowered; the second is that the flow rate will be divided. The amount by which the restriction and flow rate are lowered depend on the number of GPU blocks and the setup that they use. In series, the GPU restrictions are simply added together.
Multi-GPU Loop Designs
In this section, I'll comment on the most common multi-GPU setups (2, 3, and 4) and address when it is best to use series and when it is best to use parallel.
Two GPUs
With two cards, putting the blocks in series will yield a restriction of 2*R (two times the restriction of a single card) and the flow rate through each will be 1*F (the same flow rate as the overall loop). With two cards in parallel, the restriction will be R/2 (half the restriction of one card) and the flow rate through each will be F/2 (half the overall flow rate). This means that putting cards in parallel is 1/4 the restriction of putting them in series, with each card having half the flow rate.
Noting the above minimum flow rate of 0.5gpm, you would not want to put the cards in parallel unless the overall flow rate would be about 1gpm (or more) for most blocks. There are some blocks (generally high restriction) that perform well even at lower flow rates, which above we determined their minimum flow rate to be around 0.3gpm, making the minimum overall flow rate for parallel to 0.6gpm. Once we've established that our overall flow rate will exceed this, we have the choice between series or parallel (otherwise, series is the only viable option).
Now, if our blocks are low-restriction (lets say 0.8psi @ 1gpm (XSPC)) then it would be silly to cut our flow rate in half just to gain 1.2psi (1.6psi vs 0.4psi); but if our blocks are high-restriction (lets say 2psi @ 1gpm (Hydrocopper)) then the benefit becomes more obvious to gain back 3psi (4psi vs 1psi). To give you a perspective: the D5 puts out about 2.8psi @ 1gpm (vario setting 4; D5 B on max); that means in series your overall flow rate would be less than 1gpm if the hydrocoppers were in series and significantly more than 1gpm while in parallel. In the case of the low-restriction blocks with the same D5, in series the overall flow would still exceed 1gpm, and if put in parallel the overall flow rate would likely not exceed 2gpm so you would have less than 1gpm through each block.
Recommended:
Low-Restriction Blocks (0.5-1psi @ 1gpm): Series
Med-Restriction Blocks (1-1.5psi @ 1gpm): Parallel
High-Restriction Blocks (1.5psi+ @ 1gpm): Parallel
Three GPUs
You have three choices of loop design here: all series, all parallel, or one in series and two in parallel. Having all three in series will now begin to add up to somewhat significant restriction. In the case of very low-restriction blocks, you're still probably fine: as with the above example, you would be at 2.4psi drop; but with a medium restriction (say 1.2psi @ 1gpm), you'd be sitting at a 3.6psi drop, which is too high for that single pump if we want to maintain at least 1gpm overall flow rate.
Having two in parallel and one in series will yield a total restriction of 1.5*R with one card having full flow and the other two having half. Our low restriction blocks would now only have a 1.2psi drop, but having them all in series wasn't too bad and allowed all three cards to have full flow; you can easily go either way on this since both ways will still remain above the minimum. In the case of our medium restriction blocks, that would make an overall 1.8psi drop. That is very comfortable and since our overall flow rate remains above 1gpm, we know that each of the two GPUs in parallel are still getting more than minimum flow rate. For our high restriction blocks, we are looking at 3psi, which is doable, but the overall flow will fall below 1gpm.
Putting all three cards in parallel requires us to consider other variables. Previously, I mentioned how some blocks are designed/optimized for low flow rates and those blocks perform to a satisfactory level down to about 0.3gpm whereas other blocks (the norm) you generally wouldn't want to fall below 0.5gpm. Having all three cards in parallel causes our flow rate to cut to 1/3rd the overall flow. That means if we have an overall flow rate of 1gpm, each card receives only 0.33gpm. The overall restriction of the cards is all but non-existent (1/3*R or 0.66psi for our high restriction block), but given an average card that requires 0.5gpm flow rate as a minimum, we would need to have an overall loop flow rate of 1.5gpm or better to keep each GPU at or above the minimum flow rate.
Recommended:
Low-Restriction Blocks (0.5-1psi @ 1gpm): 3 Series or 1 Series + 2 Parallel
Med-Restriction Blocks (1-1.5psi @ 1gpm): 1 Series + 2 Parallel
High-Restriction Blocks (1.5psi+ @ 1gpm): 1 Series + 2 Parallel or 3 Parallel (for low-flow optimized)
Four GPUs
While there are several ways that this can be done, I am only going to comment on all series, all parallel, and 2x2 series/parallel. In all series, with the low restrictions blocks, you are looking at a total 3.2psi drop. We've over-stepped our setting-4/B to maintain 1gpm overall, but as long as we use a very low restriction CPU block and radiator and don't go overboard with fittings, we should still be above the minimum flow rate albeit below our overall 1gpm target.
For the 2x2 series/parallel, let me first explain the setup. There are two loop designs that utilize this: one that you connect two sets of two cards in parallel with a single tube, thus each set is in series. The other way is put two sets of two cards that are connected in series into parallel (I'll post an example so it's easier to visualize). In this setup, our low restriction blocks come out with a total 0.8psi drop, medium at total 1.2psi, and high at total 2psi with all four cards having half the overall flow rate. You probably noticed the pattern there, and you'd be correct: having four GPUs in 2x2 parallel/series will yield a total restriction equal to a single GPU. This means that regardless of whether you have low, medium, or high restriction blocks, you'll be able to meet or exceed the minimum flow rate per block as well as a 1gpm+ overall flow rate.
For having all four cards in parallel I'll just say: don't do it. In order to have all four cards in parallel and maintain a minimum flow rate per card, your overall flow rate in your loop would have to be 2gpm+. Unless you are running an Iwaki pump or have multiple pumps, you simply won't be able to have a total loop restriction low enough to allow that high of a flow rate.
Recommended:
Low-Restriction Blocks (0.5-1psi @ 1gpm): 2x2 series/parallel
Med-Restriction Blocks (1-1.5psi @ 1gpm): 2x2 series/parallel
High-Restriction Blocks (1.5psi+ @ 1gpm): 2x2 series/parallel
Post Notes
I wanted to make sure to comment about the non-GPU restrictions left. Anything in your loop causes restriction: fittings, radiators, tubing, blocks, etc. In the descriptions above I only focused on the GPU restriction, but you must remember to also factor in other restrictions when wanting to stay at or above an overall flow rate of 1gpm. Notice my comment about having three low-restriction GPUs in series: at a total 2.4psi drop and a pump that offers 2.8psi, that leaves you with only 0.4psi for the rest of the loop if you want to maintain 1gpm: that is not even enough for most low-restriction CPU blocks.
I mention several times a 'low-flow optimized block', but didn't really describe it. You will have to research reviews about block performance versus flow rate in order to pick out which blocks I am specifically talking about, but as an example, Aquacomputer has a tendency to make blocks that are high-restriction and low-flow optimized.
Customizing
I know well enough that not everyone will run a D5 vario on setting 4 or use a D5 B pump, so in this section, I'll try to quantify how to go about coming up with your own numbers for your own loop designs.
Pumps
I will list a few valid data points about a couple of well known pumps, but you will want to find the PQ curve data on your own pump to get the right numbers.
D5 Set 4/B @ 1gpm = 2.8psi
D5 Set 5 @ 1gpm = 4.6psi
D5 Strong 24v @ 1gpm = 7.1psi
MCP35X 40% @ 1gpm = 3.5psi
MCP35X 50% @ 1gpm = 5.7psi
MCP35X 100% @ 1gpm = 6.4psi
Iwaki RD30 12v @ 1gpm = 4.4psi
Iwaki RD30 18v @ 1gpm = 8.7psi
Iwaki RD30 24v @ 1gpm = 13.7psi
Aquastream XT Ultra @ 1gpm = 3psi
XSPC X20 @ 1gpm = 1.8psi
Restriction
Above, I mentioned that the restriction of a block is not linear to the flow rate; while this is true, it is almost linear. Since it's almost linear, we can use a normal resistance formula to estimate the total restriction of multiple blocks. That is: R1 + R2 = Rt for series and 1/R1 + 1/R2 = 1/Rt for parallel. From the above example, we would have 1/2 + 1/2 = 1/Rt; Rt = 1 for two high restriction blocks. Using this formula, you can even figure out the total restriction of dissimilar blocks, but I would highly recommended you not put dissimilar restriction blocks in parallel.
Flow Layout
There are many different ways you can lay out the flow of water through your components, and each will have advantages and disadvantages. For instance, if you have 3 GPUs, instead of using any of the above 3 layouts, you could add a valve to simulate the restriction of a 4th card and instead use the 2x2 series/parallel. The disadvantage to doing that is that you would need to figure out exactly how far open the valve would need to be to simulate the same restriction.
You should also be mindful when putting GPUs in parallel that you maintain equal pressure. When using a bridge, most people have bottom in and top out or top in and bottom out, but some have top in and top out or bottom in and bottom out. When just talking about two cards, the difference should not be enough to worry about and will only minorly affect the amount of flow through the cards (the one closest to the in and out will have greater flow). When you're talking about 3 or 4 cards in parallel, the issue compounds and I would not suggest having both in and out on the same side (top or bottom). To understand this, we must realize that the reason water flows is because the pressure is higher at one point than another. In a top-in, bottom-out scenario, the top card has the highest input pressure and the bottom card less by an amount depending on the restriction of the card and the pressure at the out port. In the same scenario, the top card has a higher outgoing pressure than the bottom card, so the pressure difference between both cards' ins and outs are the same. In the top/top or bottom/bottom scenario, the first card has either the highest in and lowest out or the lowest in and highest out, so the difference in pressure between the ins and outs of the two cards are not the same. In most cases, people who have 3+ cards also invest in having more pump pressure than is necessary, so the effects are not noticed due to all cards still having enough flow.
In regards to the loop order for GPUs and CPU, the key to keep in mind is that at 1gpm it takes about 250 watts (typical high-end video card) to increase the temperature of water by 1C. That means that as long as your flow rate meets or exceeds 1gpm, the order doesn't matter for one or two cards. If, however, your flow rate is lower or you have more than two GPUs, then I would recommend putting the CPU before the GPUs because the temperature raise of the water will begin to be noticeable, especially if you have overclocked your CPU to near Tmax.
Odds and Ends
Maintenance
Although it is always important to maintain the cleanliness of your loop, it becomes especially important if you use a parallel setup. If you have gunk build up in one of your GPUs, the overall flow rate and therefore CPU cooling may not be affected, but it will cause less water to flow through the gunked up GPU block.
Dissimilar Restriction
Don't put two different water blocks with different PQ curves in parallel unless they are very close; otherwise you will have a great deal more water flowing through one instead of the other.
Crummy Pumps
If your pump is low end to begin with, be very careful about using parallel at all as it will likely cause the flow rate through each video card to reach unacceptably low levels.
Diagrams
*Did I forget anything or get something wrong? Let me know.