Overclock.net banner

1 - 17 of 17 Posts

·
GWAMM
Joined
·
2,015 Posts
Discussion Starter #1
I think we've all heard this one before, the common misbelief that when crossfiring cards of differing performance levels you get the same results as you would with a dual setup of the lower-end card, ie 5770+5750 = 5750+5750.

Far Cry 2, 1920x1200, FSAA 4x + AF 16x:


XBit Labs: Unusual Tandem: Asus EAH5770/2DIS/1GD5 and PowerColor PCS HD5750 1GB GDDR5 Premium Edition

I know that this article is almost 2 months old now, but this isn't the news section, and anyway I couldn't find it posted here on OCN.
 

·
Premium Member
Joined
·
14,038 Posts
Very interesting stuff.

For a long time I was a proponent of the idea of exactly what this article says, that one should basically see performance in between the two cards when you mix/match them like this. I reasoned that with alternate frame rendering, each card would just do its thing as fast as possible, so the average would end up going up overall if one card was faster than the other vs what you'd see w/two of the slower cards.

Then someone here (I forget who) showed benchies that they'd personally done with two matching cards, where they overclocked only one of the two cards, and they were seeing no perf gain from doing so.

Therefore, I (apparently incorrectly) extrapolated from these results that when one card was faster than the other (for whatever reason), the driver would correct for this in order to maintain an even milliseconds per frame between the two cards. It looks from the posted results that what I thought must be happening ... doesn't.

I tell you what though, if the driver doesn't do that ... I'd not be the least bit surprised if the 5750/5770 combo has some microstutter associated with it that you'd not see if the cards were matched. Being that differences in milliseconds per frame is what microstuttering is, and there must be this difference because the two cards in the combination are not both the same speed.

Honestly the biggest thing I take from this is that 5750's in tandem are a very credible tandem for what they cost. They might well beat out the 5770's in terms of bang/buck.
 

·
Registered
Joined
·
2,097 Posts
^ They actually do beat 5770's in bang for the buck. I benched mine against my brother's 5770, and the 5750 won. Not by much, but it beat it in the $/frame category, at all resolutions.

I do have to say, I think there used to be downclocking on crossfire configurations. (well not downclocking, but one would wait for the other) because my 4870 x2 in quadfire with my 4850 x2 barely beat the 4870 x2.
 

·
Premium Member
Joined
·
5,390 Posts
Quote:


Originally Posted by brettjv
View Post

Very interesting stuff.

For a long time I was a proponent of the idea of exactly what this article says, that one should basically see performance in between the two cards when you mix/match them like this. I reasoned that with alternate frame rendering, each card would just do its thing as fast as possible, so the average would end up going up overall if one card was faster than the other vs what you'd see w/two of the slower cards.

Then someone here (I forget who) showed benchies that they'd personally done with two matching cards, where they overclocked only one of the two cards, and they were seeing no perf gain from doing so.

Therefore, I (apparently incorrectly) extrapolated from these results that when one card was faster than the other (for whatever reason), the driver would correct for this in order to maintain an even milliseconds per frame between the two cards. It looks from the posted results that what I thought must be happening ... doesn't.

I tell you what though, if the driver doesn't do that ... I'd not be the least bit surprised if the 5750/5770 combo has some microstutter associated with it that you'd not see if the cards were matched. Being that differences in milliseconds per frame is what microstuttering is, and there must be this difference because the two cards in the combination are not both the same speed.

Honestly the biggest thing I take from this is that 5750's in tandem are a very credible tandem for what they cost. They might well beat out the 5770's in terms of bang/buck.

To be honest with the 5 series I'm thinking that most things that show xfire performance scaling less than expected are CPU bottlenecks, whether partial or whole.
 

·
Premium Member
Joined
·
14,038 Posts
Quote:

Originally Posted by Ihatethedukes View Post
To be honest with the 5 series I'm thinking that most things that show xfire performance scaling less than expected are CPU bottlenecks, whether partial or whole.
It's a valid point in some cases, for sure, but it can be compensated for by making the test harder to run graphically. Using Extreme mode in Vantage rather than Performance mode, for example.

Also, I'm talking about a case where you run, say, 2x5850's three times with three different clocks, say:

A) 725/1000 + 725/1000
B) 725/1000 + 850/1200
C) 850/1200 + 850/1200

Which was more or less how the test case I saw on OCN had been done, where basically tests A and B were almost identical, but test C showed the expected large gain.

My take is that these results imply that there isn't a large CPU bottleneck, otherwise case C wouldn't be any better than cases A and B. A cpu bottleneck might affect how MUCH larger C is vs A and B, but I don't really see how one could account for the results as a whole just from a cpu-bottleneck.

It may well be that two identical cards w/different clocks are treated differently than two totally different cards. As it stands now, that would have to be my assessment. But ... maybe I need to do some of my own tests with different clocks on each card, just to see what happens. And NOT using Crysis


Going back to the results in the review, it's also interesting that in the ONE case of Crysis, the 5770+5750 is really not any better than 2x5750. Obviously, this means it's possible for the application itself to somehow come into play in this whole equation. Which I find very interesting, personally.
 

·
Premium Member
Joined
·
5,037 Posts
Quote:

Originally Posted by brettjv View Post
Then someone here (I forget who) showed benchies that they'd personally done with two matching cards, where they overclocked only one of the two cards, and they were seeing no perf gain from doing so.

Therefore, I (apparently incorrectly) extrapolated from these results that when one card was faster than the other (for whatever reason), the driver would correct for this in order to maintain an even milliseconds per frame between the two cards. It looks from the posted results that what I thought must be happening ... doesn't.
I had understood that they matched the lower of the clock speeds. Given that premise, I would first expect that the overclocked card would be downclocked back to the lower speed. Second, I would expect that, at a given speed, two cards with 1000 shader cores each would perform worse than one card with 1000 cores and one with 1200.

Where it would get weird would be CrossFiring a card with 1000 cores running at 800 MHz and a card with 1200 cores at 600 MHz.
 

·
Premium Member
Joined
·
14,038 Posts
Quote:


Originally Posted by MrDeodorant
View Post

I had understood that they matched the lower of the clock speeds. Given that premise, I would first expect that the overclocked card would be downclocked back to the lower speed. Second, I would expect that, at a given speed, two cards with 1000 shader cores each would perform worse than one card with 1000 cores and one with 1200.

Where it would get weird would be CrossFiring a card with 1000 cores running at 800 MHz and a card with 1200 cores at 600 MHz.

Nothing personal
but I have a couple issues with your theories, and here is why:

1) If you take two 5850's in xfire and clock them to different 3d speeds, and then run a 3d app, the GPU-Z and Afterburner GRAPHS will both show you that the clocks are at what you've set them at, i.e. they do not appear to be the same on both the cards.
2) The results from the review clearly suggest that it's in no way 'necessary' for the function of two cards in crossfire to be operating at the same overall speed. Therefore, logically, there would be no cause for the driver do what you're suggesting, i.e. downclock both cards to the slower of the two ... but only when the cards match.

In summary, if two cards can run 'asynchronously' (e.g. it doesn't matter if one is faster than the other, for whatever reason ... be it architecture, clocks, whatever) in crossfire ... which, again the review above clearly shows is possible ... why bother writing the driver to downclock the faster of two cards ... but only when the cards match? And furthermore, why is this downclocking totally invisible to us, i.e. why does it not show up in the graphs that we *normally*, day in/day out, rely upon to tell us what our actual clocks are?

Edit: I'm going to have to run tests myself to confirm that there's really no difference in perf when you only OC one card of a matching set (as the OCN'ers test showed), because in light of the results in the review, it *really* doesn't make sense to me that this could be true.

But then, come to think of it, said OCN'er may have been using Crysis for those tests. Maybe that was the issue ...
 

·
Registered
Joined
·
4,866 Posts
I've said it before, and I'll say it again. Core clocks do not have to be the same for Crossfire. That's not how it works. I ran different versions of 4890s with different speeds for months, and it ran fine. Fastest card ran at its speed, slowest card ran at its speed. Memory, on the other hand, I do believe has to have the same speeds all across. I never tested it with non-matching speeds though, so I have no solid proof on that.
 

·
Premium Member
Joined
·
4,727 Posts
Correct SgtHop.


Mixed crossfire only matches memory bandwidth, cores and core clocks are independent.

example: 1000 mhz DDR3, 1000 mhz DDR5 & 1200 mhz DDR5

DDR3 = transfer rate of (memory clock rate) Ã- 4 (for bus clock multiplier (256bit bus = 64bit x 4 multiplier)) Ã- 2 (for data rate (DDR3 is dual piped)) Ã- 64 (number of bits transferred) / 8 (number of bits/byte)

DDR3 at 1000 mhz x 4 x 2 x 64 / 8 = 64.0 GB/s bandwidth

DDR5 = transfer rate of (memory clock rate) Ã- 4 (for bus clock multiplier (256bit bus = 64bit x 4 multiplier)) Ã- 4 (for data rate(DDR5 is quad-piped)) Ã- 64 (number of bits transferred) / 8 (number of bits/byte)

DDR5 at 1000 mhz x 4 x 4 x 64 / 8 = 128.0 GB/s bandwidth
DDR5 at 1200 mhz x 4 x 4 x 64 / 8 = 153.6 GB/s bandwidth

While the cores operate independent to each other, both cards must run the same memory bandwidth. The faster memory bandwidth of the two cards will be limited by the bandwidth of the slower one, regardless if you are mixing two (2) DDR3 cards, DDR5 cards, or mixing a DDR3 car with a DDR5 card.
 

·
Registered
Joined
·
2,097 Posts
^ Which explains my 4850 x2 + 4870 x2 issues.

Thanks rico. +rep for you.
 

·
Registered
Joined
·
325 Posts
You will get worse microstutter however as render times are quite different.

They kind of match memory bandwidth in the fact that they MUST have IDENTICAL copies of the data in memory.
 

·
Premium Member
Joined
·
14,038 Posts
Quote:


Originally Posted by rico2001
View Post

Correct SgtHop.


Mixed crossfire only matches memory bandwidth, cores and core clocks are independent.

example: 1000 mhz DDR3, 1000 mhz DDR5 & 1200 mhz DDR5

DDR3 = transfer rate of (memory clock rate) Ã- 4 (for bus clock multiplier (256bit bus = 64bit x 4 multiplier)) Ã- 2 (for data rate (DDR3 is dual piped)) Ã- 64 (number of bits transferred) / 8 (number of bits/byte)

DDR3 at 1000 mhz x 4 x 2 x 64 / 8 = 64.0 GB/s bandwidth

DDR5 = transfer rate of (memory clock rate) Ã- 4 (for bus clock multiplier (256bit bus = 64bit x 4 multiplier)) Ã- 4 (for data rate(DDR5 is quad-piped)) Ã- 64 (number of bits transferred) / 8 (number of bits/byte)

DDR5 at 1000 mhz x 4 x 4 x 64 / 8 = 128.0 GB/s bandwidth
DDR5 at 1200 mhz x 4 x 4 x 64 / 8 = 153.6 GB/s bandwidth

While the cores operate independent to each other, both cards must run the same memory bandwidth. The faster memory bandwidth of the two cards will be limited by the bandwidth of the slower one, regardless if you are mixing two (2) DDR3 cards, DDR5 cards, or mixing a DDR3 car with a DDR5 card.

So assuming then Rico what you postulate is in fact the case, what would be your guess as to how the memory bandwidths are brought into sync with one another on two cards with differing memory bandwidths?

Are you saying the driver does all those calculations as you just did, and then adjusts downwards the memory clock of the faster of the two? And if so, does this adjustment show up in the AB or GPU-Z graphs, or no? And if no, how can one be certain that said bandwidth adjustment is actually occurring? How would one prove or disprove this theory, since we have no other way of measuring memory bandwidth that doesn't involve multiplying a set of numbers by a clock speed, and if we have no reliable means of deriving clock speed ... I think you see where I'm going with this


And also, what's your theory as to why it's important that the memory bandwidth (and only that number) is equal, if in fact it's not important for the two cards overall to be operating at the same speed ... something that the scores in the article clearly show to be the case? IOW, what is so special about memory bandwidth here, why do you pick that particular spec?

I'm not telling you that you're wrong or anything, I really don't know ... but I'm way too inquisitive to just take what someone tells me at face value unless I know how they've derived their conclusions


Lastly, if I was going to guess, I'd bet the driver knows next to nothing about the two cards it's dealing with in terms of specs. All it knows is that the two cards passed their 'Series' check (e.g. both these cards are 5800 series) so it's okay to run them in crossfire. I don't think it does any kind of adjustments whatsoever regarding clock speeds or bandwidth or anything. That's just my guess though
 

·
Premium Member
Joined
·
4,727 Posts
More lite reading; afr, crossfire, tri-fire, quad-fire, microstuttering, etc..


Quote:


Originally Posted by rico2001
View Post

Quote:


Originally Posted by xgeko2
View Post

The problem for quad fire dosnt really have to do with the ati drivers rather than the games and benchmarks Most every game can only do 3 frames at a time in the frame buffer if i recall this info right there are a few techniques to help increase the utilization on the 4th gpu but its not going to ever be full utilized until the software can take advantage of a bigger frame buffer. Thats what i have heard on this issue im not 100% sure on its accuracy though.

Absolutely right xgeko2! rep+


Here is expanded explanation.

http://techreport.com/articles.x/14284/2

Quote:


Originally Posted by Crossfire X explored (techreport.com)

The multi-GPU scaling challenge
AMD claims development on CrossFire X drivers has taken a year, and that the total effort amounts to twice that of its initial dual-GPU CrossFire development effort. In order to understand why that is, I spoke briefly with Dave Gotwalt, a 3D Architect at AMD responsible for CrossFire X driver development. Gotwalt identified several specific challenges that complicated CrossFire X development.

One of the biggest challenges, of course, is avoiding CPU bottlenecks, long the bane of multi-GPU solutions. Gotwalt offered a basic reminder that it's easier to run into CPU limitations with a multi-GPU setup simply because multi-GPU solutions are faster overall. On top of that, he noted, multi-GPU schemes impose some CPU overhead. As a result, removing CPU bottlenecks sometimes helps more with multi-GPU performance than with one GPU.

In this context, I asked about the opportunities for multithreading the driver in order to take advantage of multiple CPU cores. Surprisingly, Gotwalt said that although AMD's DirectX 9 driver is multithreaded, its DX10 driver is notâ€"neither for a single GPU nor for multiples. Gotwalt explained that multithreading the driver isn't possible in DX10 because the driver must make callbacks though the DX10 runtime to the OS kernel, and those calls must be made through the main thread. Microsoft, he said, apparently felt most DX10 applications would be multithreaded, and they didn't want to create another thread. (What we're finding now, however, noted Gotwalt, is that applications aren't as multithreaded as Microsoft had anticipated.)

With that avenue unavailable to them, AMD had to focus on other areas of potential improvement for mitigating CPU bottlenecks. One of the keys Gotwalt identified is having the driver queue up several command buffers and several frames of data, in order to determine ahead of time what needs to be rendered for the next frame.

Even with such provisions in place, Windows Vista puts limitations on video drivers that sometimes prevent CrossFire X from scaling well. The OS, Gotwalt explained, controls the "flip queue" that holds upcoming frames to be displayed, and by default, the driver can only render as far as three frames ahead of the frame being displayed. Under Vista, both DX9 and DX10 allow the application to adjust this value, so that the driver could get as many as ten frames ahead if the application allowed it. The driver itself, however, has no control over this value. (Gotwalt said Microsoft built this limitation into the OS, interestingly enough, because "a certain graphics vendorâ€"not us" was queuing up many more frames than the apps were accounting for, leading to serious mouse lag. Game developers were complaining, so Microsoft built in a limit.)

For CrossFire X, AMD currently relies solely on a method of GPU load balancing known as alternate frame rendering (AFR), in which each GPU is responsible for rendering a whole frame and frames are distributed to GPUs sequentially. Frame 0 will go to GPU 0, frame 1 to GPU 1, frame 2 to GPU 2, and so on. Because of the three-frame limit on rendering ahead, explained Gotwalt, the fourth GPU in a CrossFire X setup will have no effect in some applications. Gotwalt confirmed that AMD is working on combining split-frame rendering with AFR in order to improve scaling in such applications. He even alluded to another possible technique, but he wasn't willing to talk about it just yet. Those methods will have to wait for a future Catalyst release.


Quote:


As a result, AMD has taken over management of renaming in its drivers. Doing so isn't a trivial task, Gotwalt pointed out, because one must avoid over-allocating memory. At present, AMD has a constant buffer renaming mechanism in place in Catalyst 8.3, but it involves some amount of manual tweaking, and new applications could potentially cause problems by exhibiting unexpected behavior. However, Gotwalt said AMD has a new, more robust solution coming soon that won't involve so much tweaking, won't easily be broken by new applications, and will apply to any resource that is renamedâ€"not just constant buffers, but vertex buffers, textures, and the like.

The final issue Gotwalt described may be the thorniest one for multi-GPU rendering: the problem of persistent resources. In some cases, an application may produce a result that remains valid across several succeeding frames. Gotwalt's example of such a resource was a shadow map. The GPU renders this map and then uses it as a reference in rendering the final frame. This sort of resource presents a problem because multiple GPUs in CrossFire X don't share memory. As a result, he said, the driver will have to track when the map was rendered and synchronize its contents between different GPUs. Dependences must be tracked, as well, and the driver may have to replicate both a resource and anything used to create it from one GPU to the next (and the next). This, Gotwalt said, is one reason why profiled AFR ends up being superior to non-profiled AFR: the driver can turn off some of its resource tracking once the application has been profiled.

Gotwalt pointed out that "AFR-friendly" applications will simply re-render the necessary data multiple frames in a row. However, he said, the drivers must then be careful not to sync data unnecessarily when the contents of a texture have been re-rendered but haven't changed.

Curious, I asked Gotwalt whether re-rendering was typically faster than transferring a texture from one GPU to the next. He said yes, in some applications it is, but one must be careful about it. If you're re-rendering too many resources, you're not really sharing the workload, and performance won't scale. In those cases, it's faster to copy the data from GPU to GPU. Gotwalt claimed they'd found this to be the case in DirectX 10 games, whereas DX9 games were generally better off re-rendering.

Gotwalt attributed this difference more to changes in the usage model in newer games than to the API itself. (Think about the recent proliferation of post-processing effects and motion blur.) DX10 games make more passes on the data and render to textures more, creating a "cascading of resources." DX10's ability to render to a buffer via stream out also allows more room for the creation of persistent resources. Obviously, this is a big problem to manage case by case, and Gotwalt admitted as much. He qualified that admission, though, by noting that AMD learns from every game it profiles and tries to incorporate what it learns into its general "compatible AFR" implementation when possible.


 

·
Premium Member
Joined
·
5,390 Posts
Quote:

Originally Posted by rico2001 View Post
More lite reading; afr, crossfire, tri-fire, quad-fire, microstuttering, etc..

That in no way really explains why the bandwidths need to be the same. Perhaps the other card is bottlenecked by the bandwidth of the slower card because of the bandwidth it needs to transfer from GPU to GPU. That makes sense, in games that are profiled to copy from GPU to GPU.

Quote:

Originally Posted by Meaker View Post
You will get worse microstutter however as render times are quite different.

They kind of match memory bandwidth in the fact that they MUST have IDENTICAL copies of the data in memory.
Right now, no one really understands ustutter in the consumer space. I can already tell it is not what it seems to be from my own testing (xfire improved ustutter in unigine for me over a single 5870). I'll leave it at that. (I'm leaning toward a memory

Quote:

Originally Posted by brettjv View Post
It's a valid point in some cases, for sure, but it can be compensated for by making the test harder to run graphically. Using Extreme mode in Vantage rather than Performance mode, for example.

Also, I'm talking about a case where you run, say, 2x5850's three times with three different clocks, say:

A) 725/1000 + 725/1000
B) 725/1000 + 850/1200
C) 850/1200 + 850/1200

Which was more or less how the test case I saw on OCN had been done, where basically tests A and B were almost identical, but test C showed the expected large gain.

My take is that these results imply that there isn't a large CPU bottleneck, otherwise case C wouldn't be any better than cases A and B. A cpu bottleneck might affect how MUCH larger C is vs A and B, but I don't really see how one could account for the results as a whole just from a cpu-bottleneck.

It may well be that two identical cards w/different clocks are treated differently than two totally different cards. As it stands now, that would have to be my assessment. But ... maybe I need to do some of my own tests with different clocks on each card, just to see what happens. And NOT using Crysis


Going back to the results in the review, it's also interesting that in the ONE case of Crysis, the 5770+5750 is really not any better than 2x5750. Obviously, this means it's possible for the application itself to somehow come into play in this whole equation. Which I find very interesting, personally.
That's true. It's an idea but I have an alternative. You could also think of it this way, the driver has to spend CPU cycles deciding on how to load balance between two GPUs. If there is a significant disparity in the performance of two cores it might eat more CPU cycles and therefore make the CPU bottleneck worse. Depending on how difficult the disparity is to manage and the severity of the CPU bottleneck it will eat into the gain, or even worsen the performance relative to equal speed cards.

I have no evidence of this. It's just a thought.
 

·
Premium Member
Joined
·
4,727 Posts
My replies are in red.

Quote:


Originally Posted by brettjv
View Post

So assuming then Rico what you postulate is in fact the case, what would be your guess as to how the memory bandwidths are brought into sync with one another on two cards with differing memory bandwidths?

Are you saying the driver does all those calculations as you just did, and then adjusts downwards the memory clock of the faster of the two? And if so, does this adjustment show up in the AB or GPU-Z graphs, or no? Well... yes. But no about gpu-z, it only shows what clock speed the card(s) are set at. Just like (2) two cars on the same (1) lane road. If the car behind it is faster, it will still travel at the same speed as the car in front of it. As so with the memory bandwidth of the faster clocked card. And if no, how can one be certain that said bandwidth adjustment is actually occurring? How would one prove or disprove this theory, since we have no other way of measuring memory bandwidth that doesn't involve multiplying a set of numbers by a clock speed, and if we have no reliable means of deriving clock speed ... I think you see where I'm going with this
Take (2) two cards, say 4850s, match the core clocks and unmatch the memory clocks and you will get almost identical scores compared to core and memory matched configuration. Unfortunately, I never ran that particular test. Not sure why not, out of all the ones I did run; I think I was pretty confident in my thinking so there was no point. My closest attempts to this issue are below.

And also, what's your theory as to why it's important that the memory bandwidth (and only that number) is equal, if in fact it's not important for the two cards overall to be operating at the same speed ... something that the scores in the article clearly show to be the case? Well I don't know the exact answer to that one. My best guess is that the way crossfire was designed. Think of the memory bandwidth as (2) two connecting pipelines that must match before feeding the pci-e bus during communication. I've always saw the memory as the chalk board the gpus write on; where they pool their rendering before delivering their info. Although, the memories are not physically connected, as someone above said, they are trying to create the same picture. IOW, what is so special about memory bandwidth here, why do you pick that particular spec?

I'm not telling you that you're wrong or anything, I really don't know ... but I'm way too inquisitive to just take what someone tells me at face value unless I know how they've derived their conclusions


You having (2) 5850's can test and prove me right, close, or wrong. Everything I post is just my opinion and yes, from time to time, I'm wrong. Although we often disagree, I respect your opinions. You are one of the most knowledgeable people regarding video cards, I've seen on the forum.


Quote:


Originally Posted by rico2001
View Post

Reducing the bottleneck: Closing the memory bandwidth gap in a mixed crossfire configuration

Before I get started, a little background on mixed crossfire and the memory gap (bottleneck/slowing down) between DDR3 and DDR5.

Quote:


Originally Posted by rico2001
View Post

Yes with a mixed CF of 4870 + 4850, you can think of it as (1) 4850 gpu at 625 mhz + (1) 4870 gpu at 750 mhz, both working together under 1 Ghz of DDR3 running at 993 mhz. Pairing the two isn't a waste if you have a 4850 and come across a 4870 for cheap but yes it make more sense to just get a 4850 to CF with.

The memory issue with pairing a 4850 with a 4870 is a bandwidth issue. The memory of the 4850 is dual piped at 993 mhz = 63.6 GB/s bandwidth and the memory of the 4870 is quad pipe at 900 mhz = 115.2 GB/s bandwidth. And there is the bottleneck. You can never overclock the memory of the 4850 enough to match the stock bandwidth of the 4870. You would have to get the DDR3 of the 4850 up to roughly around 1700 mhz.



Today I'm going to work with my 4850 X2 and my 4870 in tri-fire and compare it with my 4870 tri-fire scores. Attempting to close the gap in memory bandwidth, I'm going to overclock the DDR3 memory of the 4850 X2 as much as possible to bring up the memory bandwidth. This will reduce the amount the 4870's DDR5 will have to slow down to match the 4850 X2.



Overclock and bandwidth stats:
4850 X2 - 625 mhz to 712 mhz gpu, memory 993 mhz DDR3 to 1220 mhz = total data rate of 78.1 GB/s (up from 63.6 GB/s)(22% increase, still far from the 1700mhz needed to match the speed of the 4870's 900 mhz DDR5)

Testbed:
E8300 @ 3.6G & 4.0G
ASUS P5Q-PRO
6Gb DDR2
ATI Catalyst 9.6
SAPPHIRE Radeon 4850 X2 2GB + Radeon 4870 1GB (tri-fire)
SAPPHIRE Radeon 4850 X2 2GB (oc'ed to 715mhz gpu/1220 mhz mem) + Radeon 4870 1GB (tri-fire)
SAPPHIRE Radeon 4870 X2 2GB + Radeon 4870 1GB (tri-fire)

Testing:
Crysis
Enemy Territory: QUAKE Wars
Furmark
Sanctuary

Crysis:
1600x1200, DX10
4850 X2 + 4870 = 41.2 average fps
4850 X2 OC + 4870 = 43.9 average fps
4870 X3 = 45.8 average fps



Enemy Territory: QUAKE Wars
1600 x 1200, DX10, highest settings, 16xAF, 4xAA
4850 X2 + 4870 = 140 average fps
4850 X2 OC + 4870 = 141 average fps
4870 X3 = 146 average fps



Furmark:
1680x1050, Open GL
4850 X2 + 4870 = 209 average fps
4850 X2 OC + 4870 = 248 average fps
4870 X3 = 255 average fps



Sanctuary:
1680x1050
4850 X2 + 4870 = 163 average fps
4850 X2 OC + 4870 = 188 average fps
4870 X3 = 196 average fps



Well to sum up this report, although it is impossible for the DDR3 to be overclocked to match the speed of the 4870's DDr5, gains were decent. Still needing another 400 mhz on the 4850's memory, I fall short of closing the gap completely. The overclocked mixed configuration did put a dent in the bandwidth gap, allowing more data through the pipes.



@Ihatethedukes

True, sometimes it's not about memory bandwidth. Some 3D applications rely more on memory bandwidth and some lean more on gpu core speed, hence mixed crossfire with unmatched mem clocks fair very well and some do not.
 

·
Premium Member
Joined
·
5,390 Posts
I just ran a series of benchmarks in farcry2 where I took my stock clocked 5870 and 5970 and ramped up the 5970's memory speeds.

5870 = always 875/1300

5970 1 = 750/1090 = 140.23FPS
5970 2 = 750/1090

5970 1 = 750/1200 = 139.23FPS
5970 2 = 750/1090

5970 1 = 750/1250 = 138.9FPS
5970 2 = 750/1090

5970 1 = 750/1250 = 143.97FPS
5970 2 = 750/1250
 
1 - 17 of 17 Posts
Top