(Benchmarks) Performance advantages of extreme timing and motherboard setting tweaks
I know a lot of us when we attempt to overclock our memory tend to focus on two things above all else; the speed in megahertz and the CAS timing. After that everything falls into a land of obscurity. I spent literally hours tweaking my memory and various motherboard settings to see how much more you can get out of your memory, after you find its top speed. Here where my results.
I used rightmarks multi-threaded memory test for my results. My sig rig was used, and the memory was always clocked at 1014mhz with a CAS 5 timing for all tests. I first tested using default timings as set by my motherboard, then I tested using the lowest possible timings I could stability acheive. I then tested with optimized AI clock twister and performance level settings (that stuff in your bios nobody is all sure what does...). Finally I mixed a comination of the two to show the maximum potential after the final stable speed has been reached. The following numbers are the average bandwidth for each test. Each test configuration was stable through at least one run of memtest.
small read/write=two threads 1024kB in size
large read/write= two threads 748576kB in size
Unoptimized timings and settings (2N 5-5-5-15-3-50-6-3-8-3-5-4-6-4-6-14-5-1-6-6):
small write 30948mB/s
small read 51701mB/s
large write 3053mB/s
large read 9017mB/s
Optimized Timings:
(2N 5-5-3-9-1-45-4-1-3-2-4-4-5-4-4-12-1-1-1-1)
Small write 30854mB/s
small read 51475mB/s
large write 3178mB/s
large read 9225mB/s
Performance increase:
small write -94mB/s .003
small read -226mB/s .004
large write +125mB/s .04
large read +208mB/s .02
----------
Optimized Settings:
(Performance level set to 7, static DRAM controller enabled, AI clock twister strong. Standard Timings.)
small write 30983mB/s
small read 51668mB/s
large write 3281mB/s
large read 9538mB/s
Performanceincrease over stock:
small write +35mB/s .001
small read -33mB/s .0006
large write +228mB/s 0.07
lare read +521mB/s 0.05
Optimized timings and settings:
(2N 5-5-3-9-1-45-4-1-3-2-4-4-5-4-4-12-1-1-1-1)
(AI clock twister moderate, performance level 7, Static DRAM controller enabled)
small write 30987
small read 51720
large write 3380
large read 9712
OptimizedTimings and settings performance increase:
small write +39mB/s 0.001
small read +19mB/s insignificant
large write +327mB/s ~0.11 (0.107)
large read +695mB/s ~ 0.8 (0.077)
So after find max speed and CAS timing for my memory kit I was able to increase write performance by almost 11% and read performance by almost 8% just by tweaking minor settings. I imagine the same kind of thing can be done with most motherboards.
---------------------------------------------
Well I got a new system in my Phenom BE/Foxconn A79A-S 790GX setup. So I decided to test this system similarly to the way I tested my x48 chipset and E8500 wolfdale combo above. Unfortunately rightmarks mutli-threading benchmark was being very finicky, so doing an apples to apples comparison was entirely impossible. I decided to pony up the money and sisoft sandra professional edition, so I could do so much more proper benchmarks. Because these tests are all about fine details, I listed the information rather then the graphs, so a more in depth comparison can be made. Not as visually exciting but a tad more scientific .
Ok, here we go. The test system is my current sig rig, but just in case that includes 2x2gb G skill PI black memory, clocked at 800mhz in every test (because my motherboard gives no way of adjusting memory clock speeds, and the kit cannot hit 1066mhz stably). The system is prime95 tested at 3.5ghz, and the clocked speed was not adjusted for any test.
Ok, the results. This time around I decided to take a different approach, and just show the before and after. The first test features my system with stock memory and CPU settings (the ones that pertained to bus frequencies and what not). The only thing I was use "optimal memory settings" which simply reads the memories SPD and applies the timings. Anyway, here are the results:
Unoptimized Timings or settings, Optimal Performance Enabled Bandwidth Benchmark
Optimized Timings and settings Bandwidth Benchmark
To achieve these results I had to disable optimal performance because it sets timings by SPD, which is restrictive for the best performance. I set the HT link to x13, setting the bus speed to 2600mhz. I then set the DCHT Bit Width to 16bit instead of auto. In my memory settings I enabled CLK3 and clocks to all dims, as well as auto tuning. All of these setting were found to increase performance and/or latency. The timings I used are listed in the Sandra Benchmark.
Resulted increase in bandwidth was 8% over stock settings with SPD timings. Also netted a 7% increase bandwidth efficiency.
Unoptimized Settings and Timings (SPD, CAS 4) Latency Benchmark
Optimized Timings and Settings Latency Benchmark
Resulted in a drop of latecny from 87ns to 78ns, or a increase in latency performance of 10%.
Unoptimized Timings and Settings Cache and Memory Benchmarks
Optimized Timings and Settings Cache and Memory Benchmarks
Resulted in Cache and Memory Performance gain of nearly 5%
Analysis
The benchmarks clearly show their is noticable and signficant increase in performance that can be had by adjusting and fine tuning timings and settings. I found with my testing that on the AMD chipset the memory timings themselves had very little effect on the system and its performance. Memory latency was highly dependent on timings however. The greatest benefits came in fine tuning those confusing settings that often nobody knows what really does.
It is interesting to note that optimizing had a huge effect on L3 cache performance in particular. In the latency benchmark optimizing resulted in a decrease of internal data cache from 81 clocks to 63 clocks, or an increase in performance of 22%. In the cache and memory performance benchmark L3 onboard cache information increased from 30.13GB/s to 42.44GB/s, or increase in performance of 40%!
I went on to test the CPU's performance using arithmetic test, muti-media, and Multi-core efficiency. The arithmetic test resulted in very small drop in performance, which can easily be explained as a margin of error common in benchmarks. Anyway, result is no gains, and you can expect optimizing to have no effect on folding or number crunching performance. The multi-media benchmark showed small, but consistent gains. Nothing getting higher then 1 or 2% though. The multi-core efficiency test though benefited drastically.
Unoptimized CPU Multi-Core Efficiency Benchmark
Optimized CPU Multi-Core Efficiency Benchmark
Resulted in an intercore bandwidth increase of 38%! With a drop of intercore latency from 89ns to 73ns, or an increase in performance of nearly 18%!
Can you make it clear which settings are which? On the first set of timings you start with the command rate of 2N, but on the others you leave them off. You might consider putting this into excel to make it clearer. Also, It would clear up which advanced timings were tweaked and how it effects performance increases.
Now that I've said that, +rep. I may have to look into changing some of the settings that pretty much no one touches.
__________________
I edit my posts alot. So please reread my posts if I've edited them; It's for your benefit not mine.
My school is making me go to this Catholic retreat thing for 'self reflection', and I wanted to mentally prepare myself before I went by pumping some lead into some god damn zombie clowns
Your results will only be usable for comparison to systems with your same chipset. Since the MCH is totally different on each chipset, all of the characteristics you are seeing, with your results, as well as overall latency and RAM speed, are independent to that chipset..only.
Results will be different, for better or for worse comapred to other chipsets/FSB's/MCH straps [even CPU multi matters for bandwidth tests on RAM, as lowering the stock multi, raises NB speed over the FSB].
I just used what I had. Not meant to be universally applicable, just meant to show that there is a lot more room for improvement in the minor settings with some motherboards.
I am bumping this thread because I have tested my AMD platform and added it to the OP. The benchmark results are put in spoiler to save space, and can be seen by clicking on "show the hidden text". All replies after this post are new, the first replies refer only to my benchmarking of my x48 chipset.