Overclock.net banner
1 - 6 of 6 Posts

·
Premium Member
Joined
·
6,045 Posts
Discussion Starter · #1 ·
I know a lot of us when we attempt to overclock our memory tend to focus on two things above all else; the speed in megahertz and the CAS timing. After that everything falls into a land of obscurity. I spent literally hours tweaking my memory and various motherboard settings to see how much more you can get out of your memory, after you find its top speed. Here where my results.

I used rightmarks multi-threaded memory test for my results. My sig rig was used, and the memory was always clocked at 1014mhz with a CAS 5 timing for all tests. I first tested using default timings as set by my motherboard, then I tested using the lowest possible timings I could stability acheive. I then tested with optimized AI clock twister and performance level settings (that stuff in your bios nobody is all sure what does...). Finally I mixed a comination of the two to show the maximum potential after the final stable speed has been reached. The following numbers are the average bandwidth for each test. Each test configuration was stable through at least one run of memtest.

small read/write=two threads 1024kB in size
large read/write= two threads 748576kB in size

Unoptimized timings and settings (2N 5-5-5-15-3-50-6-3-8-3-5-4-6-4-6-14-5-1-6-6):
small write 30948mB/s
small read 51701mB/s
large write 3053mB/s
large read 9017mB/s

Optimized Timings:
(2N 5-5-3-9-1-45-4-1-3-2-4-4-5-4-4-12-1-1-1-1)
Small write 30854mB/s
small read 51475mB/s
large write 3178mB/s
large read 9225mB/s

Performance increase:
small write -94mB/s .003
small read -226mB/s .004
large write +125mB/s .04
large read +208mB/s .02
----------
Optimized Settings:
(Performance level set to 7, static DRAM controller enabled, AI clock twister strong. Standard Timings.)
small write 30983mB/s
small read 51668mB/s
large write 3281mB/s
large read 9538mB/s

Performanceincrease over stock:
small write +35mB/s .001
small read -33mB/s .0006
large write +228mB/s 0.07
lare read +521mB/s 0.05

Optimized timings and settings:
(2N 5-5-3-9-1-45-4-1-3-2-4-4-5-4-4-12-1-1-1-1)
(AI clock twister moderate, performance level 7, Static DRAM controller enabled)
small write 30987
small read 51720
large write 3380
large read 9712

OptimizedTimings and settings performance increase:
small write +39mB/s 0.001
small read +19mB/s insignificant
large write +327mB/s ~0.11 (0.107)
large read +695mB/s ~ 0.8 (0.077)

So after find max speed and CAS timing for my memory kit I was able to increase write performance by almost 11% and read performance by almost 8% just by tweaking minor settings. I imagine the same kind of thing can be done with most motherboards.
---------------------------------------------

Well I got a new system in my Phenom BE/Foxconn A79A-S 790GX setup. So I decided to test this system similarly to the way I tested my x48 chipset and E8500 wolfdale combo above. Unfortunately rightmarks mutli-threading benchmark was being very finicky, so doing an apples to apples comparison was entirely impossible. I decided to pony up the money and sisoft sandra professional edition, so I could do so much more proper benchmarks. Because these tests are all about fine details, I listed the information rather then the graphs, so a more in depth comparison can be made. Not as visually exciting but a tad more scientific
.

Ok, here we go. The test system is my current sig rig, but just in case that includes 2x2gb G skill PI black memory, clocked at 800mhz in every test (because my motherboard gives no way of adjusting memory clock speeds, and the kit cannot hit 1066mhz stably). The system is prime95 tested at 3.5ghz, and the clocked speed was not adjusted for any test.

Ok, the results. This time around I decided to take a different approach, and just show the before and after. The first test features my system with stock memory and CPU settings (the ones that pertained to bus frequencies and what not). The only thing I was use "optimal memory settings" which simply reads the memories SPD and applies the timings. Anyway, here are the results:

Unoptimized Timings or settings, Optimal Performance Enabled Bandwidth Benchmark

Quote:
SiSoftware Sandra

Benchmark Results
Int Buff'd iSSE2 Memory Bandwidth : 10.12GB/s
Float Buff'd iSSE2 Memory Bandwidth : 10.11GB/s
Results Interpretation : Higher index values are better.

Performance vs. Speed
Int Buff'd iSSE2 Memory Bandwidth : 12.95MB/s/MHz
Float Buff'd iSSE2 Memory Bandwidth : 12.94MB/s/MHz
Results Interpretation : Higher index values are better.

Performance vs. Power
Chipset(s)/Memory Power : 53.02W
Int Buff'd iSSE2 Memory Bandwidth : 195.36MB/s/W
Float Buff'd iSSE2 Memory Bandwidth : 195.28MB/s/W
Results Interpretation : Higher index values are better.

Capacity vs Power
Memory Capacity : 77MB/W
Results Interpretation : Higher index values are better.

Int Buff'd iSSE2 Memory Bandwidth
Assignment : 10.02GB/s
Scaling : 9.96GB/s
Addition : 10.22GB/s
Triad : 10.26GB/s
Data Item Size : 16bytes
Buffering Used : Yes
Offset Displacement Used : Yes
Bandwidth Efficiency : 80.92%

Float Buff'd iSSE2 Memory Bandwidth
Assignment : 10.01GB/s
Scaling : 9.96GB/s
Addition : 10.21GB/s
Triad : 10.27GB/s
Data Item Size : 16bytes
Buffering Used : Yes
Offset Displacement Used : Yes
Bandwidth Efficiency : 80.89%

Performance Test Status
Run ID : AMD (ATI) RD790 GFX Dual Slot; 2x 2GB G.Skill DIMM DDR2 (800MHz) PC2-6400 (4-4-4-11 3-26-6-3)
Platform Compliance : x64
Total Memory : 4GB
Memory Used by Test : 2.00GB
NUMA Support : No
SMP (Multi-Processor) Benchmark : No
Total Test Threads : 4
Multi-Core Test : Yes
SMT (Multi-Threaded) Benchmark : No
Processor Affinity : P0C0T0 P0C1T0 P0C2T0 P0C3T0
System Timer : 14.32MHz
Page Size : 4kB
Use Large Memory Pages : No

Features
SSE Technology : Yes
SSE2 Technology : Yes
SSE3 Technology : Yes
Supplemental SSE3 Technology : No
SSE4.1 Technology : No
SSE4.2 Technology : No
AVX - Advanced Vector eXtensions : No
FMA - Fused Multiply Add eXtensions : No
SSE4A Technology : Yes
SSE5 Technology : No
HTT - Hyper-Threading Technology : No

Chipset
Model : AMD (ATI) RD790 GFX Dual Slot
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Chipset
Model : AMD (Family 10h) Athlon64/Opteron/Sempron HyperTransport Technology Configuration
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Logical/Chipset Memory Banks
Bank 2 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 3 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 10 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 11 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Channels : 2
Bank Interleave : 2-way
Memory Bus Speed : 2x 400MHz (800MHz)
Multiplier : 2x
Width : 64-bit
Memory Controller in Processor : Yes
Cores per Memory Controller : 4 Unit(s)
Maximum Memory Bus Bandwidth : 12.5GB/s

Memory Module(s)
Memory Module : G.Skill F2-7200CL4-2GBPI-B 2GB DIMM DDR2 PC2-6400U DDR2-800 (5.0-5-5-15 3-23-6-3)
Memory Module : G.Skill F2-7200CL4-2GBPI-B 2GB DIMM DDR2 PC2-6400U DDR2-800 (5.0-5-5-15 3-23-6-3)


Optimized Timings and settings Bandwidth Benchmark


To achieve these results I had to disable optimal performance because it sets timings by SPD, which is restrictive for the best performance. I set the HT link to x13, setting the bus speed to 2600mhz. I then set the DCHT Bit Width to 16bit instead of auto. In my memory settings I enabled CLK3 and clocks to all dims, as well as auto tuning. All of these setting were found to increase performance and/or latency. The timings I used are listed in the Sandra Benchmark.

Quote:
SiSoftware Sandra

Benchmark Results
Int Buff'd iSSE2 Memory Bandwidth : 10.95GB/s
Float Buff'd iSSE2 Memory Bandwidth : 10.97GB/s
Results Interpretation : Higher index values are better.

Performance vs. Speed
Int Buff'd iSSE2 Memory Bandwidth : 14.01MB/s/MHz
Float Buff'd iSSE2 Memory Bandwidth : 14.04MB/s/MHz
Results Interpretation : Higher index values are better.

Performance vs. Power
Chipset(s)/Memory Power : 53.02W
Int Buff'd iSSE2 Memory Bandwidth : 211.39MB/s/W
Float Buff'd iSSE2 Memory Bandwidth : 211.86MB/s/W
Results Interpretation : Higher index values are better.

Capacity vs Power
Memory Capacity : 77MB/W
Results Interpretation : Higher index values are better.

Int Buff'd iSSE2 Memory Bandwidth
Assignment : 10.90GB/s
Scaling : 10.97GB/s
Addition : 10.94GB/s
Triad : 10.97GB/s
Data Item Size : 16bytes
Buffering Used : Yes
Offset Displacement Used : Yes
Bandwidth Efficiency : 87.56%

Float Buff'd iSSE2 Memory Bandwidth
Assignment : 10.93GB/s
Scaling : 10.98GB/s
Addition : 10.98GB/s
Triad : 10.99GB/s
Data Item Size : 16bytes
Buffering Used : Yes
Offset Displacement Used : Yes
Bandwidth Efficiency : 87.76%

Performance Test Status
Run ID : AMD (ATI) RD790 GFX Dual Slot; 2x 2GB G.Skill DIMM DDR2 (800MHz) PC2-6400 (4-4-3-5 2-11-3-3)
Platform Compliance : x64
Total Memory : 4GB
Memory Used by Test : 2.00GB
NUMA Support : No
SMP (Multi-Processor) Benchmark : No
Total Test Threads : 4
Multi-Core Test : Yes
SMT (Multi-Threaded) Benchmark : No
Processor Affinity : P0C0T0 P0C1T0 P0C2T0 P0C3T0
System Timer : 14.32MHz
Page Size : 4kB
Use Large Memory Pages : No

Features
SSE Technology : Yes
SSE2 Technology : Yes
SSE3 Technology : Yes
Supplemental SSE3 Technology : No
SSE4.1 Technology : No
SSE4.2 Technology : No
AVX - Advanced Vector eXtensions : No
FMA - Fused Multiply Add eXtensions : No
SSE4A Technology : Yes
SSE5 Technology : No
HTT - Hyper-Threading Technology : No

Chipset
Model : AMD (ATI) RD790 GFX Dual Slot
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Chipset
Model : AMD (Family 10h) Athlon64/Opteron/Sempron HyperTransport Technology Configuration
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Logical/Chipset Memory Banks
Bank 2 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 3 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 10 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 11 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Channels : 2
Bank Interleave : 2-way
Memory Bus Speed : 2x 400MHz (800MHz)
Multiplier : 2x
Width : 64-bit
Memory Controller in Processor : Yes
Cores per Memory Controller : 4 Unit(s)
Maximum Memory Bus Bandwidth : 12.5GB/s

Resulted increase in bandwidth was 8% over stock settings with SPD timings. Also netted a 7% increase bandwidth efficiency.

Unoptimized Settings and Timings (SPD, CAS 4) Latency Benchmark

Quote:
SiSoftware Sandra

Benchmark Results
Memory (Random Access) Latency : 87ns
Speed Factor : 99.90
Results Interpretation : Lower index values are better.

Cache Information
Internal Data Cache : 3clocks
L2 On-board Cache : 16clocks
L3 On-board Cache : 81clocks
Results Interpretation : Lower index values are better.

Performance vs. Speed
Memory (Random Access) Latency : 0.11ns/MHz
Results Interpretation : Lower index values are better.

Performance vs. Power
Chipset(s)/Memory Power : 53.02W
Memory (Random Access) Latency : 1.64ns/W
Results Interpretation : Lower index values are better.

Detailed Benchmark Results
1kB Range : 3clocks / 1ns
4kB Range : 3clocks / 1ns
16kB Range : 3clocks / 1ns
64kB Range : 3clocks / 1ns
256kB Range : 16clocks / 5ns
1MB Range : 67clocks / 19ns
4MB Range : 96clocks / 27ns
16MB Range : 284clocks / 81ns
64MB Range : 304clocks / 87ns

Performance Test Status
Run ID : AMD (ATI) RD790 GFX Dual Slot; 2x 2GB G.Skill DIMM DDR2 (800MHz) PC2-6400 (4-4-4-11 3-26-6-3)
Platform Compliance : x64
System Timer : 14.32MHz
Memory Access : Random

Processor
Model : AMD Phenom(tm) II X4 940 Processor
Speed : 3.5GHz
Cores per Processor : 4 Unit(s)
Type : Quad-Core
Internal Data Cache : 4x 64kB, Synchronous, Write-Back, 2-way, Exclusive, 64 byte line size
L2 On-board Cache : 4x 512kB, ECC, Synchronous, Write-Back, 16-way, Exclusive, 64 byte line size
L3 On-board Cache : 6MB, ECC, Synchronous, Write-Back, 48-way, 64 byte line size, 4 threads sharing

Chipset
Model : AMD (ATI) RD790 GFX Dual Slot
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Chipset
Model : AMD (Family 10h) Athlon64/Opteron/Sempron HyperTransport Technology Configuration
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Logical/Chipset Memory Banks
Bank 2 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 3 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 10 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 11 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Channels : 2
Bank Interleave : 2-way
Memory Bus Speed : 2x 400MHz (800MHz)
Multiplier : 2x
Width : 64-bit
Memory Controller in Processor : Yes
Cores per Memory Controller : 4 Unit(s)
Maximum Memory Bus Bandwidth : 12.5GB/s

Optimized Timings and Settings Latency Benchmark

Quote:
SiSoftware Sandra

Benchmark Results
Memory (Random Access) Latency : 78ns
Speed Factor : 89.40
Results Interpretation : Lower index values are better.

Cache Information
Internal Data Cache : 3clocks
L2 On-board Cache : 16clocks
L3 On-board Cache : 63clocks
Results Interpretation : Lower index values are better.

Performance vs. Speed
Memory (Random Access) Latency : 0.10ns/MHz
Results Interpretation : Lower index values are better.

Performance vs. Power
Chipset(s)/Memory Power : 53.02W
Memory (Random Access) Latency : 1.47ns/W
Results Interpretation : Lower index values are better.

Detailed Benchmark Results
1kB Range : 3clocks / 1ns
4kB Range : 3clocks / 1ns
16kB Range : 3clocks / 1ns
64kB Range : 3clocks / 1ns
256kB Range : 16clocks / 4ns
1MB Range : 51clocks / 15ns
4MB Range : 74clocks / 21ns
16MB Range : 253clocks / 72ns
64MB Range : 271clocks / 78ns

Performance Test Status
Run ID : AMD (ATI) RD790 GFX Dual Slot; 2x 2GB G.Skill DIMM DDR2 (800MHz) PC2-6400 (4-4-3-5 2-11-3-3)
Platform Compliance : x64
System Timer : 14.32MHz
Memory Access : Random

Processor
Model : AMD Phenom(tm) II X4 940 Processor
Speed : 3.5GHz
Cores per Processor : 4 Unit(s)
Type : Quad-Core
Internal Data Cache : 4x 64kB, Synchronous, Write-Back, 2-way, Exclusive, 64 byte line size
L2 On-board Cache : 4x 512kB, ECC, Synchronous, Write-Back, 16-way, Exclusive, 64 byte line size
L3 On-board Cache : 6MB, ECC, Synchronous, Write-Back, 48-way, 64 byte line size, 4 threads sharing

Chipset
Model : AMD (ATI) RD790 GFX Dual Slot
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Chipset
Model : AMD (Family 10h) Athlon64/Opteron/Sempron HyperTransport Technology Configuration
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Logical/Chipset Memory Banks
Bank 2 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 3 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 10 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 11 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Channels : 2
Bank Interleave : 2-way
Memory Bus Speed : 2x 400MHz (800MHz)
Multiplier : 2x
Width : 64-bit
Memory Controller in Processor : Yes
Cores per Memory Controller : 4 Unit(s)
Maximum Memory Bus Bandwidth : 12.5GB/s

Resulted in a drop of latecny from 87ns to 78ns, or a increase in latency performance of 10%.

Unoptimized Timings and Settings Cache and Memory Benchmarks

Quote:
SiSoftware Sandra

Benchmark Results
Cache/Memory Bandwidth : 44.91GB/s
Results Interpretation : Higher index values are better.
Speed Factor : 35.50
Results Interpretation : Lower index values are better.

Cache Information
Internal Data Cache : 207.54GB/s
L2 On-board Cache : 102.52GB/s
L3 On-board Cache : 30.13GB/s
Results Interpretation : Higher index values are better.

Performance vs. Speed
Cache/Memory Bandwidth : 13.14MB/s/MHz
Results Interpretation : Higher index values are better.

Performance vs. Power
Processor(s)/Chipset(s)/Memory Power : 282.13W
Cache/Memory Bandwidth : 163.01MB/s/W
Results Interpretation : Higher index values are better.

Float SSE2 Cache/Memory Results Breakdown
Data Item Size : 16bytes
Buffering Used : No
Offset Displacement Used : Yes

Detailed Benchmark Results
2kB Blocks : 171.15GB/s
4kB Blocks : 214.66GB/s
8kB Blocks : 218.83GB/s
16kB Blocks : 222.08GB/s
32kB Blocks : 236.25GB/s
64kB Blocks : 200.92GB/s
128kB Blocks : 188.91GB/s
256kB Blocks : 169.96GB/s
512kB Blocks : 111.05GB/s
1MB Blocks : 94.00GB/s
4MB Blocks : 30.13GB/s
16MB Blocks : 7.42GB/s
64MB Blocks : 6.67GB/s
256MB Blocks : 6.67GB/s
1GB Blocks : 6.65GB/s

Performance Test Status
Run ID : AMD Phenom(tm) II X4 940 Processor (4C, 3.5GHz, 1.8GHz MC, 4x 512kB L2, 6MB L3); AMD (ATI) RD790 GFX Dual Slot; 2x 2GB G.Skill DIMM DDR2 (800MHz) PC2-6400 (4-4-4-11 3-26-6-3)
Platform Compliance : x64
Total Memory : 4GB
NUMA Support : No
SMP (Multi-Processor) Benchmark : No
Total Test Threads : 4
Multi-Core Test : Yes
SMT (Multi-Threaded) Benchmark : No
Processor Affinity : P0C0T0 P0C1T0 P0C2T0 P0C3T0
System Timer : 14.32MHz
Page Size : 4kB
Use Large Memory Pages : No

Processor
Model : AMD Phenom(tm) II X4 940 Processor
Speed : 3.5GHz
Cores per Processor : 4 Unit(s)
Type : Quad-Core
Internal Data Cache : 4x 64kB, Synchronous, Write-Back, 2-way, Exclusive, 64 byte line size
L2 On-board Cache : 4x 512kB, ECC, Synchronous, Write-Back, 16-way, Exclusive, 64 byte line size
L3 On-board Cache : 6MB, ECC, Synchronous, Write-Back, 48-way, 64 byte line size, 4 threads sharing

Features
SSE Technology : Yes
SSE2 Technology : Yes
SSE3 Technology : Yes
Supplemental SSE3 Technology : No
SSE4.1 Technology : No
SSE4.2 Technology : No
AVX - Advanced Vector eXtensions : No
FMA - Fused Multiply Add eXtensions : No
SSE4A Technology : Yes
SSE5 Technology : No
HTT - Hyper-Threading Technology : No

Chipset
Model : AMD (ATI) RD790 GFX Dual Slot
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Chipset
Model : AMD (Family 10h) Athlon64/Opteron/Sempron HyperTransport Technology Configuration
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Logical/Chipset Memory Banks
Bank 2 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 3 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 10 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Bank 11 : 1GB DIMM DDR2 4-4-4-11 3-26-6-3 1T
Channels : 2
Bank Interleave : 2-way
Memory Bus Speed : 2x 400MHz (800MHz)
Multiplier : 2x
Width : 64-bit
Memory Controller in Processor : Yes
Cores per Memory Controller : 4 Unit(s)
Maximum Memory Bus Bandwidth : 12.5GB/s

Optimized Timings and Settings Cache and Memory Benchmarks

Quote:
SiSoftware Sandra

Benchmark Results
Cache/Memory Bandwidth : 47.10GB/s
Results Interpretation : Higher index values are better.
Speed Factor : 33.50
Results Interpretation : Lower index values are better.

Cache Information
Internal Data Cache : 208.79GB/s
L2 On-board Cache : 102.42GB/s
L3 On-board Cache : 42.44GB/s
1.408
Results Interpretation : Higher index values are better.

Performance vs. Speed
Cache/Memory Bandwidth : 13.78MB/s/MHz
Results Interpretation : Higher index values are better.

Performance vs. Power
Processor(s)/Chipset(s)/Memory Power : 282.13W
Cache/Memory Bandwidth : 170.96MB/s/W
Results Interpretation : Higher index values are better.

Float SSE2 Cache/Memory Results Breakdown
Data Item Size : 16bytes
Buffering Used : No
Offset Displacement Used : Yes

Detailed Benchmark Results
2kB Blocks : 171.11GB/s
4kB Blocks : 215.89GB/s
8kB Blocks : 220.12GB/s
16kB Blocks : 223.44GB/s
32kB Blocks : 237.81GB/s
64kB Blocks : 202.52GB/s
128kB Blocks : 190.62GB/s
256kB Blocks : 170.37GB/s
512kB Blocks : 112.85GB/s
1MB Blocks : 91.99GB/s
4MB Blocks : 42.44GB/s
16MB Blocks : 8.11GB/s
64MB Blocks : 7.10GB/s
256MB Blocks : 7.10GB/s
1GB Blocks : 7.11GB/s

Performance Test Status
Run ID : AMD Phenom(tm) II X4 940 Processor (4C, 3.5GHz, 2.60GHz MC, 4x 512kB L2, 6MB L3); AMD (ATI) RD790 GFX Dual Slot; 2x 2GB G.Skill DIMM DDR2 (800MHz) PC2-6400 (4-4-3-5 2-11-3-3)
Platform Compliance : x64
Total Memory : 4GB
NUMA Support : No
SMP (Multi-Processor) Benchmark : No
Total Test Threads : 4
Multi-Core Test : Yes
SMT (Multi-Threaded) Benchmark : No
Processor Affinity : P0C0T0 P0C1T0 P0C2T0 P0C3T0
System Timer : 14.32MHz
Page Size : 4kB
Use Large Memory Pages : No

Processor
Model : AMD Phenom(tm) II X4 940 Processor
Speed : 3.5GHz
Cores per Processor : 4 Unit(s)
Type : Quad-Core
Internal Data Cache : 4x 64kB, Synchronous, Write-Back, 2-way, Exclusive, 64 byte line size
L2 On-board Cache : 4x 512kB, ECC, Synchronous, Write-Back, 16-way, Exclusive, 64 byte line size
L3 On-board Cache : 6MB, ECC, Synchronous, Write-Back, 48-way, 64 byte line size, 4 threads sharing

Features
SSE Technology : Yes
SSE2 Technology : Yes
SSE3 Technology : Yes
Supplemental SSE3 Technology : No
SSE4.1 Technology : No
SSE4.2 Technology : No
AVX - Advanced Vector eXtensions : No
FMA - Fused Multiply Add eXtensions : No
SSE4A Technology : Yes
SSE5 Technology : No
HTT - Hyper-Threading Technology : No

Chipset
Model : AMD (ATI) RD790 GFX Dual Slot
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Chipset
Model : AMD (Family 10h) Athlon64/Opteron/Sempron HyperTransport Technology Configuration
Revision : A1
Front Side Bus Speed : 2x 1.8GHz (3.60GHz)
In/Out Width : 16-bit / 16-bit
Maximum Bus Bandwidth : 14.06GB/s

Logical/Chipset Memory Banks
Bank 2 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 3 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 10 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Bank 11 : 1GB DIMM DDR2 4-4-3-5 2-11-3-3 1T
Channels : 2
Bank Interleave : 2-way
Memory Bus Speed : 2x 400MHz (800MHz)
Multiplier : 2x
Width : 64-bit
Memory Controller in Processor : Yes
Cores per Memory Controller : 4 Unit(s)
Maximum Memory Bus Bandwidth : 12.5GB/s

Resulted in Cache and Memory Performance gain of nearly 5%

Analysis

The benchmarks clearly show their is noticable and signficant increase in performance that can be had by adjusting and fine tuning timings and settings. I found with my testing that on the AMD chipset the memory timings themselves had very little effect on the system and its performance. Memory latency was highly dependent on timings however. The greatest benefits came in fine tuning those confusing settings that often nobody knows what really does.

It is interesting to note that optimizing had a huge effect on L3 cache performance in particular. In the latency benchmark optimizing resulted in a decrease of internal data cache from 81 clocks to 63 clocks, or an increase in performance of 22%. In the cache and memory performance benchmark L3 onboard cache information increased from 30.13GB/s to 42.44GB/s, or increase in performance of 40%!

I went on to test the CPU's performance using arithmetic test, muti-media, and Multi-core efficiency. The arithmetic test resulted in very small drop in performance, which can easily be explained as a margin of error common in benchmarks. Anyway, result is no gains, and you can expect optimizing to have no effect on folding or number crunching performance. The multi-media benchmark showed small, but consistent gains. Nothing getting higher then 1 or 2% though. The multi-core efficiency test though benefited drastically.

Unoptimized CPU Multi-Core Efficiency Benchmark

Quote:
SiSoftware Sandra

Benchmark Results
Inter-Core Bandwidth : 3.97GB/s
Results Interpretation : Higher index values are better.
Inter-Core Latency : 89ns
Results Interpretation : Lower index values are better.

Performance vs. Speed
Inter-Core Bandwidth : 1.16MB/s/MHz
Results Interpretation : Higher index values are better.
Inter-Core Latency : 0.03ns/MHz
Results Interpretation : Lower index values are better.

Performance vs. Power
Processor(s) Power : 229.10W
Inter-Core Bandwidth : 17.76MB/s/W
Results Interpretation : Higher index values are better.
Inter-Core Latency : 0.39ns/W
Results Interpretation : Lower index values are better.

Capacity vs Power
Total Cache Size : 13.76kB/W
Results Interpretation : Higher index values are better.

Detailed Benchmark Results
Processor Affinity : CPU0-CPU1 CPU2-CPU3
2x8kB Blocks Bandwidth : 3.08GB/s
4x8kB Blocks Bandwidth : 3.11GB/s
2x32kB Blocks Bandwidth : 3.16GB/s
4x32kB Blocks Bandwidth : 3.14GB/s
16x8kB Blocks Bandwidth : 3.08GB/s
2x128kB Blocks Bandwidth : 3.15GB/s
4x128kB Blocks Bandwidth : 3.15GB/s
16x32kB Blocks Bandwidth : 3.09GB/s
64x8kB Blocks Bandwidth : 3.07GB/s
16x128kB Blocks Bandwidth : 12.50GB/s
64x32kB Blocks Bandwidth : 10.94GB/s
64x128kB Blocks Bandwidth : 4.10GB/s

Performance Test Status
Run ID : AMD Phenom(tm) II X4 940 Processor (4C, 3.5GHz, 1.8GHz MC, 4x 512kB L2, 6MB L3)
Platform Compliance : x64
Buffering Used : Yes
NUMA Support : No
SMP (Multi-Processor) Benchmark : Yes
Total Test Threads : 4
Multi-Core Test : Yes
Cores per Processor : 4
System Timer : 14.32MHz
Page Size : 4kB
Use Large Memory Pages : No

Processor
Model : AMD Phenom(tm) II X4 940 Processor
Speed : 3.5GHz
Cores per Processor : 4 Unit(s)
Type : Quad-Core
Internal Data Cache : 4x 64kB, Synchronous, Write-Back, 2-way, Exclusive, 64 byte line size
L2 On-board Cache : 4x 512kB, ECC, Synchronous, Write-Back, 16-way, Exclusive, 64 byte line size
L3 On-board Cache : 6MB, ECC, Synchronous, Write-Back, 48-way, 64 byte line size, 4 threads sharing


Optimized CPU Multi-Core Efficiency Benchmark


Quote:
SiSoftware Sandra

Benchmark Results
Inter-Core Bandwidth : 5.49GB/s
Results Interpretation : Higher index values are better.
Inter-Core Latency : 73ns
Results Interpretation : Lower index values are better.

Performance vs. Speed
Inter-Core Bandwidth : 1.61MB/s/MHz
Results Interpretation : Higher index values are better.
Inter-Core Latency : 0.02ns/MHz
Results Interpretation : Lower index values are better.

Performance vs. Power
Processor(s) Power : 229.10W
Inter-Core Bandwidth : 24.54MB/s/W
Results Interpretation : Higher index values are better.
Inter-Core Latency : 0.32ns/W
Results Interpretation : Lower index values are better.

Capacity vs Power
Total Cache Size : 19.68kB/W
Results Interpretation : Higher index values are better.

Detailed Benchmark Results
Processor Affinity : CPU0-CPU2 CPU1-CPU3
2x8kB Blocks Bandwidth : 4.40GB/s
4x8kB Blocks Bandwidth : 4.42GB/s
2x32kB Blocks Bandwidth : 4.53GB/s
4x32kB Blocks Bandwidth : 4.48GB/s
16x8kB Blocks Bandwidth : 4.38GB/s
2x128kB Blocks Bandwidth : 4.49GB/s
4x128kB Blocks Bandwidth : 4.48GB/s
16x32kB Blocks Bandwidth : 4.53GB/s
64x8kB Blocks Bandwidth : 4.79GB/s
16x128kB Blocks Bandwidth : 15.99GB/s
64x32kB Blocks Bandwidth : 14.26GB/s
64x128kB Blocks Bandwidth : 4.34GB/s

Performance Test Status
Run ID : AMD Phenom(tm) II X4 940 Processor (4C, 3.5GHz, 2.60GHz MC, 4x 512kB L2, 6MB L3)
Platform Compliance : x64
Buffering Used : Yes
NUMA Support : No
SMP (Multi-Processor) Benchmark : Yes
Total Test Threads : 4
Multi-Core Test : Yes
Cores per Processor : 4
System Timer : 14.32MHz
Page Size : 4kB
Use Large Memory Pages : No

Processor
Model : AMD Phenom(tm) II X4 940 Processor
Speed : 3.5GHz
Cores per Processor : 4 Unit(s)
Type : Quad-Core
Internal Data Cache : 4x 64kB, Synchronous, Write-Back, 2-way, Exclusive, 64 byte line size
L2 On-board Cache : 4x 512kB, ECC, Synchronous, Write-Back, 16-way, Exclusive, 64 byte line size
L3 On-board Cache : 6MB, ECC, Synchronous, Write-Back, 48-way, 64 byte line size, 4 threads sharing

Resulted in an intercore bandwidth increase of 38%! With a drop of intercore latency from 89ns to 73ns, or an increase in performance of nearly 18%!
 

·
Premium Member
Joined
·
4,611 Posts
Can you make it clear which settings are which? On the first set of timings you start with the command rate of 2N, but on the others you leave them off. You might consider putting this into excel to make it clearer. Also, It would clear up which advanced timings were tweaked and how it effects performance increases.

Now that I've said that, +rep. I may have to look into changing some of the settings that pretty much no one touches.
 

·
Banned
Joined
·
16,364 Posts
Your results will only be usable for comparison to systems with your same chipset. Since the MCH is totally different on each chipset, all of the characteristics you are seeing, with your results, as well as overall latency and RAM speed, are independent to that chipset..only.

Results will be different, for better or for worse comapred to other chipsets/FSB's/MCH straps [even CPU multi matters for bandwidth tests on RAM, as lowering the stock multi, raises NB speed over the FSB].
 

·
Premium Member
Joined
·
6,045 Posts
Discussion Starter · #4 ·
I just used what I had. Not meant to be universally applicable, just meant to show that there is a lot more room for improvement in the minor settings with some motherboards.
 

·
Iconoclast
Joined
·
31,524 Posts
Good to see someone else who uses RMMT to test multi-threaded memory performance.

And yeah, tweaking subtimings is as important as many other things people fuss over, if not more so.
 

·
Premium Member
Joined
·
6,045 Posts
Discussion Starter · #6 ·
I am bumping this thread because I have tested my AMD platform and added it to the OP. The benchmark results are put in spoiler to save space, and can be seen by clicking on "show the hidden text". All replies after this post are new, the first replies refer only to my benchmarking of my x48 chipset.
 
1 - 6 of 6 Posts
Top