The L2 & L3 cache on this ES chip is not functioning based upon the chiphell comments and the AIDA memory benchmarks. I wouldnâ€™t rely too much upon these ES benchmark results. When you look at the AIDA result you can see that the L2 & L3 cache writes are problematic. The ES chip used is not representative of a production sample.
From the AMD Bulldozer optimization guide: http://support.amd.com/us/Processor_TechDocs/47414.pdf
â€œ2.5.3 L2 Cache
The AMD Family 15h processor has one shared L2 cache per compute unit. This full-speed on-die L2 cache is mostly inclusive relative to the L1 cache. The L2 is a write-through cache. Every time a store is performed in a core, that address is written into both the L1 data cache of the core the store belongs to and the L2 cache (which is shared between the two cores). The L2 cache has an 18-20 cycle load to use latency. Size and associativity of the AMD Family 15h processor L2 cache is implementation dependent. See the appropriate BIOS and Kernel Developerâ€™s Guide for details.
2.5.4 L3 Cache
The AMD Family 15h processor supports a maximum of 8MB of L3 cache per die, distributed among four L3 sub-caches which can each be up to 2MB in size. The L3 cache is considered a non-inclusive victim cache architecture optimized for multi-core AMD processors. Only L2 evictions cause allocations into the L3 cache. Requests that hit in the L3 cache can either leave the data in the L3 cacheâ€”if it is likely the data is being accessed by multiple coresâ€”or remove the data from the L3 cache (and place it solely in the L1 cache, creating space for other L2 victim/copy-backs), if it is likely the data is only being accessed by a single core. Furthermore, the L3 cache of the AMD Family 15h processor also features a number of micro-architectural improvements that enable higher bandwidth. â€œ
As a result the non functioning L2 & L3 write cache is seriously going to impact performance.
From the AMD Bulldozer optimization guide:
"2.15.1 Hypertranport Assist
HyperTransport assist also increases the total coherent fabric bandwidth capability within the system by removing much probe and response traffic from the coherent HyperTransport links. It also streamlines probe and response handling throughout the L1/L2/L3 caches and elsewhere in the microarchitecture, which can lead to additional bandwidth improvements in systems with multiple processing nodes. HyperTransport assist is enabled by partitioning the L3 cache physical storage into a section used as traditional (CPU-side) L3 cache, and a separate physical section for directory storage which is inaccessible to the CPUs. In effect, from the perspective of CPUs, systems with HyperTransport assist enabled have a smaller L3 cache. Typically, 1â€“2MB of L3 cache is reserved for use by HyperTransport assist technology. Thus, some amount of L3 capacity is traded for reduced latency on cache refills. While the benefit of this tradeoff can be workload-dependant, it is almost universally a win on larger (4+ node) systems. If a platform runs a specific workload, it may be worth evaluating performance with and without HyperTransport assist.â€
Furthermore the use of Hyper transport is impacted which will degrade performance as the memory bandwidth utilization is impacted reducing performance.