Since Balla is not going to answer the questions surround cache. This information was gather from various sources.
Interlagos has a 16-way associative design. What does this mean??
"The load-to-use latency for Bulldozer is going to be surprisingly high probably around 18-20 cycles--and in comparison, the L2 caches for Nehalem and Istanbul are roughly around 10 cycle latency. That also means The L2 cache can have as many as 23 outstanding misses concurrently. Since the L1D is both write-through and mostly included in the L2, evicting a cache line from the L1D is silent and requires no further actions. This is beneficial since evictions are typically caused by a filling a cache line, in response to a cache miss and closely tied to the critical path for a miss."
It appears that the L1D is mostly included in the L2. As a result, there are going to be certain situations where lines are going to be residing in the L1D without being present in the L2. This result in the L1D may need to be snooped when another core misses in the L3 cache. Snooping is not wanted to occur since that puts a large amount of traffic in the BD system. Bulldozer was designed to eliminate snoop traffic to the L1D caches and instead have the L2 cache for each module handle all the coherency snoops for that module. This is a Design improvement !!
There is a disadvantage of a write-through policy though and that is the L1D caches do not insulate the L2 cache from the store traffic in the cache hierarchy. Now, L2 cache must have higher bandwidth to accommodate all the store traffic from two cores, and any associated snoop traffic and responses. Hopefully, BD architecture has solved this problem by including a write coalescing cache (WCC), which is considered part of the L2.
Intel has historically focused upon Prefetching where AMD has always lagged behind. It has already been stated that Prefetching for Bulldozer will be non-strided data prefetcher's. Also, bulldozer is going to have independent prefetchers at both the L1 and L2 levels supporting a larger numbers of strides and large stride sizes
Architecture overview of Bulldozer is attached