Originally Posted by ihatelolcats
what should the multiplier between ram and nb on these fix chips be?
Generally, "leave it alone". The extra voltage, heat, and instability are not worth the almost unseen benefit of making the northbridge faster. Since the NB is no longer bound to cache, it's not a simple "make it faster" deal like with the Ph IIs.
Someone in the thread said the NB should be at, or above, the speed of the RAM at all times, which makes sense since the IMC is part of the CPU/NB, so we'll go with that.
Originally Posted by kahboom
Has anyone considered delidding an fx CPU for lowering temps or at least replacing the compound under the lid?
No, because there is no TIM under the IHS, it's soldered directly. Attempting to remove it will break the chip.
Originally Posted by MadGoat Warning: Spoiler! (Click to show)
On the topic here,
I know when I keep my PHII at a constant clock but only change the ram speed I see a large increase in heat output. I suppose this is the same effect as keeping the processor at the same voltage but clocking it higher... it increases heat.
Simply asking the unit to "do more work". I cant get a stable speed @ or above 1866 ram @ 3ghz NB... but I can decrease the NB to about 2600mhz and increase the ram speed... but then it's a trade off and I'm loosing NB performance (caches are clocked lower).
I can only assume the same principle exists with Bulldozer / piledriver.
And further more, the changes in Pildriver show up in the L2 cache performance:
AnandTech - AMD Launches Opteron 6300 series with "Piledriver" cores
The L2 cache latency and bandwidth has not changed, but AMD did quite a few optimizations. From AMD engineering:
"While the total bandwidth available between the L2 and the rest of the core did not change from Bulldozer to Piledriver, the existing bandwidth is now used more effectively. Some unnecessary instruction decode hint data writes to the L2 that were present in Bulldozer have been removed in Piledriver. Also, some misses sent to the L2 that would get canceled in Bulldozer are prevented from being sent to the L2 at all in Piledriver. This allows the L2’s existing resources to be applied toward more useful work.”
This Means that cache operations misses in L1 that would have normally been set to L2 (which created a lot of bulldozers cache latency issues) are now not allowed to take the cache cycles away from the pipeline. Requiring instead a new start cycle. This apparently is faster than than the cache continuing the search than just starting the operation again. I assume because of the the longer cycle length of the pipeline in AMD's new CMT cores. (and is also why you see the L2 cache performance differences in vishera vs. zambezi... look at the L2 Read performance numbers in Aida64 memory benchmark)
This is more of a instant performance "work around" than it is a design improvement. I don't know if this can even be solved until the 28nm fab due to the need for lower latency caches while maintaining the high clock necessary (which increases the bandwidth) to keep the cores fed.
AMD pretty much has the beginnings of a great pipeline but cant keep it fed do to cache restraints that are ultimately restricted by the fab process. Add on top of that the small associative L1 cache Piledriver is coping with, and you have the ultimate inability to get the correct information on the fly to the pipeline.
This is why Piledriver is great at parallel tasks right now, but falls short in random IO (or IMC) operations. Its really all about the cache, prefetch, branch misprediction cycle penalty, and lack of decode per core.
The Bulldozer Aftermath: Delving Even Deeper
The Real Shortcomings: Branch Misprediction Penalty and Instruction Cache Hit Rate
Bulldozer is a deeply pipelined CPU, just like Sandy Bridge, but the latter has a µop cache that can cut the fetching and decoding cycles out of the branch misprediction penalty. The lower than expected performance in SAP and SQL Server, plus the fact that the worst performing subbenches in SPEC CPU2006 int are the ones with hard to predict branches, all points to there being a serious problem with branch misprediction.
Our Code Analyst profiling shows that AMD engineers did a good job on the branch prediction unit: the BPU definitely predicts better than the previous AMD designs. The problem is that Bulldozer cannot hide its long misprediction penalty, which Intel does manage with Sandy Bridge. That also explains why AMD states that branch prediction improvements in "Piledriver" ("Trinity") are only modest (1% performance improvements). As branch predictors get more advanced, a few tweaks here and there cannot do much.
It will be interesting to see if AMD will adopt a µop cache in the near future, as it would lower the branch prediction penalty, save power, and lower the pressure on the decoding part. It looks like a perfect match for this architecture.
Another significant problem is that the L1 instruction cache does not seem to cope well with 2-threads. We have measured significantly higher miss rates once we run two threads on the 2-way 64KB L1 instruction cache. It looks like the associativity of that cache is simply too low. There is a reason why Intel has an 8-way associative cache to run two threads.
Good way to sum it up, although you have the cache order reversed. RAM -> L3 -> L2 -> L1 -> Core cache misses would be in a higher value.
Originally Posted by SkateZilla
FX83xx Series IMC can do 4 DIMMs of DDR1600.
Shoot my FX8120 runs 16GB fine at DDR1600 too.
My 970BE ran 32GB (4x8GB) at 1600 and still runs 16GB (4x4GB) at 1800, if you can deal with the heat it's a non-issue.
Originally Posted by SkateZilla
wait a sec... so you're saying my NB should be 2400 on my FX?...
Stock NB speed on PD is 2200Mhz. Stock HT is 2600Mhz