Originally Posted by Papas;15292409
true, never crossed my mind that a company like newegg and tigerdirect would carry limited quantities of a item that is such a huge release and has been touted about for over a year.
On another note. people talking about how BD performance cant be increased should really look at the 2500k/2600k. during initial testing for the release date, 2500k/2600k were scoring within 14 points of each other in vantage and 100 points in 3dmark 11(while getting beat by the i7 950 and i7 875k) now they are scoring hundreds points more(almost 1000 for the 2600k) and beating the i7 950 and i7 875k, so again, how can BD performance not increase due to some magical driver when intel released the same magical driver that made there cpu's perform better.
Yields are terrible. The launch was probably comprised of mostly paper.
I don't expect some magic 50% performance increase.
Linux developers seem to have had (still having?) a difficult time redesigning the scheduler to suit the new architecture. I assume that Windows developers are having similar issues. The issue talked about earlier in this thread (where disabling the second integer core in each module improves performance/clock). While this could show a problem with the decode unit but
the decode unit is probably not the biggest problem. Data and instructions reach the integer cores only after
being decoded. A decoder bottleneck will exist regardless of which integer core or cores are working downstream(the max no. of instructions decoded per unit of time is the max no. of instructions which can be executed per unit of time regardless of how many execution units are present); however, rearranging the chip in software (via disabling some cores) to simulate a more traditional architecture shows that the performance gains are more likely
to be due to better scheduling optimization and less cache thrashing (not "getting more instructions to fewer core").
AMD's problem is that the CPU (apparently) can't effectively use all the available integer cores due to poor scheduling by the OS, large cache latencies, not enough decoders, and poor branch prediction. That is to say that the integer units (and possibly the FPU's) are being bottlenecked.
The poor scheduling can be fixed and (based off Anandtech's Windows 8 not-fully-optimized alpha test) will decrease normal power consumption (unused cores can downclock) while increase performance 10% or maybe more (some 4c/4cu benches showed >20% increases). The cache latencies were probably increased over expected numbers to increase yields due to poor 32nm performance at Globalfoundries. They will probably increase with the next stepping or two (side note: one of AMD's goals was near linear performance increase with clockspeed (something SB doesn't achieve) and getting the 30% clockspeed advantage over Deneb that was initially expected will also be a side affect of fab improvements).
Improving decoders (if necessary) and improving branch prediction require a complete reworking of the front-end of the processor. With normal development times for simple chip redesigns being a couple of years, I suspect that AMD knew months ago about the poor decode and branch prediction. This is the only explanation for how soon piledriver is being released (just a few months rather than a couple of years for a major redesign). AMD likely counted on the 30% greater stock clockspeed to carry them until the redesign was finished (notice that the 4.6Ghz overclock benches (roughly 30% faster than Deneb designs) were fairly competitive), but AMD was screwed by the bad fab (though I believe AMD to be at fault as well for shipping a faulty design).
My prediction (please don't quote me later, I am being optimistic, but in reality, I have little faith)
Between a 10-15% average OS performance increase (this seems fairly definite), better fabs giving (I guess) 20% increase in clockspeed rather than 30% (scaling almost linearly), better fabs giving nearly 80% improvement in cache latencies (to match Deneb latencies should be completely possible, cache is cache), and 10-15% IPC improvement (also seems fairly definite) due to more decode and better branch prediction, I believe that the next iteration will show more of the theoretical potential.
edit: At best, 15% from OS and 15% from redesign gives 30% IPC performance boost (making it 20% faster than Deneb and 20% slower than Sandybridge). Better cache latencies are a mixed bag; they may give less than 2% for some applications or they may give >20% for others. If clockspeed can be increased the total increases give between 35-90% increases in performance (that's a huge delta). Even with a 70-90% increase in overall performance, performance per transister would still be worse than Sandybridge.
This seems to be the only explanation for why a chip half the transistor count of bulldozer can have better performance. As the chip is currently, I couldn't recommend anyone buy one (I don't think that I could recommend one even if the OS problem went away).Edited by hajile - 10/13/11 at 9:37am