Computer fab processes are just a few years away from not being able to become smaller. A 2D chip (note: "3D transistors" are not 3D, just turned on edge. They are not stacked.) scales to a power of 2; in order to increase performance once minimum fab size is achieved, a 2D chip must get wider and longer. A 3d chip will allow the same base size (and keep devices small).
Computer designed chips is the future. Some research shows that there is little or no heat output with graphene chips. This makes 3D designs feasible. A cubic chip would have so many transistors and be so complicated (due to hundreds of 3D interconnects) that a computer is the only feasible method of design.
AMD designing chips with a computer (if true) simply shows that the company is forward thinking. Getting experience and patents (I don't like patents, but it is "legit" business strategy) would put AMD ahead of the curve. As experience with computer design increases, efficiency will as well (note: AMD must still create and input basic structures for a computer program to work with). Also important to note is that AMD would still examine the computer designs and tweak the less efficient places by hand (as is done in computerise engineering in every other engineering profession).
That said, GPU's have been computer designed for years and it works fairly well.
edit: I would be remiss if I did not say that Intel undoubtedly uses computers for at least the more boring jobs like designing wire trace layers. If computers are not used for current designs, Intel still likely has a team doing research into this area.
To answer some other persons, I think I'll just quote myself to save typing
Edited by hajile - 10/14/11 at 6:34am
Originally Posted by hajile
Yields are terrible. The launch was probably comprised of mostly paper.
I don't expect some magic 50% performance increase.
Linux developers seem to have had (still having?) a difficult time redesigning the scheduler to suit the new architecture. I assume that Windows developers are having similar issues. The issue talked about earlier in this thread (where disabling the second integer core in each module improves performance/clock). While this could show a problem with the decode unit but the decode unit is probably not the biggest problem. Data and instructions reach the integer cores only after being decoded. A decoder bottleneck will exist regardless of which integer core or cores are working downstream(the max no. of instructions decoded per unit of time is the max no. of instructions which can be executed per unit of time regardless of how many execution units are present); however, rearranging the chip in software (via disabling some cores) to simulate a more traditional architecture shows that the performance gains are more likely to be due to better scheduling optimization and less cache thrashing (not "getting more instructions to fewer core").
AMD's problem is that the CPU (apparently) can't effectively use all the available integer cores due to poor scheduling by the OS, large cache latencies, not enough decoders, and poor branch prediction. That is to say that the integer units (and possibly the FPU's) are being bottlenecked.
The poor scheduling can be fixed and (based off Anandtech's Windows 8 not-fully-optimized alpha test) will decrease normal power consumption (unused cores can downclock) while increase performance 10% or maybe more (some 4c/4cu benches showed >20% increases). The cache latencies were probably increased over expected numbers to increase yields due to poor 32nm performance at Globalfoundries. They will probably increase with the next stepping or two (side note: one of AMD's goals was near linear performance increase with clockspeed (something SB doesn't achieve) and getting the 30% clockspeed advantage over Deneb that was initially expected will also be a side affect of fab improvements).
Improving decoders (if necessary) and improving branch prediction require a complete reworking of the front-end of the processor. With normal development times for simple chip redesigns being a couple of years, I suspect that AMD knew months ago about the poor decode and branch prediction. This is the only explanation for how soon piledriver is being released (just a few months rather than a couple of years for a major redesign). AMD likely counted on the 30% greater stock clockspeed to carry them until the redesign was finished (notice that the 4.6Ghz overclock benches (roughly 30% faster than Deneb designs) were fairly competitive), but AMD was screwed by the bad fab (though I believe AMD to be at fault as well for shipping a faulty design).
My prediction (please don't quote me later, I am being optimistic, but in reality, I have little faith)
Between a 10-15% average OS performance increase (this seems fairly definite), better fabs giving (I guess) 20% increase in clockspeed rather than 30% (scaling almost linearly), better fabs giving nearly 80% improvement in cache latencies (to match Deneb latencies should be completely possible, cache is cache), and 10-15% IPC improvement (also seems fairly definite) due to more decode and better branch prediction, I believe that the next iteration will show more of the theoretical potential.
edit: At best, 15% from OS and 15% from redesign gives 30% IPC performance boost (making it 20% faster than Deneb and 20% slower than Sandybridge). Better cache latencies are a mixed bag; they may give less than 2% for some applications or they may give >20% for others. If clockspeed can be increased the total increases give between 35-90% increases in performance (that's a huge delta). Even with a 70-90% increase in overall performance, performance per transister would still be worse than Sandybridge.
This seems to be the only explanation for why a chip half the transistor count of bulldozer can have better performance. As the chip is currently, I couldn't recommend anyone buy one (I don't think that I could recommend one even if the OS problem went away).