Originally Posted by tpi2007
So, the CPU part is still more or less the same as Bulldozer. Sorry, there is no way around that.
From the 3 Ghz A8 3870K to the 3.8 Ghz A10 5800K the difference is 26.6% higher clockspeed + Turbo Core, which is something the 3870K doesn't have. They are claiming a 29% increase in productivity performance, whatever that means, so almost all, if not all of the gains are attributable to the higher clockspeed + Turbo Core. This means that with the Trinity improvements they most probably basically only managed to bring the IPC back to the former architecture's level.
Add to that the "Optimized for Windows 8" moniker, which basically means "no, we haven't adressed the architectural flaws, just a few improvements to make it less bad and are relying on a better task scheduler of an OS that still hasn't been released to somehow make us look a little better".
AMD is relying on the newer manufacturing process to raise the clockspeed and make up the difference instead of making the architecture better.
AMD needs help and fast. They have no direction in the desktop market. Intel proved that you don't need to invest lots of money to get higher performance by doing a new manufacturing process and
new architecture at the same time like AMD is foolishly trying to do. Intel's tick - tock cycle works. AMD on the other hand is acting like a company that has money to spare but they don't.
They could have saved money by not trasitioning to 32 nm on the CPU right away and investing the money saved on improving the Phenom II architecture. If Intel did it with Nehalem on 45nm, why can't AMD ? I would say even more, Intel could technically very well release Sandy Bridge at 45nm without the GPU. AMD is trying to do everything at the same time (new architecture + new manufacturing process) and they don't have the money to make that work. Not even Intel does that.
The saving grace is that this is good enough for general purpose computers and notebooks and they have good idle power saving technology.
But in the end you need to have a proper flagship, you can't last very long if you don't have something that inspires the desire of the customers. Even car makers do that. Fiat for example owns Ferrari.
There's a fundamental reason why AMD cannot follow the tick-tock model: AMD does not own any foundries, and cannot make a single batch of chips to experiment on, then make some small tweaks, and create another small batch. They have to order in bulk for anyone to make them, whereas Intel owns their own foundry and can do whatever the hell they want with it.
Also, there was an article posted a few weeks or months ago about AMD doing a "tick-tock" like approach. With two sets of platforms now, AMD is releasing the mainstream platform first (FM1/FM2), and using the things they learned from that release to improve the enthusiast platform (AM3+). The Piledriver cores that we will see in Trinity will be an indicator of Vishera performance, but it won't be the full thing, as there will be some tweaks + L3 cache.
Originally Posted by hajile
Compare 4-module bulldozer to 6-core thuban. Despite the extra two integer units and higher clockspeed, performance was fairly similar for most applications. In multi-threaded applications, a single bulldozer core isn't as powerful as part of the front-end units are used to power the secondary integer unit. The max theoretical output for 1 module using 2 integer units is 180% of the power of just one unit (rather than 200% as many who misunderstand bulldozer CMT seem to expect).
Theoretically, if IPC for using only one core per module matches stars core (assuming identical frequencies), then adding a second stars core and activating the second module will give the two stars cores a 20% advantage in performance (though there are many other potential disadvantages). Doubling this to 4 cores and 2 modules (as trinity has) means that the 20% advantage in multi-tasking still exists. Edit: note that stars is 6% faster than Phenom II, so that's another 6% improvement over bulldozer that's needed.
A 26% increase in clockspeed would be needed just to break even and an additional 35% improvement is needed in addition to that to get the claimed increase.
To push this further, the front-end of bulldozer was frequently small enough to bottleneck two integer cores, so high throughput applications often only saw 100% for one core per module and 140% for two (once again, instead of 200%). If this is factored in, then one bulldozer module vs 2 stars cores has a 60% disadvantage. If clockspeed only increases by 26%, then another 60% improvement must be coming from somewhere else if the chip is 30% faster overall.
I believe that the front end was widened by one or two instructions and the instruction cache was increased slightly. In addition, branch prediction could net huge gains in performance (branch prediction is much worse in bulldozer than in stars). More mature 32nm also means that cache latencies will finally be decent (the waits were 2x longer than the competition in some cases) and clockspeeds will likely get better.
As a mention to those who claim that another Phenom processor would be fine; it would be a losing strategy. AMD hasn't the research buying power to compete with Intel at the same game. The only hope for competition is to try something different. The CMT in Bulldozer is just that. Many companies have talked about what a great idea it seems to be, but none were willing to take the risk. AMD was/is in a position desperate enough to try something new and revolutionary, but there's no incremental change between classic design and bulldozer's design. It's all in or go home.
AMD's failure to deliver with Bulldozer doesn't mean the design is bad. Even the P4 has great usage. Almost all of the improvements from core2 through today have been due to Intel adding features that were dropped when moving from P4 to core. As to IPC, it doesn't mean anything. A high IPC can be had at the expense of clockspeed and thus be slower than a lower IPC chip. The only reason that higher IPC makes sense today is the silicon frequency wall, but in a few short years, silicon will be replaced and most of the replacements run at high clockspeeds. With these technologies, linear scaling in clockspeed/power (ie. fewer bottlenecks as speed increases) is more important than IPC.
I love how tpi2007 (and anyone else for that matter) had absolutely nothing to say about this post.
4 module Bulldozer barely matched 6-core Thuban, even though it had a 9% clock speed advantage (at stock).
Now we see 2 module Piledriver exceeding 4-core Stars by 29% with a 26% boost in clock speed, which means Piledriver's cores should be faster clock per clock than Stars, which itself is faster than Thuban. This much of a performance gain is fairly impressive if you ask me, that is if these rumors actually hold to be true. Even if it doesn't, and performance is something like 15% better on average, it's still fairly impressive what AMD has been able to do to tweak the architecture.Edited by Tsumi - 4/5/12 at 1:13am