Great post OP, I was thinking almost exactly the same thing. There is so much impact the new architecture has. They made very big changes and it is only natural that they are not very well suported yet.
This for example:
Originally Posted by Anandtech
Compared to Sandy Bridge, Bulldozer only has two advantages in FP performance: FMA support and higher 128-bit AVX throughput. There's very little code available today that uses AMD's FMA instruction, while the 128-bit AVX advantage is tangible.
From Anandtech's review
. Bulldozer's FPU is at a disadvantage for FP heavy loads. This is something they said is supposedly not heavily used in most situations (and it costs a lot of resources, which is their reasoning for cutting the count in half), but for the situations that it is they have their new FMA and XOP instructions that should help ease the load. Unfortunately, it is hardly supported right now, if at all.
The scheduling thing is a big one as well. The results of the 4 module/4 core tests show that the module-shared resources approach is hindering the performance of BD for some reason (link to the source of the Xtemesystems post here
, source of what you linked). 30-50% increase is significant! If this is the case, IPC would be an improvement over Phenom II. Also, considering the changes to the pipeline (which would inherently lower IPC while increasing clock speed), that is pretty amazing.
There was also a review somewhere talking about scheduler optimization that would help as well. I cannot find it right now, but it talked about putting threads that shared information into the same module, and moving other threads into their own modules until there are no modules without a thread. I looked for half an hour but I can't find it, but I'll look again tomorrow. Of course, this is not the case in Windows 7.
In the end, I think with everything that is stacked up against it, it is amazing that it achieved the results that it did, and when they overcome those obstacles (whether its soon in Zambezi or in future revisions) it can turn out to be quite a decent chip. Unfortunately, in the current state of things, the final performance is underwhelming, although some people blow it out of proportion.