Originally Posted by JCPUser
That is NOT the reason. It favors Intel because SPi uses an outdated, rather obsolete instruction set called x87. x87 has basically been replaced by SSE (and it variants) in modern applications. However, Intel still uses x87 on their CPUs wheras AMD just emulates it... this is the main reason why SPi is faster on Intel processors.
That's not correct. Scientific computing sometimes needs the additional accuracy of a 64bit mantissa and even in real world code as used in our company there can be situations where it is difficult to maintain enough accuracy due to bit cancellation problems.
And thus x87 is executed in HW on Bulldozer. One half of the register files is able to store full x87 mapped registers and at least the lower halves of the two 128b FMAC units execute x87 mul/add type ops. Emulation with narrower FP hardware would cause performance to drop <25% at least.
Have a look at Hiroshige Goto's diagram:
x87 == EP (extended precision)
As someone at XS pointed out, SuperPi might suffer from bad use of caches and memory, maybe also of decode hardware, since it's been compiled for some old arch (Pentium?), which had much different needs for code optimization than newer archs.
For other possible explainations of performance behaviour have a look at my latest blog entry: http://citavia.blog.de/2011/08/28/bu...mmit-11338315/