Originally Posted by Marios145
The BD/PD module
can execute 2x128bit FP instructions and also has 2x integer cores(2 cores sharing the FPU).
Zen core can execute 2x128bit FP instructions and also has an integer core(there's no sharing here).
That gives BD/PD a total of 8x128bit FPUs.
It gives Zen a total of 16x128bit FPUs.
Cinebench uses mostly SSE which is 128bit.(correct me if i'm wrong)
Now all other things equal, the 8C/8T Zen will theoretically have twice the FP performance of 4M/8C/8T BD and derivatives.
While Zen can, theoretically, execute four concurrent floating point instructions, not all of its pipelines are equal... or independent... enough to say it can execute 2x128 instructions... though it's not terribly far off, to be fair. Scaling should be less than you might think... because ILP extraction is greatly improved... which is something that is diminished when you are executing two threads at once on Zen.
When SMT is active, we know that the execution units are not segmented, they are competitively utilized. That includes the FPU. SSE instructions can span across the entire FPU with just one thread, so adding another thread will not extract much more performance (well, 20% or so is a fair guess).
Without SMT, having twice the FPU width will only add, maybe, 50% more performance... more in some cases, much much less in others (bordering on zero). The rest of the CPU will determine how that relates to program performance.
However, each Zen FPU is superior to each Excavator FPU so, technically, Zen will have more than double the theoretical floating point capabilities... but hamstrung by ILP limitations and supporting infrastructure issues. SSE tasks may execute, in some cases, 100%+ faster, but by the time those results are usable, we have eaten up half of the advantage in other areas (such as waiting on an AGU or ALU).