Originally Posted by Yeroon
You are wrong about how each core clocks, each core within each module does the rated clock speed. Its only different when a single module turbos separate from the base frequency, where that module (or modules) goes higher than base freq.
Second, each core within each module can do a 128bit FP operation simultaneously, its only 256B FP operations that require it to work as a single unit. So an 8 core fx (8350) can do 8 128bit FP ops AT ONCE, AT 4ghz.
EDIT: your idea of how the fx works is actually damaging when done for LN2 clock records, IIRC, and most clock record attempts are done with whole modules.
Originally Posted by azanimefan
i'm pretty sure your whole post is wrong. no... i take that back. you have an almost the correct grasp on how the core modules work... your wrong because the dual cores in the module are 2 integer cores
and 1 floating point calculation unit which they share; so yes, with floating point math they will function almost like a hyperthreaded intel core... only no where near as well scheduled as hyperthreading. The thing of it is, is floating point math is mostly something used by gpus... for cpus this isn't a critical function.
you're also incorrect about no 2 cores being able to do 4ghz at the same time. In bulldozer, it was common to turn off the paired core when overclocking because of heat and stability issues (likely caused by the northbridge)... the turbo for bulldozer also was variable, and usually only effected certain cores. Piledriver is different, and it's piledriver we're talking about. Piledriver has no such issues with their clock scheduling of the core modules. The turbo affects the whole cpu at the same time, not certain cores. You'll notice that you're not even completely right about bulldozer either, in bulldozer it was the turbo that was variable. the base clock was the base clock. if the cpu was clocked to 4ghz the whole unit ran at 4ghz.
i'm not disagreeing that the bulldozer architecture is much slower then sandy bridge. technically it's about ~40% slower. Piledriver is around ~25% slower... a little slower then thuban and nehalem... close enough some overclocking can cover it up. I never expected piledriver's performance in bf4 to equal an i5's... generally when you normalize for clock speed and core count in a heavily cpu optimized gaming title you can expect piledriver to bench about 30%-25% slower in most gaming benches then a sandybridge i5 (its worse in some titles because games like skyrim use coding not supported by piledriver). But what we see in BF4 is far more significant then 30%... what we're seeing in BF4 is closer to what you'd expect to see in skyrim. Now we know BF4 isn't using a coding language that the piledriver can't handle... so the only conclusion we can draw is Jaguar is different enough from piledriver as to hard piledriver's performance in bf4.
Look at this. This is after the latest patch.
( I excluded Turbo frequency for both architecture )
Here FX 6350 at 3.9 GHz is 21.8% faster in terms of frequency speed
than a 3.2 GHz Phenom II 1090T. Right ?
Ok we all know even a Piledriver core processes less IPC than a Thuban core. Right ?
So lets assume here at extreme condition that FX has 21.8% less IPC than phenom II. (I intentionally took even lower IPC than what people usually think.)So here we start speculation keeping in mind that 3.9 GHz FX IPC is equal to 3.2 GHz Phenom II in six threaded multithreading because AMD calls both CPUs as hexa core CPU. Ok ?
Phenom II X6 1090T at 3.2 GHz = 49 fps minimum
FX 6350 at 3.9 GHz = 34 fps minimum
You are saying FX's each core within a module can do 128 bit FP concurrently (I agree with both of you)
and each core at rated frequency (I don't agree).[Condition 1:]
Game is using 128 bit instructions.( Up to SSE4.2)
FX at 3.9 GHz should be definitely equal to Phenom II at 3.2 GHz. (Remember IPC is same)Q1. Why is FX drawing 44% less FPS than Phenom ?[condition 2]
Game is using 256 bit instructions( FMA / AVX).
Note: Phenom II doesn't natively support 256 bit instructions.
So theoretically Phenom II should do it in two cycles (two 128 bit cycles per core of Phenom II).
FX does support 256 bit instructions but unfortunately it does it in two cycles, so no FPU improvement over Phenom II (one 128 bit cycle by per core in a module)
So again both FX and Thuban become equal in 256 bit operations.Q2. Why is FX still giving 44% less minimum fps than Phenom ?
FX has much higher memory bandwidth because of improved memory controller, so CPU memory bandwidth should never be an issue here.
I respect you all, this has never been about picking on anyone. This is not my type. So please don't be rude to me and its Now your turn to try to convince me that each core is still concurrently running at rated speed.
Remember I believe each module runs at rated speed.Edited by sumitlian - 10/11/13 at 10:39pm