Originally Posted by MrJava
Just list of the things that change when you do SMT within the INT core:
- decode shared by two threads
- integer RF shared by two threads
- scheduler shared by two threads
- data cache shared by two threads
- thread retire shared by two threads
Its a nightmare.
The simplest solution to not having enough cores ... (drum roll) .... is to add more cores (by adding more modules).
To understand the magnitude of the problem of a shared decoder, consider microcoded instructions. Division (which is microcoded) can take up to 72 cycles. When the decoder is issuing micro-ops for a micro-coded instruction to one thread, it cannot decode instructions for the other thread until the microcode program has finished.
This means if one thread is doing a particularly bad divide, the other core is starving for up to 72 cycles (a very long time). Dedicated decode per thread makes the problem go away. Oh and btw, if you don't think this issue affects Hyperthreading, think again.
I'm thinking in the future, intel may introduce dedicated decode for each thread for Hyperthreading in future cores.
Adding more full cores is not an option-they take die space,making the chip power hungry and more expensive to fab/sell. As it stands right now, AMD has no answer to mobile i7 which is possibly their biggest retail problem (far bigger than any FX shortcomming) as it locks them out of the upper half of laptop market. Still, they don't seem to be really pulling an intel-style SMT here-there seems to be far more "hardware" involved. With SR they supposedly remove the most apparent bottlenecks (and they add their own version of μοp cache which should work wonders in that department) thus making their module design behave mostly as a standard pair of cores while taking less die space. Now that they established this, it makes sense to go back and see what they can double up again in order to boost their multithreading performance, this time investing less in die space while making it easier for windows to understand how to deal with the extra threads (I hope).
Everything points to that leaked die shot being real , here is another hint:
Full 256, not unified 128bit units.
I am not saying this is the right way of doing things, just that it makes sense. It is a big bet though, but if they deliver a decent design, they will have an APU that is a bit behind intel at ST but on par (if not better) at MT, and this time while using similar die space.
Originally Posted by NaroonGTX
Vishera already got refreshed twice... First with the FX-4350 and 6350, then FX-9000 series. All they could do now is Vishera with RCM (descended from Warsaw) but that would probably be people getting 5ghz clocks easier, which isn't a massive perf boost or big incentive for people who already have Vishera chips.
Vishera hasn't seen any refresh at all- just more SKUs. All FX piledrivers are the same stepping.