Originally Posted by Seronx
Zen is perfect for replacing 14h/16h (Bobcat/Jaguar) core families. It is however not perfect for replacing 15h (Bulldozer) core families.
Automation hasn't been an issue at all. Agena/10h was bad because 65nm PDSOI was using a LP transistor implant to reduce costs. Anything else is because the architecture didn't meld well with the node. Core as a whole has higher computational and memory throughput than K8-derived architectures. Which gives Core the advantage of running lower clocks. It also had the advantage of being Low Power on High Performance nodes. While, AMD was High Performance on Low Power.
Mispredicts will always be an issue. Bulldozer derived isn't any worse off from Zen or Sandy Bridge to Skylake. If it doesn't hit the L0 instruction cache [Loop/Micro-op Buffer] then it will hit L1 instruction cache which is 15 cycles(Sandy)/16 cycles(Bulldozer)/18 cycles(Zen). The front-end in Bulldozer/Piledriver provided enough instructions. Four macro-ops equal 4 computational[Reg-ALU-Reg] and 4 load+store ops[Reg-Mem/Mem-Reg].
15 cycle mispredict (branch in-pipe/pick stage)
16 cycle mispredict (branch in L1 instruction cache)
20 cycle mispredict (branch in L2 unified cache)
9-13 cycle mispredict (branch in L0 instruction cache)
17 cycle mispredict (branch in-pipe/pick stage)
18 cycle mispredict (branch in L1 instruction cache)
22 cycle mispredict (branch in L2 unified cache)
Branches are very unlikely to hit L2 cache (~32KB). While Jumps will almost always go to memory(~32MB).
Fully Depleted Silicon on Insulator, literally has a dynamic/adaptive voltage frequency scale that shoots up. The 22 nanometer Fully Depleted [X being Base Platform /+Ultra Low Power components /+Ultra High Performance components /+Radio Frequency and Analog components /+Ultra Low Leakage components] 1.0 physical design kit is expected to allow for production releases for Mid-2017 to Late-2017. The 22nm FDSOI node allows for vastly more improved EDA tools over FinFET nodes which means faster and denser designs with the same macros.
For the shrink to 22nm FDSOI that particular 15h architecture gets all the Bulldozer architectural improvements from 20 nanometer Low Power Mobility slash 14nm Extreme Mobility versions. Which I have found to be Full 256-bit Vectors/Loads/Stores, Complex Arithmetic Logic Units instead of Simple Arithmatic Logic units in the Address Generating Logic Units, 2x 32 Byte Load and 1x 32 Byte store per L1 data cache which is per core.
If 22nm FDX is used then six transistor SRAM with forward body bias can replace eight transistor SRAM. Which means more advanced improvements can be applied to the cores and floating point unit, and lastly the front-end. This would be allowed with the shift from 8T SRAM to 6T SRAM+FBB in the L1 instruction cache and L1 data caches. The overall added area shrink might even allow for a double bandwidth boost to the L2 unified cache.