Originally Posted by Robenger @looncraz
LOL! I'll do my best
First up, the patent represents a simple, but probably extremely effective, means to reduce power draw - and eliminate double decoding of instructions - in loops small enough to fit in the instruction cache. When a loop is detected (which is rather a simple affair), the next thing to do is to make sure you have all of the instructions you need to execute that loop in the instruction cache. If you do, you can turn off the decoders and run the loop [almost] entirely just in the execution units.
The awesome thing with that is that you have more power available for use during execution (probably *not* more than a 10 watts at full clocks). In addition, the cache means you aren't stalling for instructions in loops whereas the construction cores (and even Intel cores) may stall, leaving the execution units idle (hurting performance). It's a really awesome, but simple, idea.
The L0 ITLB is probably related to keeping memory address translations for the above loops, which decreases the load on the AGUs.
Checkpoint queue parity I think was described quite well - it's a 100% performance feature. Intel actually does something similar for bypassing pipeline stages. The best part is that this is the first hint we have that AMD may have kept the longer pipelined design so that they can keep high clock-speeds with high IPC (like Intel managed).
What's most important about this patch, though, is it gives us a good idea as to which 'leaked' information or slides are accurate. It also tells us that AMD is aiming very high, and that 40% may be more of a safer claim than it seemed.
For me, I think the most interesting aspect of all of this is just how much effort AMD has put into minimizing memory accesses. Everywhere we look AMD has done something. From multiple concurrent page walks (looking up an address in the operating system's page table) to no fewer than four levels of caches, dedicated path to the AGUs, to multiple translation look-aside buffers (TLB). There's a lot in there that is especially good for SMT. My 15% scaling estimate may have just been shattered, but there are many inhibitors for SMT performance. If AMD gets close to Intel's Hyper-Threading (30~40%), then we have a more serious market shakeup coming.Edited by looncraz - 2/29/16 at 2:25pm