That is far from equivalent to Mantle; that's adding ARM-based hardware to the GPU to offload work from the CPU. Looks like the goal is to make it so that any GPU work that needs to be done after the CPU's done is closer to the GPU, removing latency. This in theory could work along-side an API like Mantle and DirectX.
However, programming for multi-threading is already a PITA... This would take it to a whole new level. Furthermore, they would need to compile the code that runs on those ARM cores separately from the code that runs on the X86 chip. Plus, how are they going to share an L3 cache when how data is stored in the cache is architecture dependent? It sounds like the X86 CPU is still going to have to do a bunch of work to get the relevant information to the GPU's ARM chips.
Basically, I'm not sure I fully believe this is actually real... And if it is, how much work is actually
going to be offloaded when it's all said and done? I can't imagine those little ARM cores are going to be all that powerful...
Also, is it really more efficient to pass instructions to the ARM chips over the PCI bus than it is to pass the results? My guess is no... There would be a ton of latency if the ARM chips had to go back and grab data from the L3 cache over the PCI bus, and what happens when there is a cache miss? How many clock cycles will that
...This is very relevant to a CS class I'm taking, I'm going to show this to my professor and ask her what she thinks. She does high-performance computing and works with Nvidia's Tesla products, she might have some insight.Edited by SectorNine50 - 11/14/13 at 10:44am