Originally Posted by ku4eto
Very interesting stuff. Where are those "patch notes" coming from ?If its only using 2x128, that wouldn't be too good :/
2x128 on this design will be just as effective as 2x256 on Intel's design for all non-AVX instructions.
Zen can also do 4x64 instructions as well, and certain operations should be able to do 8x32 (32-bit being the near-universal floating point size).
I wouldn't be surprised to learn that AMD didn't prioritize AVX performance, with their strong desire to lean on GPU compute (for obvious reasons).
I think they're moving towards being able to directly move certain floating point vector math to the GPU without any special coding tools or compilers needed. All-on-CPU. In theory, it wouldn't be that difficult, but it would absolutely introduce latency, so they would need to be able to fetch a significant amount of instructions and just have the GPU work on all of the likely branches all at once, then when the logic units know which results are needed, simply pull in the finished results. The end result would be an apparent near-zero latency for certain floating point operations