Originally Posted by sumitlian
I think I had figured that out wrong. Is it due to the 16B store per cycle
? Because everything other than this "16B store" technically conform AVX parts to be executed in a single cycle. BD had two cores each with 128b FPU and each core had to do one 128b operation in a cycle and combining it as one was obviously 2 clocks one 256 bit operation. But this time with Zen it is not that old Module architecture. Why did AMD not go full 256 per cycle configuration even this time ? Even after 5 years since the introduction of practical AVX with Intel SB ? This doesn't make sense.
Okay, I don't know you've perceived this or not by now or may be I am wrong, but I seriously think Intel somehow is barring AMD to use the same 4 x 64b AVX or 8 x 32 float/INT AVX2 configuration for FPU. May be I become overwhelmed about this but, anyway, I seriously think AMD has now completely lost the control of FP part of x86-64 in the market since the Intel introduced AVX. What do you think on this ?
Zen's need for two cycles for 256-bit instructions seems to simply be a cost/benefit decision on AMD's part. They have two 128-bit units in the FPU, but don't have a way to combine them, yet. They probably decided it cost too many transistors or had too much of a power impact for the performance it would provide. 256-bit AVX instructions are still pretty niche.
I highly doubt Intel is blocking anything and I'm pretty sure AVX/AVX2 and it's future iterations are covered by the cross-licencing agreement that AMD and Intel have had forever. Excavator and Zen both support AVX2, they just don't have as robust an implementation as Intel, yet.