Originally Posted by ILoveHighDPI
EDIT: Commenting on INT32 cores is really above my head, however.
After a bit of reading it does seem that INT32 is generally useful, I assume it’s part of being a GPGPU architecture, but the quantity of INT32 cores may be exaggerated in Turing to help enable the Tenser Cores, or as a vestigial component from the Workstation design.
Was this meant in response to me
If so INT32 are regular integer units (32-bit). They are used in GPGPU, but also for graphics. Basically simple calculations like 2 + 2 = 4 are integer operations. Also to evaluate which number is bigger, bit shifts etc. Some practical use cases would be pixel color manipulations, image processing filters and dataflow control (conditional execution). FP32 are floating point units and are used when you need data to be accurate to several decimal points or simply need to represent very small and very large numbers (game physics, camera rotations and lighting/shading fall in this category, i.e. most of the heavy lifting GPUs do).
Integer arithmetic is much simpler, since you can easily/efficiently perform operations on each individual bit. Floating point arithmetic is more complicated because, while the number representation scheme is efficient for representing a great range of numbers, it uses a certain encoding scheme that needs to be accounted for in each calculation. That is why floating point units are much larger, slower and power hungry than integer units (especially the double precision floating point units integrated in Voltage and Radeon VII).
Tensor cores are custom hardware that take in a batch (48 arranged in three 4x4 grids) of small (half precision) floating point operands and perform matrix multiplications on them. It's slightly more complicated (technically they multiply two FP16 4x4 grids and add it to a FP32 4x4 grid). They would be larger than both FP32 and INT32 in size, but there are not a lot of them integrated on chip. Most of the work is still done on the FP32 units.
I do think Tensor cores and RTX cores added significantly to the bill of the Turing architecture, but it's not possible to say how much of the die area went to which component (cache vs registers vs RTX vs Tensor vs FP32 vs INT32 etc.). What you can see is that TU102 is massive, so there certainly is a significant increase in hardware but it's difficult to make an exact taxonomy without specific data. Definitely these features added to R&D costs though.