Overclock.net - An Overclocking Community - View Single Post - [Techpowerup] NVIDIA DLSS and its Surprising Resolution Limitations

View Single Post
post #77 of (permalink) Old 02-20-2019, 04:23 AM - Thread Starter
New to Overclock.net
ILoveHighDPI's Avatar
Join Date: Oct 2011
Posts: 3,284
Rep: 133 (Unique: 84)
Quote: Originally Posted by TheBlademaster01 View Post
Was this meant in response to me ?

If so INT32 are regular integer units (32-bit). They are used in GPGPU, but also for graphics. Basically simple calculations like 2 + 2 = 4 are integer operations. Also to evaluate which number is bigger, bit shifts etc. Some practical use cases would be pixel color manipulations, image processing filters and dataflow control (conditional execution). FP32 are floating point units and are used when you need data to be accurate to several decimal points or simply need to represent very small and very large numbers (game physics, camera rotations and lighting/shading fall in this category, i.e. most of the heavy lifting GPUs do).

Integer arithmetic is much simpler, since you can easily/efficiently perform operations on each individual bit. Floating point arithmetic is more complicated because, while the number representation scheme is efficient for representing a great range of numbers, it uses a certain encoding scheme that needs to be accounted for in each calculation. That is why floating point units are much larger, slower and power hungry than integer units (especially the double precision floating point units integrated in Voltage and Radeon VII).

Tensor cores are custom hardware that take in a batch (48 arranged in three 4x4 grids) of small (half precision) floating point operands and perform matrix multiplications on them. It's slightly more complicated (technically they multiply two FP16 4x4 grids and add it to a FP32 4x4 grid). They would be larger than both FP32 and INT32 in size, but there are not a lot of them integrated on chip. Most of the work is still done on the FP32 units.


I do think Tensor cores and RTX cores added significantly to the bill of the Turing architecture, but it's not possible to say how much of the die area went to which component (cache vs registers vs RTX vs Tensor vs FP32 vs INT32 etc.). What you can see is that TU102 is massive, so there certainly is a significant increase in hardware but it's difficult to make an exact taxonomy without specific data. Definitely these features added to R&D costs though.
Amazing post, thanks a bunch.

I think we’ll “mostly” have our answer about the die cost of RTX in a few days when the 1660Ti launches.
ILoveHighDPI is offline