Originally Posted by Hydroplane
A new architecture would be great to see. GCN was great in 2012 but really ran out of steam by like 2015. Hopefully a chiplet design will be coming for the GPUs. Should not be difficult considering GPUs are already massively parallel. This would increase yield (especially vs. the massive die Titan RTX / 2080 Ti) and allow for chip standardization across the lineup similar to Ryzen. Better economies of scale. Chiplets would also help spread out the heat from the small 7nm dies across a greater area.
I believe the main issue is bandwidth. Right now there are no real numbers for internal die bandwidth because they arent needed, but die to vmem we get 400+ gigabytes per second. And it needs all that bandwidth. Right now, the absolute fastest, most insane interconnects, taking up all available lanes of the interconnect only reach as high as 200GB/s. Typically they are closer to 50GB/s with normal lane use per die.
If they use chiplets to create a big GPU with 2-4 "core dies" that have 2-3k shader cores each, they would need to have an interconnect on them that takes up half the die space itself just for the interconnect to reach the necessary 400-500GB/s per die
. Then they would need a similar "front end die" that is the scheduler itself and that die would be only a scheduler, some small cache, and basically a polaris sized section of nothing but interconnects to the core dies. The energy cost on it would be massive, the actual die cost would be massive. Really 7nm is even the first year we can even hope for such a design possibility, with realistically more like 3-5nm designs being where it is truly feasible. Then they have to work out whether each core die will have its own render back-end and how to sync them up on the displayed image, or if they will include a "back end die" that they then must include all the interconnects for and double the number of interconnects in the core dies again. And based on what is decided you would need memory controllers in the core die of course, but you may also need additional memory chips and controllers ion the back end die. Its quite complicated to do and a huge chunk of the space would really be wasted on just the interconnects to make it possible. Might even have to go to double PCB cards where the back card is just the power input and VRMs with a smallish heatsink on the back, and the front PCB would have the display outputs and all the dies and memory chips on it since we would need way more space to do a design like this.
As it is right now we just dont have the necessary bandwidth in an interconnect. NVidia is closest with NVLink2 having more bandwidth than PCI-E 5.0, but they would still have to include a full 16 lane NVLink2 in each core die, and a 32-64 lane setup on the front end section. The hardware exists, in the form of NVSwitch that they just introduced. But the cost on these parts is huge. If they actually integrated it into a gaming GPU I would be the price would be $2000-2500 and they really wouldnt even have that much profit on the sale. And just for reference, the NVSwitch chip, where all it has is a front end scheduler (to make each GPU look like 1 to the OS) and the interconnects has a TDP of 100w BY ITSELF. It has enough bandwidth for running 2 big dies worth of internal chips, giving 450GB/s to each die. Maybe we could even get away with doing 300GB/s each to 3 dies for gaming use. If we shrink that to 7nm, we could get the TDP down to *maybe* 75w, but thats just the front end power draw. We may also need to double that for back end chip, and the draw from each core die. So think about that power draw just from interconnects. Ya...