GM107 would likely be reused for GTX 850 , GT 840 / GT 830 , unless there's a refresh GM207 or GM208
GM206 (<150W , 6-pin PCI-e power) - possible SKU = GTX 850 Ti / GTX 860
^ $150-200 MSRP pricepoints , 2-way SLI
2 GPCs with 192-bit memory bus w/ fast memory = GTX 660 Ti performance at a minimum ; if power use scales nicely it'll be < 120W , maybe even < 110W ( ~26-29 GFLOPs/W)
* 3.3Tflops FP32, ~ 104GFLOPs FP64, 102 GTexels/s , 31.2 GPixels/s at ~ 1300Mhz
* 1280 CUDA , 80 TMU, 24 ROPs
* Memory bandwidth , 192-bit @6Ghz eff = 144 GB/s ; @7Ghz eff = 163GB/s
* If 2GB VRAM is used then there's going to be interleaving issues (see http://www.anandtech.com/show/6159/the-geforce-gtx-660-ti-review/2 ) but there's no problems with 1.5GB or 3GB
* 2MB L2 Cache
* roughly under R9 280 @ 933Mhz spec when at 1300Mhz while consuming half the wattage
* Would be roughly 2.8Tflops, 88 GTexels/s , 26.4 GPixels/s at 1100Mhz (GTX 750 Ti & GTX 750 are 1085Mhz Boost)
* 10 SMM = 80 TMUs ; GTX 660 had 80 TMUs (the rationale for this)
Probable Block diagram , SM206 192-bit (Click to show)
2 GPCs with 256-bit memory bus w/ low voltage memory = GTX 670 performance at a minimum ; if power use scales nicely it'll be < 120W
* 3.3Tflops FP32, ~ 104 GFLOPs FP64, 102 GTexels/s , 41.6 GPixels/s at ~ 1300Mhz
* 1280 CUDA , 80 TMU, 32 ROPs
* Memory Bandwidth , 5Ghz eff = 160GB/s , 5.4Ghz eff = 173 GB/s (see Quadro K5000 with 122W TDP)
* 2GB VRAM or 4 GB VRAM
* 2MB L2 Cache
* Would be roughly 2.8Tflops, 86 GTexels/s , 27 GPixels/s at 1100Mhz (GTX 750 Ti & GTX 750 are 1085Mhz Boost) , but I expect having more power from PCI-e would allow higher Boost clocks
Probable Block diagrams, SM206 256-bit (Click to show)
GM204 (< 225W , 2 pin PCI-e power) - possible SKU = GTX 860 Ti / GTX 870
^ $200-400 pricepoints , 3-way or 4-way SLI
3 GPCs with 256-bit memory bus clocked high = roughly stock GTX 780 performance ; < 180W (~22-29 GFLOPs/W)
* 5Tflops FP32, ~ 160 GFLOPs FP64, 152 GTexels/s , 41.6 GPixels/s at ~ 1300Mhz
* 1920 CUDA, 120 TMU, 32 ROPs
* Memory bandwidth , 7Ghz eff = 224 GB/s (see GTX 770)
* 4 GB VRAM
* 2MB L2 Cache
* GTX 770 had 128 TMUs ; GTX 670 had 112 TMUs (the rationale for this)
* Would be roughly GTX 780 performance (GTX 780@ 1085Mhz Boost = ~4.2TFlops, ~174 GTexels/s , ~44 GPixels/s)
* GTX 770 replacement to claim +35% faster
3 GPCs with 384-bit memory bus w/ low voltage memory = roughly stock GTX 780 performance ; < 180W
* 5Tflops FP32, ~ 160 GFLOPs FP64, 152 GTexels/s , 62.4 GPixels/s at ~ 1300Mhz
* 1920 CUDA, 120 TMU, 48 ROPs
* Memory Bandwidth , 6Ghz eff = 288 GB/s (see Quadro K6000 with 225W TDP)
* 3GB VRAM or 6GB VRAM
* 2MB L2 Cache
* roughly R9 290 spec @ 1Ghz / GTX 770 replacement
4 GPCs with 384-bit memory bus w/ low voltage memory = GTX TITAN performance at a minimum ; if power scales nicely it'll be < 240W
* 6.7Tflops FP32, ~ 208 GFlops FP64 , 203 GTexels/s , 62.4 GPixels/s at ~ 1300Mhz
* 2560 CUDA , 160 TMU , 48 ROPs
* Memory Bandwidth , 6Ghz eff = 288 GB/s (see Quadro K6000 with 225W TDP)
* 3GB VRAM or 6GB VRAM
* 2MB L2 Cache
* R9 290 has 2560 GCN cores, 160 TMU, 64 ROP
GM200 / GM210 (<300W , 6-pin and 8-pin PCI-e power) - possible SKU = GTX 880 Ti / GTX 880 (~30-40 GFlops/W)
^ $400+ pricepoints , 4-way SLI
Different GPC layout with focus on double precision & compute
A gaming-gimped compute one could have 5 GPCs
5 GPCs , 384-bit memory bus clocked high , 48 ROPs
* 7TFlops FP32, ~219GFLOPs FP64, 215 GTexels/s , 52.8 GPixels/s ~ at 1100Mhz
* 3,200 CUDA , 200 TMU, 48 ROPs
* Memory Bandwidth , 7Ghz eff = 336 GB/s (see GTX 780 Ti , GTX TITAN Black)
* 6GB VRAM
* L2 cache? (> 1.5MB)
* GTX TITAN had 224 TMUs , GTX 780 has 192 TMU
A gaming-gimped compute one could have 5 GPCs
5 GPCs , 512-bit memory bus with low voltage memory , 64 ROPs
* 7TFlops FP32, ~219GFLOPs FP64, 215 GTexels/s , 70.4 GPixels/s ~ at 1100Mhz
* 3,200 CUDA , 200 TMU, 64 ROPs
* 4GB VRAM or 8GB VRAM
* L2 cache? (> 1.5MB)
* GTX TITAN had 224 TMUs , GTX 780 has 192 TMU
* roughly GTX 680 SLI or HD7970 Crossfire?
Tabulated based off core clock
Possible cut-down versions with 40 ROPs and 320-bit memory or something akin to the GTX 580 --> GTX 570 / GTX 560 Ti approach. I don't believe they will make a GTX 670 product lineup mistake , if a cut-down version is only cut 20% shader/TMU-wise it would be worth it.
Speculated Product Lineup before EOL of GTX 700 series , based on price:
High-end Enthusiast
GM200 / GM210 card (flagship pricing) ~ $600-1000
GTX TITAN Black (GK110 full die, full FP64) ~ $650- 800 ... no less than $500 due to FP64 performance
GM200 / GM210 cut-down die (GTX 880?) ~ $500-600
GTX 780 Ti (GK110 full die) ~ $450 - 550
GTX 780 (GK110 cut-down die) ~ $350 -400
(GTX 770 Ti GK110 with 1920 CUDA would go here)
Performance Mid-range
GM204 full die (GTX 870?)--> $200 -400 , maybe $350-450 at launch if 20nm
GM204 cut-down die (GTX 860 TI?) --> $200-400 , probably $250-350 at launch
GTX 770 (GK104 full die) ~ $250-300
(GTX 760 Ti (GK104 GTX 670 rebrand possibly with higher clocked VRAM would go here))
Mid-range
GM206 full die (GTX 860?) --> $150-200 , probably $220- 250 at launch
GTX 760 (GK104 cut-down die) ~ $180-220
GM206 cut-down die (GTX 850 Ti?) --> $150-200 , probably $180-220 at launch
Entry Level
GTX 850 / GTX 750 Ti (GM107 full die) ~ $110-130
GT 840 / GTX 750 (GM107 cut-down die) ~ $90-100
NOTE:
Pixel Fill rate scales with ROPs , not cores & TMUs. Generally ROPs x 8 = memory bus width in bits.
Texture Fill Rate = (# of TMUs) x (Core Clock)
Pixel Fill Rate = (# of ROPs) x (Core Clock)
FLOPs=cores x Core clock x FLOPs per cycle
There's two strategies to optimize power use, wider memory bus clocked low (i.e. low voltage memory) or smaller memory bus clocked high (with extra L2 cache) and decreasing time to idle / downclock power state.
See also http://techreport.com/review/26050/nvidia-geforce-gtx-750-ti-maxwell-graphics-processor/11 , comparing Peak Rasterization / Peak Pixel Fill / Peak shader FLOPs vs FPS
The smallest unit of a Maxwell GPU, the SMM , has 128 CUDA cores. Currently GM107 has 5 SMM in one GPC. By inference the possibilities are:
Warning: Spoiler! (Click to show)
512 --> GTX 750 currently (cut down GM107) , 4 SMM
640 --> GTX 750 TI currently (GM107) , 5 SMM
768
896
1024 --> Possible cut-down 2 GPC card such as a GTX 850 Ti , 8 SMM (2 SMM disabled of 10)
1152 --> Possible cut-down 2 GPC card such as a GTX 850 Ti / GTX 855 / GTX 860 LE, 9 SMM (1 SMM disabled of 10)
1280 --> 2 GPC = 10 SMM if SMMs per GPC stay the same (GTX 860?)
1408
1536 --> Possible cut-down 3 GPC card such as GTX 860 Ti , 12 SMM (3 SMM disabled of 15)
1664 --> Possible cut-down 3 GPC card such as GTX 860 Ti / GTX 865 , 13 SMM (2 SMM disabled of 15)
1792
1920 --> 3 GPC = 15 SMM if SMMs per GPC stay the same (GTX 870?)
2048
2176
2304
2432
2560 --> 4 GPC if SMMs per GPC stay the same (GTX 880?)
2688
2816
2944
3072
3200 --> 5 GPC if SMMs per GPC stay the same (GTX TITAN MAX? GTX 880 Ti Gaming card?)
Nvidia claims Maxwell is 35% stronger per core vs Kepler and twice as efficient. (see whitepaper http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce-GTX-750-Ti-Whitepaper.pdf )
SMM structure (GM107 , SM200 likely has Double precision and such):
Since we know the architecture's most small unit is a SMM made of 128 CUDA cores, for the mid-range chips it's all about whether Nvidia keeps the 5 SMM per GPC layout and how many GPCs and ROPs (and therefore the memory width bus chosen) will be allowed. Big Maxwell GM200/GM210 is unlikely to keep the same number of SMMs per GPC.
WISHFUL THINKING
Another price/performance powerhouse like the 8800GT (a high end part at mid-range prices) could be had with a GM204 chip if they aren't gimped , a $250-300 card with GTX 780/GTX 780Ti performance and less than 180W power consumption would be astounding.
Even the GTX 560 Ti was rather good value at $250 compared to the GTX 660 Ti and GTX 650 Ti non-Boost which were both quite poor in price/performance value.
If we are extremely optimistic, then we may see 3 GPCs in GM206 , but it would likely be clocked low at ~ 1100Mhz or gimped with 192-bit memory bus in order to hit 150W power consumption.
Likewise, a 4 GPC GM204 would be extremely nice assuming to can fit 2560 Maxwell CUDA cores in 225W. Assuming 30 GFlops/W (double efficiency of GK104) this is doable even if it's clocked at 1300Mhz and using 384-bit memory bus.
3200 CUDA cores is the minimum I expect from GM200/GM210 but I hope for 3840 CUDA cores (6 GPCs of the current GM107 structure).
Performance relative to current cards
GM206 (<150W , 6-pin PCI-e power) - possible SKU = GTX 850 Ti / GTX 860
^ $150-200 MSRP pricepoints , 2-way SLI
2 GPCs with 192-bit memory bus w/ fast memory = GTX 660 Ti performance at a minimum ; if power use scales nicely it'll be < 120W , maybe even < 110W ( ~26-29 GFLOPs/W)
* 3.3Tflops FP32, ~ 104GFLOPs FP64, 102 GTexels/s , 31.2 GPixels/s at ~ 1300Mhz
* 1280 CUDA , 80 TMU, 24 ROPs
* Memory bandwidth , 192-bit @6Ghz eff = 144 GB/s ; @7Ghz eff = 163GB/s
* If 2GB VRAM is used then there's going to be interleaving issues (see http://www.anandtech.com/show/6159/the-geforce-gtx-660-ti-review/2 ) but there's no problems with 1.5GB or 3GB
* 2MB L2 Cache
* roughly under R9 280 @ 933Mhz spec when at 1300Mhz while consuming half the wattage
* Would be roughly 2.8Tflops, 88 GTexels/s , 26.4 GPixels/s at 1100Mhz (GTX 750 Ti & GTX 750 are 1085Mhz Boost)
* 10 SMM = 80 TMUs ; GTX 660 had 80 TMUs (the rationale for this)
Probable Block diagram , SM206 192-bit (Click to show)
2 GPCs with 256-bit memory bus w/ low voltage memory = GTX 670 performance at a minimum ; if power use scales nicely it'll be < 120W
* 3.3Tflops FP32, ~ 104 GFLOPs FP64, 102 GTexels/s , 41.6 GPixels/s at ~ 1300Mhz
* 1280 CUDA , 80 TMU, 32 ROPs
* Memory Bandwidth , 5Ghz eff = 160GB/s , 5.4Ghz eff = 173 GB/s (see Quadro K5000 with 122W TDP)
* 2GB VRAM or 4 GB VRAM
* 2MB L2 Cache
* Would be roughly 2.8Tflops, 86 GTexels/s , 27 GPixels/s at 1100Mhz (GTX 750 Ti & GTX 750 are 1085Mhz Boost) , but I expect having more power from PCI-e would allow higher Boost clocks
Probable Block diagrams, SM206 256-bit (Click to show)
GM204 (< 225W , 2 pin PCI-e power) - possible SKU = GTX 860 Ti / GTX 870
^ $200-400 pricepoints , 3-way or 4-way SLI
3 GPCs with 256-bit memory bus clocked high = roughly stock GTX 780 performance ; < 180W (~22-29 GFLOPs/W)
* 5Tflops FP32, ~ 160 GFLOPs FP64, 152 GTexels/s , 41.6 GPixels/s at ~ 1300Mhz
* 1920 CUDA, 120 TMU, 32 ROPs
* Memory bandwidth , 7Ghz eff = 224 GB/s (see GTX 770)
* 4 GB VRAM
* 2MB L2 Cache
* GTX 770 had 128 TMUs ; GTX 670 had 112 TMUs (the rationale for this)
* Would be roughly GTX 780 performance (GTX 780@ 1085Mhz Boost = ~4.2TFlops, ~174 GTexels/s , ~44 GPixels/s)
* GTX 770 replacement to claim +35% faster
3 GPCs with 384-bit memory bus w/ low voltage memory = roughly stock GTX 780 performance ; < 180W
* 5Tflops FP32, ~ 160 GFLOPs FP64, 152 GTexels/s , 62.4 GPixels/s at ~ 1300Mhz
* 1920 CUDA, 120 TMU, 48 ROPs
* Memory Bandwidth , 6Ghz eff = 288 GB/s (see Quadro K6000 with 225W TDP)
* 3GB VRAM or 6GB VRAM
* 2MB L2 Cache
* roughly R9 290 spec @ 1Ghz / GTX 770 replacement
4 GPCs with 384-bit memory bus w/ low voltage memory = GTX TITAN performance at a minimum ; if power scales nicely it'll be < 240W
* 6.7Tflops FP32, ~ 208 GFlops FP64 , 203 GTexels/s , 62.4 GPixels/s at ~ 1300Mhz
* 2560 CUDA , 160 TMU , 48 ROPs
* Memory Bandwidth , 6Ghz eff = 288 GB/s (see Quadro K6000 with 225W TDP)
* 3GB VRAM or 6GB VRAM
* 2MB L2 Cache
* R9 290 has 2560 GCN cores, 160 TMU, 64 ROP
GM200 / GM210 (<300W , 6-pin and 8-pin PCI-e power) - possible SKU = GTX 880 Ti / GTX 880 (~30-40 GFlops/W)
^ $400+ pricepoints , 4-way SLI
Different GPC layout with focus on double precision & compute
A gaming-gimped compute one could have 5 GPCs
5 GPCs , 384-bit memory bus clocked high , 48 ROPs
* 7TFlops FP32, ~219GFLOPs FP64, 215 GTexels/s , 52.8 GPixels/s ~ at 1100Mhz
* 3,200 CUDA , 200 TMU, 48 ROPs
* Memory Bandwidth , 7Ghz eff = 336 GB/s (see GTX 780 Ti , GTX TITAN Black)
* 6GB VRAM
* L2 cache? (> 1.5MB)
* GTX TITAN had 224 TMUs , GTX 780 has 192 TMU
A gaming-gimped compute one could have 5 GPCs
5 GPCs , 512-bit memory bus with low voltage memory , 64 ROPs
* 7TFlops FP32, ~219GFLOPs FP64, 215 GTexels/s , 70.4 GPixels/s ~ at 1100Mhz
* 3,200 CUDA , 200 TMU, 64 ROPs
* 4GB VRAM or 8GB VRAM
* L2 cache? (> 1.5MB)
* GTX TITAN had 224 TMUs , GTX 780 has 192 TMU
* roughly GTX 680 SLI or HD7970 Crossfire?
Tabulated based off core clock
Possible cut-down versions with 40 ROPs and 320-bit memory or something akin to the GTX 580 --> GTX 570 / GTX 560 Ti approach. I don't believe they will make a GTX 670 product lineup mistake , if a cut-down version is only cut 20% shader/TMU-wise it would be worth it.
Speculated Product Lineup before EOL of GTX 700 series , based on price:
High-end Enthusiast
GM200 / GM210 card (flagship pricing) ~ $600-1000
GTX TITAN Black (GK110 full die, full FP64) ~ $650- 800 ... no less than $500 due to FP64 performance
GM200 / GM210 cut-down die (GTX 880?) ~ $500-600
GTX 780 Ti (GK110 full die) ~ $450 - 550
GTX 780 (GK110 cut-down die) ~ $350 -400
(GTX 770 Ti GK110 with 1920 CUDA would go here)
Performance Mid-range
GM204 full die (GTX 870?)--> $200 -400 , maybe $350-450 at launch if 20nm
GM204 cut-down die (GTX 860 TI?) --> $200-400 , probably $250-350 at launch
GTX 770 (GK104 full die) ~ $250-300
(GTX 760 Ti (GK104 GTX 670 rebrand possibly with higher clocked VRAM would go here))
Mid-range
GM206 full die (GTX 860?) --> $150-200 , probably $220- 250 at launch
GTX 760 (GK104 cut-down die) ~ $180-220
GM206 cut-down die (GTX 850 Ti?) --> $150-200 , probably $180-220 at launch
Entry Level
GTX 850 / GTX 750 Ti (GM107 full die) ~ $110-130
GT 840 / GTX 750 (GM107 cut-down die) ~ $90-100
NOTE:
Pixel Fill rate scales with ROPs , not cores & TMUs. Generally ROPs x 8 = memory bus width in bits.
Texture Fill Rate = (# of TMUs) x (Core Clock)
Pixel Fill Rate = (# of ROPs) x (Core Clock)
FLOPs=cores x Core clock x FLOPs per cycle
There's two strategies to optimize power use, wider memory bus clocked low (i.e. low voltage memory) or smaller memory bus clocked high (with extra L2 cache) and decreasing time to idle / downclock power state.
See also http://techreport.com/review/26050/nvidia-geforce-gtx-750-ti-maxwell-graphics-processor/11 , comparing Peak Rasterization / Peak Pixel Fill / Peak shader FLOPs vs FPS
The smallest unit of a Maxwell GPU, the SMM , has 128 CUDA cores. Currently GM107 has 5 SMM in one GPC. By inference the possibilities are:
Warning: Spoiler! (Click to show)
512 --> GTX 750 currently (cut down GM107) , 4 SMM
640 --> GTX 750 TI currently (GM107) , 5 SMM
768
896
1024 --> Possible cut-down 2 GPC card such as a GTX 850 Ti , 8 SMM (2 SMM disabled of 10)
1152 --> Possible cut-down 2 GPC card such as a GTX 850 Ti / GTX 855 / GTX 860 LE, 9 SMM (1 SMM disabled of 10)
1280 --> 2 GPC = 10 SMM if SMMs per GPC stay the same (GTX 860?)
1408
1536 --> Possible cut-down 3 GPC card such as GTX 860 Ti , 12 SMM (3 SMM disabled of 15)
1664 --> Possible cut-down 3 GPC card such as GTX 860 Ti / GTX 865 , 13 SMM (2 SMM disabled of 15)
1792
1920 --> 3 GPC = 15 SMM if SMMs per GPC stay the same (GTX 870?)
2048
2176
2304
2432
2560 --> 4 GPC if SMMs per GPC stay the same (GTX 880?)
2688
2816
2944
3072
3200 --> 5 GPC if SMMs per GPC stay the same (GTX TITAN MAX? GTX 880 Ti Gaming card?)
Nvidia claims Maxwell is 35% stronger per core vs Kepler and twice as efficient. (see whitepaper http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce-GTX-750-Ti-Whitepaper.pdf )
SMM structure (GM107 , SM200 likely has Double precision and such):
Since we know the architecture's most small unit is a SMM made of 128 CUDA cores, for the mid-range chips it's all about whether Nvidia keeps the 5 SMM per GPC layout and how many GPCs and ROPs (and therefore the memory width bus chosen) will be allowed. Big Maxwell GM200/GM210 is unlikely to keep the same number of SMMs per GPC.
WISHFUL THINKING
Another price/performance powerhouse like the 8800GT (a high end part at mid-range prices) could be had with a GM204 chip if they aren't gimped , a $250-300 card with GTX 780/GTX 780Ti performance and less than 180W power consumption would be astounding.
Even the GTX 560 Ti was rather good value at $250 compared to the GTX 660 Ti and GTX 650 Ti non-Boost which were both quite poor in price/performance value.
If we are extremely optimistic, then we may see 3 GPCs in GM206 , but it would likely be clocked low at ~ 1100Mhz or gimped with 192-bit memory bus in order to hit 150W power consumption.
Likewise, a 4 GPC GM204 would be extremely nice assuming to can fit 2560 Maxwell CUDA cores in 225W. Assuming 30 GFlops/W (double efficiency of GK104) this is doable even if it's clocked at 1300Mhz and using 384-bit memory bus.
3200 CUDA cores is the minimum I expect from GM200/GM210 but I hope for 3840 CUDA cores (6 GPCs of the current GM107 structure).
Performance relative to current cards