Originally Posted by Yeroon
Why lock Hawaii at 1/16 when the 2nd tier card (280x) has 1/4? It also doesn't seem like GCN does any better than 1/4, as the 7970 - Firepro kept its 1/4 ratio of SP/DP.
It's in AMD's favor to offer higher DP, some of the Distributed-Computing uses it, and makes the AMD's more appealing. Those buying a firepro for a specific use are wasting their time/money if they buy a consumer card. A cut down 7870 firepro beats a 7970ghz in most work-related benches. Its the firepro drivers you want for a workstation, the hardware specs are secondary.
GCN is up to 1/2 Double precision. Read the anandtech article for 7970
Double precision and 32-bit integer instructions run at a reduced rate within a SIMD. The GCN Architecture is flexible and double precision performance varies from 1/2 to 1/16 of single precision performance, increasing the latency accordingly. The double precision and 32-bit integer performance can be configured for a specific GCN implementation, based on the target application.
Page 7 GCN whitepaper www.amd.com/us/Documents/GCN_Architecture_whitepaper.pdf
As GCN’s FP64 performance can be configured for 1/16, ¼, or ½ its FP32 performance it’s not clear at this time whether the 7970’s ¼ rate was a hardware design decision for Tahiti or a software cap that’s specific to the 7970.
VGPRs – Every work-item has access to some number of VGPRs, up to a
maximum of 256. VGPRs are 32-bits wide and are used by the vector ALU and
vector memory systems. Double-precision operations use two adjacent VGPRs
to form a 64-bit value.
for the first time on any consumer-level NVIDIA card, double precision (FP64) performance is uncapped. That means 1/3 FP32 performance, or roughly 1.3TFLOPS theoretical FP64 performance. NVIDIA has taken other liberties to keep from this being treated as a cheap Tesla K20, but for lighter workloads it should fit the bill.
As compared to the server and high-end workstation market that Tesla carves out, NVIDIA will be targeting the compute side of Titan towards researchers, engineers, developers, and others who need access to (relatively) cheap FP64 performance, and don’t need the scalability or reliability that Tesla brings. To that end Titan essentially stands alone in NVIDIA’s product stack; the next thing next to a FP64-constrained consumer card is the much more expensive Tesla K20.
GK110 whitepaper http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
... A this point I'm more stoked at what a $400-450 , 1920 CUDA GK110 GTX 770 Ti would bring to the table. Presuming the memory bus stays the same, Hyper Q, Dynamic Parallelism (both are GK110 only) would advantages over a GTX 770 or any GK104. It would be playing Russian Roulette with the GPCs, in turn meaning 2 more SMX are disabled compared to GTX 780 could result in two raster engines dead or could be great for overclocking if the disabled SMX are on different GPCs.
A full GK110 = 15 SMX units (each with 192 CUDA cores) and six 64‐bit memory controllers. Each SMX has 192 single‐precision CUDA cores, 64 double‐precision units
, 32 special function units (SFU), and 32 load/store units (LD/ST). The GTX 780 implementation only has 1/24 Double precision rather than the 1/3 possible.
Optimal GTX 770 Ti would look like
orEdited by AlphaC - 10/13/13 at 9:04pm