Quote:
Since this article is kind of long, I'll high light a few things:
Basically Charlie as we know him is back (smacktalking and criticizing nVidia) and motivates his previous statements.
Quote:
Quote:
EDIT:
There seems to be a general misconception about the performance boost of GK104, so here are the parts were it's all explained
Quote:
(Source)Nvidia's Kepler/GK104 chip has an interesting secret, a claimed Ageia PhysX hardware block that really isn't. If you were wondering why Nvidia has been beating the dead horse called PhysX for so long, now you know, but it only gets more interesting from there.
Sources tell SemiAccurate that the 'big secret' lurking in the Kepler chips are optimisations for physics calculations. Some are calling this PhysX block a dedicated chunk of hardware, but more sources have been saying that it is simply shaders, optimisations, and likely a dedicated few new ops. In short, marketing may say it is, but under the heat spreader, it is simply shaders and optimisations.
Since this article is kind of long, I'll high light a few things:
Basically Charlie as we know him is back (smacktalking and criticizing nVidia) and motivates his previous statements.
Quote:
He releases some details about GK104 architectureWe stated earlier, Kepler wins in most ways vs the current AMD video cards. How does Nvidia do it with a $299 card? Is it raw performance? Massive die size? Performance per metric? The PhysX 'hardware block'? Cheating? The easy answer is yes, but lets go in to a lot more detail.
GK104 is the mid-range GPU in Nvidia's Kepler family, has a very small die, and the power consumption is far lower than the reported 225W. How low depends on what is released and what clock bins are supported by the final silicon. A1 stepping cards seen by SemiAccurate had much larger heatsinks than the A2 versions, and recent rumours suggest there may be an A3 to fix persistent PCIe3 headaches.
Quote:
And reveals nVidia tacticsThe architecture itself is very different from Fermi, SemiAccurate's sources point to a near 3TF card with a 256-bit memory bus. Kepler is said to have a very different shader architecture from Fermi, going to much more AMD-like units, caches optimised for physics/computation, and clocks said to be close to the Cayman/Tahiti chips. The initial target floating among the informed is in the 900-1000MHz range. Rumours have it running anywhere from about 800MHz in early silicon to 1.1+GHz later on, with early stepping being not far off later ones. Contrary to some floating rumours, yields are not a problem for either GK104 or TSMC's 28nm process in general.
Performance is likewise said to be a tiny bit under 3TF from a much larger shader count than previous architectures. This is comparable to the 3.79TF and 2048 shaders on AMD's Tahiti, GK104 isn't far off either number. With the loss of the so called "Hot Clocked" shaders, this leaves two main paths to go down, two CUs plus hardware PhysX unit or three. Since there is no dedicated hardware physics block, the math says each shader unit will probably do two SP FLOPs per clock or one DP FLOP.
EDIT:
There seems to be a general misconception about the performance boost of GK104, so here are the parts were it's all explained
Quote:
In the same way that AMD's Fusion chips count GPU FLOPS the same way they do CPU FLOPS in some marketing materials, Kepler's 3TF won't measure up close to AMD's 3TF parts. Benchmarks for GK104 shown to SemiAccurate have the card running about 10-20% slower than Tahiti. On games that both heavily use physics related number crunching and have the code paths to do so on Kepler hardware, performance should seem to be well above what is expected from a generic 3TF card. That brings up the fundamental question of whether the card is really performing to that level?
This is where the plot gets interesting. How applicable is the "PhysX block"/shader optimisations to the general case? If physics code is the bottleneck in your app, A goal Nvidia appears to actively code for, then uncorking that artificial impediment should make an app positively fly. On applications that are written correctly without artificial performance limits, Kepler's performance should be much more marginal. Since Nvidia is pricing GK104 against AMD's mid-range Pitcairn ASIC, you can reasonably conclude that the performance will line up against that card, possibly a bit higher. If it could reasonably defeat everything on the market in a non-stacked deck comparison, it would be priced accordingly, at least until the high end part is released.
All of the benchmark numbers shown by Nvidia, and later to SemiAccurate, were overwhelmingly positive. How overwhelmingly positive? Far faster than an AMD HD7970/Tahiti for a chip with far less die area and power use, and it blew an overclocked 580GTX out of the water by unbelievable margins. That is why we wrote this article. Before you take that as a backpedal, we still think those numbers are real, the card will achieve that level of performance in the real world on some programs.
The problem for Nvidia is that once you venture outside of that narrow list of tailored programs, performance is likely to fall off a cliff, with peaky performance the likes of which haven't been seen in a long time. On some games, GK104 will handily trounce a 7970, on others, it will probably lose to a Pitcairn. Does this mean it won't actually do what is promised? No, it will. Is this a problem? Depends on how far review sites dare to step outside of the 'recommenced' list of games to benchmark in the reviewers guide.