Originally Posted by lordikon
It's basically a gimped APU where each SPE doesn't have access to system memory, so it doesn't really matter that it's faster in terms of number crunching. Games aren't really CPU-bound, it is beyond me why Sony made the decision to use a processor like the Cell.
Wouldn't it be more accurate to compare it to a dual-core RISC processor with 7 small, on-die graphics cards each containing two shaders where each shader is capable of processing one 4-wide vector consisting of four 32-bit numbers? The SPEs are 4-wide vector SIMD (somewhat similar to ATI VLIW-4 without the ILP). There's actually quite a bit of similarity with Cell, Larabee, GCN, and Fermi though cell is designed for the individual SIMDs to be capable of independent processing (a MIMD array of sorts) making the Cell SPE a bit more programmable than GCN or Fermi, but probably less than the Larabee prototypes.
I'm curious; what the problem is with the Cell in the PS3? The only one I can think of is that people program it in C when a language such as J or APL would probably achieve better results.
OT: has Toshiba considered shrinking its SpursEngine to 28nm?
Back in '08, Engadget reported that the 65nm 1.5 GHz 4 SPE chip (10-20 watts) was twice as fast at video transcoding as a 3Ghz core2 Quad. The chip would probably be competitive when compared to a quad-core Sandybridge today (especially when power consumption is factored in). If it were shrunk to 28nm and clocked higher, the chip would undoubtedly outperform any x86 chip at transcoding while using only a few watts of power and without having the quality issues that seem to happen with GPU transcoders.
Originally Posted by Usario
X1950 is just an overclocked X1900 except the XTX which has GDDR4.
All the X1900 and X1950 cards have the same core config just different clock speeds (except as previously mentioned the X1950XTX which has GDDR4); the clocks on the 360's GPU are very similar if not exactly the same as the AIW version.
TBH I couldn't find any direct comparisons of any X1900 and the 6670 so I instead looked at a review of the 2900XT, the first ATI card to use VLIW, and compare its performance to the X1950 Pro. Then, ignoring the architectural differences, I adjusted the X1950 Pro numbers for the clock speed of the 360's GPU and then scaled the 2900XT numbers up to the core config and clock speed of the 6670. Despite that graph showing otherwise, those numbers are in a synthetic benchmark (that says that the GTX 580 is 23% faster than the HD 6990)... whatever, if I'm actually wrong, I apologize.
EDIT: For the record, according to Anandtech, on average the 2900XT with 320 VLIW stream processors is more than twice as fast as the X1950 Pro. The 6670 has 480 SPs and some have GDDR5. Even if you account for the 6670 having 37% less memory bandwidth you still get a number significantly higher than 73%. On top of that the 6670 GDDR5 does 768 GFLOPS whereas the 2900XT does 475 (can't find numbers for the X1900 series).
No need for the extrapolation on the xenos performance. Wikipedia states that it is 240 GFLOPS. This means that the 6670 is a little better than 3x faster and (due to architecture improvements) the avaliable GFLOPS are more usable.Edited by hajile - 2/24/12 at 9:03pm