^ pretty good answer.
The technical difference between CUDA and Stream are very hard to explain since its very very complex and I myself have a tough time understanding.
Stream Processors is not exclusive to just ATI. Nvidia units also have stream processors, starting from Nvidia's 8 series. They just have different technologies implementing the same (sort of) technology. Nvidia calls it CUDA and ATI calls it by its vanilla name: Stream processing
To define stream processor, its just a computer/hardware programming technique that allows some applications to more easily exploit a limited form of parallel processing.
This may not not answer your intial question but I want to add to what the above have already said.
Next time you look at the lists of attributes of pre-DX10 cards, such as the Nvidia Geforce 7950 GTX, or the ATI X1600, you will see the attribute "fragment pipeline", "pixel shader", or "pixel pipeline" listed (these are three different words for the same thing). Whenever you look at the lists of attributes of DX10 cards, such as the Geforce 8800 GT or the ATI 3870, you will see the word "pixel / fragment pipeline" replaced with "pixel shader processor".
The reason for this is that the "pixel shader processor" is a more advanced version of the "pixel / fragment pipeline" which can "hypothetically" take advantage of the more sophisticated Unified Shader Model which involves a more sophisticated iteration of Open GL and DX10 technology.
Heres something I found that kinda explains the two...
AMD has announced a Stream Processor that comes from its recent acquisition of ATI. The processor is currently available on a PCI Express board and is provided with one gigabyte of dedicated memory. It also comes with the Close to Metal (CTM) interface for software developers. CTM is the target of stream programming platforms such as PeakStream and RapidMind, though its open nature allows it be targeted by in-house developers.
The Stream Processor is different from the CUDA technology in the GeForce 8800 in that the latter has cooperating cores and can therefore run multithreaded applications without stream programming. That is, AMD’s approach is a vector processor—SIMD—whereas NVIDIA’s approach is a multithreaded processor—MIMD. (To be precise, a stream processor applies a “kernel” of related instructions stored in a cache, whereas a vector processor applies a single instruction stored in a register; for our discussion, the difference is minimal.) This SIMD vs. MIMD divide also appears when comparing ClearSpeed and the Cell BE.
It is interesting to note that the offer of vector processors and multithreaded processors matches Cray’s adaptive supercomputing strategy. (Cray also offers FPGAs, which have been the focus of Celoxica and DRC.) And the CPU behind all of this is the x86; AMD’s offerings are currently being favored over Intel because of the direct connect architecture.
Cray might have the satisfaction of being right, but they still need to worry about market penetration before the smugness settles in. The other vendors have the benefit of commoditization, which is the exact force that removed Sun from being the leader in enterprise computing. Third-party OEMs have already announced the inclusion of the Stream Processor at Supercomputing this week. Can Cray keep up with that amount of volume?
One interesting side note I’d like to close with: while contemplating the SIMD and MIMD issues, I realized that the x86 vendors already have a watered-down version of both of these, namely SSE and multi-core architectures. It appears that Flynn’s taxonomy still rings true today; everyone is rushing to add these components to CPUs, either on-chip or along-side.
Heres another article:
So what’s the difference between Nvidia’s and ATI’s GPU architecture and why do seemingly comparable ATI cards have more stream processors than Nvidia ones? The answer lies in their different approach in implementation. Nvidia’s GPUs are flooded with fewer stream processors (CUDA technology), where each one is identical in look, feel, design, and function (FP and INT arithmetic) to its neighbor. To be more exact, for every 8 identical stream processors, there is one special functional unit that keeps things in check. So if you look at a Geforce GTX 280 with 240 stream processors, it’s really using only about 88% (1 of every 9 sp’s are there police the other eight) of its advertised FP/INT arithmetic processing power.
Nonetheless, Nvidia’s GPU architecture is easier for application and games developers to program for due to its simplicity (every stream processing unit performs the same function)- as long as the units are fed numbers to crunch from the apps, you are getting fast raw results every clock cycle. Nvidia’s architecture has been deemed analogous to American moto engines- simple raw power, gas guzzling.
ATI’s architecture is a bit different- not every stream processor (Brook+ technology) is identical to its neighbor. For every block of 6 stream processing units, 4 are identical, the 5th carries different FP/INT arithmetic functions, and the 6th keeps things in check. So essentially, each block of 5 ATI stream processors (ignoring the special unit) is comparable to 1 Nvidia stream processor. The math isn’t that simple, but its a good generalization to make that helps demystify why a high-end ATI Radeon HD 4870 card with a rocking 800 stream processors is relatively weaker than an Nvidia GTX 280 with only 240. Because of ATI’s GPU architecture, apps and game developers have a tougher time programming to take full advantage of every stream processor on board the fact that specific FP/INT arithmetic functions can only be “worked on” by one out of every five units (per block). In order to take full advantage of ATI’s architecture, an app or game must be optimally coded- something like baiting the hook to suit the fish..
If a program is not optimized for the architecture, the work, to ultimately have as many blocks of stream processing units working every cycle of the clock, relies on the GPU scheduler – the Ultra Threaded Dispatch Processor. All in all, current Nvidia graphics cards lead ATI implementations in most (if not all) game benchmarks, but from a cost/performance standpoint, ATI is definitely the better bang for the buck. Choose wisely.