Originally Posted by Kip69
But the 7750... "Anything more and you'll hit a performance "wall" in photoshop. " I don't know what that means
There is a point at which photoshop simply does not take advantage of any more GPU power. The coding used for the openCL/openGL implementation, is likely not coded very well for ultra-parallelism and winds up bottle-necking on an IO or compute related function that should have been dealt with in different block sizes or something like that. I really don't know the finer details of how that works, but when we see poor performance scaling in parallel computing it is often a result of less than ideal programming methods. There are a number of long "best practices" documents out there for coding parallel compute work (openCL/CUDA) that cover some of the glaring issues that can cause these "walls" to be hit, and ways to work around them. I don't know if version CC has made any progress to improve on this or not. I have seen this sort of performance "wall" emerge in pretty much every CS6 benchmark I have ever seen. The trend points to an unfortunate reality that openCL accelerated tasks, while running parallel on the GPU, are bound by a poorly threaded operation on the CPU side.
In the above benchmarks, Every card from about the HD7750 and GTX650 class and up, have hit that "wall." There is nothing left to give on the GPU side unless a bottleneck somewhere else is removed. As you can see, the HD4XXX is not "far" from that wall in most tests. The position of that gpu accelerated wall almost always seems to coincide with single core CPU performance. If this were an A10-6800k or i3-4330, the "position" of that wall would be different. In fact, the little APU's and i3's are well balanced for Photoshop right out of the box (no discrete GPU!) because of this "wall" effect. They have enough on-chip GPU that there is no reason to invest in more
for photoshop. They will run nearly every openCL/openGL accelerated feature in photoshop at or near their own "wall" [bottleneck] set by the CPU performance. (Either would be a very elegant solution, albeit, in a lower performance and PRICE bracket than the Xeon/i7 class stuff)
The above benchmark, is performed with a GTX Titan at the helm (WAY WAY beyond the wall)... Notice the scaling of openCL performance trends right down the line with single/lightly threaded performance of the chips in the group, with the AMD chips pulling up the rear and the older sandy E filling the middle lead by late generation haswell. And of course, in usual TomsHardware fashion, they are oblivious to the most interesting part of that benchmark in the comments following. Not a WORD about performance scaling of the openCL and how it seems to be bottlenecked something highly sensitive to per-core performance. I swear these review sites are sometimes sitting on a goldmine of information with the tests they run, and totally waste it on an interpretation of the results that is always a C- at best.
EricEdited by mdocod - 11/27/13 at 2:00am