Originally Posted by KyadCK
Then do it one core vs one core for single-threaded efficiency, and do it again max cores vs max cores for multi-threaded efficiency.
Single threaded workload comparisons are unlikely to create a large enough differential in power consumption to make an accurate judgement from a "home test environment" where we are forced to operate within the confines of something like a "from the wall" measurement. We can only so accurately estimate the losses of other components, so it is important that the CPU be a dominant part of the load in such a test, in order to have any chance of having a result that represents the compute efficiency, and not some other discrepancy. While I would likely try to include some lightly threaded workloads in such a test, I fear they would be very likely to create more questions than answers in the test environment I can operate within. I've already thought this all through very thoroughly. I know the question to which I seek answers to.
Unless you plan to rattle off a dozen multithreaded programs that only go to 6 cores (good luck with that), in which case by all means.
ALL REAL TIME workloads scale poorly to ever increasing compute parallelism. This includes any real-time simulation and all games. Even as game engines are built to leverage parallel workloads, they will still have a hard time saturating available compute resources. Almost every piece of software made in the last 10 years has at least SOME parallelism but MOST stop short of scaling beyond 3-6 threads very well. Even mainstream productivity applications like Photoshop are still plagued by filters and adjustments that are limited to 1-4 threads. The list of applications and games that would fall within the scope of "up to 6 threads" is VERY LONG.
Whether or not the software uses 1, 3, or 6 threads doesn't matter. If I answer the compute efficiency question in a 6-core vs 6-core workload, then that will very reasonably encompass ALL workloads UP TO 6 threads. I already know that adding more cores will improve compute efficiency in a workload with unlimited parallelism, I do not need to run tests to answer this question.
Otherwise limiting yourself to 6 (and only 6) is, again, stupid. And besides that, and 8350 will be better at 6-core applications than a 6300 anyway because it would remove some of the decoder bottleneck, making the 8350 a "5.2 core" and the 6300 a "4.8 core".
You're looking for answers to different questions than the one I have proposed with all this. The FX-8350 would indeed be able to achieve the same parallel performance in a 6-threaded workload as the FX-6300 with a lower clock due to different CMT scaling characteristics. I already understand this, know this, and can do the math on this. This is a different concern than the one I have proposed. However, it doesn't necessarily ensure better compute efficiency. Remove CMT scaling and you have better saturation (higher dissipation per active core). It's just a trade-off at that point. It would be interesting to explore the differences in compute efficiency for a given performance level between the FX-6300 and FX-8350 under mixed workload conditions and different levels of CMT scaling losses, but that is different question for a different set of tests. If you don't like the question I am asking, or the method required to find answer to it, then I encourage you to ignore me.
Compute efficiency is about far more than per-core capability. If you're testing for single-threaded things, use one core. If you're testing for 3-4 core Xbox ports, use that many cores. If you're testing multi-threaded workloads, use all the cores you've got and if you're testing advancements in technology give it everything you've got.
This would demand that I simply "ask different questions." I'm not going to placate to that. If you don't like the premise of the questions I have regarding compute efficiency, or the methods that would be required to get answers to those questions, then there's nothing I can do to satisfy you. I already know that parallelism increases compute efficiency. I don't need to run tests to prove that.
You have simply added arbitrary limitations to ONE of the architectures you're comparing for no reason.
If I remove that arbitrary limitation, then I have a test that answers a DIFFERENT question. I already have an answer to that question. I already know that parallelism increases compute efficiency. Why would I test for this if it has no chance of answering the question that I have?
It's just as bad as people who want to compare Vishera and Thuban clock for clock and ignore the fact Vishera can go much faster.
If the question is "what is the difference in IPC between Vishera and Thuban?" then the only way to "test" this is at equal clocks (or, to normalize the results to equal clocks with basic math). If you CHANGE the question to "which is faster, period?" Then testing at different clocks is perfectly fine. The only PROBLEM here, is the presumption that there is something "wrong" with asking the question in the first place. Once you have the answer to the IPC question, you can easily scale that result to different clock speeds. It can be a very useful tool to comparing and contrasting hardware. I doubt anyone here is actually ignoring the fact that PD clocks higher, but they are intelligent enough to know that if they have a baseline IPC differential to work from, they can make quick-n-dirty comparisons of theoretical performance at all sorts of clock speeds.
It's a completely useless metric because it doesn't even begin to tell the whole story.
We already know "the whole story." The FX-8350 is a great CPU at a reasonable price.
The 6383 is a great example of how parallelism can be implemented to improve compute efficiency. I don't need to run tests to prove it to myself, I already know that the 6383 can compile faster than an FX-9590 while using less power. However, this does not mean that it has "better performance" on average for a typical desktop machine.Edited by mdocod - 3/4/14 at 11:42am