Originally Posted by Keith Myers
No you are wrong. I never once stated the hosts I was referencing were running any version of Windows. You read into my post your own bias. Our team almost exclusively runs Linux. The evidence can not challenged. Cpu task completion times increase exponentially to double or triple the normal expected run time when the host is overloaded by running too many tasks. Testing results can be repeated and verified by simply restricting the cpu task count and matches up with the online magazine reviews that showed a performance hit on the 2990WX at around 36 cores engaged out of the 64. If you keep the concurrent cpu task count below about 36, then for a host running approximately 3.9Ghz, the tasks take around 1 hour to finish. When you increase the concurrent task count to 48, then some tasks take about 2 hours to finish. Go to 60 concurrent tasks and a large majority of tasks take around 3-4 hours to finish and the overall average of task completion times stretches out to around 2 hours. So by running too many concurrent cpu task at one time, your daily production actually decreases overall. There is a sweet spot where average completion time is minimal while producing a maximum daily output for the host.
[Edit] The person who sold off his TR 2990WX achieved his desired goal of high production cpu machine by building a dual Xeon cpu host with 56 threads total and can keep all cpu tasks running for approximately the same compute times. The Xeons don't clock as fast as the TR being the most restrictive factor. But his daily output is better with the dual Xeon host with less total amount of threads than his failed TR experiment.
Then can you provide more evidence. To be fair, your critique sounded much like the scheduler issue found with Windows.
Now, there is a question related to memory bandwidth per core on the 2990WX being less, which would impact larger datasets and tasks imposed on the chip, considering you are comparing a quad channel memory setup to at minimum and eight channel memory setup (four channels per Xeon), if not more, and in that case, it isn't a fair comparison if you were bandwidth limited.
I take it your friend was using two 14-core Xeons. Which generation? What was the dataset? That performance deficit wasn't really shown too much on Phoronix's linux review. What was the online magizine? Main ones that would have shown Linux benches is Serve the Home and Phoronix, although I cannot remember if Ian Cuttress of Anandtech checked the Linux performance in his review (I'd have to check).
Can you give more information, because with what you have given me, there is a chance you are correct, but there are many factors at play and the general consensus is that although there is a memory penalty for those dies, the performance degradation should not have been that large outside of specific use cases (meaning you may very well be telling the truth, but I don't have enough from your explanation to explore its veracity or have the full explanation).
Meanwhile, if you followed AMD's design, you would know TR will have a centralized UMA memory situation where all memory goes to the I/O die, then out to the cores, meaning equalized latency per core for memory calls. If it was a bandwidth per core issue, then equalizing latency won't solve your bandwidth deficit. But it will resolve the stale data issue where the cores crank out work, but by the time they do, the data has already been performed by a core with a lower memory latency.
So please, explicate in detail what the setup is and your sources for magizines showing that behavior on Linux, which should have been closer to a 30% slowdown rather than 50% in specific workloads due to the memory solution.
Also, if going with two Xeons, why wasn't the 7551P tested, which had all dies connected to the memory, had eight channels of memory, and costed around $2300? Granted, depending on build time, the boards for that didn't come out until October, but considering the 2990WX came out in August, unless tested then or in September, it wouldn't have been possible to test the 7551P (can't wait forever for hardware to drop).
Once again, please give more details....