Originally Posted by cookiesowns
Honestly in most cases I think the 6950X will slap these dual CPU nodes around. Two of them were supposed to go towards remote Adobe rendering, but I think they will just get re-purposed into VM hosts, not having GPU's really suck. That said, as VM hosts, these are amazing. Only pulling roughly 350watts under full load, these supermicro fattwin nodes are extremely efficient. Only 12V coming from the power backplane, and the 5V/3.3V conversion is handled all on each node individually.
The downside of a dual CPU system is NUMA awareness. It's really important to place your lanes & devices on the right CPU depending on your workload. While the CPU -> CPU bandwidth with QPI & DMI 2.0 isn't that bad, the latency hit for certain applications is still noticeable.
For example the X265 benchmark isn't full NUMA aware it seems like. The 2nd CPU seems to get much less load than the first CPU in certain cases. Really need to dig down into this and see if I can maybe squeeze a few more MHZ out of it, the memory latency hit is also there as well. I'm not on a Z10 or Supermicro Hyper-drive board.
Yeah, all true. NUMA problems, along with being (seemingly purposely) stuck with a gimped 2.0 DMI where Z170 runs wild with 3.0 etc.. are the downfall of dual socket setups (well that and the possible necessity of being forced onto Server 2012 instead of Linux or Win7/8.1/10 on some of the higher core count setups)
I've actually seen some interesting things that can come from dual socket setups though; for example, through careful management of what devices are on which CPU like you said, you can actually trick the drivers and OS etc.. in some cases, into letting restrictions down. Like having four way SLI Pascal GPU's is "impossible" except for benchmarking right? Wrong. With a dual socket setup you can have four GPU's in SLI with a four way bridge (has to be one of the hard LED ones with shielded SLI connectors) but with the first two cards in slots controlled by CPU 0 and the 3rd + 4th GPU in slots controlled by CPU 1. This makes it especially easy to bypass the Nvidia driver warning in NV contgrol panel on Pascal cards that says "a higher performing bridge could give better performance" etc.. (thus why you use a hard LED bridge, which in combination with these tweaks tricks the Nvidia drivers into fully "unlocking" SLI functionality by making the system think your 3 or 4 way normal bridge is just TWO "high bandwidth" bridges connecting two SETS of cards. In other words it thinks you have two way SLI of card 1 and 2 on CPU0 and two way SLI of card 3 and 4 on CPU1 and thus unlocks all limitations, but since CPU0 and CPU1 can still talk to each other you get true 4 way SLI on pascal, despite Nvidia claiming that 4 way SLI was impossible...it isn't, they just locked it away with drivers for some odd reason).
The same can seemingly be done on a single socket, as Baasha managed to do it, but he's being a bit uptight and refusing to share how he did it lol. All i know is that you have to do a lot of tinkering with custom SLI profiles from scratch and editing drivers etc.. overall though a dual socket setup is best not only for the ease of getting it to work but the fact that four way TITAN XP setups (or even 4 way 1080 to some degree) just really REQUIRE x16 lanes per card to be at their full potential with all the power being put out. I've seen setups like that with 4 way TITAN XP giving games like BF4 getting NEARLY THREE HUNDRED FPS at 4K resolution on a dual E5 2699 V4 setup despite only running at a ~3ghz speed; whereas on a single socket i7 there was a fairly significant drop in GPU usage on the 3rd and 4th card and in turn significantly lower fps (closer to 200) so again, those 80 PCI-e lanes can definitely come in handy at times!
There's also interesting tweaks to specifically allocate (more like force i suppose) more than the typical max of 4-6 cores to games, which is also easier to achieve on dual socket boards
Originally Posted by cekim
It really, really depends on your usage.
I have big cpu/memory apps I crunch through many times a day that even limited to 10 threads my 2x2690 setup topping out at 3.2GHz all-core matches or exceeds the performance of my 6950x at 4.4GHz. If I open up the whole can and give it all cores, then it slaps the 6950x around.
One big difference is ~120GB/s+ memory bandwidth of 2xXeon chips vs 60/70/80GB/s of a BWE chip with an awesome memory tune.
That said, if I want to plow through a database with one thread, then 4.4GHz on a BWE is the way to go.
As always the answer to "which one?" is "yes!"
Well yeah, if you have a legitimate need for massive memory bandwidth then a dual chip setup will STOMP anything a single socket can provide with ease. Hell, i've seen a few specific cases where one of the recently popular budget dual E5 2670 V1 Setups on an ASUS Z9PE-D8 (That's two 8 core 16 thread Sandy Bridge-EP chips used for only ~$70 a piece, and you can throw in the C600 chipset board and ~128GB of ECC DDR3 with a compatible case for only ~$500 TOTAL cost) actually ties or BEATS out a 6950X (mostly specific Transcoding, Encoding, etc.. loads that either A) Take advantage of the ~90GB/s bandwidth that a dual socket setup with quad channel DDR3 provides. and/or B) Has proper NUMA support that you can efficiently and quickly use pretty much all 16 cores, with the 2nd socket performing near or at the level of the first etc..
In many cases it was either ~10-15% ahead of a 6950X, roughly tied with it, or ~10-15% behind the 6950X (although i think the 6950X in this case was at maybe 3.8 - 4ghz at most, if that. Still impressive though)
The workloads that i'm working with are mostly things like training Deep Neural Nets, general video editing and encoding, etc.. and moderate/heavy gaming and web browsing and so forth in spare time. Which is why i ultimately have leaned towards just using the 6950X overclocked as high as i can get it, on the Rampage V Edition 10, with 64GB of high speed DDR4 instead of going for a dual socket C612 setup with two 10-12 core Xeons, and 128GB of slower speed ~2133-2400mhz ECC Reg. or something.Edited by DarkIdeals - 9/25/16 at 11:15pm