Overclock.net › Forums › General Hardware › General Processor Discussions › Why does CPU with more cores perform better than CPU with less cores in multithreading even if the sum of the clock speed of all the cores is the same?
New Posts  All Forums:Forum Nav:

Why does CPU with more cores perform better than CPU with less cores in multithreading even if the sum of the clock speed of all the cores is the same?

post #1 of 5
Thread Starter 
I have a hypothetical question- let's assume that there are two CPU's with the same microarchitecture and the same number of transistors per core and with the same size L1/L2/L3 cache, etc. The only difference is that first CPU has twice as fast cores compared to the second CPU while second CPU has twice as many cores compared to the first CPU. For example, 4x 4GHz cores versus 8x 2GHz cores. Is it correct to assume that that CPU with more cores performs better than CPU with less cores in heavily multithreaded environment even if the sum of the clock speed of all the cores is the same? If yes, then what are the reasons for this? I guess one of the reasons is context switching, i.e. CPU with less cores needs to schedule tasks between its fast cores more often and this scheduling itself takes time. Am I correct? In addition, modern multi-core CPUs tend to have L2 cache per core, i.e. in total there is more fast SRAM available. Are there any other reasons?
post #2 of 5
Are you trying to solve a real world issue for a CPU you might make in the future?
Or is your brain in "Stephen Hawking" mode trying to figure out something that has no real world value?

Because what you talking about does not really exist in the real world.
Because in the real world there will no be be a 4 core CPU that is exactly identical, core for core, to a 8 core CPU design.
Any modern 8 core CPU would have revisions and design improvement over any older 4 core CPU.
And once you have an 8 core CPU, no practical reason (business wise) to make a new 4 core based on the exact same design.
It's possible to disable 4 cores of an 8 core CPU, but it still would not allow the resulting "4 core" to run twice as fast as the 8 core.
ENIAC
(15 items)
 
  
CPUMotherboardGraphicsRAM
Intel i5 7600k Asus M5A99FX Pro 2.0 AMD 470 8GB 16GB DDR3  
Hard DriveOptical DriveCoolingOS
Samsung EVO 500GB Lite-On Blu-ray Burner 4 case fans Win 7 Pro 64-bit 
MonitorKeyboardPowerCase
Yamakasi DS270 SE 27"  Logitech G110 Cooler master 700 Watt Cooler Master 690 II 
MouseMouse PadAudio
Logitech G500 My desk Audio-GD NFB-15.32 
  hide details  
Reply
ENIAC
(15 items)
 
  
CPUMotherboardGraphicsRAM
Intel i5 7600k Asus M5A99FX Pro 2.0 AMD 470 8GB 16GB DDR3  
Hard DriveOptical DriveCoolingOS
Samsung EVO 500GB Lite-On Blu-ray Burner 4 case fans Win 7 Pro 64-bit 
MonitorKeyboardPowerCase
Yamakasi DS270 SE 27"  Logitech G110 Cooler master 700 Watt Cooler Master 690 II 
MouseMouse PadAudio
Logitech G500 My desk Audio-GD NFB-15.32 
  hide details  
Reply
post #3 of 5
Quote:
Originally Posted by m4rtin View Post

Is it correct to assume that that CPU with more cores performs better than CPU with less cores in heavily multithreaded environment even if the sum of the clock speed of all the cores is the same?

No, this is generally not a safe assumption.
Quote:
Originally Posted by m4rtin View Post

If yes, then what are the reasons for this? I guess one of the reasons is context switching, i.e. CPU with less cores needs to schedule tasks between its fast cores more often and this scheduling itself takes time. Am I correct?

If anything, it's easier to schedule tasks within the same core than it is to maintain coherency between different cores.
Quote:
Originally Posted by m4rtin View Post

In addition, modern multi-core CPUs tend to have L2 cache per core, i.e. in total there is more fast SRAM available. Are there any other reasons?

Now this can result in many cores doing better than fast cores in very parallelizable tasks.

When more cores do better than fewer cores of equal total cycles and architecture, it's usually because of competition for resources, or a bottleneck somewhere that prevents perfect scaling with clock speed.

In general, many cores are used instead of faster individual cores because it's inefficient or impossible to scale clock speeds past a certain point. For example, an 8-core and and 18-core Haswell-EP may require the same amount of power because the 8-core is clocked 30% higher. However, the 18-core can be much more than 30% faster in extremely well threaded tasks.
Primary
(15 items)
 
Secondary
(13 items)
 
In progress
(10 items)
 
CPUMotherboardGraphicsRAM
5820K @ 4.2/3.5GHz core/uncore, 1.175/1.15v Gigabyte X99 SOC Champion (F22n) Gigabyte AORUS GTX 1080 Ti (F3P) @ 2025/1485, 1... 4x4GiB Crucial @ 2667, 12-12-12-28-T1, 1.34v 
Hard DriveHard DriveHard DriveCooling
Plextor M6e 128GB (fw 1.06) M.2 (PCI-E 2.0 2x) 2x Crucial M4 256GB 4x WD Scorpio Black 500GB Noctua NH-D15 
OSMonitorKeyboardPower
Windows 7 Professional x64 SP1 BenQ BL3200PT Filco Majestouch Tenkeyless (MX Brown) Corsair RM1000x 
CaseMouseAudio
Fractal Design Define R4 Logitech G402 Realtek ALC1150 + M-Audio AV40 
CPUMotherboardGraphicsRAM
X5670 @ 4.4/3.2GHz core/uncore, 1.36 vcore, 1.2... Gigabyte X58A-UD5 r2.0 w/FF3mod10 BIOS Sapphire Fury Nitro OC+ @ 1053/500, 1.225vGPU/1... 2x Samsung MV-3V4G3D/US @ 2000, 10-11-11-30-T1,... 
RAMHard DriveHard DriveHard Drive
1x Crucial BLT4G3D1608ET3LX0 @ 2000, 10-11-11-3... OCZ (Toshiba) Trion 150 120GB Hyundai Sapphire 120GB 3x Hitachi Deskstar 7k1000.C 1TB 
CoolingOSPowerCase
Noctua NH-D14 Windows 7 Pro x64 SP1 Antec TP-750 Fractal Design R5 
Audio
ASUS Xonar DS 
CPUMotherboardGraphicsRAM
i7-6800K @ 4.3/3.5GHz core/uncore, 1.36/1.2v ASRock X99 OC Formula (P3.10) GTX 780 (temporary) 4x4GiB Crucial DDR4-2400 @ 11-13-12-28-T2, 1.33v 
Hard DriveHard DriveCoolingOS
Intel 600p 256GB NVMe 2x HGST Travelstar 7k1000 1TB Corsair H55 (temporary) Windows Server 2016 Datacenter 
PowerCase
Seasonic SS-860XP2 Corsair Carbide Air 540 
  hide details  
Reply
Primary
(15 items)
 
Secondary
(13 items)
 
In progress
(10 items)
 
CPUMotherboardGraphicsRAM
5820K @ 4.2/3.5GHz core/uncore, 1.175/1.15v Gigabyte X99 SOC Champion (F22n) Gigabyte AORUS GTX 1080 Ti (F3P) @ 2025/1485, 1... 4x4GiB Crucial @ 2667, 12-12-12-28-T1, 1.34v 
Hard DriveHard DriveHard DriveCooling
Plextor M6e 128GB (fw 1.06) M.2 (PCI-E 2.0 2x) 2x Crucial M4 256GB 4x WD Scorpio Black 500GB Noctua NH-D15 
OSMonitorKeyboardPower
Windows 7 Professional x64 SP1 BenQ BL3200PT Filco Majestouch Tenkeyless (MX Brown) Corsair RM1000x 
CaseMouseAudio
Fractal Design Define R4 Logitech G402 Realtek ALC1150 + M-Audio AV40 
CPUMotherboardGraphicsRAM
X5670 @ 4.4/3.2GHz core/uncore, 1.36 vcore, 1.2... Gigabyte X58A-UD5 r2.0 w/FF3mod10 BIOS Sapphire Fury Nitro OC+ @ 1053/500, 1.225vGPU/1... 2x Samsung MV-3V4G3D/US @ 2000, 10-11-11-30-T1,... 
RAMHard DriveHard DriveHard Drive
1x Crucial BLT4G3D1608ET3LX0 @ 2000, 10-11-11-3... OCZ (Toshiba) Trion 150 120GB Hyundai Sapphire 120GB 3x Hitachi Deskstar 7k1000.C 1TB 
CoolingOSPowerCase
Noctua NH-D14 Windows 7 Pro x64 SP1 Antec TP-750 Fractal Design R5 
Audio
ASUS Xonar DS 
CPUMotherboardGraphicsRAM
i7-6800K @ 4.3/3.5GHz core/uncore, 1.36/1.2v ASRock X99 OC Formula (P3.10) GTX 780 (temporary) 4x4GiB Crucial DDR4-2400 @ 11-13-12-28-T2, 1.33v 
Hard DriveHard DriveCoolingOS
Intel 600p 256GB NVMe 2x HGST Travelstar 7k1000 1TB Corsair H55 (temporary) Windows Server 2016 Datacenter 
PowerCase
Seasonic SS-860XP2 Corsair Carbide Air 540 
  hide details  
Reply
post #4 of 5
its much like a queue line of a bank teller.
think of each person as an application's threads.

in a multi-threaded workload, there isn't just one or two applications.
if you include the OS's API, kernels and services you'd end up with hundreds of threads trying to queue up on a core.


now imagine which would have a smoother line, a bank with four slow tellers, or a bank with two fast tellers.
the hint of it is that, not all customers (application and threads) have the same workload, some are light, and some are severely heavy.
if one of the customer ends up taking up 3times as much to finish, then an entire queue would stall for the entire duration.
with only 2 tellers, then theres only one teller left that can continuously serve, while on the bank of 4 slow tellers, there'd still be 3 open queues.
Edited by epic1337 - 10/4/15 at 10:33pm
post #5 of 5
Quote:
Originally Posted by epic1337 View Post

its much like a queue line of a bank teller.
think of each person as an application's threads.

in a multi-threaded workload, there isn't just one or two applications.
if you include the OS's API, kernels and services you'd end up with hundreds of threads trying to queue up on a core.


now imagine which would have a smoother line, a bank with four slow tellers, or a bank with two fast tellers.
the hint of it is that, not all customers (application and threads) have the same workload, some are light, and some are severely heavy.
if one of the customer ends up taking up 3times as much to finish, then an entire queue would stall for the entire duration.
with only 2 tellers, then theres only one teller left that can continuously serve, while on the bank of 4 slow tellers, there'd still be 3 open queues.
Nice example
SKYnet
(17 items)
 
  
CPUMotherboardGraphicsGraphics
Intel I7 4930K @ 4.6ghz ASUS Rampage IV Extreme EVGA GTX 970 EVGA GTX 970 
RAMHard DriveHard DriveOptical Drive
G. Skill Ripjaws X 4x4gb 1.5tb WD Caviar Green SATA Samsung 840 EVO 250gb SSD HP USB DVD 
CoolingCoolingOSMonitor
Antec Kuhler 650 ThermalRight True Spirit 140 Power Windows 10 Ultimate 64 bit Samsung 240 HD TOC 
KeyboardPowerCaseMouse
Razor Lycosa PC Power and Cooling Silencer 910 NZXT Switch 810 Matte Black It clicks.............. 
Mouse Pad
IKEA $1 pad 
  hide details  
Reply
SKYnet
(17 items)
 
  
CPUMotherboardGraphicsGraphics
Intel I7 4930K @ 4.6ghz ASUS Rampage IV Extreme EVGA GTX 970 EVGA GTX 970 
RAMHard DriveHard DriveOptical Drive
G. Skill Ripjaws X 4x4gb 1.5tb WD Caviar Green SATA Samsung 840 EVO 250gb SSD HP USB DVD 
CoolingCoolingOSMonitor
Antec Kuhler 650 ThermalRight True Spirit 140 Power Windows 10 Ultimate 64 bit Samsung 240 HD TOC 
KeyboardPowerCaseMouse
Razor Lycosa PC Power and Cooling Silencer 910 NZXT Switch 810 Matte Black It clicks.............. 
Mouse Pad
IKEA $1 pad 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: General Processor Discussions
Overclock.net › Forums › General Hardware › General Processor Discussions › Why does CPU with more cores perform better than CPU with less cores in multithreading even if the sum of the clock speed of all the cores is the same?