Overclock.net › Forums › Industry News › Hardware News › [TechReport]A quick look at Bulldozer thread scheduling
New Posts  All Forums:Forum Nav:

[TechReport]A quick look at Bulldozer thread scheduling

post #1 of 46
Thread Starter 
Quote:
As you may know if you read our original FX processor review, the shared nature of the "modules" in AMD's new Bulldozer architecture presents some conundrums for OS and applications developers who want to extract the best possible performance. Each module has two integer cores that are wrapped in some shared resources, including the front-end instruction fetch and decode units, the FPU, and the L2 cache and its associated data pre-fetchers. AMD claims this level of sharing is superior to what goes on in recent Intel processors—whose more resource-rich indvidual cores can track and execute two threads—because the performance of each thread in a Bulldozer module is more "robust" and predictable, less likely to stall due to resource contention.
A very good addition to the ExtremeTech news about Bulldozer. It's not a panacea, but it's interesting indeed. Most likely, games will benefit greatly from this. This fix does not disable cores, but assign core affinity for the program, forcing it either to run 0f mode or 55 mode
I know, it's baffling that simple scheduler fix have raised the benchmark like that.
source
Edited by lunan1t4 - 10/30/11 at 10:39am
Outdated
(13 items)
 
  
CPUMotherboardGraphicsRAM
E8200 2.6 Ghz Asrock g31M-S R2 ATi 3870 2GB Kingston Value RAM 
Hard DriveOptical DriveOSMonitor
2TB WD Green+ 640GB WD Blue None Win 7 Ultimate 21.5 Dell 
KeyboardPowerCaseMouse
HP Wireless Keyboard 550W Cooler Master None Logitech cheap mouse 
Mouse Pad
none 
  hide details  
Reply
Outdated
(13 items)
 
  
CPUMotherboardGraphicsRAM
E8200 2.6 Ghz Asrock g31M-S R2 ATi 3870 2GB Kingston Value RAM 
Hard DriveOptical DriveOSMonitor
2TB WD Green+ 640GB WD Blue None Win 7 Ultimate 21.5 Dell 
KeyboardPowerCaseMouse
HP Wireless Keyboard 550W Cooler Master None Logitech cheap mouse 
Mouse Pad
none 
  hide details  
Reply
post #2 of 46
Ill say it again, ill wait till BD matures as a platform before i will jump on it. The architecture is good, but the software and OS have to catch up to it first. good read tho, i wonder what the performance would be for IPC with this?
post #3 of 46
Bulldozer should have been a success -- they never collaborated with MS to fix win7s scheduler to optimize the chip. AMD sure is pretty stupid for that. They could hav even tried to create a patch themselves..... ugh.
post #4 of 46
Even perfect thread scheduling would not make it much better than K10.5. They need to do some more work on the core design itself.
 
Portable Desktop
(12 items)
 
Console Cave
(11 items)
 
CPUMotherboardGraphicsRAM
Intel Core i5 2500K ASRock P67 Extreme4 ASUS GTX 670 DCII Kingston HyperX Fury 
Hard DriveHard DriveHard DriveHard Drive
Seagate 600 Samsung Spinpoint F4 Seagate Barracuda Green Seagate Desktop 
Hard DriveHard DriveHard DriveHard Drive
Seagate FreeAgent GoFlex Seagate Expansion Desktop Maxtor Basics Seagate Backup Plus Portable 
Optical DriveCoolingCoolingCooling
Samsung SE-208AB Thermalright Archon (Rev. A) 2x Phanteks PH-F140HP_BL (CPU) 2x Noctua NF-A14 PWM (Intake) 
CoolingCoolingOSMonitor
Thermalright TY-150 (Exhaust) Noctua NF-S12A PWM (Exhaust) Windows 10 Pro DELL UltraSharp U2711 
KeyboardPowerCaseMouse
QPAD MK-80 Pro SeaSonic X-560 Lian Li PC-P50WB Func MS-3 R2 
Mouse PadAudioAudioAudio
SteelSeries QcK Creative Sound Blaster Z Swans M10 Creative Aurvana Live! 
Other
Lian Li LED50 
CPUMotherboardGraphicsGraphics
Intel Pentium G3258 ASUS H81I-PLUS Intel HD Graphics ASUS GTX 670 DirectCU II 
RAMHard DriveCoolingOS
Corsair Vengeance LP OCZ Vector Scythe Big Shuriken 2 rev. B Windows 10 Pro N 
MonitorMonitorPowerCase
LG PA70G BenQ G2420HDBL Cooler Master B500 ver.2 Cooler Master Elite 130 
MonitorAudioAudioOther
LG PA70G Yamaha HTR-3065 Andersson SRP 2.5 PlayStation 
OtherOtherOtherOther
PlayStation 2 Gamecube Dreamcast Xbox 
OtherOtherOther
Xbox 360 PlayStation 3 Wii 
  hide details  
Reply
 
Portable Desktop
(12 items)
 
Console Cave
(11 items)
 
CPUMotherboardGraphicsRAM
Intel Core i5 2500K ASRock P67 Extreme4 ASUS GTX 670 DCII Kingston HyperX Fury 
Hard DriveHard DriveHard DriveHard Drive
Seagate 600 Samsung Spinpoint F4 Seagate Barracuda Green Seagate Desktop 
Hard DriveHard DriveHard DriveHard Drive
Seagate FreeAgent GoFlex Seagate Expansion Desktop Maxtor Basics Seagate Backup Plus Portable 
Optical DriveCoolingCoolingCooling
Samsung SE-208AB Thermalright Archon (Rev. A) 2x Phanteks PH-F140HP_BL (CPU) 2x Noctua NF-A14 PWM (Intake) 
CoolingCoolingOSMonitor
Thermalright TY-150 (Exhaust) Noctua NF-S12A PWM (Exhaust) Windows 10 Pro DELL UltraSharp U2711 
KeyboardPowerCaseMouse
QPAD MK-80 Pro SeaSonic X-560 Lian Li PC-P50WB Func MS-3 R2 
Mouse PadAudioAudioAudio
SteelSeries QcK Creative Sound Blaster Z Swans M10 Creative Aurvana Live! 
Other
Lian Li LED50 
CPUMotherboardGraphicsGraphics
Intel Pentium G3258 ASUS H81I-PLUS Intel HD Graphics ASUS GTX 670 DirectCU II 
RAMHard DriveCoolingOS
Corsair Vengeance LP OCZ Vector Scythe Big Shuriken 2 rev. B Windows 10 Pro N 
MonitorMonitorPowerCase
LG PA70G BenQ G2420HDBL Cooler Master B500 ver.2 Cooler Master Elite 130 
MonitorAudioAudioOther
LG PA70G Yamaha HTR-3065 Andersson SRP 2.5 PlayStation 
OtherOtherOtherOther
PlayStation 2 Gamecube Dreamcast Xbox 
OtherOtherOther
Xbox 360 PlayStation 3 Wii 
  hide details  
Reply
post #5 of 46
Quote:
These results couldn't be much more definitive. In every case but one, distributing the threads one per module, and thus avoiding sharing, produces roughly 10-20% higher performance than packing the threads together on two modules. (And that one case, the FDom function in picCOLOR, shows little difference between the three affinity options.) At least for this handful of workloads, the benefits of avoiding resource sharing between two cores on a module are pretty tangible. Even though the packed config enables a higher Turbo Core frequency of 4.2GHz, the shared config is faster.
Best bit right there.
post #6 of 46
Even with this "fix", performance does not seem to be improved enough to justify buying a Bulldozer over a much less expensive Phenom II.
5 GHz SFF Box
(18 items)
 
   
CPUMotherboardGraphicsRAM
Core i7-2700K @ 5.0 GHz, 1.38V Asus Maximus IV GENE Asus GTX 670 DC II 4x4GB Samsung 30nm @ DDR3-2133 9-9-9-21 1.5V 
Hard DriveHard DriveHard DriveHard Drive
Plextor M3 SSD WD Velociraptor 500GB WD Caviar Black 1TB WD Caviar Green 2TB 
CoolingOSMonitorKeyboard
Thermalright HR-02 (GT AP-15 Push/Pull) Windows 7 Pro x64 LG 27" 2560x1440 S-IPS (Calibrated with Eye-One) CM Quickfire Rapid 
PowerCaseMouseMouse Pad
Seasonic X-750 Silverstone SG09 Logitech MX518 Steelseries QcK 
Audio
Asus Xonar DX + Shure SRH840 
CPUMotherboardRAMHard Drive
Core i5-3570K Gigabyte H61N-USB3 Mini-ITX 2x4GB Samsung 30nm DDR3 Samsung 830 128GB SSD 
Hard DriveOSPowerCase
WD Scorpio Blue 500GB Win 7 Pro x64 Antec 90W DC-DC/Delta power brick Antec ISK 110 
  hide details  
Reply
5 GHz SFF Box
(18 items)
 
   
CPUMotherboardGraphicsRAM
Core i7-2700K @ 5.0 GHz, 1.38V Asus Maximus IV GENE Asus GTX 670 DC II 4x4GB Samsung 30nm @ DDR3-2133 9-9-9-21 1.5V 
Hard DriveHard DriveHard DriveHard Drive
Plextor M3 SSD WD Velociraptor 500GB WD Caviar Black 1TB WD Caviar Green 2TB 
CoolingOSMonitorKeyboard
Thermalright HR-02 (GT AP-15 Push/Pull) Windows 7 Pro x64 LG 27" 2560x1440 S-IPS (Calibrated with Eye-One) CM Quickfire Rapid 
PowerCaseMouseMouse Pad
Seasonic X-750 Silverstone SG09 Logitech MX518 Steelseries QcK 
Audio
Asus Xonar DX + Shure SRH840 
CPUMotherboardRAMHard Drive
Core i5-3570K Gigabyte H61N-USB3 Mini-ITX 2x4GB Samsung 30nm DDR3 Samsung 830 128GB SSD 
Hard DriveOSPowerCase
WD Scorpio Blue 500GB Win 7 Pro x64 Antec 90W DC-DC/Delta power brick Antec ISK 110 
  hide details  
Reply
post #7 of 46
isn't the way they did the scheduling essentially purchasing 8core chip and using it as quad core?
If so, then might as well get x6 or x4 phenom II
post #8 of 46
Quote:
Originally Posted by fruitflavor View Post
isn't the way they did the scheduling essentially purchasing 8core chip and using it as quad core?
If so, then might as well get x6 or x4 phenom II
But you run into the same problem - Windows still won't schedule the processes the way they need to.

Think of the load on top of this, not just the benchmarking. The scheduling is very important for this architecture - like HT was for Intel.
    
CPUMotherboardGraphicsRAM
INTEL ASUS XFX  SAMSUNG 
Hard DriveOptical DriveCoolingOS
WD/ST LG KUHLER WINDOWS 
MonitorKeyboardPowerCase
LG/SAMSUNG IBM MODEL M CORSAIR THERMALTAKE 
MouseMouse PadAudio
MS INTELLIMOUSE EXPLORER 3.0 REGULAR LARGE PAD ONBOARD but it USED TO BE A XONAR DG  
  hide details  
Reply
    
CPUMotherboardGraphicsRAM
INTEL ASUS XFX  SAMSUNG 
Hard DriveOptical DriveCoolingOS
WD/ST LG KUHLER WINDOWS 
MonitorKeyboardPowerCase
LG/SAMSUNG IBM MODEL M CORSAIR THERMALTAKE 
MouseMouse PadAudio
MS INTELLIMOUSE EXPLORER 3.0 REGULAR LARGE PAD ONBOARD but it USED TO BE A XONAR DG  
  hide details  
Reply
post #9 of 46
This is all pretty obvious and was predictable. AMD themselves said that the sharing of resources would put a 2 "core" module, with the sharing of resources, at about 80% of going straight two full, non-sharing cores. So in doing what they have done, they have only proven AMD's claim of that 80% number. They gained up to 20% increase when taking the sharing system out of the equation, spot on. I don't see the big news in this myself. This information has been out for ages from AMD themselves.
post #10 of 46
Quote:
Originally Posted by 996gt2 View Post
Even with this "fix", performance does not seem to be improved enough to justify buying a Bulldozer over a much less expensive Phenom II.
Says the man with a 2500K

For us that already have a compatible board, it's a bit of hope.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Hardware News
Overclock.net › Forums › Industry News › Hardware News › [TechReport]A quick look at Bulldozer thread scheduling