Overclock.net › Forums › AMD › AMD CPUs › Bulldozer Questions from an Intel User.
New Posts  All Forums:Forum Nav:

Bulldozer Questions from an Intel User. - Page 14

post #131 of 162
Quote:
Originally Posted by BallaTheFeared View Post
I think if you asked a x6 T AMD user to run SuperPI at factory settings you'd find their chip probably isn't going into turbo at all.
I'm not making any judgements about who's right or wrong, i'm just showing these two jpegs to demonstrate what Balla is talking about.

Both show turbocore working, it's set at 4.2Ghz. How ever the first image shows the "thread spread", where windows has been left to run the app, with no one core reaching 100% usage, the second shows Superpi set to run on one core, which gives it 100% usage, it also again shows the turbocore working.





Maybe it's all a software issue, maybe if you know how cores each app uses you should set affinities for every app you have.

Purpleannex
(15 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k Z68 Extreme4gen3 Gainward GTX 470 @ 740Mhz 8GB Mushkin Blackline LV (2x4GB) DDR3 1600MHz 9... 
Hard DriveHard DriveHard DriveOptical Drive
128GB Crucial M225 SSD 2 x 500gb 500GbB samsung 500GbB samsung Samsung SH-B083L 8x BD Combo 
OSMonitorPowerCase
Windows 7 64-bit LG W2486L LED 24" 1080p Antech True power new 750W FT-02 B-W 
  hide details  
Reply
Purpleannex
(15 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k Z68 Extreme4gen3 Gainward GTX 470 @ 740Mhz 8GB Mushkin Blackline LV (2x4GB) DDR3 1600MHz 9... 
Hard DriveHard DriveHard DriveOptical Drive
128GB Crucial M225 SSD 2 x 500gb 500GbB samsung 500GbB samsung Samsung SH-B083L 8x BD Combo 
OSMonitorPowerCase
Windows 7 64-bit LG W2486L LED 24" 1080p Antech True power new 750W FT-02 B-W 
  hide details  
Reply
post #132 of 162
I think threads and apps are different things.Maybe this is where the confusion happens.
An app can have more threads though isn t it ?!
   
AMD HTPC
(13 items)
 
CPUMotherboardGraphicsRAM
AMD FX 6300  Asus Sabertooth 990FX R 2.0 Sapphire 6950 unlocked Geil 
Hard DriveOptical DriveCoolingOS
WD Sony Corsair H60 Mepis Tetris 
MonitorKeyboardPowerCase
LG HD  Logitech G11 OCZ 750 F1 Nexus Clodius Black 
MouseMouse PadAudio
Logitech G5R Mionix Creative X-Fi Xteme Music 
CPUMotherboardGraphicsRAM
AMD Athlon 860K MSI A88X-G41 PC Mate Sapphire R7 265 Crucial Elite 2 X 4 
Hard DriveOptical DriveCoolingOS
WD /Seagate Optiarc/LG Cooler Master Hyper TX2 Mepis Tetris 
MonitorKeyboardPowerCase
LG HD  Logitech generic Corsair CS650 Nexus Clodius White 
MouseMouse PadAudio
A4 Crap Mionix Anus Xonar D1 
CPUMotherboardGraphicsRAM
AMD Athlon 5150 ASUS AM1M-A  IGD Crucial 2X2G 
Hard DriveOptical DriveCoolingOS
Seagate 640Gb junk SONY DVD-RW Stock OpenSuse/Ubuntu 
MonitorKeyboardPowerCase
LED TV Logitech wireless FSP Foxconn 
Mouse
Logitech wireless 
  hide details  
Reply
   
AMD HTPC
(13 items)
 
CPUMotherboardGraphicsRAM
AMD FX 6300  Asus Sabertooth 990FX R 2.0 Sapphire 6950 unlocked Geil 
Hard DriveOptical DriveCoolingOS
WD Sony Corsair H60 Mepis Tetris 
MonitorKeyboardPowerCase
LG HD  Logitech G11 OCZ 750 F1 Nexus Clodius Black 
MouseMouse PadAudio
Logitech G5R Mionix Creative X-Fi Xteme Music 
CPUMotherboardGraphicsRAM
AMD Athlon 860K MSI A88X-G41 PC Mate Sapphire R7 265 Crucial Elite 2 X 4 
Hard DriveOptical DriveCoolingOS
WD /Seagate Optiarc/LG Cooler Master Hyper TX2 Mepis Tetris 
MonitorKeyboardPowerCase
LG HD  Logitech generic Corsair CS650 Nexus Clodius White 
MouseMouse PadAudio
A4 Crap Mionix Anus Xonar D1 
CPUMotherboardGraphicsRAM
AMD Athlon 5150 ASUS AM1M-A  IGD Crucial 2X2G 
Hard DriveOptical DriveCoolingOS
Seagate 640Gb junk SONY DVD-RW Stock OpenSuse/Ubuntu 
MonitorKeyboardPowerCase
LED TV Logitech wireless FSP Foxconn 
Mouse
Logitech wireless 
  hide details  
Reply
post #133 of 162
Think of it this way:

You need to cook a 6 course dinner. Is it easier to have one person do it sequentially or have 6 people all take a course? That is multithreading.

Now take the single thread. The guy making the salad has to chop a carrot into small peices. Is it easier for one guy to take his knife and chop that carrot or is it easier to have t chefs, each with a knife, passing the carrot back and forth as each takes a cut at it?

Applications can be broken down into threads and each thread can be run sequentially. But breaking a single thread down into multiple "sub threads" will not work because too much data will be dependent and the cores would spend too much time trying to synchronize with each other. All of that overhead would probably make it slower in the long run.

If you need to do a+b c+d e+f and g+h you could do that as four seperate executions on four threads on one cycle.

But if you need to do a+b+c+d+e+f+g+h you need to do it all on one core.

You may do a+b on one core, then shift to the next core and add c to that number, but even so, that is sequential, not parallel.

Simplistic example, but you should get the picture.
post #134 of 162
Quote:
Originally Posted by purpleannex View Post
I'm not making any judgements about who's right or wrong, i'm just showing these two jpegs to demonstrate what Balla is talking about.

Both show turbocore working, it's set at 4.2Ghz. How ever the first image shows the "thread spread", where windows has been left to run the app, with no one core reaching 100% usage, the second shows Superpi set to run on one core, which gives it 100% usage, it also again shows the turbocore working.

Maybe it's all a software issue, maybe if you know how cores each app uses you should set affinities for every app you have.
Or you create a turbo that can work with all cores active and you don't have that problem
post #135 of 162
Here's a more detailed analysis from JF-AMD.

Quote:
Originally Posted by JF-AMD
OK, daddy is going to do some math, everyone follow along please.

First: There is only ONE performance number that has been legally cleared, 16-core Interlagos will give 50% more throughput than 12-core Opteron 6100. This is a statement about throughput and about server workloads only. You CANNOT make any client performance assumptions about that statement.

Now, let's get started.

First, everything that I am about to say below is about THROUGHPUT and throughput is different than speed. If you do not understand that, then please stop reading here.

Second, ALL comparisons are against the same cores, these are not comparison different generations nor are they comparisons against different architectures.

Assume that a processor core has 100% throughput.

Adding a second core to an architecture is typically going to give ~95% greater throughput. There is obviously some overhead because the threads will stall, the threads will wait for each other and the threads may share data. So, two completely independent cores would equal 195% (100% for the first core, 95% for the second core.)


Looking at SPEC int and SPEC FP, Hyperthreading gives you 14% greater throughput for integer and 22% greater throughput for FP. Let's just average the two together.

One core is 100%. Two cores are 118%. Everyone following so far? We have 195% for 2 threads on 2 cores and we have 118% for 2 threads on 1 core.

Now, one bulldozer core is 100%. Running 2 threads on 2 seperate modules would lead to ~195%, it's consistent with running on two independent cores.

Running 2 threads on the same module is ~180%.

You can see why the strategy is more appealing than HT when it comes to threaded workloads. And, yes, the world is becoming more threaded.

Now, where does the 90% come from? What is 180% /2? 90%.

People have argued that there is a 10% overhead for sharing because you are not getting 200%. But, as we saw before, 2 cores actually only equals 195%, so the net per core if you divide the workload is actually 97.5%, so it is roughly a 7-8% delta from just having cores.

Now, before anyone starts complaining about this overhead and saying that AMD is compromising single thread performance (because the fanboys will), keep in mind that a processor with HT equals ~118% for 2 threads, so per thread that equals 59%, so there is a ~36% hit for HT. This is specifically why I think that people need to stay away from talking about it. If you want to pick on AMD for the 7-8%, you have to acknowledge the ~36% hit from HT. But ultimately that is not how people jusdge these things. Having 5 people in a car consumes more gas than driving alone, but nobody talks about the increase in gas consumption because it is so much less than 5 individual cars driving to the same place.

So, now you know the approximate metrics about how the numbers work out. But what does that mean to a processor? Well, let's do some rough math to show where the architecture shines.

An Orochi die has 8 cores. Let's say, for sake of argument, that if we blew up the design and said not modules, only independent cores, we'd end up with about 6 cores.

Now let's compare the two with the assumption that all of the cores are independent on one and in modules on the other. For sake of argument we will assume that all cores scale identically and that all modules scale identically. The fact that incremental cores scale to something less than 100% is already comprehended in the 180% number, so don't fixate on that. In reality the 3rd core would not be at 95% but we are holding that constant for example.

Mythical 6-core bulldozer:
100% + 95% + 95% + 95% + 95% + 95% = 575%

Orochi die with 4 modules:
180% + 180% + 180% + 180% = 720%

What if we had just done a 4 core and added HT (keeping in the same die space):
100% + 95% +95% +95% + 18% + 18% + 18% + 18% = 457%

What about a 6 core with HT (has to assume more die space):
100% + 95% +95% +95% +95% +95% + 18% + 18% + 18% + 18% + 18% + 18% = 683%

(Spoiler alert - this is a comparison using the same cores, do NOT start saying that there is a 25% performance gain over a 6-core Thuban, which I am sure someone is already starting to type.)

The reality is that by making the architecture modular and by sharing some resources you are able to squeeze more throughput out of the design than if you tried to use independent cores or tried to use HT. In the last example I did not take into consideration that the HT circuitry would have delivered an extra 5% circuitry overhead....

Every design has some degree of tradeoff involved, there is no free lunch. The goal behind BD was to increase core count and get more throughput. Because cores scale better than HT, it's the most predictable way to get there.

When you do the math on die space vs. throughput, you find that adding more cores is the best way to get to higher throughput. Taking a small hit on overall performance but having the extra space for additional cores is a much better tradeoff in my mind.

Nothing I have provided above would allow anyone to make a performance estimate of BD vs. either our current architecture or our compeition, so, everyone please use this as a learning experience and do not try to make a performance estimate, OK?
Source
2010rig
(14 items)
 
Galaxy S3
(8 items)
 
 
CPUMotherboardGraphicsRAM
X5660 @ 4.5  ASUS P6X58D-E 980TI? 12GB OCZ Platinum - 7-7-7-21 
Hard DriveCoolingOSMonitor
1 80GB SSD x25m - 3TB F3 + F4 NH-D14 Windows 7 Ultimate LG 47LH55 
KeyboardPowerCaseMouse
Natural Wireless Keyboard Corsair 750HX CM 690 II Advanced MX 518 
CPUGraphicsRAMHard Drive
Snapdragon S4 Dual core 1500mhz Adreno 225 Samsung 2GB 16GB Onboard Flash 
OSMonitorPowerCase
Android 4.4.2 - CM11 4.8" AMOLED 1280x720 2100 mAh battery Otterbox Defender 
  hide details  
Reply
2010rig
(14 items)
 
Galaxy S3
(8 items)
 
 
CPUMotherboardGraphicsRAM
X5660 @ 4.5  ASUS P6X58D-E 980TI? 12GB OCZ Platinum - 7-7-7-21 
Hard DriveCoolingOSMonitor
1 80GB SSD x25m - 3TB F3 + F4 NH-D14 Windows 7 Ultimate LG 47LH55 
KeyboardPowerCaseMouse
Natural Wireless Keyboard Corsair 750HX CM 690 II Advanced MX 518 
CPUGraphicsRAMHard Drive
Snapdragon S4 Dual core 1500mhz Adreno 225 Samsung 2GB 16GB Onboard Flash 
OSMonitorPowerCase
Android 4.4.2 - CM11 4.8" AMOLED 1280x720 2100 mAh battery Otterbox Defender 
  hide details  
Reply
post #136 of 162
Quote:
Originally Posted by JF-AMD View Post
Or you create a turbo that can work with all cores active and you don't have that problem
Yes, but Balla is talking about the issue of Superpi not being run at 100% even though Superpi is supposed to to be a straight forward balls to the wall benchmark, like a single core stress test, as shown by the 100% core usage shown in my second image where I set affinity.



I don't believe Balla is trying to discredit BD or AMD, he just wants to see 100% on his cores when windows is running an app.
Purpleannex
(15 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k Z68 Extreme4gen3 Gainward GTX 470 @ 740Mhz 8GB Mushkin Blackline LV (2x4GB) DDR3 1600MHz 9... 
Hard DriveHard DriveHard DriveOptical Drive
128GB Crucial M225 SSD 2 x 500gb 500GbB samsung 500GbB samsung Samsung SH-B083L 8x BD Combo 
OSMonitorPowerCase
Windows 7 64-bit LG W2486L LED 24" 1080p Antech True power new 750W FT-02 B-W 
  hide details  
Reply
Purpleannex
(15 items)
 
  
CPUMotherboardGraphicsRAM
i5 2500k Z68 Extreme4gen3 Gainward GTX 470 @ 740Mhz 8GB Mushkin Blackline LV (2x4GB) DDR3 1600MHz 9... 
Hard DriveHard DriveHard DriveOptical Drive
128GB Crucial M225 SSD 2 x 500gb 500GbB samsung 500GbB samsung Samsung SH-B083L 8x BD Combo 
OSMonitorPowerCase
Windows 7 64-bit LG W2486L LED 24" 1080p Antech True power new 750W FT-02 B-W 
  hide details  
Reply
post #137 of 162
And I want a pony. Sometimes things don't work out in life.

There is always going to be OS overhead. The original question had nothing to do with superpi or a core running at 100% resources. As a matter of fact, if one thread could be spread over 2 cores, it would be highly unlikely to ever see any core at 100%, ever.

If you want to see a core at 100%, high performance linpack or some of the HPC loads are as close as I have ever seen to peak efficiency.
post #138 of 162
Actually if I want to see 100% I set the affinity to the number of cores the program uses.

Thanks purpleannex, its interesting to see turbo kicking in but having a non turbo'ed core take a slight majority of the load.

It was directed towards this comment:

Quote:
Now, one bulldozer core is 100%. Running 2 threads on 2 seperate modules would lead to ~195%, it's consistent with running on two independent cores.
The original purpose of this thread was does bulldozer have 8 physical cores, the answer is yes, however in order to place more cores on the die per core performance was reduce - this was made point of fact by your own post:

Quote:
Now, one bulldozer core is 100%. Running 2 threads on 2 seperate modules would lead to ~195%, it's consistent with running on two independent cores.

Running 2 threads on the same module is ~180%.
As we've demonstrated a single thread will show activity on multiple cores, that means resources are at play on each core which would place bulldozer in the "180%" more often than not, running a program that is dual threaded would likely never see the 195% +195% (4 core bulldozer) but instead a dual threaded program will probably always run at 180% + 180% as we can all want ponies, but I guess we can't all get them.

So is it 8 true cores, or is it 8 cores operating at lower capacity... My feel is the latter, and it is only an opinion but it is my own.
    
CPUMotherboardGraphicsGraphics
Intel Core i5 2500K P8P67 PRO NVIDIA GeForce GTX 470 NVIDIA GeForce GTX 470 
GraphicsRAMRAMRAM
NVIDIA GeForce 9800 GT G-Skill A-Data G-Skill 
RAMHard DriveOptical DriveOS
A-Data Crucial M4 64GB + 1TB F3 Spinpoint $155 LS/DL DVD RW $?? Windows 8 64-bit "Epic Registry" Edition 
MonitorPowerCase
ASUS 21.5 1920x1080 2ms $135 CORSAIR HX850 $120 Mother Earth $free 
  hide details  
Reply
    
CPUMotherboardGraphicsGraphics
Intel Core i5 2500K P8P67 PRO NVIDIA GeForce GTX 470 NVIDIA GeForce GTX 470 
GraphicsRAMRAMRAM
NVIDIA GeForce 9800 GT G-Skill A-Data G-Skill 
RAMHard DriveOptical DriveOS
A-Data Crucial M4 64GB + 1TB F3 Spinpoint $155 LS/DL DVD RW $?? Windows 8 64-bit "Epic Registry" Edition 
MonitorPowerCase
ASUS 21.5 1920x1080 2ms $135 CORSAIR HX850 $120 Mother Earth $free 
  hide details  
Reply
post #139 of 162
Just to close the loop, I have a blog in the works (maybe live in a week or so) where we had a customer run their app on a 12-core 4P Magny Cours, running 8, 16, 32, 24 and 48 threads.

If you look at the 8 thread result and the 48 thread result and divide by core count, you find that the 48 thread result has a per-core performance at ~94% of the 8 thread per-core number. And that is incredibly high scalability. You'll never see the 100% that people think. Watch for the blog in the near future.
post #140 of 162
Quote:
Originally Posted by JF-AMD View Post
Just to close the loop, I have a blog in the works (maybe live in a week or so) where we had a customer run their app on a 12-core 4P Magny Cours, running 8, 16, 32, 24 and 48 threads.

If you look at the 8 thread result and the 48 thread result and divide by core count, you find that the 48 thread result has a per-core performance at ~94% of the 8 thread per-core number. And that is incredibly high scalability. You'll never see the 100% that people think. Watch for the blog in the near future.
Can you do a blog post with an AMD FX 8130P chip instead?

As exciting as this sounds, we're mostly interested in the performance of the AMD FX CPU's, there may be 1 poster in this thread that is waiting for the server CPU's.
Edited by 2010rig - 5/27/11 at 12:18pm
2010rig
(14 items)
 
Galaxy S3
(8 items)
 
 
CPUMotherboardGraphicsRAM
X5660 @ 4.5  ASUS P6X58D-E 980TI? 12GB OCZ Platinum - 7-7-7-21 
Hard DriveCoolingOSMonitor
1 80GB SSD x25m - 3TB F3 + F4 NH-D14 Windows 7 Ultimate LG 47LH55 
KeyboardPowerCaseMouse
Natural Wireless Keyboard Corsair 750HX CM 690 II Advanced MX 518 
CPUGraphicsRAMHard Drive
Snapdragon S4 Dual core 1500mhz Adreno 225 Samsung 2GB 16GB Onboard Flash 
OSMonitorPowerCase
Android 4.4.2 - CM11 4.8" AMOLED 1280x720 2100 mAh battery Otterbox Defender 
  hide details  
Reply
2010rig
(14 items)
 
Galaxy S3
(8 items)
 
 
CPUMotherboardGraphicsRAM
X5660 @ 4.5  ASUS P6X58D-E 980TI? 12GB OCZ Platinum - 7-7-7-21 
Hard DriveCoolingOSMonitor
1 80GB SSD x25m - 3TB F3 + F4 NH-D14 Windows 7 Ultimate LG 47LH55 
KeyboardPowerCaseMouse
Natural Wireless Keyboard Corsair 750HX CM 690 II Advanced MX 518 
CPUGraphicsRAMHard Drive
Snapdragon S4 Dual core 1500mhz Adreno 225 Samsung 2GB 16GB Onboard Flash 
OSMonitorPowerCase
Android 4.4.2 - CM11 4.8" AMOLED 1280x720 2100 mAh battery Otterbox Defender 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: AMD CPUs
Overclock.net › Forums › AMD › AMD CPUs › Bulldozer Questions from an Intel User.