Originally Posted by rluker5
Here's an example of that trick I told Hwgeek: I ran my living room itx at 4.2ghz with xtu internal benchmark then cpuz benchmark, but it wasn't long enough so I stressed it for a while to get stable values. I then reduced the core clock to 3.3ghz and adjusted the volts to be the same and ran the xtu bench, then cpuz stress test. The xtu bench is all jumpy so I'll use the cpuz stress test numbers as an example.
Core tdp during 4.2Ghz stress was 85w, during 3.3Ghz was 68w. (85/68)*3.3=4.125 so apparently the chip is more efficient at 4.2ghz at the same volts, but not by much. The power scaling with instructions per second is near linear with the same cpu, software.
How does a higher ipc, with all else equal, resulting in higher ips consume power differently than higher clocks, with all else equal, resulting in higher ips? I've seen with my various atom mini pcs(z8300,z8500,n4100) that their performance scales close to linear with power. This 5775c test seems to agree. Maybe I'm missing something. It's easy to replicate with all sorts of software and hardware though. Makes the problem of making cpus go faster a little tougher if every instruction carries heat with it.
Intel has been getting more efficient over the years and AMD really has with Zen. And once they get the bugs ironed out 10nm should be able to efficiency away the extra heat from extra ips.
is correct, more IPC does not inherently mean more power consumption. At the end of the day it is more likely to reduce consumption over time instead, or at least keep Perf/Watt the same.
If a Job takes 20 cycles and 10 of those are waiting for data, 5 are for operation one, and five are for operation two, then there are a few ways that you can optimize IPC.
If you reduce the waiting time for Data from 10 cycles to 8, then you can reduce Time on Task by 10%, from 20 to 18 cycles. This may bump the power usage slightly in long-term loads similar to HyperThreading (but the job gets done faster), but for short term loads it adheres to the typical race-to-sleep. During that 8 or 10 cycles wait it can not clock down much, but as soon as the whole job is done, it can, and it finishes sooner. this enables the CPU to clock down for longer periods of time by finishing it's work faster.
If you reduce the time it takes to complete operations one and two from 5 cycles to 4 cycles, then you actually get more performance for free. Fewer cycles used means fewer transistors flipped provided the same number of transistors are used, and that translates to both more performance and/or less power consumption, again depending on if it is a burst or long term load.
Inversely, if you find a way to get the same job done in fewer cycles with fewer
transistors, you can do the same job both faster and for less power, which leads into;
If you create a new extension, such as AVX-256, then perhaps your two 5-cycle (128-bit) operations can be done in one 5-cycle operation. This will not affect all workloads, but it is targeted to specific jobs. It uses more transistors, but only when required to do so, and not always double the transistors for double the work. Doing this in our hypothetical 20 cycle example can reduce the workload to just 15 cycles. Once again, in the race to sleep, this is prefered, but in long term tasks, the total power over time will work out the same but the job finishes sooner; more performance for free.
All of these have the possibility to increase power consumption under load, but generally do so for an equal trade in time reduction. In our imperfect world however, we very rarely load an entire CPU to 100%, which is why we have SMT/HT. People who run their CPUs to the max typically only care about getting the job done, and more IPC does that. Those who use their CPUs in bursts, like Mobile, care about the race to sleep to get total consumption over time
down. Remember that while increasing clocks to do the job is linear in it's power consumption, you need to increase voltage too, and that makes it logarithmic.
Final example for power over time and race to sleep;
CPU draws 100w under load.
CPU draws 5w idle.
CPU takes 1 hour to complete the job each day.
CPU draws 100wh + (23)5wh each day = 215wh/day.
Increase IPC by 50% with linear increase in consumption for equal work;
CPU draws 150w under load.
CPU draws 5w idle.
CPU takes 40m to complete the job each day.
CPU draws (.67)150wh + (23.3)5wh each day = 216.7wh/day
+1.7wh/day to get the job done in 2/3rds the time? Seems a good deal for me, and fantastic for anything that runs on a battery. They can chose to run a mix of efficiency improving IPC and performance improving IPC to get the desired result.
This is why AMD, Intel, and even Qualcomm are willing to give such higher boost clocks, if temporarily, on parts with such low base clocks. The idle power is so much less than the load power that it makes more sense to get the job done in a fraction the time, even at several times the cost during that time, than to stay in the "medium" power state for any length of time.
And these numbers are not that far from reality. Remember that while we may give the 9900k flack for being as hot as it is, it still only draws 8w idle;
And some quick-search reading material on short load power consumption efficiencies;
And of course, finally, if you can increase IPC enough, then you can reduce clocks to hit a performance target, which means you can reduce voltage, which means Perf/Watt goes up massively.
TL;DR, making CPUs is hard, and this is only a fraction of what really defines perf/watt and perf/time. Looking at only IPC in a static vacuum is a very limited viewpoint that doesn't apply to reality. Node updates, node shrinks, process improvements, materials improvements, external efficiency improvements such as ram or storage... it all plays a part.
Originally Posted by rluker5
My comparison has same node, same volts, same hardware, same software, same external environment. Yours spans over modified process nodes and different volts. With all else equal, except for instructions per second, my comparison still stands and you can verify it yourself on any cpu (that you have) you choose. More instructions per second means more power consumed. More ipc is more instructions per second just like higher clocks or higher load.
Edit: After another quick test it appears that some methods of increasing ipc may be more efficient. I did a similar test ht on vs off to just increase ipc with a particular method and got a 26% performance improvement for 15% more power. Which is better than the closer to near 0 performance/power improvement I see comparing a z8500 atom to a n4100 atom I have. They would still need a more efficient node if you were to increase the heat output of a 9900ks by 10% though.
Pics of my quick test below.
Yea, you got it.