Guys, it seems you're overrating HT in real life applications a little bit and thinking that once, let us assume, 100% CPU usage is reached, the benefit from HT will be definitely gained. Not necessarily. First of all, what is a Hyper Threading? It's a processors ability to improve parallelization of computation and it means:
Let's say, we have single core CPU with hyperthreadng. As we know, CPU performs all it's operations in various registers (like Address registers, Conditional registers, Vector registers...). Speaking generally, if program does it's work in Vector registers and suddenly requests data from "slow" RAM (RAM is slow in comparison with CPU's cache speed, so while data is being transferred from it, CPU remains idle), and, let's say, there's some work to do in Floating Point Unit registers , HT with appropriate program's CPU time handler can switch from vector register to FPU and thus improve the performance. In this case HT performance benefit is biggest (like 35%), but these situations are mainly seen just in benchmarks (3DMark Vantage CPU Test, Cinebench 11.5, SANDRA...) In other words, benchmark include code which deliberately makes one registers to idle so that HT would make a big impact on benchmark results. I've also seen such performance gains in real word applications, but they're quite rare. Mostly HT yields about 10 - 15% of additional performance over non-HT CPUs BECAUSE:
1) When CPU works, it stores most frequently used in its cache. Let's say, thread 1 is waiting for the data to be transferred from RAM and thread 2 is performing other task. It this particular situation thread 2 is likely to delete thread's 1 previously stored data on L2 cache. Later, thread 1 will have to transfer same data from RAM once again. This definitely costs time and reduces HT efficiency.
2) Let's say, threads 1 works all the time and doesn't leave CPU's time for thread 2. Under these circumstances, HT efficiency will be 10 - 15%. If programmer uses BGH algorithm, HT won't help at all.
As far as I know, benchmarks like 3DMark06 CPU test, Cinebench R10 were coded for multicore CPU's without concerning HT and performance gains proves to be ~10 - 15%.
To sum up:
In order to take full potential of HT, programs must be additionally optimized, but that usually takes more time and is not always possible.
P.S Sorry if you find the explanation a little messy
Edited by Hey_Hi_Hello - 2/24/10 at 12:52pm