Sorry about living in the UK, I have to admit it does get annoying when I want to talk to the US personnel on this site, as I'm usually in bed when they do manage to come online.
First off, no I do not work for Intel, I don't think they would let me join their corporation at the age of 17 somehow
However I do extensive work with their processors at the architectural level.
My main focus is on Processor Cache Systems, SIMD and Branch Prediction.
My current engineering project is in regards to the Virtual Address eXtension Architecture. The older members on this forum should know what this is, if they remember it. VAX was the best architecture in existence in my opinion. It defined the computer performance term MIPS.
Xeon's have always troubled me, why design a processor that is very similar to a desktop processor with little differences that will often not be noticeable?
So in the past I have been looking at what the Xeon itself does actually have to offer that would demonstrate a need for its original design.
The first was that Workstation and Server Applications seem to have a much higher requirement for Internal Storage (Cache/DRAM). Therefore the Xeon servers have been designed to offer a more sophisticated caching system to that of desktop units. This reduces the amount of Capacity and Conflict misses.
Obviously when increasing the cache you generate problems. Increasing the cache size on processors means that the processor will have to divert clock cycles to cache allocation to improve its performance.
Obviously you could just reduce the clock speed of these processors. You could do this because the cache runs at full speed e.g. 3 GHz processor = 3GHz cache. If you reduce the clock speed the cache will also run slower and therefore require less cycles diverted to cache allocation. However your processor will also run slower.
Unfortunately I am not a good graphics designer or else I would give you a nice diagram about Capacity Misses. As I am not I will have to explain.
Cache is made up of two separate arrays. It contains the data array and the tag array.
The Data array contains the information that will inform the computer of the size of the Internal Memory (i.e. 512KB, 4096KB etc). It also contains all the data that is being stored.
The Tag array contains all the addresses of the values that are currently stored within the data array. We could call this an index
Each part within the data array contains it own tag which is called a cache line. If a miss occurs during the FETCH procedure the entire cache line has to be replaced.
I don't think I will bore you anymore. The simple fact is you need more cache to reduce this problem, if you do not have enough the entire cache line must be replaced. This obviously takes time and wastes resources.
Xeon processors often use Level 3 cache as it is faster to transfer data between L3 and Main Memory.
Cache is involved here again. Inside both of these processors is a special type of cache called Branch Prediction Cache.
Branch prediction is simply what the name suggests. The processor simply guesses which instruction should be fetched next.
Large amounts of instructions are branches, therefore a pipeline processor would encounter issues with branched instructions. This is why we have branch prediction technologies implemented.
If we did not have branch prediction we would encounter numerous pipeline breaks. Therefore a massive latency would be detected because the entire branch must be resolved before continuing.
The branch prediction system can obviously make mistakes and incorrect instructions can be fetched. The performance penalty is less than if none had been predicted in the first place however
The lost cycles due to this problem are called Branch Mispredict Penalties.
Note: Two bit prediction is most commonly used. The cache memory stores two bit counters for each recently accessed branch. This is called The Branch Prediction History Table. You don't really need to know much about this, unless you want me to explain.
Now onto the Xeon issue, instead of giving you a rundown of Branch Prediction.
Have you heard of the SPEC (Standard Performance Evaluation Corporation)benchmarking system?
Within this benchmarking system is a benchmark to determine Prediction Accuracy.
I myself have tested this on my Core 2 Duo E6600 and a Intel Xeon 3070.
Application Used: Modified SPEC95 (x86, IA-32[e])
Test Used: 4096 Entry. 2-bit Branch Prediction Cache
Core 2 Duo E6600 Score: ~87% Accuracy
Xeon 3070 Score: ~96% Accuracy
Therefore as this test has been repeated several times and an average derived it is logical to say that the Xeon's ability to work with Branch Prediction is greater, therefore greater performance is obtained.
Not enough room in this post to add information on SIMD