Well, I wrote a multi-threaded console benchmarker (it multiplies big natural numbers) which stresses the CPU mostly with next instructions (meaning no FPU or Main RAM loads):
It is compiled with latest Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1 using maximum optimizations.

Thus a new CPU clock pseudo-measure has emerged -

At each iteration/cycle a digit vs digit multiplication is made.

I already gave the C source of

The logo of revision 4 (4 threads enforced) and the C source are given at:

http://www.sanmayce.com/Downloads/MokujIN_88-A4-pages.pdf

The package (Open Source) is freely downloadable at:

http://www.sanmayce.com/Downloads/MokujIN.zip

In the ZIP archive three folders are given:

- r. 3+ which is the single-thread revision;

- r. 4 which is the 4 threaded revision;

- r. 5 which is the 16 threaded revision.

My ‘Bonboniera’ Core 2 T7500 2200MHz laptop gives 73/140 MegaMokujINs (1thread/2threads).

It is interesting how 16-threaded revision behaves on CPUs having 12 threads only, I guess they will choke the scheduler.

I ran 16-threaded revision on my humble machine (2/2 cores/threads) it was significantly slower than the 4-threaded revision.

Being an AMD fan since my last 'Barton' chip I wonder how AMD's 16 thread capable processors would run my bench:

In order to run it just go to 'MokujIN_r5' folder and start 'RUNME.bat', the output looks like this:

Having read the article 'AMD Bulldozer 16-core server CPUs "trounce" Intel Xeon' makes me eager to see its power in numbers.

"Trounce", ha-ha, I like it that pun.

SOED says:

1. Afflict, distress; discomfit. M16–M17.

2. Beat, thrash, esp. as a punishment. M16.

3. Censure; rebuke or scold severely. E17.

4. Punish severely; (now dial.) punish by legal action or process; indict, sue. Also, get the better of, defeat heavily. M17.

...

2. verb trans. Cause to move rapidly; cause to go. rare. E19.

In my view

Share your results with us, please.

Code:

```
movzx
jae
jne
jbe
jb
lea
xor
sub
add
inc
cmp
dec
mov
```

Thus a new CPU clock pseudo-measure has emerged -

**MokujINs**.**MokujINs**stand for number of cycles of main loop of MUL function made per second.At each iteration/cycle a digit vs digit multiplication is made.

I already gave the C source of

**MokujIN**in 'High-precision program that calculates 2^n ' thread, but here comes the multi-threaded revision.The logo of revision 4 (4 threads enforced) and the C source are given at:

http://www.sanmayce.com/Downloads/MokujIN_88-A4-pages.pdf

The package (Open Source) is freely downloadable at:

http://www.sanmayce.com/Downloads/MokujIN.zip

In the ZIP archive three folders are given:

- r. 3+ which is the single-thread revision;

- r. 4 which is the 4 threaded revision;

- r. 5 which is the 16 threaded revision.

My ‘Bonboniera’ Core 2 T7500 2200MHz laptop gives 73/140 MegaMokujINs (1thread/2threads).

It is interesting how 16-threaded revision behaves on CPUs having 12 threads only, I guess they will choke the scheduler.

I ran 16-threaded revision on my humble machine (2/2 cores/threads) it was significantly slower than the 4-threaded revision.

Being an AMD fan since my last 'Barton' chip I wonder how AMD's 16 thread capable processors would run my bench:

Code:

`MokujIN_r5_16-Threads.exe 2 1048576 /stats`

In order to run it just go to 'MokujIN_r5' folder and start 'RUNME.bat', the output looks like this:

Code:

```
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
D:\WorkTemp>cd D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>runme
Revision 3 Single-Thread results:
Computing 2^1048576 took 0,454 seconds with '/TURBO' with Intel v12.1 on T7500 2200MHz.
Computing 2^1048576 took 1,856 seconds without '/TURBO' with Intel v12.1 on T7500 2200MHz.
Computing 2^1048576 took 0,426 seconds with '/TURBO' with Microsoft v16 on T7500 2200MHz.
Computing 2^1048576 took 1,678 seconds without '/TURBO' with Microsoft v16 on T7500 2200MHz.
SHA1 should be:
adebb3aac8ded6438719f8170a455f38dfebaae3
Computing 2^1048576 ...
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>time0<enter 1>TotalTime.txt
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>timer "MokujIN_r5_16-Threads.exe" 2 1048576 /stats
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
MokujIN, Multiplication of INtegers, an OpenMP (multi-threaded) string multiplier, 16 threads enforced, written by Kaze, 2012-Nov-11, revision 5.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 2 digits long: 4 MokujINs i.e. digits per second.
Multiplying performance for operands 3 digits long: 9 MokujINs i.e. digits per second.
Multiplying performance for operands 5 digits long: 25 MokujINs i.e. digits per second.
Multiplying performance for operands 10 digits long: 100 MokujINs i.e. digits per second.
Multiplying performance for operands 20 digits long: 400 MokujINs i.e. digits per second.
Multiplying performance for operands 39 digits long: 1,521 MokujINs i.e. digits per second.
Multiplying performance for operands 78 digits long: 6,084 MokujINs i.e. digits per second.
Multiplying performance for operands 155 digits long: 24,025 MokujINs i.e. digits per second.
Multiplying performance for operands 309 digits long: 95,481 MokujINs i.e. digits per second.
Multiplying performance for operands 617 digits long: 380,689 MokujINs i.e. digits per second.
Multiplying performance for operands 1234 digits long: 1,522,756 MokujINs i.e. digits per second.
Multiplying performance for operands 2467 digits long: 6,086,089 MokujINs i.e. digits per second.
Multiplying performance for operands 4933 digits long: 24,334,489 MokujINs i.e. digits per second.
Multiplying performance for operands 9865 digits long: 97,318,225 MokujINs i.e. digits per second.
Multiplying performance for operands 19729 digits long: 129,744,480 MokujINs i.e. digits per second.
Multiplying performance for operands 39457 digits long: 129,737,904 MokujINs i.e. digits per second.
Multiplying performance for operands 78914 digits long: 127,090,191 MokujINs i.e. digits per second.
Multiplying performance for operands 157827 digits long: 127,088,581 MokujINs i.e. digits per second.
Dumping the result to 'MokujIN.txt' ... OK
Total Time: 261 second(s).
Kernel Time = 0.156 = 0%
User Time = 495.000 = 189%
Process Time = 495.156 = 189%
Global Time = 261.523 = 100%
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>time0<enter 1>>TotalTime.txt
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>sha1sum.exe MokujIN.txt
adebb3aac8ded6438719f8170a455f38dfebaae3 MokujIN.txt
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>type TotalTime.txt
The current time is: 17:05:57.20
Enter the new time:
The current time is: 17:10:18.78
Enter the new time:
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>
```

Having read the article 'AMD Bulldozer 16-core server CPUs "trounce" Intel Xeon' makes me eager to see its power in numbers.

"Trounce", ha-ha, I like it that pun.

SOED says:

1. Afflict, distress; discomfit. M16–M17.

2. Beat, thrash, esp. as a punishment. M16.

3. Censure; rebuke or scold severely. E17.

4. Punish severely; (now dial.) punish by legal action or process; indict, sue. Also, get the better of, defeat heavily. M17.

...

2. verb trans. Cause to move rapidly; cause to go. rare. E19.

*If that's not progress, I don't know what is. ... Interlagos promises to bring unbeatable price-performance to heavily multithreaded workloads. ... It costs considerably less than its closest Intel counterparts.*In my view

**MokujIN**benchmarker can say something on**Opteron**vs**Xeon**topic.Share your results with us, please.