Well, I wrote a multi-threaded console benchmarker (it multiplies big natural numbers) which stresses the CPU mostly with next instructions (meaning no FPU or Main RAM loads):
It is compiled with latest Intel(R) C++ Compiler XE for applications running on IA-32, Version 12.1 using maximum optimizations.
Thus a new CPU clock pseudo-measure has emerged - MokujINs.
MokujINs stand for number of cycles of main loop of MUL function made per second.
At each iteration/cycle a digit vs digit multiplication is made.
I already gave the C source of MokujIN in 'High-precision program that calculates 2^n ' thread, but here comes the multi-threaded revision.
The logo of revision 4 (4 threads enforced) and the C source are given at:
http://www.sanmayce.com/Downloads/MokujIN_88-A4-pages.pdf
The package (Open Source) is freely downloadable at:
http://www.sanmayce.com/Downloads/MokujIN.zip
In the ZIP archive three folders are given:
- r. 3+ which is the single-thread revision;
- r. 4 which is the 4 threaded revision;
- r. 5 which is the 16 threaded revision.
My ‘Bonboniera’ Core 2 T7500 2200MHz laptop gives 73/140 MegaMokujINs (1thread/2threads).
It is interesting how 16-threaded revision behaves on CPUs having 12 threads only, I guess they will choke the scheduler.
I ran 16-threaded revision on my humble machine (2/2 cores/threads) it was significantly slower than the 4-threaded revision.
Being an AMD fan since my last 'Barton' chip I wonder how AMD's 16 thread capable processors would run my bench:
In order to run it just go to 'MokujIN_r5' folder and start 'RUNME.bat', the output looks like this:
Having read the article 'AMD Bulldozer 16-core server CPUs "trounce" Intel Xeon' makes me eager to see its power in numbers.
"Trounce", ha-ha, I like it that pun.
SOED says:
1. Afflict, distress; discomfit. M16–M17.
2. Beat, thrash, esp. as a punishment. M16.
3. Censure; rebuke or scold severely. E17.
4. Punish severely; (now dial.) punish by legal action or process; indict, sue. Also, get the better of, defeat heavily. M17.
...
2. verb trans. Cause to move rapidly; cause to go. rare. E19.
If that's not progress, I don't know what is. ... Interlagos promises to bring unbeatable price-performance to heavily multithreaded workloads. ... It costs considerably less than its closest Intel counterparts.
In my view MokujIN benchmarker can say something on Opteron vs Xeon topic.
Share your results with us, please.
Code:
movzx
jae
jne
jbe
jb
lea
xor
sub
add
inc
cmp
dec
mov
Thus a new CPU clock pseudo-measure has emerged - MokujINs.
MokujINs stand for number of cycles of main loop of MUL function made per second.
At each iteration/cycle a digit vs digit multiplication is made.
I already gave the C source of MokujIN in 'High-precision program that calculates 2^n ' thread, but here comes the multi-threaded revision.
The logo of revision 4 (4 threads enforced) and the C source are given at:
http://www.sanmayce.com/Downloads/MokujIN_88-A4-pages.pdf
The package (Open Source) is freely downloadable at:
http://www.sanmayce.com/Downloads/MokujIN.zip
In the ZIP archive three folders are given:
- r. 3+ which is the single-thread revision;
- r. 4 which is the 4 threaded revision;
- r. 5 which is the 16 threaded revision.
My ‘Bonboniera’ Core 2 T7500 2200MHz laptop gives 73/140 MegaMokujINs (1thread/2threads).
It is interesting how 16-threaded revision behaves on CPUs having 12 threads only, I guess they will choke the scheduler.
I ran 16-threaded revision on my humble machine (2/2 cores/threads) it was significantly slower than the 4-threaded revision.
Being an AMD fan since my last 'Barton' chip I wonder how AMD's 16 thread capable processors would run my bench:
Code:
MokujIN_r5_16-Threads.exe 2 1048576 /stats
In order to run it just go to 'MokujIN_r5' folder and start 'RUNME.bat', the output looks like this:
Code:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
D:\WorkTemp>cd D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>runme
Revision 3 Single-Thread results:
Computing 2^1048576 took 0,454 seconds with '/TURBO' with Intel v12.1 on T7500 2200MHz.
Computing 2^1048576 took 1,856 seconds without '/TURBO' with Intel v12.1 on T7500 2200MHz.
Computing 2^1048576 took 0,426 seconds with '/TURBO' with Microsoft v16 on T7500 2200MHz.
Computing 2^1048576 took 1,678 seconds without '/TURBO' with Microsoft v16 on T7500 2200MHz.
SHA1 should be:
adebb3aac8ded6438719f8170a455f38dfebaae3
Computing 2^1048576 ...
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>time0<enter 1>TotalTime.txt
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>timer "MokujIN_r5_16-Threads.exe" 2 1048576 /stats
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
MokujIN, Multiplication of INtegers, an OpenMP (multi-threaded) string multiplier, 16 threads enforced, written by Kaze, 2012-Nov-11, revision 5.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 2 digits long: 4 MokujINs i.e. digits per second.
Multiplying performance for operands 3 digits long: 9 MokujINs i.e. digits per second.
Multiplying performance for operands 5 digits long: 25 MokujINs i.e. digits per second.
Multiplying performance for operands 10 digits long: 100 MokujINs i.e. digits per second.
Multiplying performance for operands 20 digits long: 400 MokujINs i.e. digits per second.
Multiplying performance for operands 39 digits long: 1,521 MokujINs i.e. digits per second.
Multiplying performance for operands 78 digits long: 6,084 MokujINs i.e. digits per second.
Multiplying performance for operands 155 digits long: 24,025 MokujINs i.e. digits per second.
Multiplying performance for operands 309 digits long: 95,481 MokujINs i.e. digits per second.
Multiplying performance for operands 617 digits long: 380,689 MokujINs i.e. digits per second.
Multiplying performance for operands 1234 digits long: 1,522,756 MokujINs i.e. digits per second.
Multiplying performance for operands 2467 digits long: 6,086,089 MokujINs i.e. digits per second.
Multiplying performance for operands 4933 digits long: 24,334,489 MokujINs i.e. digits per second.
Multiplying performance for operands 9865 digits long: 97,318,225 MokujINs i.e. digits per second.
Multiplying performance for operands 19729 digits long: 129,744,480 MokujINs i.e. digits per second.
Multiplying performance for operands 39457 digits long: 129,737,904 MokujINs i.e. digits per second.
Multiplying performance for operands 78914 digits long: 127,090,191 MokujINs i.e. digits per second.
Multiplying performance for operands 157827 digits long: 127,088,581 MokujINs i.e. digits per second.
Dumping the result to 'MokujIN.txt' ... OK
Total Time: 261 second(s).
Kernel Time = 0.156 = 0%
User Time = 495.000 = 189%
Process Time = 495.156 = 189%
Global Time = 261.523 = 100%
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>time0<enter 1>>TotalTime.txt
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>sha1sum.exe MokujIN.txt
adebb3aac8ded6438719f8170a455f38dfebaae3 MokujIN.txt
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>type TotalTime.txt
The current time is: 17:05:57.20
Enter the new time:
The current time is: 17:10:18.78
Enter the new time:
D:\Downloads\_2012-Nov-12\_PATCH-Nov-11\D\MokujIN\MokujIN_r5>
Having read the article 'AMD Bulldozer 16-core server CPUs "trounce" Intel Xeon' makes me eager to see its power in numbers.
"Trounce", ha-ha, I like it that pun.
SOED says:
1. Afflict, distress; discomfit. M16–M17.
2. Beat, thrash, esp. as a punishment. M16.
3. Censure; rebuke or scold severely. E17.
4. Punish severely; (now dial.) punish by legal action or process; indict, sue. Also, get the better of, defeat heavily. M17.
...
2. verb trans. Cause to move rapidly; cause to go. rare. E19.
If that's not progress, I don't know what is. ... Interlagos promises to bring unbeatable price-performance to heavily multithreaded workloads. ... It costs considerably less than its closest Intel counterparts.
In my view MokujIN benchmarker can say something on Opteron vs Xeon topic.
Share your results with us, please.











