Overclock.net › Forums › Benchmarks › Benchmarking Software and Discussion › Benchmarking the fastest hash function
New Posts  All Forums:Forum Nav:

Benchmarking the fastest hash function

post #1 of 24
Thread Starter 
Edit (2012-Oct-31): My latest (and most versatile) test package is 34MB .ZIP file at:
http://www.sanmayce.com/Fastest_Hash/index.html#NightLightSky

For a long time looking on tests I couldn't find some answers to some very basic (but/and important) aspects of CPU/RAM performance.
Here I am talking about hashing performed by a console tool (written in C).
Hashing & Searching are FUNDAMENTAL in computer craft, not knowing some machine capabilities regarding these two aspects is a shame.

For example I have no opportunity to run my tests on some real powerhouse, this limits my quest of writing the fastest hasher (in C) function because i5/i7 have very different behavior (compared to Core 2) when comes to 1/2/4 bytes fetching and utilizing (auto parallel execution) with several hash lines.

You all are welcome to use my latest benchmark at:
http://www.sanmayce.com/Fastest_Hash/index.html#Yoshimitsu
Of course in order to obtain decent results stop all the concurrent processes before running the test.

I am a humble owner of mainstream laptop with Core 2 T7500 2200MHz and DDR2, the results on my machine:
Code:
//OS: Windows XP 32bit
//Motherboard Name: Toshiba Satellite L305
//CPU Type: Mobile DualCore Intel Core 2 Duo T7500
//CPU Alias: Merom
//CPU Clock: 2194.7 MHz (original: 2200 MHz)
//CPU Multiplier: 11x
//CPU FSB: 199.5 MHz (original: 200 MHz)
//Memory Bus: 332.5 MHz
//L1 Code Cache: 32 KB per core
//L1 Data Cache: 32 KB per core
//L2 Cache: 4 MB (On-Die, ECC, ASC, Full-Speed)
//Memory Timings: 5-5-5-13 (CL-RCD-RP-RAS)
//Instruction Set: x86, x86-64, MMX, SSE, SSE2, SSE3, SSSE3
//Transistors: 291 million
//Process Technology: 8M, 65 nm, CMOS, Cu, Low-K Inter-Layer, 2nd Gen Strained Si
//Front Side Bus Properties:
//   Bus Type: Intel AGTL+
//   Bus Width: 64-bit
//   Real Clock: 200 MHz (QDR)
//   Effective Clock: 800 MHz
//   Bandwidth: 6400 MB/s
//Memory Bus Properties:
//   Bus Type: Dual DDR2 SDRAM
//   Bus Width: 128-bit
//   DRAM:FSB Ratio: 10:6
//   Real Clock: 333 MHz (DDR)
//   Effective Clock: 667 MHz
//   Bandwidth: 10667 MB/s
//
//And the results:
//
//Hashing a 64MB block 1024 times i.e. 64GB ...
//FNV1A_Yoshimitsu: (64MB block); 65536MB hashed in 16875 clocks or 3.884MB per clock
//FNV1A_Yorikke: (64MB block); 65536MB hashed in 16782 clocks or 3.905MB per clock
//CRC_SlicingBy8K2: (64MB block); 65536MB hashed in 66390 clocks or 0.987MB per clock
//
//Hashing a 10MB block 8*1024 times ...
//FNV1A_Yoshimitsu: (10MB block); 81920MB hashed in 20610 clocks or 3.975MB per clock
//FNV1A_Yorikke: (10MB block); 81920MB hashed in 20546 clocks or 3.987MB per clock
//CRC_SlicingBy8K2: (10MB block); 81920MB hashed in 82938 clocks or 0.988MB per clock
//
//Hashing a 5MB block 8*1024 times ...
//FNV1A_Yoshimitsu: (5MB block); 40960MB hashed in 9562 clocks or 4.284MB per clock
//FNV1A_Yorikke: (5MB block); 40960MB hashed in 9531 clocks or 4.298MB per clock
//CRC_SlicingBy8K2: (5MB block); 40960MB hashed in 41110 clocks or 0.996MB per clock
//
//Hashing a 2MB block 8*1024 times ...
//FNV1A_Yoshimitsu: (2MB block); 16384MB hashed in 2578 clocks or 6.355MB per clock
//FNV1A_Yorikke: (2MB block); 16384MB hashed in 2657 clocks or 6.166MB per clock
//CRC_SlicingBy8K2: (2MB block); 16384MB hashed in 16156 clocks or 1.014MB per clock
//
//Hashing a 16KB block 1024*1024 times ...
//FNV1A_Yoshimitsu: (16KB block); 16384MB hashed in 2437 clocks or 6.723MB per clock
//FNV1A_Yorikke: (16KB block); 16384MB hashed in 2547 clocks or 6.433MB per clock
//CRC_SlicingBy8K2: (16KB block); 16384MB hashed in 16078 clocks or 1.019MB per clock



Also I would be glad for some feedback and results on your machines (especially on monsters with 40+GB/s bandwidth).
Edited by Sanmayce - 10/31/12 at 8:22am
post #2 of 24
Thread Starter 
Just received results on OLD FAST CPU Core 2 Q9550S, yet I need i5/i7 results:


Code:
Hashing a 64MB block 1024 times i.e. 64GB ...
FNV1A_Yoshimitsu: (64MB block); 65536MB hashed in 11015 clocks or 5.950MB per clock
FNV1A_Yorikke: (64MB block); 65536MB hashed in 11125 clocks or 5.891MB per clock
CRC_SlicingBy8K2: (64MB block); 65536MB hashed in 51469 clocks or 1.273MB per clock

Hashing a 10MB block 8*1024 times ...
FNV1A_Yoshimitsu: (10MB block); 81920MB hashed in 13813 clocks or 5.931MB per clock
FNV1A_Yorikke: (10MB block); 81920MB hashed in 14047 clocks or 5.832MB per clock
CRC_SlicingBy8K2: (10MB block); 81920MB hashed in 64421 clocks or 1.272MB per clock

Hashing a 5MB block 8*1024 times ...
FNV1A_Yoshimitsu: (5MB block); 40960MB hashed in 5204 clocks or 7.871MB per clock
FNV1A_Yorikke: (5MB block); 40960MB hashed in 5234 clocks or 7.826MB per clock
CRC_SlicingBy8K2: (5MB block); 40960MB hashed in 31328 clocks or 1.307MB per clock

Hashing a 2MB block 8*1024 times ...
FNV1A_Yoshimitsu: (2MB block); 16384MB hashed in 2047 clocks or 8.004MB per clock
FNV1A_Yorikke: (2MB block); 16384MB hashed in 2062 clocks or 7.946MB per clock
CRC_SlicingBy8K2: (2MB block); 16384MB hashed in 12485 clocks or 1.312MB per clock

Hashing a 16KB block 1024*1024 times ...
FNV1A_Yoshimitsu: (16KB block); 16384MB hashed in 1875 clocks or 8.738MB per clock
FNV1A_Yorikke: (16KB block); 16384MB hashed in 1953 clocks or 8.389MB per clock
CRC_SlicingBy8K2: (16KB block); 16384MB hashed in 12359 clocks or 1.326MB per clock
post #3 of 24
OMG, I am downloading _kaze_hash whatever. Dear sir your website is a mess. biggrin.gif
Normally you dont paste a bazillion lines of code on it...
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
post #4 of 24
Sorry for the double post - my aiming was not as good as my reaction biggrin.gif
Hm, I ran the hash_benchmark batch file.
test_results.txt 61k .txt file

Just a side note - San Disk did some very nice researches. Much better results if this program uses the gpu.

For all OCN fellas, this is not a virus or doing something bad. Just calculating some hash values.

@Sanmayce, thanks for the batch files included. Since I am in love with batch files I guess I will find something funky in there that I did not know. Cheers buddy!
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
post #5 of 24
Thread Starter 
>... your website is a mess.
Agreed, he-he.

Here I wanted to test the small file 57KB long:
http://www.sanmayce.com/Fastest_Hash/HASH_linearspeed_Yoshimitsu_vs_CRC32_FURY.zip
by clicking on the first/only picture.

Step 1:
Just enter the command prompt either by using your Windows shortcut or the supplied in the archive 'Yorikke prompt'.
Step 2:
Run the .EXE
Step 3:
Copy the resultant text from the console and share with us.

You are downloading the MAIN benchmark 100MB which gives a clear picture of how many top-gun hash functions behave.
I would be glad to see the resultant .TXT file as well but this would take more than a hour, so it is up to you.

Thanks for your readiness to benchmark it.
Sadly I have to go now.
post #6 of 24
Thread Starter 
Thanks man,
just want to be sure:
the computer used for the test, is it your Redemption and the current clock is it 3.2GHz?
post #7 of 24
Quote:
Originally Posted by Sanmayce View Post

Thanks man,
just want to be sure:
the computer used for the test, is it your Redemption and the current clock is it 3.2GHz?
Sir, yes sir. And you are welcome. I will run the other test late.
Got to watch a soccer game wink.gif
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
post #8 of 24
Ok, here is the run.
Something is bogus there, I dont run XP and I dont have DDR2 and so on.
But anyway, here you go results.txt 4k .txt file
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
post #9 of 24
Thread Starter 
Thanks Mr.Eiht,
nothing is buggy, the text above the results is static i.e. it is the dump on my laptop given in order to juxtapose current results with mine.

>Just a side note - San Disk did some very nice researches. Much better results if this program uses the gpu.
Yes GPUs are awesome, sadly I have no knowledge of their APIs, but one note from me: the nasty disadvantage is their "errorful" memory - they are bound to produce errors/artefacts since the speed there has far greater priority than a few miscalculated pixels, for more info you can check my favorite compressor using GPU acceleration:
http://encode.ru/threads/586-bsc-new-block-sorting-compressor/page9

Here it is interesting to compare the four outputs obtained so far:
SLOWEST - Core 2 T7500 2200MHz, CPU bus 200MHz, RAM bus 2x333MHz (DDR2, Dual Channel)
SLOWER - Core 2 Q9550S, 2833MHz, CPU bus 333MHz, RAM bus 2x667MHz (DDR3, Dual Channel)
SLOW - i7-3930K, 3200MHz, CPU bus 100MHz, RAM bus 1333MHz (DDR3, Quad Channel)
Not SLOW - i7-3930K, 4500MHz, CPU bus 125MHz, RAM bus 2400MHz (DDR3, Quad Channel)

Core 2 T7500 2200MHz:
FNV1A_Yoshimitsu: (64MB block); 65536MB hashed in 16875 clocks or 3.884MB per clock
FNV1A_Yorikke: (64MB block); 65536MB hashed in 16782 clocks or 3.905MB per clock
CRC_SlicingBy8K2: (64MB block); 65536MB hashed in 66390 clocks or 0.987MB per clock

FNV1A_Yoshimitsu: (16KB block); 16384MB hashed in 2437 clocks or 6.723MB per clock
FNV1A_Yorikke: (16KB block); 16384MB hashed in 2547 clocks or 6.433MB per clock
CRC_SlicingBy8K2: (16KB block); 16384MB hashed in 16078 clocks or 1.019MB per clock

Core 2 Q9550S, 2833MHz:
FNV1A_Yoshimitsu: (64MB block); 65536MB hashed in 11015 clocks or 5.950MB per clock
FNV1A_Yorikke: (64MB block); 65536MB hashed in 11125 clocks or 5.891MB per clock
CRC_SlicingBy8K2: (64MB block); 65536MB hashed in 51469 clocks or 1.273MB per clock

FNV1A_Yoshimitsu: (16KB block); 16384MB hashed in 1875 clocks or 8.738MB per clock
FNV1A_Yorikke: (16KB block); 16384MB hashed in 1953 clocks or 8.389MB per clock
CRC_SlicingBy8K2: (16KB block); 16384MB hashed in 12359 clocks or 1.326MB per clock

i7-3930K, 3200MHz:
FNV1A_Yoshimitsu: (64MB block); 65536MB hashed in 7941 clocks or 8.253MB per clock
FNV1A_Yorikke: (64MB block); 65536MB hashed in 7586 clocks or 8.639MB per clock
CRC_SlicingBy8K2: (64MB block); 65536MB hashed in 40354 clocks or 1.624MB per clock

FNV1A_Yoshimitsu: (16KB block); 16384MB hashed in 1560 clocks or 10.503MB per clock
FNV1A_Yorikke: (16KB block); 16384MB hashed in 1425 clocks or 11.498MB per clock
CRC_SlicingBy8K2: (16KB block); 16384MB hashed in 9778 clocks or 1.676MB per clock

i7-3930K, 4500MHz:
FNV1A_Yoshimitsu: (64MB block); 65536MB hashed in 6330 clocks or 10.353MB per clock
FNV1A_Yorikke: (64MB block); 65536MB hashed in 6004 clocks or 10.915MB per clock
CRC_SlicingBy8K2: (64MB block); 65536MB hashed in 33915 clocks or 1.932MB per clock

FNV1A_Yoshimitsu: (16KB block); 16384MB hashed in 1323 clocks or 12.384MB per clock
FNV1A_Yorikke: (16KB block); 16384MB hashed in 1203 clocks or 13.619MB per clock
CRC_SlicingBy8K2: (16KB block); 16384MB hashed in 8258 clocks or 1.984MB per clock

My understanding is that my test stresses mostly the CPU bus i.e. CPU bandwidth not CPU clock not RAM clock so much.
Obviously the bottleneck here is exactly the slow fetching i.e. feeding those 79bytes of code (the main loop of Yoshimitsu) with 8 DWORDS accesses i.e. 4x8bytes of data.
In my view i7-3930K executes the code most effectively but when it comes to bandwidth those 49GB/s (as Sandra says) with theoretical MAX of 52GB/s are reduced to 10GB/s, I am still not clearly seeing for the cause.

Is the CPU FSB the major factor for hash linear speed?!
Edited by Sanmayce - 10/29/12 at 7:24am
post #10 of 24
If you want to I can run another test with 125MHz or anything higher (if my CPU survives that biggrin.gif ) and lower the multiplier so I still have 3.2GHz.
This way you could directly compare 3.2GHz@100MHz vs. 3.2GHz@125MHz.
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
/* Redemption*/
(14 items)
 
  
CPUMotherboardGraphicsRAM
I7 3930K Asus Sabertooth Asus GTX 680 8x4GB G.Skill@1337MHz 
Hard DriveOptical DriveCoolingOS
2xM4 64GB/ / F3 - 1TB / 2x2TB Baracudas some LG Modified EK 360 HFX 2x(Win7 x64) 
MonitorKeyboardPowerCase
SyncMaster P2770HD and SyncMaster 940NW Roccat Isku Corsair Gold AX750 NZXT 810 Switch 
MouseMouse Pad
Rocat Kone[+] Razer exactmat X 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
Overclock.net › Forums › Benchmarks › Benchmarking Software and Discussion › Benchmarking the fastest hash function