Overclock.net › Forums › Benchmarks › Benchmarking Software and Discussion › Benchmarking the fastest hash function
New Posts  All Forums:Forum Nav:

Benchmarking the fastest hash function - Page 3

post #21 of 24
Thread Starter 
I am very grateful to Fantasy for his help, on his monstrous machine FNV1A_Yoshimura slashes much faster than FNV1A_Yorikke while featuring better dispersion (less collisions):



HASH_linearspeed_FURY_Intel_IA-32_12:
Code:
Fetching/Hashing a 64MB block 1024 times i.e. 64GB ...
BURST_Read_4DWORDS: (64MB block); 65536MB fetched in 4584 clocks or 14.297MB per clock
BURST_Read_3DWORDS: (64MB block); 65536MB fetched in 4645 clocks or 14.109MB per clock
FNV1A_YoshimitsuTRIAD: (64MB block); 65536MB hashed in 5623 clocks or 11.655MB per clock
FNV1A_Yorikke: (64MB block); 65536MB hashed in 6212 clocks or 10.550MB per clock
FNV1A_Yoshimura: (64MB block); 65536MB hashed in 5329 clocks or 12.298MB per clock
CRC32_SlicingBy8K2: (64MB block); 65536MB hashed in 37555 clocks or 1.745MB per clock

Fetching/Hashing a 10MB block 8*1024 times ...
BURST_Read_4DWORDS: (10MB block); 81920MB fetched in 4726 clocks or 17.334MB per clock
BURST_Read_3DWORDS: (10MB block); 81920MB fetched in 4850 clocks or 16.891MB per clock
FNV1A_YoshimitsuTRIAD: (10MB block); 81920MB hashed in 6363 clocks or 12.874MB per clock
FNV1A_Yorikke: (10MB block); 81920MB hashed in 7173 clocks or 11.421MB per clock
FNV1A_Yoshimura: (10MB block); 81920MB hashed in 6121 clocks or 13.383MB per clock
CRC32_SlicingBy8K2: (10MB block); 81920MB hashed in 46394 clocks or 1.766MB per clock

Fetching/Hashing a 5MB block 8*1024 times ...
BURST_Read_4DWORDS: (5MB block); 40960MB fetched in 2046 clocks or 20.020MB per clock
BURST_Read_3DWORDS: (5MB block); 40960MB fetched in 2091 clocks or 19.589MB per clock
FNV1A_YoshimitsuTRIAD: (5MB block); 40960MB hashed in 2877 clocks or 14.237MB per clock
FNV1A_Yorikke: (5MB block); 40960MB hashed in 3333 clocks or 12.289MB per clock
FNV1A_Yoshimura: (5MB block); 40960MB hashed in 2929 clocks or 13.984MB per clock
CRC32_SlicingBy8K2: (5MB block); 40960MB hashed in 22909 clocks or 1.788MB per clock

Fetching/Hashing a 2MB block 32*1024 times ...
BURST_Read_4DWORDS: (2MB block); 65536MB fetched in 3207 clocks or 20.435MB per clock
BURST_Read_3DWORDS: (2MB block); 65536MB fetched in 3296 clocks or 19.883MB per clock
FNV1A_YoshimitsuTRIAD: (2MB block); 65536MB hashed in 4554 clocks or 14.391MB per clock
FNV1A_Yorikke: (2MB block); 65536MB hashed in 5285 clocks or 12.400MB per clock
FNV1A_Yoshimura: (2MB block); 65536MB hashed in 4630 clocks or 14.155MB per clock
CRC32_SlicingBy8K2: (2MB block); 65536MB hashed in 36538 clocks or 1.794MB per clock

Fetching/Hashing a 128KB block 512*1024 times ...
BURST_Read_4DWORDS: (128KB block); 65536MB fetched in 2433 clocks or 26.936MB per clock
BURST_Read_3DWORDS: (128KB block); 65536MB fetched in 2627 clocks or 24.947MB per clock
FNV1A_YoshimitsuTRIAD: (128KB block); 65536MB hashed in 4388 clocks or 14.935MB per clock
FNV1A_Yorikke: (128KB block); 65536MB hashed in 5163 clocks or 12.693MB per clock
FNV1A_Yoshimura: (128KB block); 65536MB hashed in 4553 clocks or 14.394MB per clock
CRC32_SlicingBy8K2: (128KB block); 65536MB hashed in 36238 clocks or 1.808MB per clock

Fetching/Hashing a 16KB block 4*1024*1024 times ...
BURST_Read_4DWORDS: (16KB block); 65536MB fetched in 1968 clocks or 33.301MB per clock
BURST_Read_3DWORDS: (16KB block); 65536MB fetched in 2600 clocks or 25.206MB per clock
FNV1A_YoshimitsuTRIAD: (16KB block); 65536MB hashed in 4393 clocks or 14.918MB per clock
FNV1A_Yorikke: (16KB block); 65536MB hashed in 5126 clocks or 12.785MB per clock
FNV1A_Yoshimura: (16KB block); 65536MB hashed in 4551 clocks or 14.400MB per clock
CRC32_SlicingBy8K2: (16KB block); 65536MB hashed in 36227 clocks or 1.809MB per clock

I thank Przemyslaw Skibinski and Maciej Adamczyk (m^2) for their 64bit testbench which I included along with the 32bit by Peter Kankowski in the benchmark:
http://www.sanmayce.com/Fastest_Hash/DOUBLOON_hash_micro-package_r3.zip

I wrote revision 3 of FNV1A_Tesla as 64bit counterpart of FNV1A_Yoshimura and included them into the 64bit linear speed test by Przemyslaw and Maciej, the results are (I threw the 200MB at the hashers):

As console screenshots:





As console text dumps:
Code:
E:\DOUBLOON_hash_micro-package_r3>RUNME_64bit.BAT

E:\DOUBLOON_hash_micro-package_r3>benchmark_Intel_12.1_O2.exe CityHash128 CityHash64 SpookyHash fnv1a-jesteress fnv1a-yoshimura fnv1a-tesla3 xxhash-fast xxhash-strong xxhash256 -i77 200MB_as_one_line.TXT
memcpy: 108 ms, 209715202 bytes = 1851 MB/s
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
CityHash128                             1.0.3
  209715218 (x 1.000)      3333 MB/s 3333 MB/s      273e15 273e15
CityHash64                              1.0.3
  209715210 (x 1.000)      3333 MB/s 3389 MB/s      273e15 277e15
SpookyHash                              2012-03-30
  209715218 (x 1.000)      4081 MB/s 4081 MB/s      334e15 334e15
fnv1a-jesteress                         v2
  209715206 (x 1.000)      3333 MB/s 3333 MB/s      273e15 273e15
fnv1a-yoshimura                         v2
  209715206 (x 1.000)      4166 MB/s 4166 MB/s      341e15 341e15
fnv1a-tesla3                            v2
  209715210 (x 1.000)      4347 MB/s 4347 MB/s      356e15 356e15
xxhash-fast                             r3
  209715206 (x 1.000)      4000 MB/s 4000 MB/s      327e15 327e15
xxhash-strong                           r3
  209715206 (x 1.000)      2816 MB/s 2816 MB/s      230e15 230e15
xxhash256                               r3
  209715234 (x 1.000)      4166 MB/s 4166 MB/s      341e15 341e15
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
done... (77x1 iteration(s)).

E:\DOUBLOON_hash_micro-package_r3>benchmark_Intel_12.1_O3.exe CityHash128 CityHash64 SpookyHash fnv1a-jesteress fnv1a-yoshimura fnv1a-tesla3 xxhash-fast xxhash-strong xxhash256 -i77 200MB_as_one_line.TXT
memcpy: 109 ms, 209715202 bytes = 1834 MB/s
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
CityHash128                             1.0.3
  209715218 (x 1.000)      3278 MB/s 3333 MB/s      268e15 273e15
CityHash64                              1.0.3
  209715210 (x 1.000)      3333 MB/s 3278 MB/s      273e15 268e15
SpookyHash                              2012-03-30
  209715218 (x 1.000)      3278 MB/s 3278 MB/s      268e15 268e15
fnv1a-jesteress                         v2
  209715206 (x 1.000)      3333 MB/s 3333 MB/s      273e15 273e15
fnv1a-yoshimura                         v2
  209715206 (x 1.000)      3921 MB/s 3921 MB/s      321e15 321e15
fnv1a-tesla3                            v2
  209715210 (x 1.000)      4166 MB/s 4166 MB/s      341e15 341e15
xxhash-fast                             r3
  209715206 (x 1.000)      3636 MB/s 3636 MB/s      297e15 297e15
xxhash-strong                           r3
  209715206 (x 1.000)      2777 MB/s 2777 MB/s      227e15 227e15
xxhash256                               r3
  209715234 (x 1.000)      3773 MB/s 3773 MB/s      309e15 309e15
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
done... (77x1 iteration(s)).

E:\DOUBLOON_hash_micro-package_r3>benchmark_Intel_12.1_fast.exe CityHash128 CityHash64 SpookyHash fnv1a-jesteress fnv1a-yoshimura fnv1a-tesla3 xxhash-fast xxhash-strong xxhash256 -i77 200MB_as_one_line.TXT
memcpy: 110 ms, 209715202 bytes = 1818 MB/s
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
CityHash128                             1.0.3
  209715218 (x 1.000)      2380 MB/s 2380 MB/s      195e15 195e15
CityHash64                              1.0.3
  209715210 (x 1.000)      2105 MB/s 2105 MB/s      172e15 172e15
SpookyHash                              2012-03-30
  209715218 (x 1.000)      3508 MB/s 3508 MB/s      287e15 287e15
fnv1a-jesteress                         v2
  209715206 (x 1.000)      3389 MB/s 3389 MB/s      277e15 277e15
fnv1a-yoshimura                         v2
  209715206 (x 1.000)      4000 MB/s 4000 MB/s      327e15 327e15
fnv1a-tesla3                            v2
  209715210 (x 1.000)      4255 MB/s 4255 MB/s      348e15 348e15
xxhash-fast                             r3
  209715206 (x 1.000)      3773 MB/s 3773 MB/s      309e15 309e15
xxhash-strong                           r3
  209715206 (x 1.000)      2777 MB/s 2777 MB/s      227e15 227e15
xxhash256                               r3
  209715234 (x 1.000)      3921 MB/s 3921 MB/s      321e15 321e15
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
done... (77x1 iteration(s)).

E:\DOUBLOON_hash_micro-package_r3>benchmark_Microsoft_VS2010_Ox.exe CityHash128 CityHash64 SpookyHash fnv1a-jesteress fnv1a-yoshimura fnv1a-tesla3 xxhash-fast xxhash-strong xxhash256 -i77 200MB_as_one_line.TXT
memcpy: 111 ms, 209715202 bytes = 1801 MB/s
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
CityHash128                             1.0.3
  209715218 (x 1.000)      4444 MB/s 4444 MB/s      364e15 364e15
CityHash64                              1.0.3
  209715210 (x 1.000)      4255 MB/s 4255 MB/s      348e15 348e15
SpookyHash                              2012-03-30
  209715218 (x 1.000)      4081 MB/s 4081 MB/s      334e15 334e15
fnv1a-jesteress                         v2
  209715206 (x 1.000)      3333 MB/s 3278 MB/s      273e15 268e15
fnv1a-yoshimura                         v2
  209715206 (x 1.000)      4166 MB/s 4166 MB/s      341e15 341e15
fnv1a-tesla3                            v2
  209715210 (x 1.000)      4347 MB/s 4347 MB/s      356e15 356e15
xxhash-fast                             r3
  209715206 (x 1.000)      4255 MB/s 4255 MB/s      348e15 348e15
xxhash-strong                           r3
  209715206 (x 1.000)      2857 MB/s 2857 MB/s      234e15 234e15
xxhash256                               r3
  209715234 (x 1.000)      4255 MB/s 4255 MB/s      348e15 348e15
Codec                                   version      args
C.Size      (C.Ratio)        C.Speed   D.Speed      C.Eff. D.Eff.
done... (77x1 iteration(s)).

E:\DOUBLOON_hash_micro-package_r3>

FNV1A_Yoshimura is simply DIAMANTINE.

Fantasy I salute you with one of my close-to-heart movies/songs ever:

Mimino - Chito Grito simgera


The movie is full-of-heart, the story is about one rural civilian chopper pilot who dreams about "GREAT AVIATION", after many tricks by destiny Mimino finally becomes a co-pilot of Tu-144 (the Concord original) just to find that his real place is in beloved home where everything is TRUTHFUL without any hypocrisy and deceit - where the little bird outwith is a hawk within.

The lyrics are about a very small bird, so INSIGNIFICANT as Mimino says.
post #22 of 24
Thread Starter 
It's time for XMM funk, that is, to see how SSE2/SSE4.1 boosts the current fastest 32bit hash function.
Hold fast your wig: FNV1A_YoshimitsuTRIADiiXMM is 894% faster than CRC32_SlicingBy8 !!!

The test package used for next two dumps (C source and bandwidth speed along collisions tortures) at:
http://www.sanmayce.com/Fastest_Hash/index.html#XMM

I wrote FNV1A_YoshimitsuTRIADiiXMM (the 32bit C code using XMM registers i.e. SSE2 FNV1A_YoshimitsuTRIADii counterpart) which gave on my 'Bonboniera' Core 2 T7500 2200MHz:
Code:
Info1: One second seems to have 998 clocks.
Info2: This CPU seems to be working at 2,191 MHz.

Fetching/Hashing a 64MB block 1024 times i.e. 64GB ...
BURST_Read_4DWORDS:         (64MB block); 65536MB fetched in 15132 clocks or 4.331MB per clock
BURST_Read_8DWORDSi:        (64MB block); 65536MB fetched in 13946 clocks or 4.699MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (64MB block); 65536MB hashed in 13572 clocks or  4.829MB per clock !!! FLASHY-SLASHY: OUTSPEEDS THE INTERLEAVED 8x4 READ !!!
FNV1A_YoshimitsuTRIADii:    (64MB block); 65536MB hashed in 14399 clocks or  4.551MB per clock
FNV1A_YoshimitsuTRIAD:      (64MB block); 65536MB hashed in 15912 clocks or  4.119MB per clock
FNV1A_Yorikke:              (64MB block); 65536MB hashed in 16427 clocks or  3.990MB per clock
FNV1A_Yoshimura:            (64MB block); 65536MB hashed in 14555 clocks or  4.503MB per clock
CRC32_SlicingBy8K2:         (64MB block); 65536MB hashed in 71588 clocks or  0.915MB per clock

Fetching/Hashing a 2MB block 32*1024 times ...
BURST_Read_4DWORDS:         (2MB block); 65536MB fetched in 9532 clocks or 6.875MB per clock
BURST_Read_8DWORDSi:        (2MB block); 65536MB fetched in 9844 clocks or 6.657MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (2MB block); 65536MB hashed in 7332 clocks or  8.938MB per clock !!! COMMENTLESS !!!
FNV1A_YoshimitsuTRIADii:    (2MB block); 65536MB hashed in 10155 clocks or 6.454MB per clock
FNV1A_YoshimitsuTRIAD:      (2MB block); 65536MB hashed in 9766 clocks or  6.711MB per clock
FNV1A_Yorikke:              (2MB block); 65536MB hashed in 10171 clocks or 6.443MB per clock
FNV1A_Yoshimura:            (2MB block); 65536MB hashed in 10717 clocks or 6.115MB per clock
CRC32_SlicingBy8K2:         (2MB block); 65536MB hashed in 69764 clocks or 0.939MB per clock

Fetching/Hashing a 16KB block 4*1024*1024 times ...
BURST_Read_4DWORDS:         (16KB block); 65536MB fetched in 7863 clocks or 8.335MB per clock
BURST_Read_8DWORDSi:        (16KB block); 65536MB fetched in 7894 clocks or 8.302MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (16KB block); 65536MB hashed in 6973 clocks or  9.399MB per clock !!! WIGGING-OUT: 894% faster than CRC32_SlicingBy8 !!!
FNV1A_YoshimitsuTRIADii:    (16KB block); 65536MB hashed in 8892 clocks or  7.370MB per clock
FNV1A_YoshimitsuTRIAD:      (16KB block); 65536MB hashed in 9110 clocks or  7.194MB per clock
FNV1A_Yorikke:              (16KB block); 65536MB hashed in 9657 clocks or  6.786MB per clock
FNV1A_Yoshimura:            (16KB block); 65536MB hashed in 9734 clocks or  6.733MB per clock
CRC32_SlicingBy8K2:         (16KB block); 65536MB hashed in 69342 clocks or 0.945MB per clock

And the test run on Intel Q9550S:
Code:
Info1: One second seems to have 1,000 clocks.
Info2: This CPU seems to be working at 2,833 MHz.

Fetching/Hashing a 64MB block 1024 times i.e. 64GB ...
BURST_Read_4DWORDS:         (64MB block); 65536MB fetched in 10954 clocks or 5.983MB per clock
BURST_Read_8DWORDSi:        (64MB block); 65536MB fetched in 11015 clocks or 5.950MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (64MB block); 65536MB hashed in 9938 clocks or   6.594MB per clock  !!!  NOT BAD FOR Core 2  !!!
FNV1A_YoshimitsuTRIADii:    (64MB block); 65536MB hashed in 11406 clocks or  5.746MB per clock  ???  Much slower than non-interleaved, who knows why  ???
FNV1A_YoshimitsuTRIAD:      (64MB block); 65536MB hashed in 11047 clocks or  5.932MB per clock
FNV1A_Yorikke:              (64MB block); 65536MB hashed in 11390 clocks or  5.754MB per clock
FNV1A_Yoshimura:            (64MB block); 65536MB hashed in 11782 clocks or  5.562MB per clock
CRC32_SlicingBy8K2:         (64MB block); 65536MB hashed in 55125 clocks or  1.189MB per clock

Fetching/Hashing a 2MB block 32*1024 times ...
BURST_Read_4DWORDS:         (2MB block); 65536MB fetched in 7593 clocks or   8.631MB per clock
BURST_Read_8DWORDSi:        (2MB block); 65536MB fetched in 8125 clocks or   8.066MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (2MB block); 65536MB hashed in 5797 clocks or   11.305MB per clock
FNV1A_YoshimitsuTRIADii:    (2MB block); 65536MB hashed in 8391 clocks or    7.810MB per clock  ???  Much slower than non-interleaved, who knows why  ???
FNV1A_YoshimitsuTRIAD:      (2MB block); 65536MB hashed in 7766 clocks or    8.439MB per clock
FNV1A_Yorikke:              (2MB block); 65536MB hashed in 8250 clocks or    7.944MB per clock
FNV1A_Yoshimura:            (2MB block); 65536MB hashed in 8843 clocks or    7.411MB per clock
CRC32_SlicingBy8K2:         (2MB block); 65536MB hashed in 53625 clocks or   1.222MB per clock

Fetching/Hashing a 16KB block 4*1024*1024 times ...
BURST_Read_4DWORDS:         (16KB block); 65536MB fetched in 6110 clocks or 10.726MB per clock
BURST_Read_8DWORDSi:        (16KB block); 65536MB fetched in 6109 clocks or 10.728MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (16KB block); 65536MB hashed in 5359 clocks or  12.229MB per clock
FNV1A_YoshimitsuTRIADii:    (16KB block); 65536MB hashed in 6891 clocks or   9.510MB per clock  !!!  FEELS LIKE IT SHOULD  !!!
FNV1A_YoshimitsuTRIAD:      (16KB block); 65536MB hashed in 7078 clocks or   9.259MB per clock
FNV1A_Yorikke:              (16KB block); 65536MB hashed in 7844 clocks or   8.355MB per clock
FNV1A_Yoshimura:            (16KB block); 65536MB hashed in 7515 clocks or   8.721MB per clock
CRC32_SlicingBy8K2:         (16KB block); 65536MB hashed in 53438 clocks or  1.226MB per clock

Well, the next boost can be done using YMM registers i.e. AVX.
post #23 of 24
Thread Starter 
False humbleness aside, my latest FASTEST hash console benchmark emerged, it is a must-know.

http://www.sanmayce.com/Fastest_Hash/HASH_linearspeed_FURY_Intel_32bit_64bit_PENUMBRA.zip

For those who want to see the C source and the Intel ASM code:
http://www.sanmayce.com/Fastest_Hash/index.html#PENUMBRA

My 'Bonboniera' laptop results of above benchmark:
Code:
The 32bit results, HASH_linearspeed_FURY_Intel_IA-32_12.exe:

Memory pool starting address: 00AC0040 ... 64 byte aligned, OK

Info1: One second seems to have 998 clocks.
Info2: This CPU seems to be working at 2,191 MHz.

Fetching/Hashing a 64MB block 1024 times i.e. 64GB ...
XXH_256:                    (64MB block); 65536MB hashed in 35787 clocks or 1.831MB/1.831MB per clock
FNV1A_penumbra:             (64MB block); 65536MB hashed in 14445 clocks or 4.537MB/4.581MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (64MB block); 65536MB hashed in 14056 clocks or 4.662MB/4.752MB per clock
FNV1A_YoshimitsuTRIADii:    (64MB block); 65536MB hashed in 14773 clocks or 4.436MB/4.488MB per clock
FNV1A_YoshimitsuTRIAD:      (64MB block); 65536MB hashed in 16115 clocks or 4.067MB/4.087MB per clock
FNV1A_Yoshimura:            (64MB block); 65536MB hashed in 14914 clocks or 4.394MB/4.436MB per clock
CRC32_SlicingBy8K2:         (64MB block); 65536MB hashed in 71573 clocks or 0.916MB/0.916MB per clock

Fetching/Hashing a 2MB block 32*1024 times ...
XXH_256:                    (2MB block); 65536MB hashed in 33212 clocks or  1.973MB/ 1.972MB per clock
FNV1A_penumbra:             (2MB block); 65536MB hashed in 6568 clocks or   9.978MB/10.025MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (2MB block); 65536MB hashed in 7316 clocks or   8.958MB/ 8.976MB per clock
FNV1A_YoshimitsuTRIADii:    (2MB block); 65536MB hashed in 9750 clocks or   6.722MB/ 6.854MB per clock
FNV1A_YoshimitsuTRIAD:      (2MB block); 65536MB hashed in 9750 clocks or   6.722MB/ 6.722MB per clock
FNV1A_Yoshimura:            (2MB block); 65536MB hashed in 10311 clocks or  6.356MB/ 6.483MB per clock
CRC32_SlicingBy8K2:         (2MB block); 65536MB hashed in 69763 clocks or  0.939MB/ 0.940MB per clock

Fetching/Hashing a 16KB block 4*1024*1024 times ...
XXH_256:                    (16KB block); 65536MB hashed in 33415 clocks or 1.961MB/ 1.957MB per clock
FNV1A_penumbra:             (16KB block); 65536MB hashed in 5819 clocks or 11.262MB/11.293MB per clock  !!! Giga Shadow !!!
FNV1A_YoshimitsuTRIADiiXMM: (16KB block); 65536MB hashed in 6973 clocks or  9.399MB/ 9.399MB per clock
FNV1A_YoshimitsuTRIADii:    (16KB block); 65536MB hashed in 8908 clocks or  7.357MB/ 7.370MB per clock
FNV1A_YoshimitsuTRIAD:      (16KB block); 65536MB hashed in 8986 clocks or  7.293MB/ 7.294MB per clock
FNV1A_Yoshimura:            (16KB block); 65536MB hashed in 9688 clocks or  6.765MB/ 6.754MB per clock
CRC32_SlicingBy8K2:         (16KB block); 65536MB hashed in 69467 clocks or 0.943MB/ 0.949MB per clock

Thanks to xxhash256 (written by Cyan and m^2) an intriguing face-off is on the card: XXH_256 vs FNV1A_penumbra.
Both functions are hitting the ceiling. They give an idea and most importantly the linear speed of hashing in L1/L2 cache.

The whole test is 32/64 bit and coming with its source.
I am very interested in seeing how the newest CPUs like HASWELL do hashing, please share your 'RESULTS.TXT' here.
post #24 of 24
Thread Starter 
Having read some informative articles like this showing clearly the advantages of HASWELL against Sandy Bridge I wrote an XMMless 32bit function in C utilizing those 4 ALUs:



Lamely Core i3 2310M (Sandy Bridge) possesses only 3 ALUs, let's see how lame the speed hurt is gonna be:
Code:
Info1: One second seems to have 1,000 clocks.
Info2: This CPU seems to be working at 2,095 MHz.

Fetching/Hashing a 256bytes block 256*1024*1024 times ...
BURST_Read_8DWORDSi:        (256bytes block); 65536MB fetched in 6735 clocks or 9.731MB per clock
BURST_Read_4DWORDS:         (256bytes block); 65536MB fetched in 6937 clocks or 9.447MB per clock
FNV1A_YoshimitsuTRIADiiXMM: (256bytes block); 65536MB hashed in 8719 clocks or  7.516MB per clock
FNV1A_penumbra:             (256bytes block); 65536MB hashed in 8875 clocks or  7.384MB per clock
FNV1A_YoshimitsuTRIAD:      (256bytes block); 65536MB hashed in 12187 clocks or 5.378MB per clock
FNV1A_Yoshimura:            (256bytes block); 65536MB hashed in 12469 clocks or 5.256MB per clock
FNV1A_Yorikke:              (256bytes block); 65536MB hashed in 13015 clocks or 5.035MB per clock
FNV1A_YoshimitsuTRIADii:    (256bytes block); 65536MB hashed in 13094 clocks or 5.005MB per clock
FNV1A_farolito:             (256bytes block); 65536MB hashed in 14063 clocks or 4.660MB per clock !!! Sandy Bridge i3's 3 ALUs disappoint !!!
CRC32_SlicingBy8K2:         (256bytes block); 65536MB hashed in 78406 clocks or 0.836MB per clock

This test on HASWELL will show whether that 4th ALU is added really, ha-ha.

My expectations are that FNV1A_farolito will shine on HASWELL.
Edited by Sanmayce - 7/21/13 at 10:54am
New Posts  All Forums:Forum Nav:
  Return Home
Overclock.net › Forums › Benchmarks › Benchmarking Software and Discussion › Benchmarking the fastest hash function