Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › Application Programming › FAST 'on the fly' fuzzy string matching console tool written in C
New Posts  All Forums:Forum Nav:

FAST 'on the fly' fuzzy string matching console tool written in C - Page 2

post #11 of 32
Thread Starter 
Guys,
speed is religion for people who love to see fuzzy blurry views, at least once in a while, that very moment when life appears passing before your eyes.

Speed is beauty.
Everytime when I encounter speed unappreciation it ends up the unappreciators were extra dumbs or zombified ex persons, got it? Zombie speed lovers, heeeeeeeeeeee-he-he.

It takes no brainer to spot the obvious trend: the more one person is involved in mundane boring stuff in his miserable life the less joy from life he draws.
Speedy experiences return the joy in life, help one to feel alive again.

As a kid I was in France (in Havre mostly, sadly for a short time) staying at one lovely family (Ivve (or Ivon) the father, Josefine (or Jasmine) the mother, and their kids: Patrick, Frank, Mikkel (not Michelle), name of the sister cannot (dummy me) recall, hi to all from George).
Good people, very hospitable, Josefine gave me as a memento her velour pocket-book (accordion-like), she just throw all cards, notes and documents out of it - a lifelong memento indeed, joyous time it was, one of my remembrances was how we travel with their old Peugeot combi (to a picnic and a regional bicycle racing - the biggest brother being a good bicyclist took part of the event) for a half an hour at 140 km/h steadily - that was my ground topspeed at the time, a week later we took a proto-TGV class train (to Paris) and for the first time I saw from the window how the cars driving along the track were as if crawling (even stopped), I was told the speed was 230km/h.
That view imprinted in my mind how the speedy car-ride we made a week ago was no match to TGV-ride, and vivid mixed feelings of how my topfast (has become an illusion) was outspeeded smoothly (no shaking and wheel hits into gaps (our railway has free space between sections to compensate temperature effects)).

Twenty five years later those immovable cars are before my eyes.

Bulgarian rail:
Code:
------] [---------------------------------] [-------
      ] [                                 ] [ 
      ] [                                 ] [
------] [---------------------------------] [-------

Somewhere I saw the French railway uses different solution (is it so? I regret that I didn't look at the station back then):
French rail?:
Code:
------\ \---------------------------------\ \-------
       \ \                                 \ \
        \ \                                 \ \
---------\ \---------------------------------\ \----

AFAIK, the Japanese railway uses welded rails, no gaps, how do they compensate?!

Pretty much the limbo between loading (still not threaded in r.2) and parsing and searching resembles those gaps, in order to achieve higher speeds the gaps should be missing i.e. need for speed becomes need for seamless track.

I intend to write a mix of GRAFFITH & GALADRIEL, thus allowing superfast wildcards (7+1) and Levenshtein searches, I will call it TSUBAME|HIKARI|NOZOMI, still not decided, it's hard to choose.

Tsubame (つばめ) swallow (bird)
Hikari (ひかり) (a ray of) light
Nozomi (のぞみ) wish, hope


To see Japan behind Nozomi "300-series" windows - this old shinkansen is my favorite - is an old dream of mine.


I have a dream add-on: to stick around Tsubame, to touch its nose (and to ride shotgun from predawn to postdusk, traversing all the way from Kagoshima to Morioka) - to me this is not a train - it is a living thing, however outside Japan such hardcore zen sounds stupid to not FINE persons like me, hai.
Wow! I am really foxy-'n'-fuzzy, the previous sentence says that I might be both 'FINE' and 'not FINE', my wording is mutsi.
False humbleness aside, I am FINE, and it can't be otherwise - my mother's name is FINA.

To be lifeful, that is the point.

And a sight for sore eyes:
TGV speaks, see at 1:08 the blurry vista:


Eleven beauties:


I changed my mind, my next tool will be named Kazahana:


風花 Kazahana : a natural phenomenon meaning the snow in clearday.
post #12 of 32
Thread Starter 
Fastest 'structureless' fuzzy string matcher?!

I'm no longer frightened
No more sleepless nights
No more bruises on my soul
Troubles out of sight

I have no more sorrow
No more search for light
Got my train of inner thoughts
Back on the tracks of life

Livin' my life again
Livin' my life again
Swept away by the wind of change
Taking time to rearrange


/SYLVER - LIVIN' MY LIFE lyrics/



Brutally tortured wildcard etude ...
Code:
E:\_Kaze_Kazahana>dir Kazahana*.exe
 Volume in drive E is SSD_Sanmayce
 Volume Serial Number is 9CF6-FEA3

 Directory of E:\_Kaze_Kazahana

02/04/2013  08:43 AM           413,184 Kazahana_r1-_HEXADECAD-Threads_IntelV12.exe
02/04/2013  08:42 AM           495,616 Kazahana_r1-_HEXADECAD-Threads_IntelV12_64bit.exe
02/04/2013  08:43 AM           114,688 Kazahana_r1-_MONAD-Thread_IntelV12.exe
02/04/2013  08:42 AM           132,608 Kazahana_r1-_MONAD-Thread_IntelV12_64bit.exe
               4 File(s)      1,156,096 bytes
               0 Dir(s)  10,398,699,520 bytes free

E:\_Kaze_Kazahana>Kazahana_r1-_HEXADECAD-Threads_IntelV12.exe
Kazahana, an x-gram suggester using wildcards & Levenshtein Distance (Wagner-Fischer), revision 1-, copyleft Sanmayce 2013-Feb-03.
Usage: Kazahana [AtMostLevenshteinDistance] xgram xgramfile
Note1: Incoming xgram could be up to 1008/126 chars for wildcards/Levenshtein respectively.
Note2: Incoming xgramfile could be bigger than 4GB.
Note3: Each line should end with [CR]LF, that is Windows or/and UNIX style.
Note4: Seven wildcards are available:
       wildcard '*' any character(s) or empty,
       wildcard '@'/'#' any character {or empty}/{and not empty},
       wildcard '^'/'$' any ALPHA character {or empty}/{and not empty},
       wildcard '|'/'~' any NON-ALPHA character {or empty}/{and not empty}.
       TO-DO: wildcard '+'/'`' any WORD {or empty}/{and not empty}.
Example1: E:\>Kazahana 0 ramjet MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example2: E:\>Kazahana 3 psychedlicize MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example3: E:\>Kazahana "psyched^^^^^^ize^" MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd

E:\_Kaze_Kazahana>TESTbigrams.bat
Note: The output is NOT redirected to RESULTS.TXT, the test takes 6- minutes.


E:\_Kaze_Kazahana>timer Kazahana_r1-_MONAD-Thread_IntelV12.exe 3 00,000,00?_optimized_already 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, an x-gram suggester using wildcards & Levenshtein Distance (Wagner-Fischer), revision 1-, copyleft Sanmayce 2013-Feb-03.
Enforcing MONAD i.e. single-thread ...
Allocating memory 8MB ... OK
/; 00,000,125,805 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 35,116,064/11,432,185/1
Kazahana: Performance: 121 KB/clock
Kazahana: Performance: 4,924 xgrams/clock

Kernel Time  =     0.733 =    9%
User Time    =     6.754 =   90%
Process Time =     7.488 =   99%
Global Time  =     7.492 =  100%

E:\_Kaze_Kazahana>timer Kazahana_r1-_HEXADECAD-Threads_IntelV12.exe 3 00,000,00?_optimized_already 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, an x-gram suggester using wildcards & Levenshtein Distance (Wagner-Fischer), revision 1-, copyleft Sanmayce 2013-Feb-03.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating memory 8MB ... OK
/; 00,000,211,863 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 35,116,064/11,432,185/1
Kazahana: Performance: 205 KB/clock
Kazahana: Performance: 8,303 xgrams/clock

Kernel Time  =     0.826 =   16%
User Time    =     8.392 =  165%
Process Time =     9.219 =  181%
Global Time  =     5.074 =  100%

E:\_Kaze_Kazahana>timer Kazahana_r1-_MONAD-Thread_IntelV12.exe "~,~~~,~~~@^^^^^^^^optimiz^^^^_^^$^^$^^$^^" 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, an x-gram suggester using wildcards & Levenshtein Distance (Wagner-Fischer), revision 1-, copyleft Sanmayce 2013-Feb-03.
Enforcing MONAD i.e. single-thread ...
Allocating memory 8MB ... OK
/; 00,000,003,955 bytes/clock
Kazahana: Total/Dumped xgrams: 35,116,064/1,920
Kazahana: Performance: 3 KB/clock
Kazahana: Performance: 154 xgrams/clock

Kernel Time  =     0.748 =    0%
User Time    =   226.825 =   99%
Process Time =   227.574 =   99%
Global Time  =   227.619 =  100%

E:\_Kaze_Kazahana>timer Kazahana_r1-_HEXADECAD-Threads_IntelV12.exe "~,~~~,~~~@^^^^^^^^optimiz^^^^_^^$^^$^^$^^" 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, an x-gram suggester using wildcards & Levenshtein Distance (Wagner-Fischer), revision 1-, copyleft Sanmayce 2013-Feb-03.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating memory 8MB ... OK
/; 00,000,007,596 bytes/clock
Kazahana: Total/Dumped xgrams: 35,116,064/1,920
Kazahana: Performance: 7 KB/clock
Kazahana: Performance: 297 xgrams/clock

Kernel Time  =     1.279 =    1%
User Time    =   233.923 =  197%
Process Time =   235.202 =  198%
Global Time  =   118.227 =  100%

Comparing files 1 and 2
FC: no differences encountered

Comparing files 3 and 4
FC: no differences encountered

E:\_Kaze_Kazahana>

The performance on my 'Bonboniera' laptop, for above heavily wildcarded example, is 297,000 xgrams/lines per second which is 7 MB/s.
Wildcard code loads the CPU very well, my estimation is 16 threads capable CPU to achieve eight times as much i.e. 8x297,000 xgrams/lines.
Currently I am using recursive functions, I hate them, iterative (with my own simulated stack) inlined code should replace them.

Many thanks go to Igor Pavlov, the 7zip developer.
Edited by Sanmayce - 2/4/13 at 8:37am
post #13 of 32
This has got to be the most random thread about a program ever created... tongue.gif
Perpetual Upgrade
(17 items)
 
Server
(17 items)
 
Galago UltraPro
(9 items)
 
CPUMotherboardGraphicsRAM
i7-4770K MSI Z97M Gaming Zotac GTX 1080 AMP! Edition (2x4GB) Corsair DDR3-2000 
Hard DriveHard DriveCoolingCooling
128GB Crucial M4 (2x) 500GB RAID 0 Swiftech Apogee Black Ice GT Stealth 240 
OSKeyboardPowerCase
Windows 10 Pro 64bit Corsair K70 Vengence Seasonic X650 Aerocool DS Cube 
MouseAudio
Logitech G500 ASUS Xonar DX 
CPUMotherboardGraphicsRAM
Phenom II X4 965 MSI 870A-G54 nVidia 8400GS (2x2GB) Patriot DDR3-1600 
RAMHard DriveHard DriveCooling
(2x4GB) Patriot DDR3-1600 (3x) 320GB RAID 5 (1x) 1TB Backup Storage Coolermaster TX3 
OSPowerOther
Proxmox Hypervisor Antec TruePower 430W HP Smart Array P400 
CPUGraphicsRAMHard Drive
Intel i7-4750HQ Intel Iris Pro Graphics 5200  (2 x 4GB) DDR3-1600 90GB Intel mSATA SSD 
Hard DriveOSOSMonitor
500GB 5400RPM HDD Ubuntu Gnome 15.10 Windows 10 14" 1080p ColorPro IPS 
Case
Galago UltraPro 
  hide details  
Reply
Perpetual Upgrade
(17 items)
 
Server
(17 items)
 
Galago UltraPro
(9 items)
 
CPUMotherboardGraphicsRAM
i7-4770K MSI Z97M Gaming Zotac GTX 1080 AMP! Edition (2x4GB) Corsair DDR3-2000 
Hard DriveHard DriveCoolingCooling
128GB Crucial M4 (2x) 500GB RAID 0 Swiftech Apogee Black Ice GT Stealth 240 
OSKeyboardPowerCase
Windows 10 Pro 64bit Corsair K70 Vengence Seasonic X650 Aerocool DS Cube 
MouseAudio
Logitech G500 ASUS Xonar DX 
CPUMotherboardGraphicsRAM
Phenom II X4 965 MSI 870A-G54 nVidia 8400GS (2x2GB) Patriot DDR3-1600 
RAMHard DriveHard DriveCooling
(2x4GB) Patriot DDR3-1600 (3x) 320GB RAID 5 (1x) 1TB Backup Storage Coolermaster TX3 
OSPowerOther
Proxmox Hypervisor Antec TruePower 430W HP Smart Array P400 
CPUGraphicsRAMHard Drive
Intel i7-4750HQ Intel Iris Pro Graphics 5200  (2 x 4GB) DDR3-1600 90GB Intel mSATA SSD 
Hard DriveOSOSMonitor
500GB 5400RPM HDD Ubuntu Gnome 15.10 Windows 10 14" 1080p ColorPro IPS 
Case
Galago UltraPro 
  hide details  
Reply
post #14 of 32
Quote:
Originally Posted by SectorNine50 View Post

This has got to be the most random thread about a program ever created... tongue.gif

Well it doesn't help that it's basically just an advertising thread by someone who doesn't care about any discussion outside of "wow, you've revolutionised my life". Never mind the fact that command line tools like this are ten to the dozen - so it's not even as exciting as reinventing the wheel.

I probably should report this thread as I'm fairly sure self promotion like this is against the T&Cs, but Sanmayce, while odd, isn't actually doing any harm.
post #15 of 32
Thread Starter 
@Plan9
Man, you are wrong!
Take a minute and look up who you are talking to.

>... just an advertising thread by someone who doesn't care about any discussion ...
And this said by a man to whom I wanted to help, what a ungratefulness.
How can you say such a horrible lie?
I am not advertising, I am sharing a free tool and offering my experience in text processing freely to all.
You hardly could find a person more open to discussions than me, that is the truth!

>I probably should report ...
Report to NSA if you like, you have so distorted perception.

>... but Sanmayce, while odd, ...
Whether I am odd or even it doesn't matter, the thing that matters is to give (for FREE) to all text search users a high-performance tool.

Anyway, nobody asks me anything nor use Kazahana in heavy tasks so far, I am still waiting for exploring its capabilities in practice.

Last night I added a third feature in Kazahana, namely the exact searching, thus three types of command line searches are available:
- exact matching - superfast and most commonly used;
- wildcards matching - slow but very useful in narrowing the hits;
- fuzzy (Levenshtein) - irreplaceable for words/phrases matching within words/phrases.

And I couldn't resist not to compare Kazahana revision 1-+ versus grep 2.5.4:
// Test on my 'Bonboniera' laptop T7500 2200MHz, 2/2 cores/threads, 2x2GB dual channel DDR2 667MHz, Windows 7 64bit:
Code:
E:\_Kaze_Kazahana>grep\grep.exe -V
GNU grep 2.5.4

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

E:\_Kaze_Kazahana>Kazahana_r1-+_HEXADECAD-Threads_IntelV12
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-+, copyleft Kaze 2013-Feb-06.
Usage: Kazahana [AtMostLevenshteinDistance] string textualfile
Note1: There are three regimes: exact, wildcards and fuzzy searches. First two kick in when 2 parameters are given, fuzzy when 3.
Note2: What decides whether exact or wildcards? Of course presence of at least one wildcard. To see exact search see Example #4.
Note3: Exact search hits with 'Railgun_Quadruplet_7'.
Note4: Incoming string is automatically lowercased for exact and wildcards searches i.e. they both are case insensitive.
Note5: Incoming string could be up to 21168/126 chars for exact&wildcards/Levenshtein respectively.
Note6: Incoming textualfile could be bigger than 4GB.
Note7: Each line should end with [CR]LF, that is Windows or/and UNIX style.
Note8: The dump goes to Kazahana.txt file.
Note9: Seven wildcards are available:
       wildcard '*' any character(s) or empty,
       wildcard '@'/'#' any character {or empty}/{and not empty},
       wildcard '^'/'$' any ALPHA character {or empty}/{and not empty},
       wildcard '|'/'~' any NON-ALPHA character {or empty}/{and not empty}.
Example1: E:\>Kazahana 0 ramjet MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example2: E:\>Kazahana 3 psychedlicize MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example3: E:\>Kazahana "psyched^^^^^^ize^" MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example4: E:\>Kazahana "metal fatigue" enwiki-20121201-pages-articles.xml
Example5: E:\>Kazahana "out^^^^^^^^^^^^^ize*" MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
          E:\>type Kazahana.txt
          [out^^^^^^^^^^^^^ize*] outhyperbolize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outsize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outsized /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outstrategize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outtyrannize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/

E:\_Kaze_Kazahana>dir 4andabove_Gamera.tar.2.sorted
 Volume in drive E is SSD_Sanmayce
 Volume Serial Number is 9CF6-FEA3

 Directory of E:\_Kaze_Kazahana

02/07/2013  12:14 AM       889,537,624 4andabove_Gamera.tar.2.sorted
               2 File(s) 43,043,184,331 bytes
               0 Dir(s)  14,405,668,864 bytes free

E:\_Kaze_Kazahana>timer "Kazahana_r1-+_HEXADECAD-Threads_IntelV12.exe" ramjet 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-+, copyleft Kaze 2013-Feb-06.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK
|; 00,000,195,583 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 35,116,064/35,116,064/49
Kazahana: Performance: 189 KB/clock
Kazahana: Performance: 7,653 xgrams/clock
Kazahana: Done.

Kernel Time  =     0.967 =   19%
User Time    =     8.049 =  166%
Process Time =     9.016 =  186%
Global Time  =     4.844 =  100%

E:\_Kaze_Kazahana>timer grep\grep ramjet 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
0,000,083       bussard_ramjet
0,000,051       the_ramjet
0,000,048       the_ramjets
0,000,046       a_ramjet
0,000,031       a_scramjet
0,000,027       the_scramjet
0,000,026       bussard_ramjets
0,000,018       interstellar_ramjet
0,000,014       ramjet_engine
0,000,012       scramjet_powered
0,000,012       ramjet_is
0,000,011       scramjet_engines
0,000,011       scramjet_engine
0,000,011       ramjet_engines
0,000,010       ramjets_were
0,000,010       combustion_ramjet
0,000,009       ramjet_and
0,000,008       ramjet_controls
0,000,007       combustion_ramjets
0,000,006       water_ramjet
0,000,006       scramjet_technology
0,000,006       ramjets_on
0,000,006       ramjet_will
0,000,006       ramjet_speeds
0,000,006       ramjet_ship
0,000,006       ramjet_rocket
0,000,006       ramjet_in
0,000,006       mode_scramjet
0,000,005       scramjets_can
0,000,005       ramjet_to
0,000,005       ramjet_scramjet
0,000,005       ramjet_operation
0,000,005       of_scramjets
0,000,005       of_scramjet
0,000,005       of_ramjets
0,000,005       by_ramjets
0,000,005       and_ramjets
0,000,005       and_ramjet
0,000,004       scramjet_to
0,000,004       scramjet_s
0,000,004       scramjet_is
0,000,004       scramjet_intake
0,000,004       ramjet_was
0,000,004       ramjet_a
0,000,004       raking_ramjets
0,000,004       or_scramjet
0,000,004       expander_ramjets
0,000,004       ejector_ramjet
0,000,004       a_turboramjet

Kernel Time  =     0.483 =    9%
User Time    =     4.368 =   86%
Process Time =     4.851 =   95%
Global Time  =     5.062 =  100%

E:\_Kaze_Kazahana>

And the heaviest test known to me, English Wikipedia:
Code:
E:\_Kaze_Kazahana>timer grep\grep -i -c "metal fatigue" ..\enwiki-20121201-pages-articles.xml
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
465

Kernel Time  =    25.084 =   14%
User Time    =    80.527 =   46%
Process Time =   105.612 =   60%
Global Time  =   173.592 =  100%

E:\_Kaze_Kazahana>timer "Kazahana_r1-+_HEXADECAD-Threads_IntelV12.exe" "metal fatigue" ..\enwiki-20121201-pages-articles.xml
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-+, copyleft Kaze 2013-Feb-06.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK
-; 00,000,155,903 bytes/clock
Kazahana: Total/Checked/Dumped xgrams: 690,578,792/632,181,373/465
Kazahana: Performance: 152 KB/clock
Kazahana: Performance: 2,553 xgrams/clock
Kazahana: Done.

Kernel Time  =    40.170 =   14%
User Time    =   426.085 =  157%
Process Time =   466.255 =  172%
Global Time  =   270.668 =  100%

E:\_Kaze_Kazahana>timer grep\grep -i -c "metal fatigue" ..\enwiki-20121201-pages-articles.xml
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
465

Kernel Time  =    23.930 =   13%
User Time    =    80.356 =   46%
Process Time =   104.286 =   60%
Global Time  =   173.175 =  100%

E:\_Kaze_Kazahana>dir ..\enwiki-20121201-pages-articles.xml
 Volume in drive E is SSD_Sanmayce
 Volume Serial Number is 9CF6-FEA3

 Directory of E:\

01/13/2013  03:53 AM    42,153,646,707 enwiki-20121201-pages-articles.xml
               1 File(s) 42,153,646,707 bytes
               0 Dir(s)  15,219,331,072 bytes free

E:\_Kaze_Kazahana>type Kazahana.txt|more
[metal fatigue] The AK-47's accuracy has always been considered to be &quot;good enough.&quot;&lt;ref&gt;Kalashnikov AK47 By Gideon Burrows&lt;/ref&gt;&lt;ref name=&quot;defenseindustrydaily1&quot;&gt;{{cite web|url=http://www.defenseindustrydaily.com/the-usas-m4-carbine-controversy-03289/ |title=The USA's M4 Carbine Controversy |publisher=Defenseindustrydaily.com |date=21 November 2011 |accessdate=10 January 2012}}&lt;/ref&gt;&lt;ref name=&quot;alpharubicon2&quot;&gt;[http://www.alpharubicon.co
m/leo/akseries.htm Avtomat Kalashnikov]. Alpharubicon.com. Retrieved on 3 April 2012.&lt;/ref&gt; The milled AK-47s are capable of shooting 3ΓÇô5 inch groups at 100 yards, whereas the stamped AKM's are capable of shooting 4ΓÇô6 inch groups at 100 yards.&lt;ref name=&quot;alpharubicon2&quot;/&gt; &quot;There are advantages and disadvantages in both forged/milled receivers and stamped receivers. Milled/Forged Receivers are much more rigid, flexing less as the rifle is fired thus not hindering accu
racy as much as stamped receivers. Stamped receivers on the other hand are a bit more rugged since it has some give in it and have less chances of having metal fatigue under heavy usage.&quot;&lt;ref name=&quot;alpharubicon2&quot;/&gt; As a result, the newer stamped steel receiver AKM models are actually less accurate than their predecessors.&lt;ref name=&quot;alpharubicon2&quot;/&gt; /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] American Airlines was under pressure to enter the jet age so they orderd British Built De-haviland Comets. The orders were cancelled when the Comets were discovered to suffer serious metal fatigue. American Airlines introduced transcontinental jet service with [[Boeing 707]]s on January 25, 1959. With its 707s American shifted to nonstop coast-to-coast flights, although it maintained feeder connections to cities along its old route using smaller [[Convair 990]]s and [[Lockheed L-
188 Electra|Lockheed Electras]]. American invested $440&amp;nbsp;million in jet aircraft up to 1962, launched the first electronic booking system ([[Sabre (computer system)|Sabre]]) with [[IBM]] (the basis of today's [[Travelocity]]) and built an upgraded terminal at Idlewild (now [[John F. Kennedy International Airport|JFK]]) Airport in New York City which became the airline's largest base.&lt;ref name=&quot;timejets&quot;&gt;{{cite journal|url=http://www.time.com/time/magazine/article/0,9171,8
10685,00.html|title=Jets Across the U.S.|work=TIME|date=November 17, 1958}}&lt;/ref&gt;  Vignelli Associates designed the AA eagle logo in 1967. Vignelli attributes the introduction of his firm to American Airlines to Henry Dreyfuss, the legendary AA design consultant. The logo is still in use today. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] Typically bronze only oxidizes superficially; once a copper oxide (eventually becoming copper carbonate) layer is formed, the underlying metal is protected from further corrosion. However, if copper chlorides are formed, a corrosion-mode called &quot;bronze disease&quot; will eventually completely destroy it.&lt;ref&gt;[http://proteus.brown.edu/greekpast/4867 Bronze Disease, Archaeologies of the Greek Past]. Proteus.brown.edu. Retrieved on 2012-06-09.&lt;/ref&gt; Copper-based [[a
lloy]]s have lower [[melting point]]s than steel or iron, and are more readily produced from their constituent metals. They are generally about 10 percent heavier than steel, although alloys using [[aluminium]] or [[silicon]] may be slightly less dense. Bronzes are softer and weaker than steelΓÇöbronze [[spring (device)|springs]], for example, are less stiff (and so store less energy) for the same bulk. Bronze resists [[corrosion]] (especially seawater corrosion) and [[metal fatigue]] more than
steel and is a better conductor of heat and electricity than most steels. The cost of copper-base alloys is generally higher than that of steels but lower than that of [[nickel]]-base alloys. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] Accidents due to design errors included [[TWA Flight 800]], where a 747-100 that exploded in mid-air on July 17, 1996 due to sparking electricity wires inside the fuel tank, led the FAA to propose a rule requiring installation of an [[inerting system]] in the center fuel tank of most large aircraft that was adopted in July 2008, after years of research into solutions. It is expected that the new safety system will cost US$100,000 to $450,000 per aircraft and weigh approximately {
{convert|200|lb|kg}}.&lt;ref&gt;&quot;Airlines Ordered to Cut Fuel-Tank Explosion Risk.&quot; ''Wall Street Journal'', July 17, 2008, p. B5. Note: Cargo aircraft and smaller regional jets and commuter aircraft are not subject to this rule.&lt;/ref&gt; [[El Al Flight 1862]] crashed after the fuse pins of engine 3 broke off shortly after take-off due to metal fatigue. Instead of dropping away from the wing, engine 3 knocked off engine 4 as well as damaging the wing.&lt;ref name=ASN_Bijlmer&gt;[htt
p://aviation-safety.net/database/record.php?id=19921004-2]&lt;/ref&gt; /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] Examples are the use of numerical approximations to the [[Navier-Stokes equations]] to describe aerodynamic flow over an aircraft, or the use of [[metal fatigue|Miner's rule]] to calculate fatigue damage. Second, engineering research employs many semi-[[empirical methods]] that are foreign to pure scientific research, one example being the method of parameter variation{{citation needed|date=November 2011}}. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] Various traces of other metals change its properties significantly: the addition of small amounts of [[antimony]] or [[copper]] increases hardness and improves the corrosion reflection from [[sulfuric acid]] for lead.{{sfn|Polyanskiy|1986|p=18}} A few other metals also improve only hardness and fight [[metal fatigue]], such as [[cadmium]], [[tin]], or [[tellurium]]; metals like [[sodium]] or [[calcium]] also have this ability, but they weaken the chemical stability.{{sfn|Polyansk
iy|1986|p=18}} Finally, [[zinc]] and [[bismuth]] simply impair the corrosion resistance (0.1% bismuth content  is the industrial usage threshold).{{sfn|Polyanskiy|1986|p=18}} In return, lead impurities mostly worsen the quality of industrial materials, although there are exceptions: for example, small amounts of lead improve the ductility of steel.{{sfn|Polyanskiy|1986|p=18}} /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] In [[industrial engineering|production engineering]], metallurgy is concerned with the production of metallic components for use in consumer or [[engineering]] products. This involves the production of alloys, the shaping, the heat treatment and the surface treatment of the product. The task of the metallurgist is to achieve balance between material properties such as cost, [[weight]], [[tensile strength|strength]], [[toughness]], [[Hardness (materials science)|hardness]], [[corr
osion]], [[fatigue (material)|fatigue]] resistance, and performance in [[temperature]] extremes. To achieve this goal, the operating environment must be carefully considered. In a saltwater environment, ferrous metals and some aluminium alloys corrode quickly. Metals exposed to cold or [[cryogenic]] conditions may endure a ductile to brittle transition and lose their toughness, becoming more brittle and prone to cracking. Metals under continual cyclic loading can suffer from metal fatigue. Metal
s under constant [[stress (physics)|stress]] at elevated temperatures can [[creep (deformation)|creep]]. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] Other materials are often added to the iron/carbon mixture to produce steel with desired properties. [[Nickel]] and [[manganese]] in steel add to its tensile strength and make [[austenite]] form of the iron-carbon solution more chemically stable, [[chromium]] increases hardness and melting temperature, and [[vanadium]] also increases hardness while reducing the effects of [[metal fatigue]].&lt;ref name=materialsengineer&gt;{{cite web|title=Alloying of Steels|publisher=Metallurgic
al Consultants|date=2006-06-28|url=http://materialsengineer.com/E-Alloying-Steels.htm|accessdate=2007-02-28}}&lt;/ref&gt; /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] For large scale complex systems, hundreds if not thousands of maintenance actions can result from the failure analysis. These maintenance actions are based on conditions (e.g., gauge reading or leaky valve), hard conditions (e.g., a component is known to fail after 100 hrs of operation with 95% certainty), or require inspection to determine the maintenance action (e.g., metal fatigue). The RCM concept then analyzes each individual maintenance item for its risk contribution to saf
ety, mission, operational readiness, or cost to repair if a failure does occur. Then the sum total of all the maintenance actions are bundled into maintenance intervals so that maintenance is not occurring around the clock, but rather, at regular intervals. This bundling process introduces further complexity, as it might stretch some maintenance cycles, thereby increasing risk, but reduce others, thereby potentially reducing risk, with the end result being a comprehensive maintenance schedule, p
urpose built to reduce operational risk and ensure acceptable levels of operational readiness and availability. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] Stranded wire is more flexible than solid wire of the same total cross-sectional area. Solid wire is cheaper to manufacture than stranded wire and is used where there is little need for flexibility in the wire. Solid wire also provides mechanical ruggedness; and, because it has relatively less surface area which is exposed to attack by corrosives, protection against the environment. Stranded wire is used when higher resistance to [[metal fatigue]] is required. Such situations inc
lude /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] * [[January 10]] -  [[British Overseas Airways Corporation]] (BOAC) [[BOAC Flight 781|Flight 781]], a [[de Havilland Comet]] jet plane, disintegrates in mid-air due to [[metal fatigue]] and crashes in the [[Mediterranean Sea]] near [[Elba]].  All 35 people on board are killed. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] The first [[jet airliner]]s came in the immediate post war era. [[Turbojet]] engines were trialled on [[Reciprocating engine|piston engine]] airframes, such as the [[Avro Lancastrian]] and the [[Vickers VC.1 Viking]], the latter becoming the first jet engine passenger aircraft in April 1948. The first purpose built jet airliners were the [[de Havilland Comet]] (UK) and the [[Avro Jetliner]] (Canada). The former entered production and service while the latter did not. The Comet wa
s unfortunate in that metal fatigue caused by the square shape of the windows in early versions could cause crashes. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] What may end an airliner's working life is a lack of spare parts, as the original manufacturer and third manufacturers may no longer provide or support them. [[Corrosion]] and [[metal fatigue]] are other issues that become more expensive to deal with as time goes on. Eventually, these factors and advances in aircraft technology lead to older airliners becoming too expensive or inefficient to operate. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] On the road, drilled or slotted discs still have a positive effect in wet conditions because the holes or slots prevent a film of water building up between the disc and the pads. Cross-drilled discs may eventually crack at the holes due to metal fatigue. Cross-drilled brakes that are manufactured poorly or subjected to high stresses will crack much sooner and more severely. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] *:The planet lights must be powered by wires, which have to bend about as the planets rotate, and repeatedly bending copper wire tends to cause wire breakage through [[metal fatigue]]. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] A major problem, afflicting early production Typhoons in particular, was a series of structural failures leading to loss of the entire tail sections of some aircraft, mainly during high-speed dives. Eventually a combination of factors was identified, including harmonic vibration, which could quickly lead to metal fatigue, and a weak transport joint just forward of the horizontal tail unit. The loss of the tailplane of R7692 (having only 11 hours of flight recorded) on 11 August 1
942, in the hands of an experienced test pilot (Seth-Smith), caused a major reassessment which concluded that the failure of the bracket holding the elevator mass balance [[Bellcrank|bell crank]] linkage had allowed unrestrained flutter which led to structural failure of the fuselage at the transport joint.   /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] A few games have experimented with diversifying map design, which continues to be largely two-dimensional even in 3D engines. ''[[Earth 2150]]'' allowed units to tunnel underground, effectively creating a dual-layer map; three-layer (orbit-surface-underground) maps were introduced in ''[[Metal Fatigue (video game)|Metal Fatigue]]''. In addition, units could even be transported to entirely separate maps, with each map having its own window in the user interface. ''[[Three Kingdoms
: Fate of the Dragon]]'' (2001) offered a simpler model: the main map contains locations that expand into their own maps. In these examples, however, gameplay was essentially identical regardless of the map layer in question. ''[[Dungeons &amp; Dragons: Dragonshard|Dragonshard]]'' (2005) emphasized its dual-layer maps by placing one of the game's two main resources in each map, making exploration and control of both maps fundamentally valuable. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] * ''[[No Highway]]'' (1948): An eccentric &quot;boffin&quot; at [[Royal Aircraft Establishment|RAE Farnborough]] predicts [[metal fatigue]] in a new airliner. Interestingly, the [[de Havilland Comet|Comet]] failed for just this reason several years later, in 1954. Set in Britain and Canada. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] The Lorentz forces increase with ''B&lt;sup&gt;2&lt;/sup&gt;''.   In large electromagnets the windings must be firmly clamped in place, to prevent motion on power-up and power-down from causing [[metal fatigue]] in the windings.  In the [[Bitter electromagnet|Bitter]] design, below, used in very high field research magnets, the windings are constructed as flat disks to resist the radial forces, and clamped in an axial direction to resist the axial ones. /..\enwiki-20121201-pages-
articles.xml/
[metal fatigue] On Monday 30 May 2011 a train on the line suffered a [[derailment]] at Brampton, during which wheels from one of the coaches were reported to have come up through the floor of the vehicle.&lt;ref&gt;[http://www.edp24.co.uk/news/train_derails_on_bure_valley_railway_1_907256 Train derails on Bure Valley Railway]&lt;/ref&gt;  The [[Rail Accident Investigation Branch]] were called in to conduct a preliminary examination into the incident,&lt;ref&gt;[http://www.edp24.co.uk/news/update
_bure_valley_railway_crash_photo_shows_narrow_escape_1_908565 UPDATE: Bure Valley Railway crash photo shows narrow escape]&lt;/ref&gt;&lt;ref&gt;[http://www.northnorfolknews.co.uk/news/update_investigators_due_at_scene_of_north_norfolk_rail_crash_1_907490 UPDATE: Investigators due at scene of north Norfolk rail crash]&lt;/ref&gt; and found it to have been caused by the failure due to metal fatigue of an axle journal that had been welded several years previously (when the railway was under differ
ent management). Following this accident all wheels of this design were identified by the railway and scrapped, being replaced by new wheelsets. &lt;ref&gt;[http://www.raib.gov.uk/publications/bulletins/bulletins_2011/bulletin_04_2011.cfm Rail Accident Investigation Branch bulletin 04/2011]&lt;/ref&gt; /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] The [[ICE 1]] trains were equipped with single-cast wheels, known as [[monobloc]] wheels. Once in service it soon became apparent that this design could, as a result of [[metal fatigue]] and out-of-round conditions, result in [[resonance]] and vibration at cruising speed. Passengers noticed this particularly in the restaurant car, where there were reports of loud vibrations in the dinnerware and of glasses &quot;creeping&quot; across tables. /..\enwiki-20121201-pages-articles.xml
/
[metal fatigue] About the time of the disaster, the engineers at Deutsche Bahn's maintenance facility in Munich used only standard flashlights for visual inspection of the tires, instead of metal fatigue detection equipment.&lt;ref&gt;090120 NGC Seconds from the catastrophe&lt;/ref&gt; Previously, advanced testing machines had been used; however, as the equipment generated many [[false positive]] error messages, it was considered unreliable and its use was discontinued. During the week prior to
the Eschede disaster, three separate automated checks indicated that a wheel was defective. Investigators discovered, from a maintenance report generated by the train's on-board computer, that two months prior to the Eschede disaster, conductors and other train staff filed eight separate complaints about the noises and vibrations generated from the [[bogie]] with the defective wheel; the company did not replace the wheel. Deutsche Bahn said that its inspections were proper at the time and that t
he engineers could not have predicted the wheel fracture.&lt;ref name=&quot;Seconds&quot;/&gt; /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] * [[Fatigue (material)|Metal fatigue]] /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] After World War II, [[commercial aviation]] grew rapidly, using mostly ex-military aircraft to transport people and cargo. This growth was accelerated by the glut of heavy and super-heavy bomber airframes like the B-29 and [[Avro Lancaster|Lancaster]] that could be converted into commercial aircraft.  The [[DC-3]] also made for easier and longer commercial flights. The first commercial jet airliner to fly was the British [[de Havilland Comet]]. By 1952, the British state airline
[[British Overseas Airways Corporation|BOAC]] had introduced the Comet into scheduled service. While a technical achievement, the plane suffered a series of highly public failures, as the shape of the windows led to cracks due to metal fatigue. The fatigue was caused by cycles of pressurization and depressurization of the cabin, and eventually led to catastrophic failure of the plane's fuselage. By the time the problems were overcome, other jet airliner designs had already taken to the skies. /.
.\enwiki-20121201-pages-articles.xml/
[metal fatigue] A year after entering commercial service, Comet [[airframe]]s began suffering catastrophic [[fatigue (material)|metal fatigue]], with three of them tearing apart during mid-flight in well-publicised accidents. The Comet was withdrawn from service and extensively tested to discover the cause; the first incident had been incorrectly blamed on adverse weather. Design flaws including window shape and installation methodology were ultimately identified; consequently the Comet was exte
nsively redesigned with oval windows, structural reinforcement and other changes. Rival manufacturers meanwhile heeded the lessons learned from the Comet while developing their own aircraft. /..\enwiki-20121201-pages-articles.xml/
[metal fatigue] Because the Comet represented a new category of passenger aircraft, more rigorous testing was a development priority.&lt;ref name=d17/&gt; From 1947 to 1948, de Havilland conducted an extensive research and development phase, including the use of several stress test rigs at Hatfield for small components and large assemblies alike. Sections of pressurised fuselage were subjected to high-altitude flight conditions via a large [[Pressure vessel|decompression chamber]] on-site,{{#tag
:ref|The fuselage sections and nose simulated a flight up to 70,000 ft at a temperature of -70˚C, with 2,000 lb pressure applications at 9 lb pressure/square in.&lt;ref name=&quot;Birtles p. 125&quot;&gt;Birtles 1970, p. 125.&lt;/ref&gt;|group=N}} and tested to failure.&lt;ref name=d18/&gt; However, tracing fuselage failure points proved difficult with this method,&lt;ref name=d18/&gt; and de Havilland ultimately switched to conducting structural tests with a water tank that could be safely con
figured to increase pressures gradually.&lt;ref name=&quot;Birtles p. 125&quot;/&gt;&lt;ref name=d18/&gt;&lt;ref&gt;[http://www.flightglobal.com/pdfarchive/view/1955/1955%20-%201835.html &quot;Tank Test Mk 2.&quot;] ''Flight,'' 1955, pp. 958ΓÇô959. Retrieved 26 April 2012.&lt;/ref&gt; The entire forward fuselage section was tested for metal fatigue by repeatedly pressurising to {{convert|2.75|psi|kPa}} overpressure and depressurising through more than 16,000 cycles, equivalent to about 40,000&am
p;nbsp;hours of airline service.&lt;ref name=&quot;Davies and Birtles&quot;&gt;Davies and Birtles 1999, p. 30.&lt;/ref&gt;  The windows were also tested under a pressure of {{convert|12|psi|kPa|abbr=on}}, {{convert|4.75|psi|kPa|abbr=on}} above expected pressures at the normal service ceiling of {{convert|36000|ft|m|abbr=on}}.&lt;ref name=&quot;Davies and Birtles&quot;/&gt; One window frame survived {{convert|100|psi|kPa|abbr=on}}, about 1,250 percent over the maximum pressure it was expected to
encounter in service.&lt;ref name=&quot;Davies and Birtles&quot;/&gt; /..\enwiki-20121201-pages-articles.xml/
...

Above example is reproduceable with current revision, here.
post #16 of 32
Thread Starter 
While searching the Japanese net for nuances of Kazahana, one very cuore doll named 風花 Kazahana popped up:


The Japanese doll masters amaze me endlessly, this beauty is made by 'Rose Moon' figurine workshop.
Thus far I found that "Kazahana" means literally "snowflake", while ISO8 (the train enthusiast from Japan) gives another meaning: the phenomenon of snowflaking in a sunny day. As I see it is used to denote purity, or as SOED defines it:
The state of being morally or spiritually pure; freedom from moral or ritual pollution; chastity; an instance of this. ME.
post #17 of 32
Thread Starter 
Embrace yourself for the hit of 16 railguns in a second.

Since the exact matching was not supposed to enter among fuzzy guns it was invoked just as other two hitters (wildcards&fuzzy) for EACH LINE - which is very slow and for high speeds is highly not recommended, I did it that way in order to see how it would behave, just out of curiosity.
But my laptop's 2 threads can do it better, so revision 1-++ showcases the full might of my top-gun text hitter: Railgun_Quadruplet_7Gulliver - the fastest text search function known to me, and on top of that multi-threaded.
I wrote it as mix of Boyer-Moore-Horspool-Sunday order 2 and some other nifty micro etudes.
This resulted in appearance of first in the INTERNET text searcher utilizing the I/O read bandwidth at its fullest!

To be more precise, the upper theoretical speed limit is 16 threads * 3 GB/s = 48 GB/s, those 3GB/s are nominal for 'Railgun_Quadruplet_7Gulliver' on my laptop. Of course, in reality, my estimation is 8-20GB/s.
Because of this monstrous bandwidth Kazahana proves to be typhoon class tool.

For those who don't believe I suggest to give it a try on their superfast I/O systems, nowadays mainstream drives offer the "miserable" 512MB/s, it would be very interesting those with 1++GB/s to share their results.
Still a disbeliever! Enter the kitchen and see how I cooked it (with tons of examples&tests) : Fastest strstr-like Function in C!?



For example my humble laptop is equipped with Samsung 470 64GB which gives average linear read 241MB/s at 1MB blocks, obtained with 'Everest'.

From Wikipedia torture below you can see how full is my fullest: (241-232)/232*100% = 3.8% deviation:
Code:
E:\_Kaze_Kazahana>timer "Kazahana_r1-++_HEXADECAD-Threads_IntelV12.exe" "metal fatigue" ..\enwiki-20121201-pages-articles.xml
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++, copyleft Kaze 2013-Feb-09.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK
-; 00,000,244,543 bytes/clock
Kazahana: Dumped xgrams: 329
Kazahana: Performance: 238 KB/clock
Kazahana: Done.

Kernel Time  =    58.391 =   33%
User Time    =   148.902 =   86%
Process Time =   207.294 =  119%
Global Time  =   172.780 =  100%

E:\_Kaze_Kazahana>timer grep\grep -c "metal fatigue" ..\enwiki-20121201-pages-articles.xml
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
329

Kernel Time  =    24.148 =   13%
User Time    =    78.718 =   44%
Process Time =   102.867 =   58%
Global Time  =   175.565 =  100%

E:\_Kaze_Kazahana>timer "Kazahana_r1-++_HEXADECAD-Threads_IntelV12.exe" "metal fatigue" ..\enwiki-20121201-pages-articles.xml
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++, copyleft Kaze 2013-Feb-09.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK
-; 00,000,244,102 bytes/clock
Kazahana: Dumped xgrams: 329
Kazahana: Performance: 238 KB/clock
Kazahana: Done.

Kernel Time  =    59.108 =   34%
User Time    =   143.224 =   82%
Process Time =   202.333 =  116%
Global Time  =   173.508 =  100%

E:\_Kaze_Kazahana>"Kazahana_r1-++_HEXADECAD-Threads_IntelV12.exe"
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++, copyleft Kaze 2013-Feb-09.
Usage: Kazahana [AtMostLevenshteinDistance] string textualfile
Note1: There are three regimes: exact, wildcards and fuzzy searches. First two kick in when 2 parameters are given, fuzzy when 3.
Note2: What decides whether exact or wildcards? Of course presence of at least one wildcard. To see exact search see Example #4.
Note3: Exact search hits with 'Railgun_Quadruplet_7Gulliver'.
Note4: Incoming string is automatically lowercased for wildcards searches i.e. they are case insensitive.
Note5: Incoming string could be up to 21168/126 chars for exact&wildcards/Levenshtein respectively.
Note6: Incoming textualfile could be bigger than 4GB.
Note7: Each line should end with [CR]LF, that is Windows or/and UNIX style.
Note8: The dump goes to Kazahana.txt file.
Note9: Seven wildcards are available:
       wildcard '*' any character(s) or empty,
       wildcard '@'/'#' any character {or empty}/{and not empty},
       wildcard '^'/'$' any ALPHA character {or empty}/{and not empty},
       wildcard '|'/'~' any NON-ALPHA character {or empty}/{and not empty}.
Example1: E:\>Kazahana 0 ramjet MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example2: E:\>Kazahana 3 psychedlicize MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example3: E:\>Kazahana "psyched^^^^^^ize^" MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example4: E:\>Kazahana "metal fatigue" enwiki-20121201-pages-articles.xml
Example5: E:\>Kazahana "out^^^^^^^^^^^^^ize*" MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
          E:\>type Kazahana.txt
          [out^^^^^^^^^^^^^ize*] outhyperbolize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outsize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outsized /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outstrategize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outtyrannize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/

E:\_Kaze_Kazahana>

And to behold the real hit of only 2 railguns:
Code:
E:\_Kaze_Kazahana>timer Kazahana_r1-+_MONAD-Thread_IntelV12 ramjet 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++, copyleft Kaze 2013-Feb-09.
Enforcing MONAD i.e. single-thread ...
Allocating Master-Buffer 7MB ... OK
|; 00,000,639,411 bytes/clock
Kazahana: Dumped xgrams: 49
Kazahana: Performance: 625 KB/clock
Kazahana: Done.

Kernel Time  =     0.795 =   33%
User Time    =     1.513 =   63%
Process Time =     2.308 =   96%
Global Time  =     2.381 =  100%

E:\_Kaze_Kazahana>timer Kazahana_r1-+_HEXADECAD-Threads_IntelV12 ramjet 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++, copyleft Kaze 2013-Feb-09.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK
|; 00,000,729,181 bytes/clock
Kazahana: Dumped xgrams: 49
Kazahana: Performance: 703 KB/clock
Kazahana: Done.

Kernel Time  =     0.904 =   51%
User Time    =     1.778 =  100%
Process Time =     2.683 =  151%
Global Time  =     1.771 =  100%

E:\_Kaze_Kazahana>timer grep\grep ramjet 4andabove_Gamera.tar.2.sorted
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31
0,000,083       bussard_ramjet
0,000,051       the_ramjet
0,000,048       the_ramjets
0,000,046       a_ramjet
0,000,031       a_scramjet
0,000,027       the_scramjet
0,000,026       bussard_ramjets
0,000,018       interstellar_ramjet
0,000,014       ramjet_engine
0,000,012       scramjet_powered
0,000,012       ramjet_is
0,000,011       scramjet_engines
0,000,011       scramjet_engine
0,000,011       ramjet_engines
0,000,010       ramjets_were
0,000,010       combustion_ramjet
0,000,009       ramjet_and
0,000,008       ramjet_controls
0,000,007       combustion_ramjets
0,000,006       water_ramjet
0,000,006       scramjet_technology
0,000,006       ramjets_on
0,000,006       ramjet_will
0,000,006       ramjet_speeds
0,000,006       ramjet_ship
0,000,006       ramjet_rocket
0,000,006       ramjet_in
0,000,006       mode_scramjet
0,000,005       scramjets_can
0,000,005       ramjet_to
0,000,005       ramjet_scramjet
0,000,005       ramjet_operation
0,000,005       of_scramjets
0,000,005       of_scramjet
0,000,005       of_ramjets
0,000,005       by_ramjets
0,000,005       and_ramjets
0,000,005       and_ramjet
0,000,004       scramjet_to
0,000,004       scramjet_s
0,000,004       scramjet_is
0,000,004       scramjet_intake
0,000,004       ramjet_was
0,000,004       ramjet_a
0,000,004       raking_ramjets
0,000,004       or_scramjet
0,000,004       expander_ramjets
0,000,004       ejector_ramjet
0,000,004       a_turboramjet

Kernel Time  =     0.546 =   10%
User Time    =     4.258 =   82%
Process Time =     4.804 =   93%
Global Time  =     5.138 =  100%

E:\_Kaze_Kazahana>

First executable is single-threaded, it offers 610MB/s, while the second offers 686MB/s with Global Time = 1.771 seconds while 'grep' with its 5.138 seconds is 200% slower!

Update: 2013-Feb-11, I am sorry, I overlooked the condition of exceeding the limit of 21168 chars for dumping the results in exact matching part, now fixed and ready to hit without this bug:
Free as always: Kazahana_r1-++.zip

And one fantastic song (two cuore variants) that inspired me immensely sung by GHELICKHANI and Rewound by Voidd.

Lyrics
Thievery Corporation - 'Une simple histoire' / 'A Simple Story'
Songwriters: ROB GARZA, ERIC HILTON, LOU LOU GHELICKHANI
http://www.youtube.com/watch?v=0d8_OHoVrQ0

Couleur de vie, couleur de joie / Colour of life, colour of joy
Où sont tes larmes, tous tes mots / Where are your tears, all your words
Et puisque t'en as marre de voir / And since you're so tired of seeing
Tous les gens souffrir / All the people suffer
Cette notion trop simple et naïve / This simple, naive idea
Parvenir à vaincre une loi / Will manage to overcome a law
Loi invisible, si injuste / An invisible law, so unjust
Si injuste, si sombre / So unjust, so dark

Vis ta vie, elle est si belle / Live your life, it is so beautiful
Vis ta vie, c'est la tienne / Live your life, it is yours
Vis ta vie, sans mensonge / Live your life without lies
Vis ta vie, comme tu veux / Live your life however you want to

Couleur de vie, couleur de joie / Colour of life, colour of joy
Où est ce feu philosophique / Where is that philosophical spark
Ton esprit condensé, pulverisé / Your spirit dragged down, pulverized,
Subtilé par toutes tes larmes / Subjugated by your tears

Vis ta vie, elle est si belle / Live your life, it is so beautiful
Vis ta vie, c'est la tienne / Live your life, it is yours
Vis ta vie, sans mensonge / Live your life without lies
Vis ta vie, comme tu veux / Live your life however you want to


Enjoy!
Edited by Sanmayce - 2/11/13 at 8:51am
post #18 of 32
Very interesting, that's exactly what I'm looking for. Thanks thumb.gif!
Good name, pictures and ideas! Don't take the idiot opinions seriously.

I'm curious how it performs in Linux and will be happy to speed up my tasks related to the searching.
Do you have some info about that or a compiled version?

I'm Linux sysadmin and for me any tool that's offering better performance and speed deserve a special attention.
post #19 of 32
Quote:
Originally Posted by duhai View Post

Very interesting, that's exactly what I'm looking for. Thanks thumb.gif!
Good name, pictures and ideas! Don't take the idiot opinions seriously.

I'm curious how it performs in Linux and will be happy to speed up my tasks related to the searching.
Do you have some info about that or a compiled version?

I'm Linux sysadmin and for me any tool that's offering better performance and speed deserve a special attention.

I'd watch who you're calling an idiot there.

I'm a Linux and UNIX sys admin too. I've never hit a bottleneck with grep. Not even when searching through entire file systems or hundreds of thousands of lines of text.

So maybe you're just doing it wrong tongue.gif
Edited by Plan9 - 2/11/13 at 7:19am
post #20 of 32
Thread Starter 
How embarrassing, I overlooked the condition of exceeding the limit of 21168 chars for dumping the results in exact matching part, now fixed and ready to hit without this bug, the .ZIP has been updated and now everything is OK.

>I'm curious how it performs in Linux and will be happy to speed up my tasks related to the searching.
Eventually the source shall be put in public domain, my intention (as always) is other people to improve/adjust it, two things stop me for the moment:
- I want to beautify some of fragments especially the recursion - it should be exterminated;
- For a reason (of mystical nature, he-he) I am too emotional and I get easily offended when a source etude of mine is shared and the reactions are as if I am a criminal who has no right to touch the subject, only pretensions&disrespect, I mean no simple 'thanks/gracias/merci/mashallah', by the way the latter is a very good one it literally means 'Whatever Allah (God) wills', (ماشاءالله) It is often used in occasions where there is surprise in someones' good deeds or achievements. For example people say 'mashallah' when someone does very well in their exams.

>Do you have some info about that or a compiled version?
I don't have any info yet, PTHREADS (I am too lazy to read how they work) are native for *nix while I use OPEN MP (even though easy to use, a few basic things I do not understand), I heard (sadly I am not in Linux, many basic things there are also beyond my grasp) OPEN MP support is available already in gcc. As soon as the elf is ready I will post here.

As for 'grep', there are crafty programmers who juggle with multi-threaded I/O they should make it pass in the next century.
And one more thing: don't take me too seriously, because, I know what programming is and for that very reason I am afraid to call myself a programmer, that is, take my tools and enjoy them as a gift.
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Application Programming
Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › Application Programming › FAST 'on the fly' fuzzy string matching console tool written in C