Overclock.net banner
21 - 32 of 32 Posts

·
Registered
Joined
·
26 Posts
@Plan9
offtopic.gif
Your post #19 is completely off topic, do you realize that?


Quote:
I'd watch who you're calling an idiot there.
If you are thinking that you are an idiot it's your right to think whatever you want
thumb.gif
.

Quote:
I'm a Linux and UNIX sys admin too.
I'm completely sure that you are the SysAdmin, who is dedicated to its work with an analytical thinking and positive attitude to the users. That's why I'm delighted when such experienced Linux & Unix SysAdmin with more than 3000 posts for an year is talking to me
smile.gif
.

Quote:
I've never hit a bottleneck with grep. Not even when searching through entire file systems or hundreds of thousands of lines of text.
Really? I never knew that! You are my Star with your wisdom & knowledge
tongue.gif
!

Quote:
So maybe you're just doing it wrong
Of course I'm wrong
smile.gif
. Thanks for your time! It was pleasure for me to be your user
wink.gif
.
 

·
Premium Member
Joined
·
8,041 Posts
Quote:
Originally Posted by duhai View Post

@Plan9
offtopic.gif
Your post #19 is completely off topic, do you realize that?
Then so was your comment that i was replying to. But then 99% of this thread has been off topic anyway - even without my contributions.
Quote:
Originally Posted by duhai View Post

If you are thinking that you are an idiot it's your right to think whatever you want
thumb.gif
.
troll
tongue.gif

Quote:
Originally Posted by duhai View Post

I'm completely sure that you are the SysAdmin, who is dedicated to its work with an analytical thinking and positive attitude to the users. That's why I'm delighted when such experienced Linux & Unix SysAdmin with more than 3000 posts for an year is talking to me
smile.gif
.
I don't really see what a post count has to do with anything.

Even so, I hadn't realised I spent nearly that much time on here. I really need to spend less time on forums
rolleyes.gif
 

·
Registered
Joined
·
26 Posts
@Sanmayce
Thanks for the reply.

Quote:
I want to beautify some of fragments especially the recursion - it should be exterminated;
I agree completely. The recursion must be left into the books
biggrin.gif
.

Quote:
For a reason (of mystical nature, he-he) I am too emotional and I get easily offended when a source etude of mine is shared and the reactions are as if I am a criminal who has no right to touch the subject, only pretensions&disrespect,..
I saw that you are very sensitive & emotional and it will be pity to change your intention for sharing because of the existence of non-creative or a toxic person. I hope that is not the case
smile.gif
.
It's awful to be seen an empty pour soul that doesn't care about the different opinions except its own limited vision and without any sense of shame or idea for something better. I'm surprised how persistent the stupidity can be.

Quote:
I don't have any info yet, PTHREADS (I am too lazy to read how they work) are native for *nix while I use OPEN MP (even though easy to use, a few basic things I do not understand), I heard (sadly I am not in Linux, many basic things there are also beyond my grasp) OPEN MP support is available already in gcc. As soon as the elf is ready I will post here.
Most of the best servers are written in C including the Linux kernel and they are open source
wink.gif
. I'm telling you all of this because I've spend some time with Kazahana and there are things to be fixed. I did two of them and the compilation with gcc 4.7.2 & OpenMP was successful
thumb.gif
. I prefer the Linux pthreads in front of the Intel's OpenMP. I'm afraid that I'll change a part of the source code to make Kazahana portable and adjusted according to my needs
biggrin.gif
. BTW, it works perfectly.

Quote:
As for 'grep', there are crafty programmers who juggle with multi-threaded I/O they should make it pass in the next century.
You are right, the grep is not Kazahana and for those who don't know Kazahana it's a mistake to make direct relation between them
smile.gif
.

Quote:
And one more thing: don't take me too seriously, because, I know what programming is and for that very reason I am afraid to call myself a programmer, that is, take my tools and enjoy them as a gift.
It was a nice joke, ha-ha-ha
tongue.gif
and thanks for the gift
rolleyes.gif
!
 

·
Premium Member
Joined
·
8,041 Posts
If I've come across toxic its because the op has spent more time posting poems, manga and train pictures than differentiating this from existing tools.

I'm all for code sharing and free tools. But I also see reinventing existing tools as a bad thing unless they offer significant advantages as the new tools aren't going to be widely available (which is a complete bind if you're an administrator.

Thus far all I've had is condescending non-answers and extracts from Japanese culture.

So if you want to kick off about the toxicity of people in this thread then perhaps you should be looking at how badly this thread failed to answer my original questions.

And for the record, grep can be multithreaded via xargs. And I have written pthreaded applications myself, so all the comments I've been making are from a peer trying to grasp the point of yet another reinvention of the grep tool (and there are hundreds out there)

[edit]

I should point out that I've been here myself. I'd written a replacement for a number of Windows CLI tools. It did grepping / fuzzy matching (far less sophisticated than this though, but I much prefer precise matching personally), could pipe output between STDOUT / STDERR and parameters, and a boat load of other POSIX-like stuff that I missed on Windows (most of which I've long since forgotten). I released the app and source and very few people were interested in it. But that didn't bother me because it was a personal project. However I then started to administrate other servers and found this utility useless without copying it onto every box I wanted to work with. And it was the same case with desktops too. Then when I switched jobs, I just never bothered to install it because I basically had to learn to use the existing CLI tools better so my tools became redundant.

And it's exactly the same case with all the Linux / UNIX configs and aliases. It was such a bind copying them onto each box that I just stopped bothering.

So if I come across as negative, it's because I've been down this road many many times myself. So if this is just a personal project, then that's great. But if you're trying to push your project out there then the OP needs to be clearer about what this tool actually does that users cannot already do - fuzzy matching and multi-threading alone isn't enough to make the effort worthwhile.

I know it can be harsh criticism to swallow, but like I said, if this is purely a personal project then who cares?
smile.gif
But this does feel a bit like a sales/promotional thread. So if you're wanting to write applications that other people are really going to use and thus applications that are going to get your name known, then there's better concepts to focus your energy on.
smile.gif


So I resent being called "toxic" and "idiot" because I happen to offer some concerns. Sometimes you have to take the bad feedback with the good in order to grow as a developer and as a person
rolleyes.gif
 

·
Registered
Joined
·
26 Posts
@Sanmayce
Hi Sanmayce,

The Kazahana is not anymore just the fastest train it is the fastest space ship equipped with 16/22/30/60/400 threads
thumb.gif
. The speed boost with 60 threads was almost 57 times faster than currently used application in my work. It's amazing! In Saturday I will make the test with 400 threads and I hope until then to put a video clip with the beautiful space ship flying across the HPC space
biggrin.gif
.

Sanmayce, thank you!
Banzai !!!
 

·
Premium Member
Joined
·
8,041 Posts
Quote:
Originally Posted by duhai View Post

The speed boost with 60 threads was almost 57 times faster than currently used application in my work. It's amazing! In Saturday I will make the test with 400 threads
400 threads?
What sort of hardware are you running that on?
 

·
Integer Benchmarker
Joined
·
437 Posts
Discussion Starter · #27 ·
@Plan9
Man, we are too different to get along.

>... if this is purely a personal project then who cares?
Who cares who cares!
Persons who need a free search tool will have the chance to try this one, hopefully without your 'censorship'.

>But this does feel a bit like a sales/promotional thread.
Again you knocked me down, please stop throwing slanders at me, my only fault is that 6-7 years ago when I was choosing my domain I foolishly chose to be .com which I regret ever since, consequently I learned that it stands for commercial, but in my defence .org and .net appeared to me too pompous and not good for a personal site, anyway I don't sell anything, literally and figuratively.
You have many things to unlearn, that is only my personal opinion which happens to be overlapping with the truth in huge number of cases.
I see where the problem is, you have no faith in people, your presumption is 'guilty', to be open is a noble feature, you see ghosts emanating from your lack of confidence, maybe life has not been good to you, but as long as we are alive we have to learn how to play 'life'.

And for more manga (this word you don't know), being thankful and what not you can read one of my posts at thefreedictionary, I hope you will unlearn something.

@Duhai
Hi my friend, thank you for all your jokes, lively spirit and gratefulness, that's what I cherish most.
Eternal damnation for me if I don't share Kazahana with you, just check your Overclock.net's Personal Messages Box, cheers!

At 2:21 of 'Flower':
"... D-I-I-I-STANT CHILD MY FLOWER ... you amaze me." - one of Kylie's immortal "childs".

Quickly, without second-guessing myself, I am making these verses the motto of Kazahana.
I couldn't miss that one of my oldest callnames shares a common glyph (WIND) with Kazahana, in addition it turns out that an anime character exist with the same name translated as 'Windbloom', immediately I saw the connection:
Kazahana - wind/aerial flower

While searching the WEB for a snowflower I found ... the blog of ... Snow White:
http://divinetheatre.blogspot.com/2011/12/blessings-of-season.html

Below, an actual snowflake (taken from Snow White's blog) under a microscope! Breathtaking and unique!


Still, didn't simulate the wildcard matching thus discarding the nasty recursionsES, however I added two more wildcards:
- wildcard '.' any ALPHA character(s) or empty
- wildcard '`' any NON-ALPHA character(s) or empty

How to use them is shown below:

Code:

Code:
E:\Kazahana_r1-++fix+>"Kazahana_r1-++fix+_HEXADECAD-Threads_IntelV12.exe"
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++fix+, copyleft Kaze 2013-Feb-13.
Usage: Kazahana [AtMostLevenshteinDistance] string textualfile
Note1: There are three regimes: exact, wildcards and fuzzy searches. First two kick in when 2 parameters are given, fuzzy when 3.
Note2: What decides whether exact or wildcards? Of course presence of at least one wildcard. To see exact search see Example #4.
Note3: Exact search hits with 'Railgun_Quadruplet_7Gulliver'.
Note4: Incoming string is automatically lowercased for wildcards searches i.e. they are case insensitive.
Note5: Incoming string could be up to 21168/126 chars for exact&wildcards/Levenshtein respectively.
Note6: Incoming textualfile could be bigger than 4GB.
Note7: Each line should end with [CR]LF, that is Windows or/and UNIX style.
Note8: The dump goes to Kazahana.txt file.
Note9: Seven+two wildcards are available:
       wildcard '*' any character(s) or empty,
       wildcard '.' any ALPHA character(s) or empty,
       wildcard '`' any NON-ALPHA character(s) or empty,
       wildcard '@'/'#' any character {or empty}/{and not empty},
       wildcard '^'/'$' any ALPHA character {or empty}/{and not empty},
       wildcard '|'/'~' any NON-ALPHA character {or empty}/{and not empty}.
Example1: E:\>Kazahana 0 ramjet MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example2: E:\>Kazahana 3 psychedlicize MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example3: E:\>Kazahana "psyched^^^^^^ize^" MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Example4: E:\>Kazahana "metal fatigue" enwiki-20121201-pages-articles.xml
Example5: E:\>Kazahana "out^^^^^^^^^^^^^ize*" MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
          E:\>type Kazahana.txt
          [out^^^^^^^^^^^^^ize*] outhyperbolize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outsize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outsized /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outstrategize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
          [out^^^^^^^^^^^^^ize*] outtyrannize /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/

E:\Kazahana_r1-++fix+>copy con Severina.txt
Mitko Schtereff 4 president
777 trumps 666
Windbloom
^Z
        1 file(s) copied.

E:\Kazahana_r1-++fix+>"Kazahana_r1-++fix+_HEXADECAD-Threads_IntelV12.exe" ". .`." Severina.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++fix+, copyleft Kaze 2013-Feb-13.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK

Kazahana: Total/Checked/Dumped xgrams: 3/3/1
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 1 xgrams/clock
Kazahana: Done.

E:\Kazahana_r1-++fix+>type Kazahana.txt
[. .`.] Mitko Schtereff 4 president /Severina.txt/

E:\Kazahana_r1-++fix+>"Kazahana_r1-++fix+_HEXADECAD-Threads_IntelV12.exe" `.` Severina.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++fix+, copyleft Kaze 2013-Feb-13.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK

Kazahana: Total/Checked/Dumped xgrams: 3/3/2
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 3 xgrams/clock
Kazahana: Done.

E:\Kazahana_r1-++fix+>type Kazahana.txt
[`.`] 777 trumps 666 /Severina.txt/
[`.`] Windbloom /Severina.txt/

E:\Kazahana_r1-++fix+>"Kazahana_r1-++fix+_HEXADECAD-Threads_IntelV12.exe" . Severina.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++fix+, copyleft Kaze 2013-Feb-13.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK

Kazahana: Total/Checked/Dumped xgrams: 3/3/1
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 3 xgrams/clock
Kazahana: Done.

E:\Kazahana_r1-++fix+>type Kazahana.txt
[.] Windbloom /Severina.txt/

E:\Kazahana_r1-++fix+>"Kazahana_r1-++fix+_HEXADECAD-Threads_IntelV12.exe" * Severina.txt
Kazahana, a superfast exact & wildcards & Levenshtein Distance (Wagner-Fischer) searcher, revision 1-++fix+, copyleft Kaze 2013-Feb-13.
omp_get_num_procs( ) = 2
omp_get_max_threads( ) = 2
Enforcing HEXADECAD i.e. hexadecuple-threads ...
Allocating Master-Buffer 7MB ... OK

Kazahana: Total/Checked/Dumped xgrams: 3/3/3
Kazahana: Performance: 0 KB/clock
Kazahana: Performance: 3 xgrams/clock
Kazahana: Done.

E:\Kazahana_r1-++fix+>type Kazahana.txt
[*] Mitko Schtereff 4 president /Severina.txt/
[*] 777 trumps 666 /Severina.txt/
[*] Windbloom /Severina.txt/

E:\Kazahana_r1-++fix+>
Latest revision: Kazahana_r1-++fix+.zip

Tuning is something very interesting and rewarding (it accumulates valuable chunks of experience), one (in fact one more) nasty bottleneck remains to be widened.

It is still hard for me to see what causes that brutal damage on speed scalability, further below.

Having beautified the Gulliver's arrays, by making them global and eliminating the unnecessary reinitializations, the result is a small but needed speed improvement:

Because the tested file is cached the I/O traffic doesn't disturb us.

1-threaded Exact search for 'ramjet' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 632 KB/clock !632!
Kazahana: Performance: 632 KB/clock
Kazahana: Performance: 625 KB/clock
Kazahana: Performance: 632 KB/clock
Kazahana: Performance: 625 KB/clock
Kazahana: Performance: 625 KB/clock
Kazahana: Performance: 624 KB/clock

r1-++fix:
Kazahana: Performance: 625 KB/clock
Kazahana: Performance: 632 KB/clock !632!
Kazahana: Performance: 632 KB/clock
Kazahana: Performance: 632 KB/clock
Kazahana: Performance: 632 KB/clock
Kazahana: Performance: 632 KB/clock
Kazahana: Performance: 632 KB/clock

1-threaded Exact search for 'metal_fatigue' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 713 KB/clock
Kazahana: Performance: 722 KB/clock !722!
Kazahana: Performance: 722 KB/clock
Kazahana: Performance: 722 KB/clock
Kazahana: Performance: 722 KB/clock
Kazahana: Performance: 722 KB/clock
Kazahana: Performance: 722 KB/clock

r1-++fix:
Kazahana: Performance: 703 KB/clock
Kazahana: Performance: 704 KB/clock !704!
Kazahana: Performance: 704 KB/clock
Kazahana: Performance: 704 KB/clock
Kazahana: Performance: 703 KB/clock
Kazahana: Performance: 704 KB/clock
Kazahana: Performance: 704 KB/clock

1-threaded Exact search for 'incomprehensible_misunderstanding' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 806 KB/clock !806!
Kazahana: Performance: 805 KB/clock
Kazahana: Performance: 806 KB/clock
Kazahana: Performance: 806 KB/clock
Kazahana: Performance: 806 KB/clock
Kazahana: Performance: 806 KB/clock
Kazahana: Performance: 805 KB/clock

r1-++fix:
Kazahana: Performance: 794 KB/clock !794!
Kazahana: Performance: 794 KB/clock
Kazahana: Performance: 794 KB/clock
Kazahana: Performance: 794 KB/clock
Kazahana: Performance: 794 KB/clock
Kazahana: Performance: 794 KB/clock
Kazahana: Performance: 794 KB/clock

Or roughly ((632+722+806)-(632+704+794))/(632+704+794)*100% = 1.4% speed up for the new revision.

16-threaded Exact search for 'ramjet' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 695 KB/clock !695!
Kazahana: Performance: 695 KB/clock
Kazahana: Performance: 687 KB/clock
Kazahana: Performance: 687 KB/clock
Kazahana: Performance: 687 KB/clock
Kazahana: Performance: 695 KB/clock
Kazahana: Performance: 687 KB/clock

r1-++fix:
Kazahana: Performance: 678 KB/clock
Kazahana: Performance: 678 KB/clock
Kazahana: Performance: 678 KB/clock
Kazahana: Performance: 670 KB/clock
Kazahana: Performance: 670 KB/clock
Kazahana: Performance: 686 KB/clock !686!
Kazahana: Performance: 670 KB/clock

16-threaded Exact search for 'metal_fatigue' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 783 KB/clock
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 751 KB/clock
Kazahana: Performance: 772 KB/clock
Kazahana: Performance: 794 KB/clock !794!
Kazahana: Performance: 752 KB/clock
Kazahana: Performance: 762 KB/clock

r1-++fix:
Kazahana: Performance: 752 KB/clock
Kazahana: Performance: 741 KB/clock
Kazahana: Performance: 751 KB/clock
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 741 KB/clock
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 783 KB/clock !783!

16-threaded Exact search for 'incomprehensible_misunderstanding' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 794 KB/clock
Kazahana: Performance: 783 KB/clock
Kazahana: Performance: 817 KB/clock !817!
Kazahana: Performance: 794 KB/clock
Kazahana: Performance: 784 KB/clock
Kazahana: Performance: 772 KB/clock
Kazahana: Performance: 817 KB/clock

r1-++fix:
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 784 KB/clock !784!
Kazahana: Performance: 784 KB/clock
Kazahana: Performance: 741 KB/clock
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 783 KB/clock
Kazahana: Performance: 783 KB/clock

Or roughly ((695+794+817)-(686+783+784))/(686+783+784)*100% = 2.3% speed up for the new revision.

The thing that confuses me badly is the miserable speed up for 16-threaded executable: ((695+794+817)-(632+722+806))/(632+722+806)*100% = 6.7%, obviously I was wrong to expect at least 50%, what causes this ugliness, who can explain!?

I took desperate measures and reduced the master buffer from 7MB (in order to search long lines, as Wikipedia's, 7MB is the minimum) down to 1MB.
The master buffer is devoured by those 16 threads, that is, each thread has its own haystack or approximately 1MB/16 = 65536 bytes.

1-threaded Exact search for 'ramjet' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 731 KB/clock
Kazahana: Performance: 762 KB/clock !762!
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 762 KB/clock
Kazahana: Performance: 762 KB/clock

1-threaded Exact search for 'metal_fatigue' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 868 KB/clock
Kazahana: Performance: 869 KB/clock !869!
Kazahana: Performance: 869 KB/clock
Kazahana: Performance: 882 KB/clock
Kazahana: Performance: 869 KB/clock
Kazahana: Performance: 868 KB/clock
Kazahana: Performance: 869 KB/clock

1-threaded Exact search for 'incomprehensible_misunderstanding' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 1,091 KB/clock !1,091!
Kazahana: Performance: 1,089 KB/clock
Kazahana: Performance: 1,091 KB/clock
Kazahana: Performance: 1,089 KB/clock
Kazahana: Performance: 1,089 KB/clock
Kazahana: Performance: 1,089 KB/clock
Kazahana: Performance: 1,089 KB/clock

16-threaded Exact search for 'ramjet' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 830 KB/clock
Kazahana: Performance: 868 KB/clock
Kazahana: Performance: 869 KB/clock !869!
Kazahana: Performance: 868 KB/clock
Kazahana: Performance: 855 KB/clock
Kazahana: Performance: 869 KB/clock
Kazahana: Performance: 855 KB/clock

16-threaded Exact search for 'metal_fatigue' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 1,029 KB/clock !1,029!
Kazahana: Performance: 993 KB/clock
Kazahana: Performance: 976 KB/clock
Kazahana: Performance: 974 KB/clock
Kazahana: Performance: 958 KB/clock
Kazahana: Performance: 992 KB/clock
Kazahana: Performance: 943 KB/clock

16-threaded Exact search for 'incomprehensible_misunderstanding' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

r1-++fix+:
Kazahana: Performance: 1,181 KB/clock
Kazahana: Performance: 1,208 KB/clock
Kazahana: Performance: 1,264 KB/clock !1,264!
Kazahana: Performance: 1,183 KB/clock
Kazahana: Performance: 1,208 KB/clock
Kazahana: Performance: 1,209 KB/clock
Kazahana: Performance: 1,183 KB/clock

Again confusion: ((869+1029+1264)-(762+869+1091))/(762+869+1091)*100% = 16.1%, still far from my illusionary 50%, the drop is too deep, who can explain!?

The benchmark/torture that interests me the most is this:

- Physical RAM: 64GB, preferably quad-channeled - to tune fastest on non-fastest is not exciting;
- XEON class CPU 8cores/16threads, or AMD 8cores/8threads;
- Preferably both the RAM&CPU overclocked at MAX - I want a glimpse of the future.

It will show whether disappointing results on my T7500 'Bonboniera' are proportional to results on thread-wise CPUs like XEON/Bulldozer.
The above environment is excellent for tuning Kazahana because my master torture-test English Wikipedia (39.2GB) fits in the OS system cache thus eliminating the I/O traffic.

My primary interest lies in English language phrase suggesting, Galadriel and after that Kazahana were meant to be word/phrase suggesteresses in my free phrase-checker Masakari, however it doesn't hurt they both to be used as standalone tools.

Beside text torturing, what I need is at least one *nix programmer willing to help me to port Kazahana, my skills in *nix are tragical, I would be very glad to see (or rather hear) her operational in *nix environment - it would be the most meaningful 'mashallah' for me.
Just send me an email at [email protected], I will give you (within 48 hours - my latency) a link to C source of latest 1-++fix+ revision.

One song from my childhood is still very dear to me: 'Diana Express - Severina', the world is really small: 'Diana Express' was (she is no more, she is in Trains' Heaven, Bulgaria is falling apart) the name of one of Bulgarian shinkansens (in Japanese: new rail line, in Bulgaria we simply say 'влакът стрела' i.e. 'the arrow train' not bullet train nor high velocity train, because Diana is the goddess of hunting, she is depicted as semi-naked beauty launching arrows gracefully and presumably with god-like accuracy), where 'SEVERINA' is the name of a girl (made of snow, the lyrics don't tell literally or metaphorically) coming every winter from the North, the closest equivalent of 'Severina' being 'Northina'.

Lyrics
Mitko Shterev - keyboards
Illya Angelov - lead vocal & guitar
Диана Експрес - Северина / Diana Express - Severina

Северина, момиче от сняг / Severina a girl made of snow
всяка зима е северен знак / every winter she is a northern sign
аз го имам в песен от юг / I have it in song from south
Северина - радост за друг / Severina - a joy not for me

И като сняг тихо вали / And like a snow she silently comes
вик от мойта любов / a scream from my love
и се топи и навява тъга / and she melts and brings sadness
песента ми за теб / my song for you

Северина, момиче от сняг / Severina a snow girl
на приказна фея / she is fabulous fairy's
е северен знак / northern sign
целува ме бързо / she kisses me quickly
и по снега тръгва зима / and winter start marching on
бяла тъга / white sadness


I've heard that Eskimo people have more than 200 words for snow, this trumps even the sensitivity of Japanese people who are best known for their reverence to Nature, just a few snow related ones:

hatsuyuki : first snow (of season)
hyouden : field of eternal snow
koyuki : light snow
ooyuki : heavy snow
setsuzou : snow sculpture
shinshin : sound of heavy snow-fall
shinshin : mind body
yukionna : snow woman, fairy

Mutsi-mutsi, 'shinshin' is all-zen, in my view, any language lacking a word for sound of snowfalling quickly must incorporate it and fill the GAP.

How much I would like someone versed in their languageS to teach me all variants of 'snowflake'.

In my language 'snowflake' is described with 'снежинка', whereas 'Snowwhite' with 'Снежанка', no bias here: these two Bulgarian words are fantabulous and so ringy, they are feminine (English in that respect fails to connote the most beautiful facet: thenderness), another lovely variant is the Russian 'Снегурочка', Russian is a brother language yet the 'snowflake' counterpart eludes me.

Please tell me what words are in use for 'snowflake' in your language.
 

·
Premium Member
Joined
·
8,041 Posts
Quote:
Originally Posted by Sanmayce View Post

@Plan9
Man, we are too different to get along.
You're the one getting shirty though. Every time I ask what differentiates this from existing tools that come pre-installed, you get funny and say I've offended you.
Quote:
Originally Posted by Sanmayce View Post

Persons who need a free search tool will have the chance to try this one, hopefully without your 'censorship'.
They already exist - pre-installed with every OS. And I'm not censoring you, I'm just asking what this tool does that's so awe-inspiring. You're the one censoring yourself by not using that opportunity to sing the praises of your application.
Quote:
Originally Posted by Sanmayce View Post

Again you knocked me down, please stop throwing slanders at me
That isn't slander. I'm actually getting quite annoyed at you now because it's impossible to hold a mature discussion without you throwing your toys out of the pram.
Quote:
Originally Posted by Sanmayce View Post

, my only fault is that 6-7 years ago when I was choosing my domain I foolishly chose to be .com which I regret ever since, consequently I learned that it stands for commercial, but in my defence .org and .net appeared to me too pompous and not good for a personal site, anyway I don't sell anything, literally and figuratively.
That's not the reason why this felt like a promotional thread. The reason why is because you started a thread centred around an app of yours - a thread that you've spammed multiple forums too, I might add- then you keep trying to change the subject whenever competing products are mentioned and making false claims that your app is the only free utility of it's kind available.

So while I would normally give kudos to those who do write freeware and particularly those who release the source, this those thread comes across mighty suspicious - and that's entirely down to the way how you've conducted yourself. It's impossible to get straight answers from you. Then out of the blue a user who hasn't posted before yet shares the same posting style as you comes on board and does the tried and tested routine of a third party endorsing a product. Equally suspicious.
Quote:
Originally Posted by Sanmayce View Post

You have many things to unlearn, that is only my personal opinion which happens to be overlapping with the truth in huge number of cases.
And you call me rude
rolleyes.gif

Quote:
Originally Posted by Sanmayce View Post

And for more manga (this word you don't know), being thankful and what not you can read one of my posts at thefreedictionary, I hope you will unlearn something.
1) I know what manga means given that I was the one who posted the term.
2) I'm not being ungrateful, I'm just asking what this sodding application does that makes it so bloody special. However instead of answering what should be a straightforward question, you kick off as if I've insulted your mother.

I'm going to give up on this thread now though because it's become quite clear that you're too immature to chat about the software itself (which should have been the crux of this thread). And looking back at this thread, it's quite obvious that everyone else (bar duhai, who I'd put money on being your alias) is just as confused about the point of this application as I am. So you're doing your utility a real injustice by having this loopy attitude of yours.
 

·
Integer Benchmarker
Joined
·
437 Posts
Discussion Starter · #29 ·
No problema, so be it.
 

·
Integer Benchmarker
Joined
·
437 Posts
Discussion Starter · #30 ·
Just found one very cool 2 years old video:

Cheapest super computer in the world made by Prof. Hamada:
Description:
"This super computer was built by a Japanese university professor , its built using ordinary computer parts that you can find in any computer shop every where , and what is -very- special about this computer is that its very cheap compared to other super computers around the world ,computers owned by governments and big corporations that coasts around 1.2 bil$ each .this one coast only 420000$ ,and it broke the world record of performance ( calculations per second ) compared to the other extremely expensive super computer."


At 4:10 a very nice message.

At 5:04
... the biggest applications haven't even been imagined yet.

Nice, this guy is so natural, should have his own TV show:
What should I say, not a bad machine for my ENWIKI torture at all.

EDIT, 23 Feb:
The article going along with the above video.

I have been badly surprised when looked at AIDA64 Memory results on "powerful" Dual Xeon E5-2687W @3.4GHz:
Memory Copy: 10000MB/s

Just now I got why the system is called 'INSANITY' - quad channel monster machine with such miserable memory bandwidth!?


Two months ago I asked a forum fellow (cavallino) for stats on his dual channel XEON and AIDA reported 21530MB/s, how is that possible!?
 

·
Integer Benchmarker
Joined
·
437 Posts
Discussion Starter · #31 ·
Fine-tuning continues with next more verbose revision 1-++fix+nowait, in addition to stats, the Master-Buffer size is changeable from the command line (last parameter), thus hitting the fastest (for your CPU) size is possible, for mine it is between 1024KB and 1800KB.

First, all results below are reproduceable with this test package: Kazahana_r1-++fix+nowait_VS_grep.7z
On my T7500 2200MHz 4MB L2 cache processor, Windows 7 64bit, the results for next 3 patterns are:

16-threaded Exact search for 'ramjet' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

Code:

Code:
Allocating Master-Buffer 1536KB ... OK
Kazahana: Dumped xgrams: 49
Kazahana: Performance: 1,011 KB/clock
Kazahana: Performance: Total/fread() clocks: 859/437
Kazahana: Performance: I/O time, i.e. fread() time, is 50 percents
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.546 =   57%
User Time    =     1.138 =  119%
Process Time =     1.684 =  176%
Global Time  =     0.953 =  100%
16-threaded Exact search for 'metal_fatigue' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

Code:

Code:
Allocating Master-Buffer 1536KB ... OK
Kazahana: Dumped xgrams: 1
Kazahana: Performance: 1,235 KB/clock
Kazahana: Performance: Total/fread() clocks: 703/484
Kazahana: Performance: I/O time, i.e. fread() time, is 68 percents
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.546 =   59%
User Time    =     0.951 =  103%
Process Time =     1.497 =  163%
Global Time  =     0.918 =  100%
16-threaded Exact search for 'incomprehensible_misunderstanding' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

Code:

Code:
Allocating Master-Buffer 1536KB ... OK
Kazahana: Dumped xgrams: 1
Kazahana: Performance: 1,462 KB/clock
Kazahana: Performance: Total/fread() clocks: 594/470
Kazahana: Performance: I/O time, i.e. fread() time, is 79 percents
Kazahana: Done.
Timer 9.01 : Igor Pavlov : Public domain : 2009-05-31

Kernel Time  =     0.546 =   58%
User Time    =     0.889 =   95%
Process Time =     1.435 =  154%
Global Time  =     0.930 =  100%
And to "verify" our watch with grep's:

'grep' 2.5.4 Exact search for 'ramjet' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

Code:

Code:
Kernel Time  =     0.639 =   13%
User Time    =     4.196 =   86%
Process Time =     4.836 =   99%
Global Time  =     4.847 =  100%
'grep' 2.5.4 Exact search for 'metal_fatigue' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

Code:

Code:
Kernel Time  =     0.546 =   11%
User Time    =     4.040 =   88%
Process Time =     4.586 =   99%
Global Time  =     4.587 =  100%
'grep' 2.5.4 Exact search for 'incomprehensible_misunderstanding' into 889,537,624 bytes long file '4andabove_Gamera.tar.2.sorted':

Code:

Code:
Kernel Time  =     0.577 =   13%
User Time    =     3.806 =   86%
Process Time =     4.383 =   99%
Global Time  =     4.399 =  100%
It surprises me how 'grep' doesn't get advantage of longer needles/patterns, anyway, Kazahana is 4.5:1 faster than 'grep'.

And one quick dummy stat:
For all three patterns below the total fread() time is in range 437..484 clocks (well, milliseconds):
'ramjet' is found at 1,011 KB/clock for entire task, but only 100% - 50% = 50% are spent in actual parsing and searching which equals 1,011 KB/clock / (50/100) = 2,022 KB/clock L2 Multi-Threaded Gulliver performance.
'metal_fatigue' is found at 1,235 KB/clock for entire task, but only 100% - 68% = 32% are spent in actual parsing and searching which equals 1,235 KB/clock / (32/100) = 3,859 KB/clock L2 Multi-Threaded Gulliver performance.
'incomprehensible_misunderstanding' is found at 1,462 KB/clock for entire task, but only 100% - 79% = 21% are spent in actual parsing and searching which equals 1,462 KB/clock / (21/100) = 6,961 KB/clock L2 Multi-Threaded Gulliver performance.

The bottom-line: Kazahana is bound to Main Memory Copying.
I have had too high hopes, speaking of total performance, the time which fread() consumes is a nasty break, meaning Kazahana is limited by I/O even with cached (by the OS) data. She is so fast that nowadays PCs with their slow memcpy() transfers ('Everest' says for Main Memory: 5446MB/s Read and 4015MB/s Copy on my laptop) are bottlenecks especially with MT-Gulliver used.
Seeing how XEONs do Main Memory Copy at 20000MB/s still gives me hope of getting 20000:4000 boost.

In my view, Kazahana is a forerunner (a snowflower in a sunny day, remember) of more cool times, when RAM operations will be much faster and she will be able to bloom TRULY.
 

·
Integer Benchmarker
Joined
·
437 Posts
Discussion Starter · #32 ·
Happy to share the latest most optimized and fully functional Kazahana.

Inspired by the words of Phil Schneider "We can do better!" here comes the 100% FREE (no licenses of any kind) package for English language explorers.

The source and executables are here, the console tool now has its own GUI shell called Gallowwalker which gives you control over several must-have text processing tasks.
In short, Kazahana has three main modes: exact (CASE-SENSITIVE), wildcard (CASE-SENSITIVE/CASE-INSENSITIVE also RECURSIVE/ITERATIVE), fuzzy (SIMPLE/EXHAUSTIVE).

The resultant file of above search:


Code:

Code:
"D:\_GW\Kazahana.exe" "*~envisaged~*~moving~*" "Google_Books_corpus_version_20130501_English_All_Arcs.txt" 458754

D:\>type _GW\Kazahana.txt
[*~envisaged~*~moving~*] envisaged      envisaged/VBD/ROOT/0 as/IN/prep/1 moving/VBG/pcomp/2    19      1959,1  1964,1  1977,3  1979,2  1985,1  1993,1  1995,3  1996,1  1999,1  2001,3  2002,1  2004,1 /Google_Books_corpus_version_20130501_English_All_Arcs.txt/
[*~envisaged~*~moving~*] envisaged      envisaged/VBD/ROOT/0 moving/VBG/dep/1   15      1924,1  1946,2  1962,2  1971,1  1981,1  1991,1  2003,3  2004,3  2006,1 /Google_Books_corpus_version_20130501_English_All_Arcs.txt/
[*~envisaged~*~moving~*] envisaged      envisaged/VBD/ROOT/0 moving/VBG/dobj/1  10      1981,1  1996,1  1998,1  2003,7 /Google_Books_corpus_version_20130501_English_All_Arcs.txt/
[*~envisaged~*~moving~*] envisaged      envisaged/VBD/ROOT/0 moving/VBG/xcomp/1 65      1960,1  1967,2  1968,3  1970,2  1975,1  1976,1  1977,2  1978,2  1982,5  1983,5  1985,1  1987,2  1988,1  1989,2  1990,2  1991,2  1993,3  1994,1  1997,3  1998,2  1999,3  2002,3  2003,2  2005,2  2006,3  2007,6  2008,3 /Google_Books_corpus_version_20130501_English_All_Arcs.txt/
[*~envisaged~*~moving~*] envisaged      envisaged/VBD/rcmod/0 moving/VBG/xcomp/1        13      1981,1  1982,1  1991,1  1996,2  1997,3  1998,2  2001,1  2006,1  2008,1 /Google_Books_corpus_version_20130501_English_All_Arcs.txt/
[*~envisaged~*~moving~*] envisaged      envisaged/VBN/ROOT/0 as/IN/prep/1 moving/VBG/pcomp/2    64      1941,2  1949,1  1961,1  1967,3  1968,1  1970,1  1971,1  1972,2  1973,1  1974,1  1975,3  1978,1  1980,1  1981,5  1982,2  1983,1  1985,2  1986,1  1987,1  1988,1  1989,2  1990,1  1991,3  1992,3  1993,1  1995,3  1996,2  1997,6  1999,2  2002,1  2003,2  2004,1  2005,2  2006,1  2007,1  2008,1 /Google_Books_corpus_version_20130501_English_All_Arcs.txt/
[*~envisaged~*~moving~*] envisaged      envisaged/VBN/ROOT/0 moving/VBG/xcomp/1 24      1948,5  1949,1  1952,1  1973,3  1974,1  1988,1  1995,1  1998,3  2000,1  2002,3  2005,1  2007,3 /Google_Books_corpus_version_20130501_English_All_Arcs.txt/
The Google ngram viewer is said to exploit those data:


Few days ago I found something very useful (for English n-gram explorers) - the n-gram dumps (from 2013) of Google Books, now featuring more than 3 million books.
They shifted somewhat from n-grams toward arcs, in the previous release (from 2012) I counted around 5 million distinct words, now they are more than 7 million.
In their notation a node stands for a word, but they say 47 million nodes, whatever.

Basically, a phrase of order n (usually n=1..9) is a sequence of n words, some call it an n-gram.
Google uses different kind of n-grams, called arcs, denoting syntactical n-grams, arcs because under each arc zero or few words can take place.
My approach is not as fuzzy as theirs, I stick to WYSIWYG principle.
I came up with a simple structure of GREAT importance - PAGODA.
In contrast to Google's arcs my x-grams (kind of n-grams) are used as building blocks for PAGODAs.
Looking into arcs' dumps I see only left-to-right suggesting while the nifty way of dealing with n-grams is two-way, right-to-left as well.

A quick legend:
1-gram / node = word
2-gram / arc = word ^ word
3-gram / biarc = word ^ word ^ word
4-gram / triarc = word ^ word ^ word ^ word
5-gram / quadarc = word ^ word ^ word ^ word ^ word

That is how node corpus looks like:

Code:

Code:
D:\_KAZE\Google_Books_corpus_version_20130501_English_All_Nodes\Google_Books_corpus_version_20130501_English_All_Nodes>dir ..\*.txt

01/10/2015  12:41 PM    10,624,363,237 Google_Books_corpus_version_20130501_English_All_Nodes.txt

D:\_KAZE\Google_Books_corpus_version_20130501_English_All_Nodes\Google_Books_corpus_version_20130501_English_All_Nodes>type nodes.49-of-99|more
indolebutyric   indolebutyric/NNP/nn/0  67      1933,2  1939,1  1940,1  1941,2  1943,1  1945,2  1946,1  1947,1  1948,1  1949,1  1951,5  1952,1  1953,2  1954,2  1956,2  1957,2  1958,2  1959,1  1961,1  1962,3  1963,1  1966,1  1968,3  1969,1  1970,1  1971,2  1976,1  1982,1  1983,1  1984,1  1985,1  1986,4  1987,1  1989,2  1993,2  1994,1  1995,2  1996,1  1998,3  2004,1  2005,1  2006,1
indolecarboxylate       indolecarboxylate/JJ/dep/0      14      1948,8  1981,1  1996,4  1997,1
indolecarboxylic        indolecarboxylic/JJ/amod/0      55      1936,3  1938,1  1946,1  1948,5  1958,1  1960,3  1962,1  1963,3  1964,1  1966,1  1967,3  1968,2  1970,3  1971,1  1975,1  1978,1  1980,1  1983,1  1984,2  1988,2  1990,5  1994,1  2000,2  2004,1  2006,1  2007,5  2008,3
indolecarboxylic        indolecarboxylic/JJ/dep/0       29      1938,1  1947,1  1948,1  1955,4  1962,2  1963,1  1964,1  1965,2  1967,3  1969,2  1970,1  1980,1  1981,1  1990,2  1995,1  1996,1  1998,1  2000,1  2002,2
indoleethylamines       indoleethylamines/NNS/pobj/0    17      1962,3  1963,3  1973,1  1974,1  1975,2  1977,2  1979,2  1989,1  1997,2
indoleglycerol  indoleglycerol/JJ/dep/0 84      1964,2  1966,3  1967,5  1968,3  1969,2  1970,1  1971,1  1972,1  1973,2  1975,22 1977,2  1978,2  1979,2  1980,20 1981,1  1983,2  1984,3  1985,2  1986,1  1990,1  1991,1  1995,1  2004,2  2006,1  2007,1
indoleglycerol  indoleglycerol/NN/dobj/0        13      1963,3  1967,1  1973,3  1975,2  1979,1  1984,1  1985,2
indoleglycerol  indoleglycerol/NN/nn/0  266     1955,1  1959,2  1960,16 1961,4  1962,15 1963,11 1964,12 1965,6  1966,3  1967,11 1968,22 1969,4  1970,1  1971,11 1972,3  1973,8  1974,9  1975,9  1976,2  1977,7  1978,10 1979,10 1980,5  1982,12 1983,14 1984,3  1985,6  1986,1  1989,2  1990,1  1991,1  1992,1  1993,6  1995,2  1997,6  1998,4  1999,3  2000,2  2001,6  2002,1  2003,1  2004,1  2005,3  2006,4  2007,1  2008,3
indoleglycerol  indoleglycerol/NNP/dep/0        12      1962,1  1965,2  1966,5  1980,2  1991,1  2006,1
indoleglycerol  indoleglycerol/NNP/nn/0 42      1960,3  1961,2  1963,2  1964,2  1965,1  1967,2  1968,3  1969,1  1970,5  1973,2  1974,4  1975,2  1978,3  1980,1  1983,1  1995,1  1998,2  2000,1  2001,1  2002,2  2003,1
indoleglycerolphosphate indoleglycerolphosphate/NN/pobj/0       10      1955,1  1964,1  1965,3  1971,1  1974,2  1999,2
indolelactic    indolelactic/JJ/amod/0  164     1936,2  1938,2  1947,2  1949,1  1951,2  1954,2  1955,3  1957,3  1958,10 1959,2  1960,13 1961,10 1962,5  1963,10 1964,16 1965,4  1966,15 1967,1  1968,3  1969,2  1970,7  1971,10 1972,6  1973,6  1974,1  1975,4  1977,1  1978,1  1981,1  1983,1  1986,1  1988,3  1989,1  1990,2  1991,1  1992,1  1999,2  2003,6  2004,1
indolelactic    indolelactic/JJ/conj/0  24      1955,2  1957,1  1960,1  1961,2  1962,2  1963,3  1964,4  1965,2  1966,4  1968,1  1970,1  2003,1
indolelactic    indolelactic/JJ/dep/0   32      1938,3  1945,1  1949,1  1954,1  1955,5  1957,1  1961,2  1966,4  1967,4  1972,3  1986,1  1990,1  1993,1  1994,1  1999,1  2006,1  2007,1
...
The arcs corpus is like that:

Code:

Code:
K:\arcs>type arcs.31-of-99
envisaged       envisaged/VBD/ROOT/0 movements/NNS/dobj/1       27      1965,1  1966,2  1970,2  1981,1  1982,2  1983,4  1988,2  1990,2  1992,1  1993,3  1996,1  1997,1  1999,1  2001,1  2003,1  2004,2
envisaged       envisaged/VBD/ROOT/0 moves/NNS/dobj/1   18      1953,3  1969,3  1976,2  1989,1  1991,1  1993,2  1994,2  1995,1  1998,1  2002,1  2006,1
envisaged       envisaged/VBD/ROOT/0 moving/VBG/dep/1   15      1924,1  1946,2  1962,2  1971,1  1981,1  1991,1  2003,3  2004,3  2006,1
envisaged       envisaged/VBD/ROOT/0 moving/VBG/dobj/1  10      1981,1  1996,1  1998,1  2003,7
envisaged       envisaged/VBD/ROOT/0 moving/VBG/xcomp/1 65      1960,1  1967,2  1968,3  1970,2  1975,1  1976,1  1977,2  1978,2  1982,5  1983,5  1985,1  1987,2  1988,1  1989,2  1990,2  1991,2  1993,3  1994,1  1997,3  1998,2  1999,3  2002,3  2003,2  2005,2  2006,3  2007,6  2008,3
envisaged       envisaged/VBD/ROOT/0 much/JJ/dobj/1     32      1914,1  1932,1  1941,2  1966,2  1967,3  1979,1  1989,2  1993,1  1994,1  1996,1  1997,1  1999,1  2000,1  2001,1  2004,5  2005,1  2007,4  2008,3
envisaged       envisaged/VBD/ROOT/0 much/RB/advmod/1   39      1966,4  1967,2  1970,2  1972,3  1973,1  1976,2  1986,8  1989,3  1994,4  1998,2  2001,5  2002,1  2004,1  2006,1
envisaged       envisaged/VBD/ROOT/0 much/RB/dobj/1     14      1933,1  1948,1  1964,3  1965,2  1984,1  1985,1  1991,2  2000,1  2001,1  2008,1
envisaged       envisaged/VBD/ROOT/0 multitude/NN/dobj/1        27      1946,3  1949,1  1950,3  1951,1  1953,2  1961,1  1979,1  1983,1  1985,3  1989,3  1999,1  2000,4  2001,1  2003,2
envisaged       envisaged/VBD/ROOT/0 municipalities/NNS/dobj/1  10      1958,3  1962,2  1975,1  1989,2  1993,2
envisaged       envisaged/VBD/ROOT/0 museum/NN/dobj/1   27      1982,3  1992,1  1994,2  1995,4  1996,1  1997,1  1998,1  1999,4  2000,5  2001,1  2004,2  2006,1  2007,1
envisaged       envisaged/VBD/ROOT/0 music/NN/dobj/1    26      1960,2  1983,1  1988,3  1989,3  1991,1  1996,4  1998,1  2001,1  2005,2  2006,2  2007,6
envisaged       envisaged/VBD/ROOT/0 muslim/NN/dobj/1   10      1945,1  1988,1  1989,1  1993,1  1998,1  1999,1  2000,2  2001,1  2002,1
envisaged       envisaged/VBD/ROOT/0 mustering/NN/dobj/1        10      1971,2  1977,1  1982,1  1987,1  1991,1  1996,1  2001,3
envisaged       envisaged/VBD/ROOT/0 myself/PRP/dobj/1  32      1938,2  1957,1  1958,1  1975,1  1978,1  1980,2  1984,2  1986,2  1988,2  1990,1  1995,2  1996,2  1998,1  2000,4  2001,1  2004,2  2006,3  2007,2
envisaged       envisaged/VBD/ROOT/0 name/NN/dobj/1     10      1983,1  1986,1  1992,1  1999,2  2007,3  2008,2
envisaged       envisaged/VBD/ROOT/0 namely/RB/advmod/1 72      1944,1  1946,1  1949,4  1959,1  1962,2  1964,2  1967,1  1968,3  1970,1  1974,1  1975,4  1981,3  1982,5  1983,2  1984,1  1986,3  1988,2  1989,1  1990,1  1992,3  1995,2  1996,3  1998,4  2000,5  2002,1  2003,2  2004,7  2005,1  2006,3  2007,1  2008,1
envisaged       envisaged/VBD/ROOT/0 naseem/NNP/dobj/1  12      1981,2  1991,5  1995,2  1997,2  2006,1
envisaged       envisaged/VBD/ROOT/0 nation/NN/dobj/1   155     1928,2  1937,1  1942,2  1944,1  1946,5  1951,5  1954,5  1955,2  1957,2  1958,1  1959,2  1960,1  1961,6  1962,2  1964,2  1966,2  1967,2  1969,5  1970,1  1973,2  1974,4  1975,2  1976,7  1977,3  1978,2  1979,4  1980,2  1981,4  1982,3  1983,3  1984,1  1987,3  1988,1  1989,2  1990,2  1991,4  1993,2  1994,4  1995,2  1996,2  1997,6  1998,5  1999,3  2000,8  2002,7  2003,3  2004,3  2005,3  2006,2  2007,3  2008,4
envisaged       envisaged/VBD/ROOT/0 nationalisation/NN/dobj/1  29      1966,1  1967,1  1969,3  1971,1  1974,1  1976,1  1980,4  1981,1  1985,2  1987,2  1988,2  1989,2  1991,2  1994,4  1998,1  2000,1
envisaged       envisaged/VBD/ROOT/0 nationalization/NN/dobj/1  43      1943,1  1948,1  1951,1  1963,1  1965,3  1966,1  1967,2  1970,2  1972,4  1974,1  1975,1  1977,2  1978,2  1979,3  1982,2  1983,3  1984,1  1985,4  1986,1  1987,1  1992,4  2001,1  2002,1
envisaged       envisaged/VBD/ROOT/0 nationhood/NN/dobj/1       14      1944,4  1946,1  1961,2  1967,1  2002,1  2005,4  2007,1
envisaged       envisaged/VBD/ROOT/0 nations/NNPS/dobj/1        13      1924,1  1955,3  1967,3  1987,2  1990,2  1999,1  2007,1
envisaged       envisaged/VBD/ROOT/0 nations/NNS/dobj/1 37      1945,1  1946,2  1947,2  1951,2  1961,2  1968,1  1970,2  1973,2  1975,1  1979,2  1981,1  1984,1  1985,1  1987,1  1988,4  1990,2  1992,1  1994,1  1997,2  2003,1  2004,1  2008,4
envisaged       envisaged/VBD/ROOT/0 nature/NN/dobj/1   72      1923,1  1932,1  1943,1  1946,2  1947,1  1956,3  1957,2  1959,2  1960,2  1961,1  1964,3  1966,1  1967,1  1968,2  1970,2  1971,1  1972,3  1973,2  1976,1  1978,1  1980,1  1981,3  1983,1  1987,2  1988,3  1989,1  1992,2  1993,3  1995,1  1996,1  1997,3  2001,2  2002,1  2004,4  2005,4  2006,5  2007,1  2008,1
envisaged       envisaged/VBD/ROOT/0 navy/NN/dobj/1     10      1940,1  1945,1  1951,1  1957,1  1962,1  1965,2  1989,1  1991,1  2005,1
envisaged       envisaged/VBD/ROOT/0 necessary/JJ/acomp/1       11      1973,1  1974,2  1979,1  1980,1  1983,2  1985,1  1990,1  1999,2
envisaged       envisaged/VBD/ROOT/0 necessity/NN/dep/1 12      1926,3  1929,1  1937,2  1941,1  1952,1  1956,1  1959,1  1960,1  1987,1
envisaged       envisaged/VBD/ROOT/0 necessity/NN/dobj/1        131     1923,1  1924,1  1926,4  1929,1  1930,2  1935,1  1936,1  1937,2  1940,4  1941,1  1943,1  1944,2  1946,2  1947,2  1948,1  1951,1  1952,5  1953,1  1954,2  1955,1  1956,2  1959,1  1960,3  1961,13 1962,1  1963,5  1964,2  1965,1  1967,1  1968,3  1971,4  1972,1  1974,1  1977,3  1978,1  1980,3  1981,5  1983,3  1985,2  1987,2  1988,3  1991,1  1992,3  1993,1  1994,2  1995,4  1996,3  1998,4  1999,2  2000,1  2001,1  2002,2  2003,2  2004,1  2005,1  2006,4  2007,2  2008,1
envisaged       envisaged/VBD/ROOT/0 need/NN/dobj/1     340     1929,3  1931,1  1932,2  1936,1  1940,2  1946,1  1947,2  1948,2  1949,1  1950,3  1952,1  1953,1  1954,5  1955,4  1957,8  1959,7  1960,4  1961,4  1963,4  1964,1  1965,9  1966,3  1967,6  1968,4  1969,3  1970,2  1971,1  1972,4  1973,6  1974,3  1975,7  1976,5  1978,1  1979,6  1980,3  1981,5  1983,3  1984,2  1985,3  1986,7  1987,7  1988,6  1989,13 1990,11 1991,5  1992,7  1993,11 1994,5  1995,10 1996,5  1997,8  1998,6  1999,5  2000,6  2001,11 2002,6  2003,7  2004,19 2005,13 2006,12 2007,21 2008,6
envisaged       envisaged/VBD/ROOT/0 need/VB/ccomp/1    77      1953,1  1963,3  1966,1  1970,2  1971,1  1976,1  1978,1  1979,1  1980,1  1982,2  1983,2  1986,3  1988,4  1989,3  1990,1  1991,1  1993,1  1994,9  1995,3  1996,1  1997,3  1998,3  1999,1  2000,3  2001,4  2003,2  2004,2  2005,6  2006,1  2007,2  2008,8
envisaged       envisaged/VBD/ROOT/0 need/VBP/ccomp/1   37      1944,1  1945,2  1947,3  1957,2  1965,2  1966,3  1969,1  1970,4  1971,1  1975,3  1977,2  1978,1  1984,1  1985,1  1986,2  1992,1  1996,1  1998,2  2003,1  2006,1  2007,1  2008,1
envisaged       envisaged/VBD/ROOT/0 needed/VBN/advcl/1 10      1955,3  1958,1  1964,1  1989,2  1997,1  1998,1  2006,1
envisaged       envisaged/VBD/ROOT/0 needed/VBN/ccomp/1 36      1936,1  1960,2  1973,3  1977,3  1980,1  1982,2  1987,3  1988,2  1990,2  1993,1  1997,2  2000,1  2001,2  2002,3  2003,5  2004,2  2005,1
envisaged       envisaged/VBD/ROOT/0 needs/NNS/dobj/1   46      1915,3  1916,6  1917,1  1923,1  1936,1  1942,1  1946,2  1951,1  1952,1  1961,1  1967,3  1970,2  1975,2  1983,1  1985,1  1986,2  1992,3  1993,1  1997,1  1998,3  1999,2  2000,1  2001,2  2002,2  2003,1  2004,1
envisaged       envisaged/VBD/ROOT/0 negotiation/NN/dobj/1      54      1940,1  1946,3  1947,2  1951,2  1954,1  1956,1  1958,2  1964,1  1965,1  1968,6  1971,1  1972,1  1975,1  1982,2  1984,1  1988,1  1989,2  1991,1  1992,1  1996,4  1997,4  1998,2  2000,4  2001,3  2002,1  2003,2  2005,2  2006,1
envisaged       envisaged/VBD/ROOT/0 negotiations/NNS/dobj/1    63      1954,3  1962,1  1969,5  1971,3  1972,1  1973,5  1977,1  1978,1  1980,2  1982,2  1983,3  1986,1  1988,2  1989,3  1990,4  1991,2  1993,4  1994,1  1997,6  1999,1  2001,1  2002,1  2003,2  2004,4  2005,3  2008,1
...
The arcs corpus (Google_Books_corpus_version_20130501_English_All_Arcs.txt) stats:
Size: 179,736,720,202 bytes
Encountered lines: 918,860,187
Encountered words: 7,419,031,777
Longest line: 4,244
Longest word: 217

I am already out of external memory, having the five corpora might require 1TB drive, dedicated, preferably m.2, then 16 threads would run at 1GB/s in wildcard mode.

In short, I did my best to offer Uncompromised Speed, however the running machine has to be stuffed with cores and RAM.
This scamp looks dear to me:
 
21 - 32 of 32 Posts
Top