Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › Searching a Hah Table
New Posts  All Forums:Forum Nav:

Searching a Hah Table

post #1 of 15
Thread Starter 
Hey there i need a bit of help with my find method. Im making an english to tagalog dictionary for a co worker who is new to canada i plan to use JOptionPane for UI and im using a hash table for the efficieny benefits but i can not seem to get my find method to work correctly, if you guys can help i will be very happy!

i will include my code and the text file used as the dictionary so far i believe that i am hashing correctly but i can not get he find to work correctly. It almost works...
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
post #2 of 15
are these the two methods?

any collision handling?

Collisions occur when you are creating your hash table. It happens when two different words have the same key...look into linear probing or chaining to handle collisions. If you want to get rid of collisions the easy way...just make your hash table larger...but of course you lose more efficiency this way...but since you only have a few words inputted into your table it is negligible.

Also...when you are searching a hash table you dont use a loop...which would defeat the purpose of a hash table...efficiency. You simply hash the word you are searching for...get the key...and the word should come up within 1 search...depending upon collisions.

Once you have found your word in the hash table...then it is time to use a method to convert the word into the other language.

Not sure if I helped...but I tried...if I had my other comp running (has java on it) I would write this up quickly for you.

Quote:
Originally Posted by Sin Cities Sin View Post
public String findTagalog(String word){
int hashVal = hash(word);

for(int i=0; i< arraySize; i++){
if(word.equals(array[hashVal].getEnglish())){
return array[hashVal].getTagalog();
}
hashVal %= arraySize;
}
return null;
}

public String findEnglish(String word){
int hashVal = hash(word);

for(int i=0; i< arraySize; i++){
if(word.equals(array[hashVal].getTagalog())){
return array[hashVal].getEnglish();
}
hashVal %= arraySize;
}
return null;
}



Edited by surfbumb - 8/10/11 at 9:54pm
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
post #3 of 15
Thread Starter 
this definitely does help i will implement a double hash in hopes to avoid the collisions but will that completely eliminate all collisions no right? also i plan on having my array 2x the size of the data set with double hashing. As for searching i would just need to have say

find(String word){
hashVal = hash(word);
return array[hashVal];
}
correct? and for a double hash i would add in
find(String word){
hashVal = hash(word);
hashVal += doubleHash(word); //assuming the double hash method return an int
or would i use the % here im am slightly confused her
return array[hashVal];
}

this is my first real endeavor with hash tables and i appreciate your help
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
post #4 of 15
collisions are a part of creating a hash table...there's minimal perfect and perfect hash tables which have no collisions and have the same amount of storage space as input storage...but are extremely hard to implement.

Here is an example of what happens when you encounter a collision:

Code:
array[0] = "bread"
array[1] = "corn"
array[2] = 
array[3] = "steak"
array[4] = 
array[5] = 
array[6] =

Lets say you have this already in your hash table. You want to hash a word, lets say "turkey", and its key is 0. You do a check if the storage at array[0] is used...it is...this is a collision. Do a check at the next slot down...array[1]...it is also used...another collision. The next slot is free...array[2]...so you put turkey in that position in the array. This is called linear probing. Also, the load factor plays a role in the amount of collisions...which is just the proportion of entries to storage space. Double hashing is what it sounds like...when you encounter a collision...you basically re-hash the word...using a certain method...and come up with a new key.



When you search a word in this hash table...you do the same thing with the word turkey...keep going down the array until you find it...or until you do an entire loop...which would mean it is not in the hash table. When your load factor approaches 1...linear probing becomes highly inefficient...it basically becomes linear searching with a for loop. So like I said just make the table about 2x the size of the amount of words being inputted.

Hope this helps a bit more...I'm assuming you are very close to finishing.

Also what is your method to hashing the words?

I sum all of the ascII values of each letter and use the % symbol in conjunction with the table size to find the key for the word.

So something like this:

Code:
sum(word)%TABLESIZE;
Also when I made my hash tables back in school...my professor did not allow us to use the java library for creating hash tables...so I am not familiar with the library's methods which might just have a collision resolution...such as double hashing which you are apparently using. For chaining...a more efficient approach than probing once the load factor approaches 1...we had to use linked-list inside our arrays...so basically a data structure inside another data structure...it was cool.
Edited by surfbumb - 8/11/11 at 12:14pm
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
post #5 of 15
edit...double post.
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
post #6 of 15
Thread Starter 
for the initial hash i use
Quote:
private int hash(String key){
final int hashConstant = 13;
int i;
int hash = (int)(key.charAt(0));

for(i =1; i<key.length(); i++){
hash *= hashConstant;
hash+= (int)(key.charAt(i));
}
return Math.abs(hash)%arraySize;
}
and for the double hash i am not sure what to do yet but im hoping that that will sovle collisions if not i plan on doing quadratic probing just incase i do happen to have a large load factor but im hoping that will not happen if i make the array size 2x the size of input data
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
post #7 of 15
Quote:
Originally Posted by Sin Cities Sin View Post
for the initial hash i use


and for the double hash i am not sure what to do yet but im hoping that that will sovle collisions if not i plan on doing quadratic probing just incase i do happen to have a large load factor but im hoping that will not happen if i make the array size 2x the size of input data
ok good...you have to play with the array size and do tests to see if there are collisions once you finish your program.

For the double hash...aka re-hashing...there are many ways to go about this.

Ex.

Code:
h(key) = key % TABLESIZE;
check for a collision...if there is...use a re-hash function

Code:
rh(key) = key * key % TABLESIZE;  //multiplies keys first followed by the modulus operator
Put the re-hash function in a while loop...boom done...pretty simple. For searching...use the exact same formulas. Should take 1 to 2 look ups max...as opposed to linear searching which can take up to n max searches. This is really good for searching large databases.
Edited by surfbumb - 8/11/11 at 3:33pm
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
post #8 of 15
Thread Starter 
that makes sense, when testing for a collision
if(array[hashVal] == null){
array[hashVal] = word;
else{
probe(array, word);
} // being a method for linear or quadratic probing.
or in the else would you have hashVal *= (hashVal%tableSize); then array[hashVal] = word
that would achieve the double hash if i am thinking correctly

my only confusion now is the find method in the book i have they use a while loop the hash and second hash then check the array at those 2 points until it finds it. I understand that hashing the word turkey should always be at the same spot in the array unless you probe to find an open spot, so wouldnt you then need the loop to check all the positions or would it be similar to teh insertion in which you check at the hash point if it is not there check at the rehash point?
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
post #9 of 15
Quote:
Originally Posted by Sin Cities Sin View Post
that makes sense, when testing for a collision
if(array[hashVal] == null){
array[hashVal] = word;
else{
probe(array, word);
} // being a method for linear or quadratic probing.
or in the else would you have hashVal *= (hashVal%tableSize); then array[hashVal] = word
that would achieve the double hash if i am thinking correctly

my only confusion now is the find method in the book i have they use a while loop the hash and second hash then check the array at those 2 points until it finds it. I understand that hashing the word turkey should always be at the same spot in the array unless you probe to find an open spot, so wouldnt you then need the loop to check all the positions or would it be similar to teh insertion in which you check at the hash point if it is not there check at the rehash point?
Your initial hashing and checking are fine...and your probe method I assume would do the linear or quadratic probing. Which method are you going to use though for collision resolution? linear/quadratic probing or re-hashing?

Anyway, for your re-hashing get rid of the parenthesis...because of order of operations.

Code:
hashVal *= (hashVal%tableSize);
to this

Code:
hashVal = hashVal * hashVal % tableSize;
put this in a while loop to check for a collision

Code:
while(array[hashVal] != null){
      
        hashVal = hashVal * hashVal % tableSize;

}
array[hashVal] = word;
now your re-hash method will look something like this

Code:
if(array[hashVal] == null){
       array[hashVal] = word;
}
else{
        while(array[hashVal] != null){
      
               hashVal = hashVal * hashVal % tableSize;

}
array[hashVal] = word;
}
I believe string and object array elements are initialized to null when created if I'm not mistaken...and the primitive types...int, float, double....are initialized to 0. Can you double check me on that....also char too.

Yes, for the find method for re-hashing...you first check if your word matches the word at the position of the first hash...if not...use the re-hash method...check again...repeat.

You are getting closer to finishing it seems...I wont give up until you do...haha cheers!
Edited by surfbumb - 8/12/11 at 11:57am
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
Black Silence
(15 items)
 
  
CPUMotherboardRAMHard Drive
i5 3570k @ 4.5 GHz Asus P8Z77-M Pro Kingston HyperX Genesis 8 GB - 1600 MHz Seagate Barracuda 250 GB 
Optical DriveCoolingOSMonitor
Samsung WriteMaster Noctua NH-D14 Windows 10 ASUS VS24AH-P 
KeyboardPowerCaseMouse
Logitech Navigator Enermax Infiniti 650W Fractal R3 Black Pearl Razer Death Adder 
Mouse PadAudio
SteelSeries QcK Mass Altec Lansing FX4021 
  hide details  
Reply
post #10 of 15
Thread Starter 
so far to make it work i just made the size of the array 47521 and it worked but that loop for the rehash in the add makes perfect sense! lol who would have ever though that an array based data structure could be complicated seriously its and array
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
My Rig
(13 items)
 
  
CPUMotherboardGraphicsRAM
i7 860 Asus P7P55D PRO GTX 480 sli wc Corsair Vengeance  
Hard DriveOSPowerCase
WD 1TB BLACK x2 Windows 7 64 bit Silverstone Strider 1000+ CM 690 2 Advanced 
  hide details  
Reply
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Coding and Programming
Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › Searching a Hah Table