The idea behind is to represent sub-string or search pattern(say it "eet") as a single number(hash code) and traversing through the given String(Say it "This is Sujeet Kumar"), generating single number(hash code) for consecutive characters of the same length as of given pattern string and comparing both number for equality.
In the above scenario, first hash code "Thi" will be compared with hash code of "eet". It will unequal. then "his" hash code will be compared with hash code of "eet" and so on. When both hash code are not matching then one character is being release from left side and one character is being from the right side to make a sub string of same length as of given pattern.The efficiency of this algorithm depends on hash coding algorithm that how efficient unique hash code it generates for different sets of character. Because there is extra overhead of equality of characters in case hash collision. Also rolling hash code is generated ie hash code of new sub-string derived from previous hash code and new added character. Lets have look at a sample example as below.
public class RabinKarp_findSubString {
public static void main(String[] args) {
String t = "This is sujeet Kumar";
String s = "eet";
int index = getSubStringIndex(t, s);
System.out.println("index:" + index);
}
private static int getSubStringIndex(String t, String s) {
int power = 1;
int tHsc = 0;
int sHsc = 0;
final int BASE = 26;
for (int i = 0; i < s.length(); i++) {
power = i > 0 ? power * BASE : 1;
tHsc = tHsc * BASE + t.charAt(i); // 1
sHsc = sHsc * BASE + s.charAt(i);
}
for (int j = s.length(), k = 0; j < t.length(); j++, k++) {
if (tHsc == sHsc && t.substring(k, j).equals(s)) {
return k;
}
tHsc = tHsc - t.charAt(k) * power; // rolling hash calculation
tHsc = tHsc * BASE + t.charAt(j);
}
if (tHsc == sHsc && t.substring(t.length() - s.length()).equals(s)) {
return (t.length() - s.length());
}
return -1;
}
}
In the above scenario, first hash code "Thi" will be compared with hash code of "eet". It will unequal. then "his" hash code will be compared with hash code of "eet" and so on. When both hash code are not matching then one character is being release from left side and one character is being from the right side to make a sub string of same length as of given pattern.The efficiency of this algorithm depends on hash coding algorithm that how efficient unique hash code it generates for different sets of character. Because there is extra overhead of equality of characters in case hash collision. Also rolling hash code is generated ie hash code of new sub-string derived from previous hash code and new added character. Lets have look at a sample example as below.
public class RabinKarp_findSubString {
public static void main(String[] args) {
String t = "This is sujeet Kumar";
String s = "eet";
int index = getSubStringIndex(t, s);
System.out.println("index:" + index);
}
private static int getSubStringIndex(String t, String s) {
int power = 1;
int tHsc = 0;
int sHsc = 0;
final int BASE = 26;
for (int i = 0; i < s.length(); i++) {
power = i > 0 ? power * BASE : 1;
tHsc = tHsc * BASE + t.charAt(i); // 1
sHsc = sHsc * BASE + s.charAt(i);
}
for (int j = s.length(), k = 0; j < t.length(); j++, k++) {
if (tHsc == sHsc && t.substring(k, j).equals(s)) {
return k;
}
tHsc = tHsc - t.charAt(k) * power; // rolling hash calculation
tHsc = tHsc * BASE + t.charAt(j);
}
if (tHsc == sHsc && t.substring(t.length() - s.length()).equals(s)) {
return (t.length() - s.length());
}
return -1;
}
}
No comments:
Post a Comment