Tech-letter #16 | March 30, 2023
Suffix Tree
-
A
suffix tree
is a data structureused for efficiently storing and searching a string or set of strings
. It is particularly useful for searching a large text corpus for occurrences of a given pattern or substring. -
One important property of a suffix tree is that it allows for very fast substring searches. This is because searching for a substring in a suffix tree involves traversing the tree from the root node to a leaf node, following edges labeled with the characters of the substring being searched for. If the traversal reaches a leaf node, the substring is present in the original string; otherwise, it is not.
-
Suffix trees can be constructed in linear time relative to the size of the input string, making them very efficient for large strings. They have many applications in computational biology, natural language processing, and other areas of computer science.
-
Now let us implement the suffix tree
class suffixTreeNode {
constructor() {
this.childre = {};
this.indices = [];
}
}
The SuffixTreeNode class represents a node in the suffix tree. Each node has two properties:
children
: This is an object that maps characters to child nodes. For example, if a node has a child node for the character ‘a’, then children[‘a’] will be the child node object.
indices
: This is an array of indices where the substring represented by this node appears in the input string. For example, if the input string is “banana” and this node represents the substring “na”, then indices would be [2, 4] because “na” appears at indices 2 and 4 in the input string.
The children and indices properties are both initialized in the constructor of the SuffixTreeNode class.
class SuffixTree {
constructor(input) {
this.root = new suffixTreeNode();
this.build(input)
}
build(input) {
for (let i = 0; i < input.length; i++) {
this.insertSuffix(input.substring(i), i)
}
}
// all other methods goes in here
//..
}
The build method simply calls another method that is insertSuffix with a character. So, the build method is simple as it uses a for loop and sends characters to another method. Here, what does the substring()
method of javascript do?
substring()
:
The substring method is a string method in JavaScript that returns a new string containing a portion of the original string. The portion of the string to be returned is specified using two arguments: the starting index
and the ending index
(exclusive).
The startIndex argument is the index at which the substring should begin. If startIndex is negative, it is treated as 0. If startIndex is greater than or equal to the length of the string, an empty string is returned.
The endIndex argument is the index at which the substring should end (but not include). If endIndex is not specified, the substring will include all the characters from the startIndex to the end of the string. If endIndex is negative, it is treated as 0. If endIndex is greater than or equal to the length of the string, the entire rest of the string is included in the substring.
let str = "Hello, world!";
let substr1 = str.substring(0, 5);
let substr2 = str.substring(7);
console.log(substr1); // "Hello"
console.log(substr2); // "world!"
Does the subtring() method include spaces?
The substring method in JavaScript includes spaces and other characters in the substring
, just like any other character in the string. It does not ignore or remove any characters based on their type or value.
let str = "Hello, world!";
let substr = str.substring(0, 7);
console.log(substr); // "Hello, "
Hope this clears you with what a substring() method does!
for (let i = 0; i < input.length; i++) {
this.insertSuffix(input.substring(i), i)
}
In the above code, if the input string is “banana”, the build method would call:
insertSuffix("banana", 0) -> i = 0
insertSuffix("anana", 1) -> i = 1
insertSuffix("nana", 2) -> i = 2
insertSuffix("ana", 3) -> i = 3
insertSuffix("na", 4) -> i = 4
insertSuffix("a", 5) -> i = 5
This would result in a suffix tree that represents all the suffixes of “banana”: “banana”, “anana”, “nana”, “ana”, “na”, and “a
Understanding insertSuffix
method
The insertSuffix method will be as shown below
insertSuffix(suffix, index) {
let node = this.root
for (let i = 0; i < suffix.length; i++) {
let char = suffix[i];
if (!node.children[char]) {
node.children[char] = new suffixTreeNode();
}
node = node.children[char];
node.indices.push(index);
}
}
-
The method starts by initializing the
current node to the root node
of the suffix tree. -
It then iterates over the characters of the suffix using a for loop. For each character, the method checks if the current node has a child node that represents the character. If it does not, the method creates a new child node for the character and adds it to the children object of the current node. The method then updates the current node to be the newly created child node.
Finally, when the loop is finished, the method adds the starting index of the suffix to the indices array of the current node, which represents the leaf node that represents the suffix.
lets dissect little more
-
For the first character “b”, the method checks if the root node has a child node for “b”. Since it does not, the method creates a new child node for “b” and updates the current node to be the new child node.
-
For the second character “a”, the method checks if the current node (the child node for “b”) has a child node for “a”. Since it does not, the method creates a new child node for “a” and updates the current node to be the new child node.
Similarly, it proceeds for other characters too.
Finally, at the end the method adds the starting index 0 to the indices array of the current node, which represents the leaf node
that represents the suffix “banana”.
So, Now you have finally created a suffix tree. But, we need more. That is, it’s not just enough to create a tree, you need to perform search operation on it.
understanding search
method
search(pattern) {
let node = this.root;
for (let i = 0; i < pattern.length; i++) {
let char = pattern[i];
if (!node.children[char]) {
return null;
}
node = node.children[char];
}
return node.indices;
}
The search method is fairly simple and direct.
The search method takes a pattern string as input and returns an array of indices where the pattern occurs in the input string. The method searches for the pattern in the suffix tree by traversing the tree from the root node to a leaf node that represents a suffix that starts with the pattern.
This is the search method of the SuffixTree class in a simple implementation of a suffix tree in JavaScript.
The search method takes a pattern string as input and returns an array of indices where the pattern occurs in the input string. The method searches for the pattern in the suffix tree by traversing the tree from the root node to a leaf node that represents a suffix that starts with the pattern.
The method starts by initializing the current node to the root node of the suffix tree. It then iterates over the characters of the pattern using a for loop. For each character, the method checks if the current node has a child node that represents the character. If it does not, the method returns null, indicating that the pattern does not occur in the suffix tree.
If the current node has a child node that represents the character, the method updates the current node to be the child node and continues iterating over the characters of the pattern. When the loop is finished, the method returns the indices array of the current node, which represents the leaf node that represents a suffix that starts with the pattern. If the pattern does not occur in the suffix tree, the method returns null.