The Rope
A data structure for long strings
Introduction
Programming languages usually provide a method to hold strings. In most cases, strings are stored in contiguous memory as fixed-length character array primitives. While this may work for most cases, in the occasion of longer strings and more complicated string editing, a different data structure to hold these characters may well be more efficient.
About
This is where the rope data structure comes into play. Used by most text editors to handle large strings efficiently, a rope is a type of binary tree in which every leaf node holds a String or Substring and its length (this is considered to be the weight of the node). Each parent node holds the total weight of all the nodes in its immediate left subtree (which represents the weight of that parent node). Here’s a graphical example below:
Why use a Rope?
Ropes have several advantages over using normal strings. In comparison, using a rope is much faster at concatenation, insertion, and deletion. As well as this, ropes do not need O(n) extra memory when performing operations and does not require long pieces of contiguous memory.
Operations
Creation
In order to create a rope object from a character array, you can use a recursive function that passes in a node, its parents, a character array, and a left and right integer.
Here’s an example of a recursive rope creation function:
// Function that creates a Rope structure.
// node --> Reference to pointer of current root node
// l --> Left index of current substring (initially 0)
// r --> Right index of current substring (initially n-1)
// par --> Parent of current node (Initially NULL)
//LEAF_LEN is the maximum length of each leafvoid createRopeStructure(Rope *&node, Rope *par,
char a[], int l, int r)
{
Rope *tmp = new Rope();
tmp->left = tmp->right = NULL;
// We put half nodes in left subtree
tmp->parent = par;
// If string length is more
if ((r-l) > LEAF_LEN)
{
tmp->str = NULL;
tmp->lCount = (r-l)/2;
node = tmp;
int m = (l + r)/2;
createRopeStructure(node->left, node, a, l, m);
createRopeStructure(node->right, node, a, m+1, r);
}
else
{
node = tmp;
tmp->lCount = (r-l);
int j = 0;
tmp->str = new char[LEAF_LEN];
for (int i=l; i<=r; i++)
tmp->str[j++] = a[i];
}
}
//sourced from geeksforgeeks
//link: https://www.geeksforgeeks.org/ropes-data-structure-fast-string-concatenation/
In the example, notice that each recursive call passes half of the character array to make a smaller rope, which finally gets consolidated into one huge rope.
Concatenate
One of the main benefits of ropes is that they do not take up extra space when concatenating two character arrays together. While a normal string requires the copying and recreation of character arrays, rope concatenation simply requires creating a new root node to connect both ropes together.
A diagram is shown below:
Note that the tree may still need rebalancing after concatenation. Here’s some example code for the concatenation function:
// Function that efficiently concatenates two strings
// with roots root1 and root2 respectively
// n1 is size of string represented by root1.
// root3 is going to store root of concatenated Rope.void concatenate(Rope *&root3, Rope *root1, Rope *root2, int n1)
{
// Create a new Rope node
// Make root1 and root2 as children of tmp.
Rope *tmp = new Rope();
tmp->parent = NULL;
tmp->left = root1;
tmp->right = root2;
root1->parent = root2->parent = tmp;
tmp->lCount = n1; // Make string of tmp empty and update reference r
tmp->str = NULL;
root3 = tmp;
}
//source: GeeksForGeeks
//link: https://www.geeksforgeeks.org/ropes-data-structure-fast-string-concatenation/
As you can see, because this simply requires adding a new node, the time complexity of concatenating two strings is O(1). However, in order to set the weight of the new root node, the time complexity becomes O(logN).
Index
In order to return the character at a specific index, we must recursively search through the tree. Since we know the length of a specific node’s left subtree, we can find what leaf node (and then what character from said leaf) to return.
Here is an example pseudocode of the recursive index function:
function index(RopeNode node, integer i)
if node.weight < i and exists(node.right) then
return index(node.right, i - node.weight)
end
if exists(node.left) then
return index(node.left, i)
end
return node.[i]string
end//source: OpenGenius IQ
//link: https://iq.opengenus.org/rope-data-structure/
The average case time complexity of this function is O(logN).
Split
When splitting a string, there are two possible cases:
- Splitting the leaf node in the middle
- Splitting the last character of the leaf node
In the first case, we can reduce and make a simpler version of the second case. Take, for example this string: “Split this string” — and split it at the 11th index (into “Split this ” and “string”). It would graphically look like this:
The graph on the top left shows the graph of the full, unbroken string. On the right side, you can notice the split between the leaf node between the 10th and 11th index to make two separate leaves.
Now, we separate the two leaves (shown in the top left), then rebalance the original rope tree while creating a brand new rope with the new string (shown in top right).
The time complexity for this function is O(log N).
Insertion
In order to insert a character s in index i we split the string at the index and then concatenate the new string and the half of the other string.
Here is some code that represents this logic:
void insert(int i, string s)
{
newRope = split(i);
concatenate(s, newRope);
concatenate(originalRope, s);
}
The time complexity of this operation is O(log N) because it is the sum of the three operations.
Deletion
To delete a part of a string (from index i to index j), similar logic is applied. We simply split the rope at the two indexes and concatenate the separated ropes.
Here is some code that represents deletion logic:
void delete(int i, int j)
{
splitRope = split(j);
removedRope = split(i);
concatenate(originalRope, splitRope);
}
The time complexity — much like insertion — is the sum of the operations which is O(log N).
Testing the Rope
In order to test the size at which using a rope to concatenate becomes slower than concatenating two strings, I created 5 differently sized ropes and strings(size 10, 100, 1000, 10000, 100000) and appended two ropes/strings of the same size together. Using a high resolution clock, I timed them and compared their times.
Below is a chart with all the raw data with the times displayed in seconds.
We can easily convert this into a line graph to observe the concatenation times of strings and ropes side by side.
As you can see by the chart, concatenating two strings of size 10, 100, and 1000 are relatively similar. However, when the concatenating strings of greater size, the rope has a much lower time than the string primitive. In the case of two 100,000 sized strings, the rope is about 15 times faster than the string primitive!
To see the source code and testing functions for this data, please click here or use the link below:
https://github.com/leongkkevin/RopeTesting
Disadvantages of Using Ropes
While ropes have a variety of advantages over using strings, there are also an array of disadvantages.
Overall, ropes take more storage in comparison to a simple string because of the parent nodes that are needed to be stored.
When it comes to smaller strings, ropes are significantly slower because of the storage management.
As well as this, the increased complexity of ropes also increases room for bugs.
Conclusion
In conclusion, ropes are a good data structure if you are handling larger strings. The main disadvantages for ropes only appear when working with small strings. The extra memory needed for parent nodes and the extra time complexity would not be as efficient as simply using the string primitive. However, when it comes to very large strings (like ones being handled by text editors and other software), ropes are both faster and more memory efficient than normal string primitive operations.
References and Further Reading
For more information, please visit any of the references I used throughout this article (shown below):
- “Rope: the Data Structure used by text editors to handle large strings” by OpenGenus
- “Ropes are Better Than Strings” by Hans-J. Boehm, Russ Atkinson and Michael Plass
- “Rope Data Structure (Fast String Concatenate)” by GeeksForGeeks