How to Remove a Node from a BST? by Danny Reina
To wrap up constructing a basic binary search tree (BST) let's first do a quick no-nonsense review. Take note that you can find my how-tos on building the
insert methods for a BST here and here.
What is a BST?
A BST is a tree data structure that carries the following foundational principles.
- nodes can have up to only two children nodes
- the value of any node is greater than all values occurring in its left subtree and less than all values occurring in its right subtree
- the left and right children nodes of the node you are positioned on, have to satisfy the BST property
Let's start with some boilerplate code.
Here we have a simple BST class that initiates a value and a left and right property both set to null. We also have our starting code for the
remove method which for now takes in the value we want to remove from the BST.
The very first step in the
remove method is finding the node. We have already built this code in our other BST methods so this shouldn’t be unfamiliar but here is a quick recap, just in case.
- compare target value to value at current node, return value if equal
- if target value is less than current node’s value then move to the left child of current node and repeat comparison logic
- if target value is greater than current node’s value then move to the right child of current node and repeat comparison logic
- continue until target value equals current node’s value or return false if value not found
Keep in mind that when removing a node we will want to keep track of the parent node of the current node since we will need to reassign the child node of the current node, which is being removed, to the current node’s parent node. In other words, say we wanted to remove the number 3 instead of 1 in our example BST. In order to do so, we would have to reassign 3 with 1, which effectively removes 3 from the BST and makes 1 the new left child node of the parent node 5. In order to keep track of the parent node, we can add the following code.
There will be three major edge cases for removing a node in a BST, there is no getting around this it is what it is. Some of the edge cases have sub-edge cases. Here is how to resolve each one step by step.
Edge Case #1
We will start off by addressing the case when the node we want to remove has two child nodes. So we would have to check if our
currentNode has two child nodes and we do so like this.
Next we would have to figure out a way to remove the target node in a way that continues BST validation.
So, in order to preserve the integrity of our BST, how do we replace a node with two child nodes or in other words which existing node should we use to replace the deleted node? The answer is to grab the smallest value in the right sub tree of the node that will be removed. What is the smallest value in the right sub tree of the target node, which is 70, that will be removed? To find it, all we need to do is look at the right sub tree of 70 and find the left most node or leaf node. Basically, the last left node. In our case it is 71.
So we would replace 70 with 71 AND remove 71 from its original place.
As we can see, BST princples are preserved. Take it upon yourself to see if you can figure out what would be the new value of the root node if it were removed. Feel free to comment the answer in the comment section below. Let’s now see how we can execute this logic in code.
Note that we have to build out our helper method
getMinVal. I will show you how after resolving all three edge cases.
Edge Case #2
Hang in there, there is a lot to take in but we will get there I promise! The next edge case we will tackle is removing the root node but when it has either only a right child node or only a left child node.
This edge case is, in my opinion, the trickiest/hardest part to solve, in part because it has two sub-edge case. I will show you how to break it down and identify a small pattern that will help you master this edge case. We will begin by identifying the root node/target node to be removed. Since the root node never has a parent node then we can target it whenever
parentNode is null. We can do so with this:
Next, we will want to tackle those two sub-edge cases. To do so, we will check if the root node has a left and/or right child node.
Let’s tackle the right child node first. This right child node accounts for one sub-edge case of edge case #1. Here we will want to manually change the values of the root node with the root node’s right child node value.
We need to account for if the child node of the root node has itself any child nodes, as in our example above. The way we account for it is by manually changing those nodes as well. But how do we do this in our algorithm?
Take note that here we are not manually changing node values but rather the nodes themselves. We are only changing node values when we are attempting to replace an old node value with a new one. Let’s break it down line by line.
currentNode.left = currentNode.right.left what does this look like?
Here is the result.
currentNode.left = currentNode.right.right what does this look like?
Here is the result which takes care of the first sub-edge case of edge case #1.
Then we do the same in the case of if the root node has only a left child node which would resolve sub-edge case #2 in edge case #1.
What is the pattern to observe? Whichever node you are on, such as
currentNode.right, you must first manually change (after changing the value for
currentNode) the opposite child node of the child node you checked for in your condition. For example, if you checked for the left child node then first manually change the right child node then proceed to manually changing the left child. node. If you checked for the right child node then first manually change the left child node and then continue to the right child node. There is a practical reason for this, take for instance our right child node, you don’t want to change the right child node first because you need the original value to properly change the opposite child node. Don’t forget, later on, we will build out our helper method
Edge Case #3
Our final edge case focuses on removing a node that does not have two children nodes but has either one or no child nodes at all. The child node can be either a left or right child node. This edge case also has two sub-edge cases.
First, as always, we locate the target node we want to remove. Thankfully, this is done by our earlier code. Next, we must respositoin our node of 77 so that it isn’t left out in the abyss.
Thankfully, we can save node 77 because we kept track of our parent node, hopefully that clears any lingering doubts you may have had as to why we initialized a parent node in the function’s argument (it did for me). We can use our parent node and targeted node to be removed like this:
Next, we have to take into account if the node to be removed aka
currentNode has either a left or right child node. We can meet this condition with this code.
If the node to be removed has no child nodes it simply gets removed. Here is the final result for the first sub-edge case of edge case #3.
This resolves the first sub-edge case, now onto the second sub-edge case which checks for the left branch of the parent node.
We are almost there! What’s left? Well if our code manages to even find the target node in the BST then it will start working through the three edge cases to remove the target node. But we need to add a
break statement at the end to tell the program to exit the while loop. Also, we will need to ultimately tell the program to stop running with
We are almost done! Remember we have to build out our helper method
getMinVal. The helper function
getMinVal should return the smallest value from the right sub tree. If we think back to our BST, the smallest value will always be a leaf node or in other words the last node on the left side of the branch/tree. We can tell
getMinVal to simply traverse all left child nodes from the origin node until there are no more left child nodes and then we return its value.
That is the complete code on how to remove a node from a BST.
Time Space Complexity
For the time space complexity of the three BST methods that we walked through (
remove) we will be dealing with an average and worst case scenarios. In the average case, we will be dealing with a time complexity of
O(log(n)) logarithmic time. Why? Because traversing starts at the root node of every BST after which we start our comparing logic and move down either the right or left branch of that root node. Whichever branch we move down on, we are essentially cutting the tree in half or rather completely disregarding the other subtree that is on the opposite branch that we did not travel to. So if our comparison logic leads us to the right branch of the root node then we can completely disregard the left sub tree of the root node because one we know that we are searching for a value that is higher than all the values in that left sub tree but moreso in general that is how searching in binary trees works. This method of “cutting in half” while searching through a BST equates to logarithmic time.
In the worst case for time complexity, we will have
O(n). Why? Well lets take this BST as an example.
Even though we only have one branch, this still qualifies as a BST. However, say we are searching for 40. We start at the root node but there is no left branch to disregard and subsequently there are no left sub trees on any of the remaining nodes. Therefore, the algorithm will have to traverse the entire branch until it either finds the node before reaching the end of the branch or at the very end of the branch. This is why the worst case is
O(n) linear time. Because the three methods
remove use the same searching logic, all three have the same average and worst case time complexities.
Space complexity will be run in constant time
O(1). Why? Because we are never initializing a value that increases over time. This big O notation holds true for all three methods so this is how our final time space complexity for a basic BST will look like.
Iteration vs Recursion
Why did I teach you how to build a basic BST with iteration? Because recursion can be difficult to understand from just reading. I believe that video materials are better suited towards guiding a programmer to understanding how and what recursion is doing. Here is a great video resource from freecodecamp constructing a basic BST with recurssion.
O(n) because our space on the stack increases to however many functions are called through recursion. In the absolute worst case scenario, when using recursion you might encounter a system crash known as stackoverflow.
Iterative functions do not need to be stored in memory and jump from one function to another to pass in values. That is where the bad performance of recursion comes into play. There are many cons and pros in the recurssion vs iteration debate but this article is not concerned with addressing that. I simply want to explain that building a basic BST through iteration will give you the best space complexity.
I want to thank you for making it all the way to the end of this article and I hope it was helpful. I intend on writing more posts like this going over data structures and algorithms, and your feedback is encouraged and appreciated. Happy coding and stay safe!
P.S. Please go easy on my whiteboard art, as I was working on it at 2am at JFK airport while waiting on a rental car :P