In our previous post, we mentioned that we’re working on a project whose main entity is organized as a tree. Because of that, we analyzed the three major implementations for tree structures: acts_as_tree, ancestry, and closure_tree, to decide which one was best to use. They’re great solutions but each one of them has different trade-offs: they are either fast for reading or fast for writing.
In our particular case, we had 2 constraints:
- We need to insert the nodes in any order, meaning we might not know the parent of a node when we insert it
- Each tree could have a large number of nodes, so we need a solution that’s fast for reading
The first constraint practically discards ancestry as it needs the parent node at the moment of insertion to build the materialized path. It also discards closure_tree for the same reason, except that it builds the transitive closure tree instead of the materialized path.
acts_as_tree is perfect for writing as it doesn’t need the parent node at insertion time. But when it comes to traversing the tree, its performance is bad.
Since we need unordered inserts and fast reads, we needed another solution, that doesn’t exist yet. After researching and experimenting, we came up with a solution using Common Table Expressions (CTEs), which is both fast for reading and writing: with_recursive_tree.
What’s a Common Table Expression?
A common table expression (CTE) is a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement.
For example:
WITH posts_with_tags_or_comments AS (
SELECT * FROM posts WHERE tags_count > 0
UNION ALL
SELECT * FROM posts WHERE comments_count > 0
)
SELECT * FROM posts_with_tags_or_comments
The WITH
clause defines a UNION
between 2 SELECT
statements: the first one selects all posts with 1 or more tags, and the second one selects all posts with 1 or more comments. The result set is then selected. The result of this query is all posts that have either tags or comments.
A recursive common table expression is one having a subquery that refers to its name. If you remember my last post, I mentioned that acts_as_tree uses recursion to traverse the tree structure. One example is the #descendants
method, which returns the sum of each child’s #descendants
, making a recursive N+1 query.
By using a recursive CTE, we can traverse the tree structure in a single query. For example, given a nodes
table, and starting from a node with ID = 1 we could do:
WITH RECURSIVE tree AS (
SELECT * FROM nodes WHERE id = 1
UNION ALL
SELECT * FROM nodes JOIN tree ON nodes.parent_id = tree.id
)
SELECT * FROM tree
Let’s break down the query. The WITH RECURSIVE
clause defines a union between 2 members: the anchor member and the recursive member. The anchor member (SELECT * FROM nodes WHERE id = 1
) selects the root node from which to start the recursion. The recursive member (SELECT * FROM nodes JOIN tree ON nodes.parent_id = tree.id
) joins the nodes table with the temporary table tree (which is essentially the nodes table) on the condition that the parent_id
in nodes matches the id
in tree
. This means that the recursive member selects all the children of the current node, recursively.
After the recursive process completes, the final SELECT
statement retrieves all the rows from the temporary table.
Rails and CTE
Rails 7.1 added the #with
query method, which can be used to create a CTE. But it wasn’t until Rails 7.2 that Rails added the #with_recursive
query method, which can be used to create a recursive CTE.
Database compatibility
Common table expressions are available in MySQL 8.0 and above, PostgreSQL 13.0 and above, and SQLite 3.34.0 and above.
with_recursive_tree
with_recursive_tree is heavily inspired by acts_as_tree, and has the same API. But instead of recursively fetching the children and ancestors, it uses a recursive CTE to do so in a single query. The main advantage of this approach is the performance improvement when dealing with deep threes, and the removing N+1 queries.
Benchmark
Benchmarks are normally very basic cases that don’t consider real-world data, so you shouldn’t take them as if they were the word of god. However, they’re useful to set a baseline and compare different approaches.
You can run the benchmarks by cloning the repository and running:
$ ruby benchmarks/benchmark.rb <LEVELS> <SIBLINGS>
You can specify the number of levels and siblings with which to generate a tree. For example, ruby benchmarks/benchmark.rb 5 3
will generate a tree with 5 levels and 3 siblings per node.
For a tree with 10 levels and 10 siblings per node (1.111.111.111 nodes), the numbers are as follows:
Method | acts_as_tree | ancestry | closure_tree | with_recursive_tree |
---|---|---|---|---|
::roots | 0.000111 | 0.000063 | 0.000171 | 0.000075 |
#ancestors | 0.001493 | 0.000675 | 0.000998 | 0.000417 |
#descendants | 3.253830 | 0.000544 | 0.008368 | 0.000482 |
#level | 0.002322 | 0.000481 | 0.004359 | 0.000613 |
#root | 0.001475 | 0.000612 | 0.005891 | 0.000538 |
#self_and_ancestors | 0.001548 | 0.000459 | 0.000967 | 0.000452 |
#self_and_descendants | 5.248950 | 0.000565 | 0.000977 | 0.000559 |
#self_and_siblings | 0.003389 | 0.000749 | 0.001057 | 0.000660 |
walk tree, dfs | 5.055768 | 0.398219 | 73.838887 | 0.160001 |
walk tree, bfs | 4.420154 | - | - | 0.157244 |
reparenting | 0.003547 | 0.112872 | 15.401308 | 0.001399 |
Compatibility
with_recursive_tree is compatible with Ruby on Rails 6.0 and above, Ruby 3.0 and above, and the databases mentioned above.
Conclusion
with_recursive_tree began as an experiment after looking at tree structure solutions for Rails. After seeing its performance, we decided to open-source it so everyone else could benefit from it.
We hope you find this gem useful and welcome any feedback or contributions!