with_recursive_tree

In the last post we explored tree structure implementations for Ruby on Rails. In this post, we present a new implementation using Common Table Expressions.

Jan 14th, 2025
By Patricio Mac Adden

In our previous post, we mentioned that we’re working on a project whose main entity is organized as a tree. Because of that, we analyzed the three major implementations for tree structures: acts_as_tree, ancestry, and closure_tree, to decide which one was best to use. They’re great solutions but each one of them has different trade-offs: they are either fast for reading or fast for writing.

In our particular case, we had 2 constraints:

  1. We need to insert the nodes in any order, meaning we might not know the parent of a node when we insert it
  2. Each tree could have a large number of nodes, so we need a solution that’s fast for reading

The first constraint practically discards ancestry as it needs the parent node at the moment of insertion to build the materialized path. It also discards closure_tree for the same reason, except that it builds the transitive closure tree instead of the materialized path.

acts_as_tree is perfect for writing as it doesn’t need the parent node at insertion time. But when it comes to traversing the tree, its performance is bad.

Since we need unordered inserts and fast reads, we needed another solution, that doesn’t exist yet. After researching and experimenting, we came up with a solution using Common Table Expressions (CTEs), which is both fast for reading and writing: with_recursive_tree.

What’s a Common Table Expression?

A common table expression (CTE) is a named temporary result set that exists within the scope of a single statement and that can be referred to later within that statement.

For example:

WITH posts_with_tags_or_comments AS (
 SELECT * FROM posts WHERE tags_count > 0
 UNION ALL
 SELECT * FROM posts WHERE comments_count > 0
)
SELECT * FROM posts_with_tags_or_comments

The WITH clause defines a UNION between 2 SELECT statements: the first one selects all posts with 1 or more tags, and the second one selects all posts with 1 or more comments. The result set is then selected. The result of this query is all posts that have either tags or comments.

A recursive common table expression is one having a subquery that refers to its name. If you remember my last post, I mentioned that acts_as_tree uses recursion to traverse the tree structure. One example is the #descendants method, which returns the sum of each child’s #descendants, making a recursive N+1 query.

By using a recursive CTE, we can traverse the tree structure in a single query. For example, given a nodes table, and starting from a node with ID = 1 we could do:

WITH RECURSIVE tree AS (
  SELECT * FROM nodes WHERE id = 1
  UNION ALL
  SELECT * FROM nodes JOIN tree ON nodes.parent_id = tree.id
)
SELECT * FROM tree

Let’s break down the query. The WITH RECURSIVE clause defines a union between 2 members: the anchor member and the recursive member. The anchor member (SELECT * FROM nodes WHERE id = 1) selects the root node from which to start the recursion. The recursive member (SELECT * FROM nodes JOIN tree ON nodes.parent_id = tree.id) joins the nodes table with the temporary table tree (which is essentially the nodes table) on the condition that the parent_id in nodes matches the id in tree. This means that the recursive member selects all the children of the current node, recursively.

After the recursive process completes, the final SELECT statement retrieves all the rows from the temporary table.

Rails and CTE

Rails 7.1 added the #with query method, which can be used to create a CTE. But it wasn’t until Rails 7.2 that Rails added the #with_recursive query method, which can be used to create a recursive CTE.

Database compatibility

Common table expressions are available in MySQL 8.0 and above, PostgreSQL 13.0 and above, and SQLite 3.34.0 and above.

with_recursive_tree

with_recursive_tree is heavily inspired by acts_as_tree, and has the same API. But instead of recursively fetching the children and ancestors, it uses a recursive CTE to do so in a single query. The main advantage of this approach is the performance improvement when dealing with deep threes, and the removing N+1 queries.

Benchmark

Benchmarks are normally very basic cases that don’t consider real-world data, so you shouldn’t take them as if they were the word of god. However, they’re useful to set a baseline and compare different approaches.

You can run the benchmarks by cloning the repository and running:

$ ruby benchmarks/benchmark.rb <LEVELS> <SIBLINGS>

You can specify the number of levels and siblings with which to generate a tree. For example, ruby benchmarks/benchmark.rb 5 3 will generate a tree with 5 levels and 3 siblings per node.

For a tree with 10 levels and 10 siblings per node (1.111.111.111 nodes), the numbers are as follows:

Method acts_as_tree ancestry closure_tree with_recursive_tree
::roots 0.000111 0.000063 0.000171 0.000075
#ancestors 0.001493 0.000675 0.000998 0.000417
#descendants 3.253830 0.000544 0.008368 0.000482
#level 0.002322 0.000481 0.004359 0.000613
#root 0.001475 0.000612 0.005891 0.000538
#self_and_ancestors 0.001548 0.000459 0.000967 0.000452
#self_and_descendants 5.248950 0.000565 0.000977 0.000559
#self_and_siblings 0.003389 0.000749 0.001057 0.000660
walk tree, dfs 5.055768 0.398219 73.838887 0.160001
walk tree, bfs 4.420154 - - 0.157244
reparenting 0.003547 0.112872 15.401308 0.001399

Compatibility

with_recursive_tree is compatible with Ruby on Rails 6.0 and above, Ruby 3.0 and above, and the databases mentioned above.

Conclusion

with_recursive_tree began as an experiment after looking at tree structure solutions for Rails. After seeing its performance, we decided to open-source it so everyone else could benefit from it.

We hope you find this gem useful and welcome any feedback or contributions!