Tree structures in Rails

Tree structures are a common way of organizing hierarchical data. In this post, we’ll explore existing solutions for Ruby on Rails, their benefits, drawbacks, and when to use them.

Development Ruby on Rails

Dec 12th, 2024

By Patricio Mac Adden

Dec 12th, 2024

By Patricio Mac Adden

Development Ruby on Rails

Tree structures are a common way of organizing hierarchical data. Each node in the structure has a parent node (except the root) and zero or more child nodes. Last month, as we started working on a new project in which the main entity is organized as a tree, we needed to analyze the existing solutions for tree structures in Rails’ ecosystem. In this post, we will explore those solutions, their benefits, drawbacks, and when to use them. Particularly, we’ll take a look at:

In all cases, once you set these plugins up, your model will be extended to access a node’s parent and children, ancestors, descendants, etc. and you’ll have a way of arranging the trees to traverse them.

How they work

acts_as_tree

acts_as_tree needs a parent_id column to be present on your model’s table. It’s the simplest solution as there are no auxiliary columns or structures to store information about the tree. It’s simple to insert or update a node in any section of the tree, you just need to set the parent_id adequately and that’s all. But simplicity comes at a cost: using the parent_id as the foreign key for the parent and the children nodes, the only way to traverse the tree is recursively, causing N+1 queries. For example, calling #descendants on a node will return its children #descendants, resulting in a recursive N+1 query.

Ancestry

Ancestry takes a completely different approach. It doesn’t use a parent_id column to build the parent and children relationships. Instead, it stores the materialized path in each node (eg. /1/2/3/, the path from the root to the current node). This approach allows for efficient querying of #ancestors and #descendants without N+1 queries.

However, it is less performant when it comes to moving or deleting nodes, as it needs to update the paths of all affected nodes and their descendants.

closure_tree

Apart from a parent_id column, closure_tree needs an auxiliary table to store the transitive closure tree table for your model. Consequently, reads are performant (1 query to fetch a node’s #ancestors, #descendants, etc.) but writes are less performant (2 inserts to create a node, 3 inserts/updates on node reparenting).

Which one is better?

As is frequent in software development, there’s no single correct answer: you must always weigh the trade-offs of the different implementations considering your specific use case and choose the one that fits best for your application.

By having performed a thorough analysis of the existing solutions, we can say that if your structure isn’t likely to change and you need fast reads, then either ancestry or closure_tree are good options, with ancestry being a bit more performant, especially when traversing the tree and re-parenting nodes. If your structure is likely to change, your trees aren’t going to be very deep or your UX can cope with the performance hit of N+1s, acts_as_tree might be enough.

Another factor to consider is configurability: do you need to customize which foreign key to use? acts_as_tree allows you to change the primary and foreign keys for the parent and children relationships, closure_tree allows you to change the foreign key, and ancestry will always use the id to build the materialized path. If you need to use other non-integer columns to build the tree, then acts_as_tree is the most versatile.

Our pick

The previous analysis leaves you with a sour taste, doesn’t it? Well, in our case it did:

In our specific use case, we receive nodes unordered from another component of the app. This means we don’t necessarily have the root node at the beginning of the process, and thus we cannot create new nodes with their immediate parent. With ancestry or closure_tree, it’s impossible to create the node without their parent as they use it to build the materialized path and transitive closure table (the only way is potentially rewriting the whole tree on each new node, which is not viable).

With acts_as_tree, however, we could create the node regardless of the parent. This was a big advantage in our use case. But our trees are big and deep, and our app needs to be fast. So… what to do if one needs all of the benefits and none of the drawbacks? Fast reads and writes, simplicity, no auxiliary metadata, and customizability?

If this is your case too, then, stay tuned. We might have a surprise for you.