Classic performance optimization strategies in a Ruby on Rails application involve moving slow or expensive logic to background jobs, looking at slow queries and adding missing indexes, or tracking and fixing N+1 query issues. The view layer, most of the time overlooked, should also be a target for performance improvements. In this post, we will do a quick recap of the different rendering strategies in Rails, benchmark them to set the base, and analyze them to decide when to use them (or when not to).
Rendering strategies in Rails
In Rails, we can render a template in many ways. To illustrate the different rendering strategies, we’re going to use a simple Rails 8 app, like the one in the classic 15-minute blog: an Article
model that has many Comment
.
Inline rendering
Inline rendering refers to a chunk of HTML that could be modularized by extracting it into a partial. In a Rails view, one would do:
<h1><%= @article.title %></h1>
<p><%= @article.body %></p>
<h2>Comments (<%= @article.comments.count %>)</h2>
<% @article.comments.each do |comment| %>
<div class="comment">
<p><strong><%= comment.author %>:</strong> <%= comment.body %></p>
<small>Posted on <%= comment.created_at.strftime("%b %d, %Y at %H:%M") %></small>
</div>
<% end %>
It’s good for small views, but it’s hard to use on large views. And, probably the biggest drawback, it doesn’t allow reusing any piece of the HTML.
Partial rendering
Here we extract the HTML in the iteration in its own partial, letting us reuse that piece of HTML in other places, and have more focused and smaller files, easier to work with.
<h1><%= @article.title %></h1>
<p><%= @article.body %></p>
<h2>Comments (<%= @article.comments.count %>)</h2>
<% @article.comments.each do |comment| %>
<%= render "comments/comment", comment: comment %>
<% end %>
Collection rendering
Same as partial rendering, but we delegate the loop to the render method using the collection
parameter.
<h1><%= @article.title %></h1>
<p><%= @article.body %></p>
<h2>Comments (<%= @article.comments.count %>)</h2>
<%= render partial: "comments/comment", collection: @comments, as: :comment %>
Implicit rendering
This is the more succinct version of all, similar in pros and cons to the previous strategy, but here we also delegate to the render
method the decision of which partial to use.
<h1><%= @article.title %></h1>
<p><%= @article.body %></p>
<h2>Comments (<%= @article.comments.count %>)</h2>
<%= render @comments %>
The benchmark
The benchmark renders each view 1000 times using Benchmark::bmbm
.
And these were the results:
Rehearsal -----------------------------------------------------------
Inline ERB view: 1.597948 0.012652 1.610600 ( 1.611081)
Partial loop view: 6.774650 0.024155 6.798805 ( 6.799789)
Collection render view: 3.257858 0.019441 3.277299 ( 3.279077)
Implicit render view: 3.641655 0.018333 3.659988 ( 3.660372)
------------------------------------------------- total: 15.346692sec
user system total real
Inline ERB view: 1.705810 0.008909 1.714719 ( 1.715067)
Partial loop view: 6.914086 0.026164 6.940250 ( 6.944075)
Collection render view: 3.269090 0.018296 3.287386 ( 3.287694)
Implicit render view: 3.678030 0.019551 3.697581 ( 3.697888)
Analysis
Let’s analyze from the slowest to the fastest one, so we can understand how the optimizations of each strategy work.
Partial rendering
Why is the partial rendering in a loop so slow?
The render
method is a perfect example of the conceptual compression philosophy core to Rails design.
Behind the scenes, just rendering a template to an HTML string is much more complex than it sounds:
- Finds the compiled cached template (fast, but not free)
- Creates an ActionView::Renderer
- Sets up the rendering context and binds the locals to it (
comment: comment
) - Finally, executes the cached template method that generates the HTML
This work is repeated for each of the 1000 comments. This has a lot of repeated work that we should be able to avoid: enter collection rendering.
Collection and implicit rendering improvements
Collection rendering and implicit rendering are sister strategies. In this case, implicit rendering is just collection rendering with a tiny bit of object-oriented magic on top: the object knows how to render itself by implementing the to_partial_path
method (which is implemented by default).
How do they perform 2 times better? Well, with collection rendering, steps 1 and 2 are done once for the entire loop, so for the 1000 partials, we save 999 template searches and 999 ActionView::Renderer
instantiations. That’s quite a lot of work. Even bigger if your collections are bigger (not very usual, though).
But we still need to bind the locals and call the rendering method 1000 times. Could we do any better?
Inline rendering
Now that we know what render
is doing under the hood, we can easily figure out why inline rendering is the fastest: there’s no render
at all. So, there’s no template lookup, there’s no rendering context instantiation nor binding setup, and there’s no separate method invocation to assemble the HTML. It’s all already taken care of by the article partial as it’s done once. Hard to beat.
Having second thoughts
If you are like me, you would be thinking: if the performance hit comes from calling the render
method and the main pain point of inlined views is maintainability and reusability… what would happen if instead of a partial, we put the template in a helper using content_tag
? That should give us the best of both worlds, right? We could modularize using Ruby methods, and we wouldn’t be calling render
, so it should be fast, right?
Well, let’s see! Let’s add this method to the application helper:
def render_comment(comment)
content_tag("div", class: "comment") do
content_tag("p") do
content_tag("strong", comment.author) +
content_tag("small", "posted on #{comment.created_at.strftime("%b %d, %Y at %H:%M")}")
end
end
end
And our view becomes:
<% @comments.each do |comment| %>
<%= render_comment comment %>
<% end %>
It almost looks like a component! Let’s see the benchmarks now:
Rehearsal -----------------------------------------------------------
Inline ERB view: 1.576235 0.016086 1.592321 ( 1.595398)
Partial loop view: 6.798589 0.027718 6.826307 ( 6.828153)
Collection render view: 3.215288 0.017600 3.232888 ( 3.234518)
Implicit render view: 3.623890 0.020319 3.644209 ( 3.645871)
helper loop view: 6.856758 0.020698 6.877456 ( 6.878699)
------------------------------------------------- total: 22.173181sec
user system total real
Inline ERB view: 1.558490 0.010675 1.569165 ( 1.569459)
Partial loop view: 6.928491 0.026780 6.955271 ( 6.955799)
Collection render view: 3.258910 0.018507 3.277417 ( 3.277837)
Implicit render view: 3.659728 0.019208 3.678936 ( 3.679344)
helper loop view: 6.939471 0.024494 6.963965 ( 6.964710)
HA! Have you ever seen a hypothesis go down that spectacularly? Looks like there are worse things than render
out there!
What happened back there?
To understand why the helper strategy is by far the slowest, we need to see what our application was actually doing. A good way to see where our code spends most of the time is with a profiler.
This is what ruby-prof gives us back (the first few most interesting lines):
Measure Mode: wall_time
Thread ID: 1616
Fiber ID: 9368
Total: 0.044312
Sort by: self_time
%self total self wait child calls name location
11.18 0.005 0.005 0.000 0.000 6027 String#initialize
6.20 0.022 0.003 0.000 0.019 4002 ActionView::Helpers::TagHelper::TagBuilder#content_tag_string /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/actionview-8.0.2.1/lib/action_view/helpers/tag_helper.rb:239
5.40 0.040 0.002 0.000 0.038 4000 *ActionView::Helpers::TagHelper#content_tag /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/actionview-8.0.2.1/lib/action_view/helpers/tag_helper.rb:516
5.00 0.004 0.002 0.000 0.002 5034 ActiveSupport::CoreExt::ERBUtil#unwrapped_html_escape /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/activesupport-8.0.2.1/lib/active_support/core_ext/erb/util.rb:10
4.29 0.025 0.002 0.000 0.023 2000 *ActionView::OutputBuffer#capture /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/actionview-8.0.2.1/lib/action_view/buffers.rb:72
4.24 0.007 0.002 0.000 0.005 6023 ActiveSupport::SafeBuffer#initialize /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/activesupport-8.0.2.1/lib/active_support/core_ext/string/output_safety.rb:70
3.94 0.003 0.002 0.000 0.001 7025 String#blank? /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/activesupport-8.0.2.1/lib/active_support/core_ext/object/blank.rb:153
3.39 0.004 0.002 0.000 0.003 6004 String#present? /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/activesupport-8.0.2.1/lib/active_support/core_ext/object/blank.rb:165
3.22 0.026 0.001 0.000 0.025 2000 *ActionView::Helpers::CaptureHelper#capture /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/actionview-8.0.2.1/lib/action_view/helpers/capture_helper.rb:47
2.68 0.001 0.001 0.000 0.000 9034 Regexp#match?
2.36 0.005 0.001 0.000 0.004 1010 Hash#each_pair
2.20 0.008 0.001 0.000 0.007 6027 <Class::String>#new
2.15 0.009 0.001 0.000 0.008 6020 String#html_safe /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/activesupport-8.0.2.1/lib/active_support/core_ext/string/output_safety.rb:225
1.86 0.001 0.001 0.000 0.000 7252 Hash#[]
1.82 0.003 0.001 0.000 0.002 1020 ActionView::Helpers::TagHelper::TagBuilder#tag_option /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/actionview-8.0.2.1/lib/action_view/helpers/tag_helper.rb:294
1.62 0.001 0.001 0.000 0.001 4001 ActionView::Helpers::TagHelper#ensure_valid_html5_tag_name /Users/f-3r/.rbenv/versions/3.4.5/lib/ruby/gems/3.4.0/gems/actionview-8.0.2.1/lib/action_view/helpers/tag_helper.rb:575
1.48 0.001 0.001 0.000 0.000 11060 String#empty?
Well… it looks rather messy. What are we looking for here? We want lines that have a high %self
(% of the total sampled time) and low child
time, and higher or equal self
time. This means the time was used by the method itself and not by another method that was called. For example:
String#initialize
: was called 6027 times and was executing 11% of the timeERBUtil#unwrapped_html_escape
: called 5034 times, 5%ActiveSupport::SafeBuffer#initialize
: instantiated 6023 times, 4.24%String#blank?
: 7025 times, 4.94%Regexp#match?
: 9034 times, 2.68%TagHelper#ensure_valid_html5_tag_name
: 4001 times, 1.62%String#empty?
: 11060 times, 1.48%- and so on…
We can see that what takes more time are String
and SafeBuffer
allocations, and string validations/checks.
Internally, every content_tag
:
- Validates the tag name
- Processes the attributes
- Escapes attributes via
ERB::Util.html_escape
- Allocates and returns an
ActiveSupport::SafeBuffer
And we have 4 of these, 1000 times. The operations are fast, but there are so many of them that the work piles up.
So, inline rendering is the definitive winner in terms of performance.
When to and when not to
We have focused solely on performance, but you can imagine that rendering all your views in a huge single ERB file, because it is the fastest, might not be the smartest choice. There are trade-offs, always:
- You can trade in a little performance for maintainability/readability,
- Or exchange a little performance for re-usability,
- Or the other way around, sacrifice readability/maintainability for a performance boost when things get critical
Here we enter the domain of design choices. As usually happens with the interesting sides of programming, there’s no silver bullet, no single correct answer. This has to do more with team alignment and project culture than purely technical decisions.
Some recommendations
We usually use the following heuristic/guidelines to decide when to use one or the other:
Try to make your view tree as shallow as possible. Modularize where it makes sense, not just for the sake of it, as indirection is not free (in terms of performance and maintainability).
The priorities we use to decide:
- Focus on maintainability and readability first. Code is written once and read thousands of times. Be kind to your future self.
- Always use collection rendering (where applicable). We like it more than implicit rendering because it’s more explicit and flexible regarding partial locations, and doesn’t need to switch context from the view to the model to know what partial will be rendered.
- Always profile your code (eg, use rack-mini-profiler, or any observability/APM solution). Though rendering is not always the first reason views are slow, if you detect a partial that’s making things slow, you can try inlining it.
But perhaps the more important thing to take into account is that rendering performance tuning might be negligible in a broader context. For example, if a page is loading 300 third-party JS dependencies, squeezing out 80ms from your view rendering won’t help much with your app performance or what your users perceive. Also, there are other alternatives that are outside of pure view performance, like caching or taking smart product decisions, but every performance issue is different from another, and one should always investigate what’s possible on each individual problem.