Debugging a memory-leaking action

This week we had to refactor a feature in one of the Ruby on Rails apps we’re maintaining. While at it, we discovered it was leaking memory, making the memory usage ramp up exponentially. With our expertise and the help of some tools, we were able to identify the root cause and fix the issue. Read on to learn more about how we approached this problem.

Sep 26th, 2024
By Patricio Mac Adden

This week we had to fix a bug in one of the Ruby on Rails apps we’re maintaining. We inherited the application from another team and our objectives included a Ruby on Rails upgrade, an infrastructure migration (from a VPS to Heroku), and new features. Now that these main objectives have been accomplished, we switched focus to taking care of performance issues.

The issue that we’re going to look at in this post is fairly common.

This application has several lists. You can filter those lists and export the results as an Excel file. The implementation had several issues:

  • The exports were built directly in the controller, which results in high response time and affects the throughput of the app.
  • As usually happens with 10-year-old applications, the volume of data grew huge over the years, resulting in a timeout in most of the cases (except when the filtered list was very small).
  • The memory usage ramped out pretty quickly.

This is a classic, when building new applications is frequent for inexperienced devs to forget to think about how the current implementation will work with ten times or a hundred times more data. This simple implementation must have probably worked just fine for a couple of years. But, if they had a well-configured application monitoring in place, and was taken into account in the development process, this could have been avoided long ago.

The refactor

Initially, once you filtered a list and hit the export button, the main thread would block until the xls file was ready and sent to the client. Fixing this is trivial: instead of processing the export inline, a background job would be enqueued and the user would be notified that the export would be sent to their email soon. The background job would process the data, build the xls file and send it via email to the user that generated the report.

Before the refactor, the export action looked something like this:

def export
  authorize Product, :export?

  @products = ProductFilter.new(filter_params, current_user).call

  respond_to do |format|
    format.xlsx { render xlsx: "export", filename: "products.xlsx" }
  end
end

And the view (app/views/products/export.xlsx.axlsx) looked something like this:

wb = xlsx_package.workbook
wb.add_worksheet(name: "Products") do |sheet|
  sheet.add_row ["Date:  #{Date.today.strftime("%d/%m/%Y")}"]

  sheet.add_row ["Code", "Provider", "Mode", "Percentage", "Kind", "Color", "Brand", "Created at", "Sold at", "Status", "Gender", "Price"]

  @products.each do |p|
    sheet.add_row [
      p.code,
      p.provider.name,
      t(p.mode, scope: :modes),
      p.percentage_category.to_s,
      p.kind_name,
      p.color_name,
      p.brand.name,
      p.created_at.strftime("%d/%m/%Y"),
      p.sold_at.try(:strftime, "%d/%m/%Y"),
      t(p.state),
      p.gender,
      p.price
    ]
  end
end

Note these snippets are redacted for simplicity.

As stated above, the refactor was trivial:

class ExportProductsJob < ApplicationJob
  def perform(params, user)
    @products = ProductFilter.new(params, user).call

    build_excel_file

    # the mailer grabs the generated file from the /tmp folder
    UserMailer.send_products_export(user).deliver_now
  end

  private

  def build_excel_file
    package = Axlsx::Package.new
    wb = package.workbook
    wb.add_worksheet(name: "Products") do |sheet|
      sheet.add_row ["Date:  #{Date.today.strftime("%d/%m/%Y")}"]

      sheet.add_row ["Code", "Provider", "Mode", "Percentage", "Kind", "Color", "Brand", "Created at", "Sold at", "Status", "Gender", "Price"]

      @products.in_batches.each do |batch|
        batch.each do |p|
          sheet.add_row [
            p.code,
            p.provider.name,
            t(p.mode, scope: :modes),
            p.percentage_category.to_s,
            p.kind_name,
            p.color_name,
            p.brand.name,
            p.created_at.strftime("%d/%m/%Y"),
            p.sold_at.try(:strftime, "%d/%m/%Y"),
            t(p.state),
            p.gender,
            p.price
          ]
        end
      end
    end

    # this is not ideal as if many users request the same export, files could be corrupt when the first job finishes
    # but it works to illustrate the issue
    package.serialize Rails.root.join("tmp/products.xlsx")
  end
end

If you pay close attention, you’ll notice an initial performance improvement: the ProductFilter class would return all the products in the database, so instead of loading them all in memory and then iterating over them, we used #in_batches to split the bulk of products into smaller chunks of 1k products.

The memory leak

After refactoring the features into their separate jobs that built the Excel files, we could see that the memory usage ramped up (but not because of the refactoring) from 160-something MB to a few gigabytes after running the reports, especially the unfiltered products export, which allows you to export 700k+ products. This export is not useful because no one will ever need it as it would export all products, even the ones that were sold years ago. But it’s a good example to measure and improve, as all of the other exports work the same way. After all, if this export is improved, we can replicate the fix on the other exports.

Upgrading to a more powerful Heroku dyno was definitively not an option, as it should’ve been at least 4Gb RAM and just for running these reports. This doesn’t justify the upgrade.

Finding the leak

It was obvious that somewhere in the code the export is leaking memory, but where? The export’s code is quite direct:

  1. it fetches all the required records with the ProductFilter class
  2. it iterates over all the products within the Excel builder

We thought that processing the results in batches of 1k products would suffice, but that wasn’t the case, so we needed to measure:

  1. How much memory does the ProductFilter use?
  2. How much memory does the xls builder (axlsx gem) use?

We added the benchmark-memory gem to the project to measure different parts of the job to try and find the sections that were leaking memory, but given that the memory increase is huge, and that the memory profiler also uses more memory to collect data, the process blew out before we could see something useful.

Puts debugging

Using benchmark-memory was not an option, but we’re puts debuggerers and we always have a card up our sleeve. After all, our objective is to run the job and maintain the memory consumption under 512MB.

To find out where our code leaks memory, one of the simplest methods is to see how much memory is using the process where the job is running in certain parts of the code. The code that we’re using to see how much memory the process is using looks like this:

class ExportProductsJob < ApplicationJob
  include ActionView::Helpers::NumberHelper

  # ...

  private

  def memory
    number_to_human_size `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last.kilobytes
  end
end

What this method does is:

  1. ps ax -o pid,rss lists all processes running on the system, showing the PID and RSS (Resident Set Size, the real memory the size of the process)
  2. grep -E "^[[:space:]]*#{$$} filters the output of the command above, matching lines that start with zero or more whitespace chars, followed by the PID of the current process ($$)
  3. strip removes any leading and trailing whitespace chars from the output of the previous command
  4. split splits the string based on a whitespace char, leaving a 2-item array with the PID of the current process in the first position, and the RSS of the current process in the last position
  5. map(&:to_i) casts the strings into integers
  6. last gets the last item of the array, the RSS
  7. kilobytes returns the number of bytes equivalent to the kilobytes provided
  8. number_to_human_size returns a string that we can read, something like 164MB

Now we can use the memory method in key parts of the code to see how the memory grows. For example, we want to measure how much memory the process is using at the beginning of the run, then when the products are fetched, then on every iteration, etc.

class ExportProductsJob < ApplicationJob
  include ActionView::Helpers::NumberHelper

  def perform(params, user)
    puts "start: #{memory}"
    @products = ProductFilter.new(params, user).call

    attachment = build_excel_file

    UserMailer.send_products_export(user, attachment).deliver_now
  end

  private

  def build_excel_file
    package = Axlsx::Package.new
    wb = package.workbook
    wb.add_worksheet(name: "Products") do |sheet|
      sheet.add_row ["Date:  #{Date.today.strftime("%d/%m/%Y")}"]

      sheet.add_row ["Code", "Provider", "Mode", "Percentage", "Kind", "Color", "Brand", "Created at", "Sold at", "Status", "Gender", "Price"]

      @products.in_batches.each_with_index do |batch, i|
        batch.each do |p|
          sheet.add_row [
            p.code,
            p.provider.name,
            t(p.mode, scope: :modes),
            p.percentage_category.to_s,
            p.kind_name,
            p.color_name,
            p.brand.name,
            p.created_at.strftime("%d/%m/%Y"),
            p.sold_at.try(:strftime, "%d/%m/%Y"),
            t(p.state),
            p.gender,
            p.price
          ]
        end

        puts "#{(i * 1000) + batch.size} products processed: #{memory}"
      end
    end
  end

  def memory
    number_to_human_size `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last.kilobytes
  end
end

By running the code above, we can see that the memory use ramps up quickly:

start: 167 MB
1000 products processed: 181 MB
2000 products processed: 182 MB
3000 products processed: 182 MB
4000 products processed: 184 MB
5000 products processed: 184 MB
...
98000 products processed: 405 MB
99000 products processed: 405 MB
100000 products processed: 405 MB
101000 products processed: 405 MB
102000 products processed: 406 MB
...
320000 products processed: 838 MB
321000 products processed: 858 MB
322000 products processed: 879 MB
323000 products processed: 913 MB
324000 products processed: 923 MB
325000 products processed: 929 MB
...
474000 products processed: 1.22 GB
475000 products processed: 1.23 GB
476000 products processed: 1.23 GB
477000 products processed: 1.24 GB
478000 products processed: 1.23 GB
...

There’s something wrong with #build_excel_file. Let’s see what it does:

  1. it iterates over batches of 1k products
  2. for every batch, it iterates over each one of the products, building an array and adding it to the excel

Also, each product is loaded into memory along with some associated records, and an array is built. We can try and see if this process of bringing 1k products and associated records, and building 1k arrays leaks memory by commenting out (for now) the code that builds the Excel file:

def build_excel_file
  @products.in_batches.each_with_index do |batch, i|
    batch.each do |p|
      [
        p.code,
        p.provider.name,
        t(p.mode, scope: :modes),
        p.percentage_category.to_s,
        p.kind_name,
        p.color_name,
        p.brand.name,
        p.created_at.strftime("%d/%m/%Y"),
        p.sold_at.try(:strftime, "%d/%m/%Y"),
        t(p.state),
        p.gender,
        p.price
      ]
    end

    puts "#{(i * 1000) + batch.size} products processed: #{memory}"
  end
end

This time, #in_batches, works as expected. After every iteration, the memory remains practically the same:

start: 165 MB
1000 products processed: 169 MB
2000 products processed: 180 MB
3000 products processed: 184 MB
4000 products processed: 185 MB
5000 products processed: 186 MB
...
148000 products processed: 191 MB
149000 products processed: 191 MB
150000 products processed: 191 MB
...
589000 products processed: 191 MB
590000 products processed: 191 MB
591000 products processed: 191 MB
...
719000 products processed: 191 MB
720000 products processed: 191 MB
720008 products processed: 191 MB

But in this case, we’re not doing anything else than returning an array for each item in the batch. After some time, the 1k products that we loaded plus the 1k arrays that we built will be erased from memory by the garbage collector. The problem starts when we need to collect them to build the Excel file. Even if we collected those small arrays outside of the iteration that adds the rows to the excel file, the problem would remain. It’s a lot of data and it will normally use a lot of memory.

The solution

The solution is simple: instead of writing the Excel file to disk after collecting all the rows, we need to write them to disk after each batch, for example. This way, once the batch finishes, the memory used to allocate the batch and the arrays will be collected by the garbage collector. But unfortunately, the axlsx gem doesn’t support opening existing Excel files.

After reviewing the final xls file and speaking with the client, we decided to stop exporting Excel files and export CSV files instead. The client wasn’t really taking advantage of the features Excel files provide anyway, so exporting CSV files instead ended up being an improvement.

CSV files have many benefits:

  • they’re plain text files that can be imported into Excel if they want to
  • they can be opened, written, closed, and reopened as many times as we want
  • since they’re plain text files we can open them in a+ mode (reading/appending)

After this refactoring, the code looks much cleaner:

class ExportProductsJob < ApplicationJob
  include ActionView::Helpers::NumberHelper

  def perform(params, user)
    puts "start: #{memory}"
    @products = ProductFilter.new(params, user).call

    UserMailer.send_products_export(user).deliver_now
  end

  private

  def build_excel_file
    file = File.open Rails.root.join("tmp/products.csv"), "a+"

    @products.in_batches.each_with_index do |batch, i|
      batch.each do |p|
        row = [
          p.code,
          p.provider.name,
          t(p.mode, scope: :modes),
          p.percentage_category.to_s,
          p.kind_name,
          p.color_name,
          p.brand.name,
          p.created_at.strftime("%d/%m/%Y"),
          p.sold_at.try(:strftime, "%d/%m/%Y"),
          t(p.state),
          p.gender,
          p.price
        ].join(",")

        file.write "#{row}\n"
      end

      puts "#{(i * 1000) + batch.size} products processed: #{memory}"
    end

    file.close
  end

  def memory
    number_to_human_size `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last.kilobytes
  end
end

And it doesn’t leak memory anymore:

start: 164 MB
1000 products processed: 171 MB
2000 products processed: 183 MB
3000 products processed: 184 MB
4000 products processed: 184 MB
5000 products processed: 184 MB
...
300000 products processed: 189 MB
301000 products processed: 189 MB
302000 products processed: 189 MB
...
718000 products processed: 190 MB
719000 products processed: 190 MB
720000 products processed: 190 MB
720008 products processed: 190 MB

Conclusion

Debugging memory leaks is not always easy. Not always you can use tools like benchmark-memory or APMs such as Scout, NewRelic, or similar. In this case, none of these tools could be used, so we had to go back and forth testing different parts of the code (the code above is just a simplified, beautified, fragment of the real code, which piles up a good amount of technical debt apart from this issue), trying to find the memory leak.

It requires some experience working on Ruby and Ruby on Rails to understand exactly how Ruby manages memory and how you can leverage ActiveRecord to be more performant and less resource-hungry. Sometimes, the solution results in a beautiful code that is easy to understand (such as in this case), but sometimes the solution is less beautiful. And also sometimes, you need to sacrifice speed (in a background job speed is not critical) in favor of decreasing memory consumption. Each case needs to be carefully evaluated.

If your Ruby on Rails application needs to be evaluated by a professional, don’t hesitate to contact us. We’re now offering an application assessment where we analyze your application and build a report with this kind of improvement and fixes for you to act upon.