SINAPTIA

This week in #devs - Issue #2

2024-10-14T00:00:00+00:00

Our #devs channel is a cross-project, shared space where the entire dev team of SINAPTIA can ask questions, share opinions, and discuss interesting articles or tech they come across. The idea is to post a curated extract of what happens there every week.

Rails data migrations

Fernando shared news regarding Rails 8: new script folder and generator. Rails 8 introduces a new script folder dedicated to holding one-off or general-purpose scripts, such as data migrations, cleanup tasks, or other utility operations. This addition helps organize these scripts neatly, keeping them separate from your main application logic.

This news triggered a conversation about how we handled data migrations in the past. Patricio shared his experience with using the standard migration approach for handling data migrations in Rails applications. He mentioned that while it works well for simple cases, more complex scenarios can become cumbersome and difficult to manage. In those cases, he used rake tasks or scripts to complete the data migrations. A couple of alternative solutions were shared:

Patricio shared the maintenance_tasks gem by Shopify.
Fernando shared the data-migrate gem.

Finally, Fernando shared an interesting thread about Patterns for Data-Only Rails Migrations from the Rails discourse.

Git Stash

Diego asked a question about git stash. He accidentally dropped his stash by running git stash drop, and wanted to know if he could recover his changes. Patricio suggested using the reflog (git reflog) to see if he could recover something from there but unfortunately, deleted stashed changes are not recoverable. This triggered other developers to suggest how they normally stash their changes:

Fernando avoids using git stash; instead, he commits his changes. He argues that having a single git workflow is much simpler and benefits from leaving messages to himself in his git commits.
Julián uses git stash, but suggests never dropping: use git stash pop instead.
Esteban joked about saving the diff in a notepad 😂

AI Coding Assistants

Diego shared this interesting article about AI coding assistants: A study suggests these tools don’t increase coding speed. On the contrary, they led to a 41% increase in bugs, raising concerns about code quality.

That’s it for this issue. See you around next week!

This week in #devs - Issue #1

2024-10-04T00:00:00+00:00

On Rails Performance

Last week, we discussed Rails performance extensively due to some work we’re doing on one of our client’s Ruby on Rails applications. Fernando shared an article about DHH’s take on N+1 queries: “N+1 is a feature”. Used with a Russian doll caching setup, it could be even more performant than eager loading. The topic is a bit controversial, as DHH’s opinions usually are, but we found the thought exercise fruitful. Both problems (eager loading and caching) are challenging, and there’s no single correct solution; each particular case must be thoroughly evaluated.

In relation to this, Fernando also shared a tweet by Nate Berkopec, which is related to having an app full of unoptimized code hidden behind layers of cache. This is a risky spot to be in! Say goodbye to your app if your Redis instance goes down 🙃

Fernando shared this Reddit post where the maintainer of Pitchfork answers why he forked Unicorn instead of using Puma at Shopify. Super interesting thread, full of first-hand information and technical details behind the app server that supports one of the biggest Rails apps in the world. Go get a quick read if you haven’t.

Patricio shared a post by tenderlove on Ruby’s memory allocations. As is usual with Tenderlove’s posts, this one is super rich and well-explained. It also delves into tiny technical details about the inner workings of the Ruby VM and garbage collector. Super interesting!

Rails World

We watched DHH’s opening keynote and commented on our excitement for Rails 8, especially regarding the simplicity of deploying simple apps to a VPS. We also discussed SQLite’s performance and the challenges of maintaining a Rails app using SQLite. Fernando shared an article about tuning SQLite for better performance.

Dev Containers

Patricio opened up a pull request in rails/devcontainer. The pull request introduces a new devcontainer feature to provide a bundler cache, speeding up the startup of dev containers using it.

The good news: It has been merged! Props to Patricio 👏🏻

Audited Gem

Nazareno asked if it was possible to add custom attributes to audited’s Audit model. He got it working by monkey-patching the Audit model but wanted to know if there was a better alternative. After discussing various alternatives with Fernando and Patricio, he decided to open up this pull request to address this issue.

Well done, Naza!

That’s it for this issue. See you around next week!

Debugging a memory-leaking action

2024-09-26T00:00:00+00:00

This week we had to fix a bug in one of the Ruby on Rails apps we’re maintaining. We inherited the application from another team and our objectives included a Ruby on Rails upgrade, an infrastructure migration (from a VPS to Heroku), and new features. Now that these main objectives have been accomplished, we switched focus to taking care of performance issues.

The issue that we’re going to look at in this post is fairly common.

This application has several lists. You can filter those lists and export the results as an Excel file. The implementation had several issues:

The exports were built directly in the controller, which results in high response time and affects the throughput of the app.
As usually happens with 10-year-old applications, the volume of data grew huge over the years, resulting in a timeout in most of the cases (except when the filtered list was very small).
The memory usage ramped out pretty quickly.

This is a classic, when building new applications is frequent for inexperienced devs to forget to think about how the current implementation will work with ten times or a hundred times more data. This simple implementation must have probably worked just fine for a couple of years. But, if they had a well-configured application monitoring in place, and was taken into account in the development process, this could have been avoided long ago.

The refactor

Initially, once you filtered a list and hit the export button, the main thread would block until the xls file was ready and sent to the client. Fixing this is trivial: instead of processing the export inline, a background job would be enqueued and the user would be notified that the export would be sent to their email soon. The background job would process the data, build the xls file and send it via email to the user that generated the report.

Before the refactor, the export action looked something like this:

def export
  authorize Product, :export?

  @products = ProductFilter.new(filter_params, current_user).call

  respond_to do |format|
    format.xlsx { render xlsx: "export", filename: "products.xlsx" }
  end
end

And the view (app/views/products/export.xlsx.axlsx) looked something like this:

wb = xlsx_package.workbook
wb.add_worksheet(name: "Products") do |sheet|
  sheet.add_row ["Date:  #{Date.today.strftime("%d/%m/%Y")}"]

  sheet.add_row ["Code", "Provider", "Mode", "Percentage", "Kind", "Color", "Brand", "Created at", "Sold at", "Status", "Gender", "Price"]

  @products.each do |p|
    sheet.add_row [
      p.code,
      p.provider.name,
      t(p.mode, scope: :modes),
      p.percentage_category.to_s,
      p.kind_name,
      p.color_name,
      p.brand.name,
      p.created_at.strftime("%d/%m/%Y"),
      p.sold_at.try(:strftime, "%d/%m/%Y"),
      t(p.state),
      p.gender,
      p.price
    ]
  end
end

Note these snippets are redacted for simplicity.

As stated above, the refactor was trivial:

class ExportProductsJob < ApplicationJob
  def perform(params, user)
    @products = ProductFilter.new(params, user).call

    build_excel_file

    # the mailer grabs the generated file from the /tmp folder
    UserMailer.send_products_export(user).deliver_now
  end

  private

  def build_excel_file
    package = Axlsx::Package.new
    wb = package.workbook
    wb.add_worksheet(name: "Products") do |sheet|
      sheet.add_row ["Date:  #{Date.today.strftime("%d/%m/%Y")}"]

      sheet.add_row ["Code", "Provider", "Mode", "Percentage", "Kind", "Color", "Brand", "Created at", "Sold at", "Status", "Gender", "Price"]

      @products.in_batches.each do |batch|
        batch.each do |p|
          sheet.add_row [
            p.code,
            p.provider.name,
            t(p.mode, scope: :modes),
            p.percentage_category.to_s,
            p.kind_name,
            p.color_name,
            p.brand.name,
            p.created_at.strftime("%d/%m/%Y"),
            p.sold_at.try(:strftime, "%d/%m/%Y"),
            t(p.state),
            p.gender,
            p.price
          ]
        end
      end
    end

    # this is not ideal as if many users request the same export, files could be corrupt when the first job finishes
    # but it works to illustrate the issue
    package.serialize Rails.root.join("tmp/products.xlsx")
  end
end

If you pay close attention, you’ll notice an initial performance improvement: the ProductFilter class would return all the products in the database, so instead of loading them all in memory and then iterating over them, we used #in_batches to split the bulk of products into smaller chunks of 1k products.

The memory leak

After refactoring the features into their separate jobs that built the Excel files, we could see that the memory usage ramped up (but not because of the refactoring) from 160-something MB to a few gigabytes after running the reports, especially the unfiltered products export, which allows you to export 700k+ products. This export is not useful because no one will ever need it as it would export all products, even the ones that were sold years ago. But it’s a good example to measure and improve, as all of the other exports work the same way. After all, if this export is improved, we can replicate the fix on the other exports.

Upgrading to a more powerful Heroku dyno was definitively not an option, as it should’ve been at least 4Gb RAM and just for running these reports. This doesn’t justify the upgrade.

Finding the leak

It was obvious that somewhere in the code the export is leaking memory, but where? The export’s code is quite direct:

it fetches all the required records with the ProductFilter class
it iterates over all the products within the Excel builder

We thought that processing the results in batches of 1k products would suffice, but that wasn’t the case, so we needed to measure:

How much memory does the ProductFilter use?
How much memory does the xls builder (axlsx gem) use?

We added the benchmark-memory gem to the project to measure different parts of the job to try and find the sections that were leaking memory, but given that the memory increase is huge, and that the memory profiler also uses more memory to collect data, the process blew out before we could see something useful.

Puts debugging

Using benchmark-memory was not an option, but we’re puts debuggerers and we always have a card up our sleeve. After all, our objective is to run the job and maintain the memory consumption under 512MB.

To find out where our code leaks memory, one of the simplest methods is to see how much memory is using the process where the job is running in certain parts of the code. The code that we’re using to see how much memory the process is using looks like this:

class ExportProductsJob < ApplicationJob
  include ActionView::Helpers::NumberHelper

  # ...

  private

  def memory
    number_to_human_size `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last.kilobytes
  end
end

What this method does is:

ps ax -o pid,rss lists all processes running on the system, showing the PID and RSS (Resident Set Size, the real memory the size of the process)
grep -E "^[[:space:]]*#{$$} filters the output of the command above, matching lines that start with zero or more whitespace chars, followed by the PID of the current process ($$)
strip removes any leading and trailing whitespace chars from the output of the previous command
split splits the string based on a whitespace char, leaving a 2-item array with the PID of the current process in the first position, and the RSS of the current process in the last position
map(&:to_i) casts the strings into integers
last gets the last item of the array, the RSS
kilobytes returns the number of bytes equivalent to the kilobytes provided
number_to_human_size returns a string that we can read, something like 164MB

Now we can use the memory method in key parts of the code to see how the memory grows. For example, we want to measure how much memory the process is using at the beginning of the run, then when the products are fetched, then on every iteration, etc.

class ExportProductsJob < ApplicationJob
  include ActionView::Helpers::NumberHelper

  def perform(params, user)
    puts "start: #{memory}"
    @products = ProductFilter.new(params, user).call

    attachment = build_excel_file

    UserMailer.send_products_export(user, attachment).deliver_now
  end

  private

  def build_excel_file
    package = Axlsx::Package.new
    wb = package.workbook
    wb.add_worksheet(name: "Products") do |sheet|
      sheet.add_row ["Date:  #{Date.today.strftime("%d/%m/%Y")}"]

      sheet.add_row ["Code", "Provider", "Mode", "Percentage", "Kind", "Color", "Brand", "Created at", "Sold at", "Status", "Gender", "Price"]

      @products.in_batches.each_with_index do |batch, i|
        batch.each do |p|
          sheet.add_row [
            p.code,
            p.provider.name,
            t(p.mode, scope: :modes),
            p.percentage_category.to_s,
            p.kind_name,
            p.color_name,
            p.brand.name,
            p.created_at.strftime("%d/%m/%Y"),
            p.sold_at.try(:strftime, "%d/%m/%Y"),
            t(p.state),
            p.gender,
            p.price
          ]
        end

        puts "#{(i * 1000) + batch.size} products processed: #{memory}"
      end
    end
  end

  def memory
    number_to_human_size `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last.kilobytes
  end
end

By running the code above, we can see that the memory use ramps up quickly:

start: 167 MB
1000 products processed: 181 MB
2000 products processed: 182 MB
3000 products processed: 182 MB
4000 products processed: 184 MB
5000 products processed: 184 MB
...
98000 products processed: 405 MB
99000 products processed: 405 MB
100000 products processed: 405 MB
101000 products processed: 405 MB
102000 products processed: 406 MB
...
320000 products processed: 838 MB
321000 products processed: 858 MB
322000 products processed: 879 MB
323000 products processed: 913 MB
324000 products processed: 923 MB
325000 products processed: 929 MB
...
474000 products processed: 1.22 GB
475000 products processed: 1.23 GB
476000 products processed: 1.23 GB
477000 products processed: 1.24 GB
478000 products processed: 1.23 GB
...

There’s something wrong with #build_excel_file. Let’s see what it does:

it iterates over batches of 1k products
for every batch, it iterates over each one of the products, building an array and adding it to the excel

Also, each product is loaded into memory along with some associated records, and an array is built. We can try and see if this process of bringing 1k products and associated records, and building 1k arrays leaks memory by commenting out (for now) the code that builds the Excel file:

def build_excel_file
  @products.in_batches.each_with_index do |batch, i|
    batch.each do |p|
      [
        p.code,
        p.provider.name,
        t(p.mode, scope: :modes),
        p.percentage_category.to_s,
        p.kind_name,
        p.color_name,
        p.brand.name,
        p.created_at.strftime("%d/%m/%Y"),
        p.sold_at.try(:strftime, "%d/%m/%Y"),
        t(p.state),
        p.gender,
        p.price
      ]
    end

    puts "#{(i * 1000) + batch.size} products processed: #{memory}"
  end
end

This time, #in_batches, works as expected. After every iteration, the memory remains practically the same:

start: 165 MB
1000 products processed: 169 MB
2000 products processed: 180 MB
3000 products processed: 184 MB
4000 products processed: 185 MB
5000 products processed: 186 MB
...
148000 products processed: 191 MB
149000 products processed: 191 MB
150000 products processed: 191 MB
...
589000 products processed: 191 MB
590000 products processed: 191 MB
591000 products processed: 191 MB
...
719000 products processed: 191 MB
720000 products processed: 191 MB
720008 products processed: 191 MB

But in this case, we’re not doing anything else than returning an array for each item in the batch. After some time, the 1k products that we loaded plus the 1k arrays that we built will be erased from memory by the garbage collector. The problem starts when we need to collect them to build the Excel file. Even if we collected those small arrays outside of the iteration that adds the rows to the excel file, the problem would remain. It’s a lot of data and it will normally use a lot of memory.

The solution

The solution is simple: instead of writing the Excel file to disk after collecting all the rows, we need to write them to disk after each batch, for example. This way, once the batch finishes, the memory used to allocate the batch and the arrays will be collected by the garbage collector. But unfortunately, the axlsx gem doesn’t support opening existing Excel files.

After reviewing the final xls file and speaking with the client, we decided to stop exporting Excel files and export CSV files instead. The client wasn’t really taking advantage of the features Excel files provide anyway, so exporting CSV files instead ended up being an improvement.

CSV files have many benefits:

they’re plain text files that can be imported into Excel if they want to
they can be opened, written, closed, and reopened as many times as we want
since they’re plain text files we can open them in a+ mode (reading/appending)

After this refactoring, the code looks much cleaner:

class ExportProductsJob < ApplicationJob
  include ActionView::Helpers::NumberHelper

  def perform(params, user)
    puts "start: #{memory}"
    @products = ProductFilter.new(params, user).call

    UserMailer.send_products_export(user).deliver_now
  end

  private

  def build_excel_file
    file = File.open Rails.root.join("tmp/products.csv"), "a+"

    @products.in_batches.each_with_index do |batch, i|
      batch.each do |p|
        row = [
          p.code,
          p.provider.name,
          t(p.mode, scope: :modes),
          p.percentage_category.to_s,
          p.kind_name,
          p.color_name,
          p.brand.name,
          p.created_at.strftime("%d/%m/%Y"),
          p.sold_at.try(:strftime, "%d/%m/%Y"),
          t(p.state),
          p.gender,
          p.price
        ].join(",")

        file.write "#{row}\n"
      end

      puts "#{(i * 1000) + batch.size} products processed: #{memory}"
    end

    file.close
  end

  def memory
    number_to_human_size `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i).last.kilobytes
  end
end

And it doesn’t leak memory anymore:

start: 164 MB
1000 products processed: 171 MB
2000 products processed: 183 MB
3000 products processed: 184 MB
4000 products processed: 184 MB
5000 products processed: 184 MB
...
300000 products processed: 189 MB
301000 products processed: 189 MB
302000 products processed: 189 MB
...
718000 products processed: 190 MB
719000 products processed: 190 MB
720000 products processed: 190 MB
720008 products processed: 190 MB

Conclusion

Debugging memory leaks is not always easy. Not always you can use tools like benchmark-memory or APMs such as Scout, NewRelic, or similar. In this case, none of these tools could be used, so we had to go back and forth testing different parts of the code (the code above is just a simplified, beautified, fragment of the real code, which piles up a good amount of technical debt apart from this issue), trying to find the memory leak.

It requires some experience working on Ruby and Ruby on Rails to understand exactly how Ruby manages memory and how you can leverage ActiveRecord to be more performant and less resource-hungry. Sometimes, the solution results in a beautiful code that is easy to understand (such as in this case), but sometimes the solution is less beautiful. And also sometimes, you need to sacrifice speed (in a background job speed is not critical) in favor of decreasing memory consumption. Each case needs to be carefully evaluated.

If your Ruby on Rails application needs to be evaluated by a professional, don’t hesitate to contact us. We’re now offering a free application assessment where we analyze your application and build a report with this kind of improvement and fixes for you to act upon.

Rediscovering fixtures

2024-05-27T00:00:00+00:00

Over the last couple of months, I’ve been investing a lot of time in improving my testing skills. I spent most of this time in going back to the basics, trying to start from scratch with every little concept one takes for granted. Trying to see everything through a child’s eyes.

There are 3 things that I’m giving importance to in this quest: speed, simplicity, and maintainability. And today I’m putting factories under scrutiny.

I started using factories (particularly factory_bot, when it still was factory_girl) more than a decade ago. In my very first Ruby on Rails project I tried to do everything The Rails Way, but soon I felt like fixtures were bad because, well, we didn’t use them properly (as it always happens when you use something for the first time) and the Rails Guides didn’t cover fixtures organization. It is over time that you get to understand how everything works and start mastering it, but I didn’t give fixtures a second chance. I was sold on using factory_girl because with fixtures:

you are not creating objects, you’re creating db rows (database structure)
yaml is a bit foreign and uncomfortable to write compared to Ruby
model associations are named, meaning you need to scan several yaml files to find associated objects

To be honest, factories don’t solve all these issues. You can feel more comfortable writing your test data with a Ruby DSL (Domain-Specific Language) than yaml files, but you will still need to scan several files to find the associated objects.

Factories add other features such as callbacks, which are handy. But the bigger and more complex your domain is, the bigger and more complex your factories will be.

However, I can’t recall a single project where I used factories where it didn’t bring frustration to the table. Most of the time I found to have mystery guests in our test suite. But also, factories got bigger and more complex over time making it harder and harder to follow over time. This is the motivation for writing our guidelines for writing better specs.

So, if we consider speed, simplicity, and maintainability, factories are not the best choice.

Before jumping into these 3 things, let’s answer…

What is a test fixture?

Seems like a dumb question that everyone takes for granted, but we need to answer this to move forward and decide what’s a better approach for us.

A test fixture is all the things we need to have in place to run a test and expect a particular outcome.

This means that, before running a test, we must know what objects we will be dealing with. Not only the SUT (Subject Under Test) but all its associated objects.

This can be achieved with both fixtures and factories, but not quite. With factories, we don’t have a test fixture until we create it before or during each test.

Speed

Fixtures are faster than factories. The yaml files become database inserts instead of ActiveRecord objects that get saved. This saves a ridiculous amount of time as validations and callbacks are not run. This comes with a tradeoff: the data in your yaml files are not validated, which could result in runtime errors. But this forces you to have the right database constraints in place, which is always good. Factories, on the other hand, become ActiveRecord models, which means that they are validated and all callbacks run.

Here’s a quick benchmark: A newly created Rails app that has only one model. We will run 100 tests that assert that 5 users are valid. With factories, we’d have this user factory:

FactoryBot.define do
  factory :user do
    email { Faker::Internet.email }
    password_digest { BCrypt::Password.create("password") }
    first_name { Faker::Name.name }
    last_name { Faker::Name.last_name }
  end
end

and this is the test:

require "test_helper"

class UserTest < ActiveSupport::TestCase
  100.times do |i|
    test "test #{i}" do
      users = create_list :user, 5

      assert users.all?(&:valid?)
    end
  end
end

With fixtures:

<% 5.times do |i| %>
user_<%= i %>:
  email: <%= Faker::Internet.email %>
  password_digest: <%= BCrypt::Password.create("password") %>
  first_name: <%= Faker::Name.name %>
  last_name: <%= Faker::Name.last_name %>
<% end %>

and this is the test:

require "test_helper"

class UserTest < ActiveSupport::TestCase
  100.times do |i|
    test "test #{i}" do
      assert users.all?(&:valid?)
    end
  end
end

The results are eloquent. The results with factories are:

Running 100 tests in parallel using 11 processes
Run options: --seed 64117

# Running:

....................................................................................................

Finished in 14.061858s, 7.1114 runs/s, 7.1114 assertions/s.
100 runs, 100 assertions, 0 failures, 0 errors, 0 skips

And with fixtures:

Running 100 tests in parallel using 11 processes
Run options: --seed 35467

# Running:

....................................................................................................

Finished in 2.008258s, 49.7944 runs/s, 49.7944 assertions/s.
100 runs, 100 assertions, 0 failures, 0 errors, 0 skips

We can see that fixtures are 7 times faster than factories for this simple benchmark. However, it’s not something to take for granted. Hello world kind of benchmarks aren’t real benchmarks, but they give you a notion of the best-case scenario.

Simplicity

From the implementation perspective, fixtures are simpler than factories. They’re just yaml files that get inserted into the database. Then in your tests, you can use ActiveRecord as you normally would. Factories on the other hand is a Ruby DSL with certain features that can make the factories more or less complex depending on how you use them. It’s a simple Ruby DSL, but a DSL at the end of the day.

With fixtures, you model your test data specifically for the tests you will make. Everything will be in place, and available for you when you write your tests. You can also use the same data in development (or even build the data in development and then dump it into your fixtures), which is useful.

With factories, test data is disposable. They give you a disposable object with meaningless data that you can’t use (at least out of the box) in development. You don’t exactly know what your test fixture looks like, you need to build it before each test, making the test suite larger, and harder to read and follow.

Maintainability

Maintainability is often overlooked. But when you get a legacy Rails application that has been in production for a couple of years (we have several cases like this!), you wish the test suite was running. And if it’s running, that the test data is easy to understand.

You can simply load your test fixture in your development environment, play around with it, and make sense of it. Factories, again, give you a way of building generic and meaningless test data. You need to read factories and tests, scan them up and down, back and forth to understand what’s going on. Usually, a 5-minute fix becomes a 5-minute fix plus 5 hours of fixing a test.

Conclusion

Fixtures are more powerful than we think they are. By using them we have more benefits than using factories in the long run:

they’re faster
they allow us to better understand our test data
they’re simpler and easier to maintain than factories, so it’s harder to make a callback mess out of them
they provide a unified testing and development dataset

It’s time to give them a second chance.

Guidelines for writing better specs

2024-03-26T00:00:00+00:00

As we’ve seen in one of the previous posts in this series, RSpec has a lot of cool features that can easily be misused and can make your test suite really painful, hard to work with, affect the team productivity and morale, enrage the management and ultimately lose confidence in your own code.

We’ve also analyzed the benefits of a more direct approach to testing using Minitest and the Four-phase testing pattern in the previous post

In this post, we will briefly tell you what we did in those projects where changing the testing stack was not viable. We will define a reduced and safe subset of the RSpec features and a set of guidelines on how to use them to avoid the problems we outlined.

TL;DR

This is the set of guidelines we came up with after a thorough analysis of the test suite, the problems it was facing and their root causes:

Use context to define the test scenario in natural language
Use context to group related tests together
Keep factories to their minimal expression (pass validations)
Don’t put test setups in factories
Don’t put test setups in contexts that have nested contexts
Don’t put its in contexts that have nested contexts
Use let! to define objects in the setup that the it will use to exercise and verify
Use before to define objects and relations that must be there but are not directly used in the following it
Define test setup in the context that has the its
Do not use let
Do not use shared examples
Do not use subject
Repetition in test setups is ok
If repetition is unbearable, extract the setup into a tailored factory method inside an instance method in the same spec file.

Read ahead if you would like to know the reasoning behind each of these guidelines

Main pain points

Let’s first revisit the main issues we faced in those projects. The root cause of those issues was the deep inheritance tree of test setups. This inheritance tree was enabled by 2 things:

RSpec’s ability to allow infinitely nested contexts
Naively optimizing code-writing: who wants to write again the same test setup that I wrote 15 lines above?

The two test anti-patterns that this produced were:

Global Shared Fixtures
Mystery Guests

Global shared fixtures

See: Standard Fixture

TL;DR: This means that the setup is shared among several test cases.

This brings three main problems:

As it is global all the tests that depend on the fixture are coupled together.
As there are many test cases depending on it, it will probably create more things than needed for each specific test
There’s a lack of information categorization. As it holds details of many test setups, is hard to tell which part of it is important on a given test.

A global fixture usually lives in:

Factories
lets and befores near the top of specs with deeply nested contexts

Mystery Guest

The test reader is not able to see the cause and effect between fixture and verification logic because part of it is done outside the Test Method.

see: Mystery Guest

This means the test is running on top of out-of-band information that the dev needs to search, find, and reconstruct to fully understand what’s going on in the test.

Usually found in:

Factories
lets and befores near the top of specs with deeply nested contexts
fixture files stored on disk

The Safe Toolset

As you can already imagine we used the conceptual structure of the Four-Phase Test Pattern:

Setup
Exercise
Verification
Teardown

So, the first step was to define a mapping between Rspec DSL and the four phases. Here’s what we came up with:

before, let and let! are the Setup phase
it will hold both the Exercise and Verification phases
after is Teardown

Surprisingly, there are no mappings for context or describe. This was the first revelation we had. The source of a lot of the problems we had is just not there anymore :). And so, the first question arose: if they are not an essential part of the test, why are they there? what is their mission?

The conclusions we arrived at are:

They are mandatory: you need at least one describe as the entry point of the rspec tests.
They are documentation, the description of the test case. Similar to the method name in Minitest::Unit, or the #test method in ActiveSupport::TestCase
They are used in the test runner output, which is quite nice with nested contexts

So, taking all this into account, how can we use rspec and avoid the mentioned problems?

Guidelines

Here we will outline a few DOs and DON’Ts on how to use each of the methods in the safe subset and other tools and methods.

describe / context

Use describe and context as descriptions. The idea is to use these descriptions as the specification of a test’s setup. A code review can easily detect if there are discrepancies between test specification (what the dev wants to test) and implementation (what is actually being tested).

DOs:

use them to explicitly communicate to the reader what’s the test case and how the test setup should be built
use them to group related tests together (nesting contexts is ok if you respect the DON’Ts)
be verbose

DON’Ts:

Do not use lets or befores if the context has nested contexts
Do not use its if the context has nested contexts

let and let!

This one is one of the most interesting in my opinion. Here goes the controversial statement:

let and let! are just a DSL that hides away variable assignments. You don’t really need them to express a test case setup.

Do you agree?

let is even more problematic, as it only makes sense if you have a massive shared fixture at the top of the test file.

One can understand the reason behind let: rspec is trying to solve problem number 2 (creating more things than needed for each specific test, which makes the test brittle and slow). But, in my opinion, this would have been the first symptom showing us that something is not ok: The problem is not that variable assignment is not lazy/performant, the problem is the huge shared fixture at the top. Without it, you don’t need lazy variable assignments. A good old plain assignment that takes effect where it is defined is more than enough, straightforward, no second guessing when it is going to actually be defined.

Anyway, we didn’t remove them from the safe subset, as we found a communicative advantage in using them (see the next section before).

DOs:

Only use them inside the context that is the direct parent of its and have the exercise and verification phases
Always, always, use let!

DON’Ts:

Never use let

`before`

before and let! are both useful for defining stuff that belongs in the test setup. But how can we decide when to use one or the other? We use these simple rules for deciding:

Use let!s for pieces of setup that you need a handle to. That is: variables that are directly used in the its (exercise and verification phase).
Use befores for uninteresting things that need to be there but are not used directly in the test

The idea is to minimize the distracting bits of the test setup and make it evident if the test is fulfilling its description.

it

Finally! its hold the exercise and verification phases.

DOs:

Always make them the leaves of the contexts tree
use a blank line to visually separate the two phases

DON’Ts:

Don’t use them in the middle of the tree.

Final example

After all these points, arguments, DOs, and DON’Ts, how does a test using these guidelines look like? see for yourself:

describe "Creating a profile" do
  context "without providing location id" do
    context "when profile does not exist" do
      let!(:brand) { create(:brand) }
      let!(:user) { create_user(brand: brand) }
      let!(:service) { CreateProfileService.new(phone_number: "+155512345678") }
      let!(:default_location) { create_location(id: SecureRandom.uuid) }

      before do
        brand.locations = [default_location]
        brand.default_location = default_location
      end

      it "creates a profile for the default location" do
        service.call brand: brand, user: user

        expect(Profile.count).to be 1
        expect(Profile.last.location).to eq brand.default_location
      end
    end
    context "when profile exists with phone number within a Brand" do
      let!(:brand) { create(:brand) }
      let!(:user) { create(:user, brand: brand) }
      let!(:service) { CreateProfileService.new(phone_number: "+1 555 1234 5678") }

      before do
        create :profile, phone_number: "+1 555 1234 5678", brand: brand
      end

      it "does not create a profile" do
        result = service.call brand: brand, user: user

        expect(Profile.count).to eq 1
        expect(result.errors).to be_present
        expect(result.errors.first).to eq :profile_already_exists
      end
    end
  end

  context "providing a location id" do
    context "when the brand does not have a 3rd party integration" do
      #...
    end
    context "when the brand has a 3rd party integration" do
      #...
    end
  end
end

Doesn’t look too alien to my old-time RSpec eyes, what do you think?

Duplication is ok

This might be a little counterintuitive, but having repetition in the test’s setup or verification is ok.

Duplication causes less harm and waste of effort than the previous problems, it helps isolate each test, thus making debugging and finding problems easier when tests fail.

If duplication is not bearable or feels absolutely dumb, extract the repeated bits in carefully crafted Factory Methods (this is not a factory-bot factory, but an actual ruby method like in the factory method pattern).

For example, let’s assume all the test cases for a given service object need a User (that belongs to a Location which belongs to a Brand), and has a specific role on a Group inside that Location. But some of the tests care about the brand or location and Some don’t. But no test in this file cares about the group or the role. You can have a method defined inside this file as:

describe "Creating a user" do
  #...
  def create_user(brand: create(:brand), location: create(:location, brand: brand))
    create(:user).tap do |u|
      u.set_role :standard, create(:group, location: location)
    end
  end
end

This allows us to keep all the benefits of the above guidelines, and at the same time, clean up the little repetitive bits of complex test setups that are not interesting for the test cases at hand. But it is not the “global” unique way to create a User, is specifically tailored for the tests in this test file, thus the impact of the shared logic is contained.

Now, You might be thinking… Why not an actual factory? we’ll see in the next section

About factories

As mentioned before, using a full-featured factory library like factory_bot, gives us another way to obscure our test setups, move them out of the test context, and make them easily fall into a global shared fixture. Similarly to let and let! these libraries provide a DSL that replaces method calls and object instantiation. We still use them, because:

team’s culture
DSL is comfortable

But we follow these guidelines strictly:

Keep your factories to the minimal expression possible. Just make them pass validations.
If you are building a bunch of complex relations in your factories, stop and think if you are not moving test setup inside factories.
If you are, then:
- Don’t put test setups inside factories
- Use factory methods specially crafted for constructing such relations defined as instance methods in the spec file

The purpose of factories is not to define the needed relations in test setups but to hide uninteresting details for the current test about certain needed objects. Factories help to have those unimportant details defined with valid data and avoid polluting the information space of the test with a noisy setup.

For example:

describe "Creating a profile" do
  context "without providing location id" do
    context "when profile does not exist" do
      let!(:brand) { create(:brand) }
      let!(:user) { create_user(brand: brand) }
      let!(:service) { CreateProfileService.new(phone_number: "+155512345678") }
      let!(:default_location) { create_location(id: SecureRandom.uuid) }

      before do
        brand.locations = [default_location]
        brand.default_location = default_location
      end

      it "creates a profile for the default location" do
        service.call brand: brand, user: user

        expect(Profile.count).to be 1
        expect(Profile.last.location).to eq brand.default_location
      end
    end

Here the test doesn’t care about the brand’s or user’s name, but it does care about the relation between the two (a user in that brand). Without factories, this setup would look like:

  let(:brand) { Brand.create(name: "My Test Brand", address: "742 Evergreen Terrace, Springfield", type: :multi) }
  let(:user) { User.create(brand: brand, name: "My user name", password: "admin1234", password_confirmation: "admin1234", role: :admin, email: "admin@my-email.com") }
  let(:location) { Location.create brand: brand, name: "My Test Location", address: "742 Evergreen Terrace, Springfield, Again", another_uninteresting_field_for_this_test: true, but_needed_to_create_a_new_instance_of_this_class: true }
  let(:group) { Group.create name: :department, location: location}

  before { user.add_role :admin, group }
end

To figure out the relationship between the user and the brand, you would need to read all of this and connect the dots in your head. And to learn what are the interesting bits of setup, you need to read the whole test, which you’ll do anyway, but a little explicitness up front makes it way better.

Be kind to your reviewers. Be kind to your future self

About shared Examples

Don’t

Abstracting away the test’s exercise and verification phases ends up with highly-coupled and anemic tests. Other things benefit from an abstraction or a DRY-up refactoring. In general, tests don’t.

About `subject`

Avoid using subject when other ways to express which is the Subject Under Test are available. This falls along the lines of “prefer explicit vs implicit”.

Use more Ruby! <3

Ruby’s Object Orientation, methods, and classes are fantastic tools, super-expressive, and available everywhere, in rspec specs too. You don’t need to force yourself into the DSL. Use what it is good for, leave the rest out of your toolbox.

Always have a critical eye on the things you learned and don’t be afraid of revisiting your principles. You might learn or discover something new or unseen that better fits your team’s culture and brings benefits in the long run!

Make minitest/unit Great Again

2024-03-18T00:00:00+00:00

My journey with testing started with Ruby. My first tests were on a Rails 3 application I was working on before starting SINAPTIA. Probably around 2011. It was a good experience and I learned a lot. The tests were messy, but they helped me understand testing better.

After that experience, I started looking at minitest/spec and RSpec looking for something more clever than the standard Test::Unit framework that came with Rails (it was still Test::Unit back then!). I used minitest/spec a lot until we started working with Alliants. In the project we started collaborating they had RSpec already set up. So I started using it and getting comfortable with it.

Fast forward to 2024. If you haven’t already, read our previous post for context.

The situation described in our previous post had an impact on me. I started asking myself what was wrong with our test suite, and if there was a better way for junior devs to approach testing. Not only that, it was hard for someone more experienced to understand tests quickly.

What’s wrong with RSpec

I want to be clear. There’s nothing intrinsically wrong with RSpec, it’s a nice framework and it’s widely used. However, the way it’s designed allows you to over-engineer your tests so you can write more tests with less code. Nested contexts, shared examples, let and let!, are all accomplices of this crime: one-sentence-tests.

In RSpec is common to find tests like this:

RSpec.describe InvoicesController do
  describe "#create" do
    context "when the user is not logged in" do
      # tests
    end

    context "when the user is logged in" do
      let(:user) { create :user, role: role }

      before do
        sign_in user
      end

      context "and it's admin" do
        let(:role) { "admin" }

        # tests
      end

      context "and it's finance" do
        let(:role) { "finance" }

        # tests
      end

      context "and it's customer" do
        let(:role) { "customer" }

        # tests
      end
    end
  end
end

As you can see, tests are read beautifully. But understanding what a single test does means scanning the whole suite looking for breadcrumbs.

Back to the basics

After the situation described in our previous post happened, I started being critical not only of our usage of RSpec but testing in general.

What’s the purpose of testing?

This is the first question we need to ask ourselves. What do we want to achieve with the tests we write? To me, it’s confidence. Confidence that the code we’re producing works as expected. Confidence that the feature we’re adding to a project is not breaking other features.

An important aspect of testing is maintainability because a test suite difficult to work with will harm confidence: to sustain a high level of confidence, tests must be easy to change, fix, and debug. Applications evolve and their test suites must evolve along with them. When tests get harder to change and fix, the team’s productivity drops, and rather sooner than later people will tend to test the minimum possible, or avoid testing at all.

So what can we do about it? Good old Four-phase tests to the rescue!

Four-Phase test

The Four-Phase test is a testing pattern. It consists of structuring each test with four distinct parts executed in sequence. The idea is to make what we’re testing obvious. The four phases are:

Setup: We prepare the subject under test and anything we need to observe the desired outcome of the test.
Exercise: We interact with the subject under test.
Verify: We do whatever is necessary to determine whether the expected outcome has been obtained.
Teardown: We put everything back into the state we found it.

The Four-Phase test is a great pattern for people interested in testing for the first time, but also for people who like things simple. Using Rails’ sugar, the most purist approach to the Four-Phase test would be to write a test like this:

class InvoicesControllerTest < ActionDispatch::IntegrationTest
  test "admin user can create invoices" do
    # setup
    @user = create :user, role: :finance
    invoices_count = @user.invoices.count

    # exercise
    post invoices_path, params: {invoice: {}} # assuming we're sending the right params

    # verify
    assert_equal invoices_count + 1, @user.invoices.count

    # teardown
    @user.destroy # assuming this also destroys associated invoices
  end
end

Let’s compare this pattern to the code above. Where’s the setup? where’s the exercise, and where’s the verification? Well, they are all over the place. Setting up a test in the “and it’s admin” context, means scanning up to the parent contexts and describes looking for before blocks, if the before block uses a variable not defined in it then you must scan for a let, and if there’s a let! it will be always executed. Then, in the actual test, a variable could be referenced that has never been defined, so you need to scan the file again for said variable.

In this purist example, we have one method doing the four phases, very easy to read. Readability improves a lot, even if tests get longer. However, it’s too purist for my taste.

In minitest/unit, you can share common setup and teardown methods, and add as many test_ methods as you want (which act as exercise and verify). Moreover, Rails adds a small layer on top of Minitest which helps reverting your changes to the database. Similar to the example above, in minitest/unit (using Rails’ sugar) we could write:

class InvoicesControllerTest < ActionDispatch::IntegrationTest
  setup do
    @user = create :user
  end

  test "can't create an invoice if the user is not logged in" do
    # the actual test
  end

  test "admin user can create invoices" do
    # This line acts as part of the setup. One could argue that it's confusing, but having just a bit of setup in the test is not bad.
    @user.role = :admin
    # the actual test
  end

  test "finance user can create invoices" do
    @user.role = :finance
    # the actual test
  end

  # and so on...
end

As you can see, we are testing the same things as in the RSpec example but more direct, and less layered. It’s easier to understand, and maintain and encourages new contributors to write more tests for the features they add, which is always desirable.

Final thoughts

At the end of the day, it doesn’t matter which testing framework you choose or if you choose not to test at all (though we don’t encourage this). What matters is that you are confident about the code you put in production, and that can be accomplished in many ways and with many frameworks. What I will certainly try to do is encourage people to write tests in the simplest way possible and I think that can be achieved much easier with minitest/unit.

But that’s not the end of the story. Not all projects are subject to change the testing stack, or suddenly change the testing culture of a team. We’ve been through these situations too. That doesn’t mean there’s nothing to do to improve your test suite’s and team’s testing culture. Want to know what we did? Stay tuned for next week’s post where we’ll talk about strategies you can apply to your rspec test suite to avoid running into this kind of issues in the long run.

Don’t over-engineer your tests

2024-03-06T00:00:00+00:00

For various reasons, we ended up using RSpec by default in all our Ruby on Rails projects. Sometimes we inherit a project using it, and sometimes our clients prefer it over Minitest. We would start all new projects using it over Minitest for this reason.

These are the things we like about RSpec:

tests read naturally
tests are easy to write once you understand the mechanics
very succinct syntax
one of the most popular testing frameworks
good documentation and online resources

We have an internal tool, a time tracker, that we built a couple of years ago and use every day. Naturally, when we started working on it, we decided to go for RSpec as the testing framework, like we’re used to.

All of our test suites are normally comprised of RSpec, shoulda-matchers, and factory_bot. And if the project needs it, we could use more gem dependencies to help out our testing.

When my colleague, a junior developer, started working on the time tracker, he needed to add a couple of features, and we wanted him to add tests for them. Nothing out of the ordinary, though: model validations, controller tests, etc. Having not tested before, it was a good opportunity for him to learn and practice.

The tasks he needed to complete were very simple and he did great, but he struggled with the test suite. I asked myself why, because to me, it was pretty straightforward and he would only need to copy certain structures present in other tests, make sense of what he needed to test and that would be it. Simple.

But afterward, when shared with me the tests he was doing, I was surprised. The tests were correctly written (they were based on other similar tests) and the code made sense. But even then, some of them wouldn’t pass.

This is an extract of one of them: it’s a class that has a start_date, an end_date, and a duration, which is calculated before_validation. Something like this:

class Event < ApplicationRecord
  validates :duration, :start_date, :end_date, presence: true

  before_validation :set_duration, if: :dates_present?

  private

  def dates_present?
    start_date.present? && end_date.present?
  end

  def set_duration
    self.duration = end_date - start_date
  end
end

It is a simplified example, the actual class should also validate that the start_date can’t be greater or equal to the end_date and the actual model also has additional fields and validations, but it should be enough for this article.

The spec would look something like:

RSpec.describe Event do
  describe "validations" do
    subject { build :event }

    it { should validate_presence_of :duration }
    it { should validate_presence_of :start_date }
    it { should validate_presence_of :end_date }
  end
end

Simple tests, right? We have 3 validations in the class, we test those 3 validations. They also read beautifully: “It should validate the presence of duration”. Pretty close to English. That’s RSpec’s beauty. But do you see the problem yet? Well, it’s hidden at first sight: the first test will always fail. You can take a moment to re-read it and think about why.

The duration is set before the validation occurs, and the subject under test is a valid Event (ie. with start_date and end_date).

This is a big problem. We’re sacrificing a direct interaction with our subject under test in favor of readability. This is great if we’re used to it, but let’s not forget the original problem: a junior developer cloned the project, added a feature, and then struggled to test it. The problem wasn’t obvious to him until he spent more than twice as much time working on the test suite than the actual feature and he decided to put the PR under review to get some help from me.

This is a huge problem. Spending more time trying to test something than programming it is a symptom that something is wrong with our test suite.

What’s causing this?

Over-engineering is the act of providing a solution to a problem in an elaborate or complicated manner, where a simpler solution can be demonstrated to exist with the same efficiency and effectiveness as that of the original design.

In other words, we like to be clever when writing tests, and RSpec allows us to be clever when writing tests. Nesting many contexts, defining objects in different contexts that will only be created when called (or when defined), is a way of over-engineering our tests. We’re building layers and layers of tests that, in favor of aesthetics, are impacting the ease of understanding.

Using a library such as shoulda-matchers adds another layer of complexity, even though using it sounds harmless. When testing using should validate_presence_of, we’re essentially not exercising our subject under test (these terms come from the Four-phase test pattern), shoulda-matchers is. And shoulda-matchers is a library and as such, must provide a generic implementation. To achieve this, the matchers are much more complex than explicitly interacting with the subject under test and performing a simple assert.

This over-engineering looks great and we feel great when we write tests like this, but then you have to maintain those tests. And more importantly, someone else should understand those tests easily. And when coming back to a test suite like this, and having to jump up and down tracking down a variable, believe me, you don’t feel great anymore.

Model specs like this one, usually have no more than 10 tests. Controller specs tend to get worse in terms of over-engineering. Especially if we’re handling roles, permissions, etc.

How did we solve this?

In this particular case, we decided to ditch the test as a valid Event, ie an Event with a start_date and an end_date, will always set the duration set before_validation.

An alternative approach would’ve been to change the subject to a new Event (though shoulda-matchers advise providing a valid object) or add a context when the Event is new.

To be continued…

This process made me think if I should be critical about how I test and if I should do something about it. My verdict is in the next post.

Optimizing background jobs execution with Delayed::UniqueJob

2024-02-15T00:00:00+00:00

Efficiency in deferred job execution is crucial for any web application, particularly when dealing with time-consuming or resource-intensive tasks. In this context, the Delayed Job gem proves to be an invaluable tool for handling background tasks effectively. However, we often face a common challenge: how do we ensure that a job is enqueued only once?

Unveiling the issue

Picture this common scenario: a user performs an action through a dashboard, which results in enqueuing a job. Simultaneously, an automatic cron job schedules the same task. This is where the predicament arises - a job could potentially be enqueued multiple times.

Moreover, consider a situation where a job, once enqueued, might take a considerable amount of time to complete. If the job is inadvertently enqueued more than once, either by user interaction or automated processes, it could lead to redundancy and wasted resources.

Existing solutions to this problem

Delayed Job doesn’t have concurrency controls out of the box. If we need to ensure unique jobs are enqueued, we need to find other options:

good_job comes with concurrency controls out of the box. You can prevent duplicate jobs from being enqueued or performed.
solid_queue comes with concurrency controls out of the box. You can prevent duplicate jobs from being performed at the same time, but they are never discarded, just blocked.
sidekiq comes with concurrency controls, but only the enterprise version of it.

While these solutions effectively address the issue of unique job enqueuing, there are cases where sticking with Delayed Job becomes necessary, either due to project constraints or a preference for Delayed Job mature and stable foundations.

Analysis of Delayed Job

Delayed Job, being a mature and established library, has its strengths and limitations. While it lacks certain features like built-in support for concurrency controls, it remains a reliable choice with a straightforward design. However, it’s essential to acknowledge that features available out-of-the-box in other job processing frameworks, such as Sidekiq or good_job, might require additional gems when using Delayed Job.

In our specific case, we have project constraints to stick with Delayed Job, so we need to find a solution.

Introducing Delayed::UniqueJob

Delayed::UniqueJob provides a seamless solution for handling unique jobs.

This plugin introduces the #enqueue_once method, designed to ensure that a job is enqueued only once, regardless of how many attempts are made. To do this, the job must define a #unique_key method that #enqueue_once will use to determine the uniqueness of a job. This could be as simple as a string, or include data from objects.

Here’s a quick guide on how to integrate and use this gem in your Ruby on Rails application:

class MyUniqueJob
  def perform
    # Your job logic
  end

  def unique_key
    "my-unique-job"
    # or you could constrain the job to run once per object, returning something like
    # "my-unique-job-#{user.id}"
  end
end

# #enqueue_once returns true if the job was enqueued, or false if it was already enqueued
if Delayed::Job.enqueue_once(MyUniqueJob.new)
  # job enqueued
else
  # job was already enqueued
end

Next Steps

As you embrace the efficiency brought by Delayed::UniqueJob, consider exploring further optimizations in your job execution workflow. Evaluate additional features provided by related gems to tailor the solution to your application’s unique needs.

There are many alternatives, we need to analyze which is the best fit for our project.

A remote require for Ruby

2024-02-07T00:00:00+00:00

A couple of years ago I was learning about Deno. Deno is a JavaScript runtime created by Node’s creator, Ryan Dahl. I remember being fascinated by Deno’s remote import, which essentially allows you to import a module that’s stored somewhere on the internet. The most basic example goes as follows:

/**
 * remote.ts
 */
import {
  add,
  multiply,
} from "https://x.nest.land/ramda@0.27.0/source/index.js";

function totalCost(outbound: number, inbound: number, tax: number): number {
  return multiply(add(outbound, inbound), tax);
}

console.log(totalCost(19, 31, 1.2));
console.log(totalCost(45, 27, 1.15));

/**
 * Output
 *
 * 60
 * 82.8
 */

As you can see, one can simply import a file that’s stored somewhere on the internet and use it freely.

But, could something like this exist in Ruby?

Proof of concept

In Ruby, a similar feature would look like this:

require "https://gist.githubusercontent.com/patriciomacadden/68523656b306e39a4c2fd54f823f320e/raw/f6470e4a0b0352d1051e510b5613315dfdb109ff/remote_require_test.rb"

Person.new.walk

Here, the intent is to require a ruby file that’s stored in a gist. Then, we’re instantiating a Person object that’s defined in that file and calling its #walk method.

How would this work?

First, let’s not touch require just yet. Let’s create a remote_require method instead. It’d be clearer for now.

An initial implementation would be:

require "net/http"
require "tempfile"

def remote_require(url)
  content = Net::HTTP.get URI(url)

  file = Tempfile.new [File.basename(url, ".*"), File.extname(url)]

  file.write content
  file.close

  require file.path
end

This, a basic implementation of the remote_require method, takes a URL as its only argument and it doesn’t consider that URL to be wrong or not containing a Ruby file. Then, it reads the content of the given URL and stores it in the file system. After storing the file, it just requires it. You can test this implementation by just pasting the code above in an IRB session plus this:

remote_require "https://gist.githubusercontent.com/patriciomacadden/68523656b306e39a4c2fd54f823f320e/raw/f6470e4a0b0352d1051e510b5613315dfdb109ff/remote_require_test.rb"

Person.new.walk

Refining the PoC

Back to the original snippet of the proof of concept, to be able to do this with require, we’d need to monkey patch Kernel#require. Doing this is not that difficult:

require "net/http"
require "tempfile"

module Kernel
  alias original_require require
  def require(path)
    if URI.parse(path).scheme&.start_with? "http"
      remote_require path
    else
      original_require path
    end
  end

  private

  def remote_require(url)
    content = Net::HTTP.get URI(url)

    file = Tempfile.new [File.basename(url, ".*"), File.extname(url)]

    file.write content
    file.close

    original_require file.path
  end
end

In this case, we’re aliasing the require method to original_require and then creating a new require method which will decide if the given string is a URL or not. If it’s an URL it will call the remote_require method and if it isn’t it will call the original_require method. Business as usual.

Is it enough?

Of course, it’s not. Requiring files is much harder than this.

If we think about this for a moment, require deals not only with files but also with gems. So if we’d want to require a gem, say for example, Sinatra, we’d need to download it, unpack it, etc. Everything the gem command does for us. Or, if we get greedy, everything Bundler does for us. To download any gem, not only do we need to download, unpack, etc. but also download, unpack, etc. all its dependencies. And the same for all the dependency graph. As you can imagine, a very difficult task already solved by Bundler.

Moreover, to make this usable out of the box (ie not pasting this snippet when we intend to use it), we need to change the Kernel#require implementation.

Is it worth it?

Enabling the widespread availability of the remote require within the language would mean a huge effort. Not to mention that it’s not something demanded by the community or even something I imagine using myself.

But for sure it was a fun experiment!

A discussion around Integer division in Ruby

2024-01-31T00:00:00+00:00

Ruby defines the behavior of the division operator between two Integer numbers with the Floor Division method, this results in all non-error operations having Integer results. To obtain a Float result, you can either use the float division method #fdiv or cast one of the operands to Float using #to_f.

Do you know these memes of JavaScript snippets where they get ridiculous results such as [] + [] being an empty string? There is even a repository called WTFJs that compiles them. Well, there is nothing like this for Ruby, because Ruby is a very intuitive language.

Then one day, I typed the following in an IRB session and this happened:

3.0.0 :001 > 1 / 2
=> 0

I was expecting the result of the division to be 0.5 or 1/2r. Why is it 0? Because 1 and 2 are Integer numbers and the division operator, when used with two Integer numbers, uses the Floor Division method which always results in an Integer number, like 0. Why was I expecting 0.5 or 1/2r then? Because in Mathematics the division between two Integer numbers is not closed on Z (set of Integer numbers), this means that the result could be found in Q - Z, Q being the set of Rational numbers, for example 0.5.

Not only I had found out this, but I had found out because this wrong assumption I had about the / operator made me write a bug, so naturally I posted a rant about this in the Slack channel and I even called it “the worst design decision of Ruby” (which is a bit dramatic in retrospective), but I was surprised by the responses of my coworkers, most of them agreed with the approach, they said, “if I operate with Integers, I want Integers as results”. There was more to this than I initially thought so I decided to write this post.

Floor Division and True Division methods

I use the terms Floor Division and True Division to refer to two methods for implementing a division between two Integer numbers.

The Floor Division method returns an Integer rounding the result toward zero. For example: 8/10 => 0 instead of 0.8 or 1. Sometimes is named Truncating Division or Integer division.

The True Division method returns the Numeric result of the division. For example: 8/10 => 0.8. Sometimes is named Floating-Point Division or Decimal Division.

The principle of least surprise

There is a story around POLA (Principle Of Least Astonishment) being falsely attributed to the Ruby language philosophy and the creator of Ruby Matz refuting it, there is a quote from Matz about it:

Everyone has an individual background. Someone may come from Python, someone else may come from Perl, and they may be surprised by different aspects of the language. Then they come up to me and say, ‘I was surprised by this feature of the language, so Ruby violates the principle of least surprise.’ Wait. Wait. The principle of least surprise is not for you only. The principle of least surprise means principle of least my surprise. And it means the principle of least surprise after you learn Ruby very well. For example, I was a C++ programmer before I started designing Ruby. I programmed in C++ exclusively for two or three years. And after two years of C++ programming, it still surprises me.

The matter of this post is a great example of what Matz is stating. For example, I came from doing a Mathematics course at University not so long ago where we discussed Set Theory, and it was surprising for me that the default behavior was Floor Division but for other developers coming from different recent experiences, it isn’t.

How to Force a True Division

To achieve True Division with two Integer numbers in Ruby, several methods are available. Let’s explore them:

Using #to_f

# to_f in one of the operands => Float
result = integer_1.to_f / integer_2

This method is the most popular. However, it’s not explicit in its purpose of performing a Float Division.

result = integer_1 / integer_2.to_f

This is probably well understood by seasoned Ruby devs but might be confusing for people just starting with Ruby. Use code reviews to discuss these use cases and link them to the official docs so they start getting a deeper knowledge of the language.

Using #fdiv

result = integer_1.fdiv(integer_2)

The #fdiv method is self-explanatory, making it clear that the division will result in a Float. It’s more “Googleable” but less commonly used than #to_f. The method’s name can be confusing, the ‘f’ in #fdiv stands for Float Division, but it could be mistaken for Fixnum division or Floor division.

Using #to_r

result = integer_1.to_r / integer_2

The #to_r method is slightly slower than the others and is used when working with Rational numbers, which is a more niche requirement.

Other considerations

Monkey Patching: An alternative approach is to monkey patch the Integer class to change the behavior of the division operator to True Division. However, this should be done with caution.
Literal Float Operand: If one of the operands is a literal number, you can define it as a Float by appending .0. For example:

result = 8 / 10.0

In terms of performance, the methods #to_f and #fdiv are quite similar, with #to_r being marginally slower. Given their similar performance, the choice between them should be based on readability and the specific requirements of your use case.

Whatever choice you make, the key is trying to be consistent in your choice of method across your codebase. This will help create a code culture that is easy to sustain and help newcomers to the project and the language.

Is it too late?

If tomorrow the Ruby community would agree on the True Division method for the division operator instead of the Floor Division method being a better decision, would it still make sense to change it?

Changing how the division operator works with Integer numbers would be a breaking change, it could potentially introduce bugs on hundreds and hundreds of code bases, there is an interesting issue from 12 years ago, where Matz and other developers discussed introducing it to Ruby 2.0, and even though Matz agrees at first he ends up rejecting the proposal because as he states there was “too much incompatibility”, there is also a more recent issue where developers discuss the idea of making this change in Ruby 3.0 or 4.0.

This said, Python used to have the same approach to Floor Division but they were able to change it to True Division in Python 3.0, by introducing the // operator for explicit Floor Division and providing tools to automatically migrate all code. Here is the proposal document.

It wouldn’t be feasible to use the // operator in Ruby for Floor Division because it’s already defined for empty regex, but Ruby could use the ~/ operator for it like the Dart language.

It’s important to mention that in mruby, the embeddable implementation of the Ruby language, the True Division method is used on the division operator. Here is a GitHub issue about it.

Final words

In summary, the behavior of the division operator in Ruby defaults to Floor Division. As Ruby continues to evolve, it remains an open question whether adopting True Division as the default behavior would be beneficial. Have you encountered any challenges or bugs related to this behavior? Do you think it should be changed?

SINAPTIA

This week in #devs - Issue #2

Rails data migrations

Git Stash

AI Coding Assistants

This week in #devs - Issue #1

On Rails Performance

Rails World

Dev Containers

Audited Gem

Debugging a memory-leaking action

The refactor

The memory leak

Finding the leak

Puts debugging

The solution

Conclusion

Rediscovering fixtures

What is a test fixture?

Speed

Simplicity

Maintainability

Conclusion

Guidelines for writing better specs

TL;DR

Main pain points

Global shared fixtures

Mystery Guest

The Safe Toolset

Guidelines

describe / context

let and let!

before

it

Final example

Duplication is ok

About factories

About shared Examples

About subject

Use more Ruby! <3

Make minitest/unit Great Again

What’s wrong with RSpec

Back to the basics

What’s the purpose of testing?

Four-Phase test

Final thoughts

Don’t over-engineer your tests

What’s causing this?

How did we solve this?

To be continued…

Optimizing background jobs execution with Delayed::UniqueJob

Unveiling the issue

Existing solutions to this problem

Analysis of Delayed Job

Introducing Delayed::UniqueJob

Next Steps

A remote require for Ruby

Proof of concept

Refining the PoC

Is it enough?

Is it worth it?

A discussion around Integer division in Ruby

Floor Division and True Division methods

The principle of least surprise

How to Force a True Division

Using #to_f

Using #fdiv

Using #to_r

Other considerations

Is it too late?

Final words

`before`

About `subject`