A few weeks ago, we ran into an interesting problem in one of our projects. We had a reports table with a reason column that used the classic Rails enum approach:
# db/migrate/XXXXXXXXXXXXXX_create_reports.rb
class CreateReports < ActiveRecord::Migration[7.1]
create_table :reports do |t|
t.string :title
t.integer :reason, default: 0, null: false
end
end
# app/models/report.rb
class Report < ApplicationRecord
enum :reason, {
spam: 0,
harassment: 1,
inappropriate_content: 2,
copyright: 3,
misinformation: 4
}, prefix: true
end
Everything worked fine until the client requested the ability to select multiple reasons for a report. A user could report content for being spam and containing misinformation. The set of possible reasons is fixed and defined by developers.
We tried four approaches.
Four ways to solve it
- Bitwise Operations: store multiple values in a single integer using bit-level flags.
- PostgreSQL Array: use native PostgreSQL array columns with enum-like syntax.
- JSONB: store reasons as a JSON array inside a JSONB column.
- HABTM: the classic many-to-many approach with a join table.
1. Bitwise Operations
They use bit-level operations to store multiple values in a single integer. Each reason occupies a specific bit. The active_flag gem provides a clean DSL for this:
# db/migrate/XXXXXXXXXXXXXX_add_reasons_to_reports.rb
class AddReasonsToReports < ActiveRecord::Migration[7.1]
def change
add_column :reports, :reasons, :bigint, default: 0, null: false
# migrate data to new format
remove_column :reports, :reason
end
end
# app/models/report.rb
class Report < ApplicationRecord
flag :reasons, [:spam, :harassment, :inappropriate_content, :copyright, :misinformation]
end
# Usage
report = Report.new(reasons: [:spam, :misinformation])
report.reasons # => [:spam, :misinformation]
report.reasons.spam? # => true
report.reasons = [:spam]
report.save!
# Read: check if spam is included
report.reasons.spam? # => true
# Validation: invalid values raise ArgumentError
report.reasons = [:invalid] # => raises ArgumentError
# Query: find all reports with spam AND misinformation
Report.where_reasons(:spam, :misinformation)
The benefits here are clear: extremely compact storage, very fast boolean checks, and no GIN index required, just a standard integer index. But the trade-offs become apparent quickly. Database values are inexpressive (what does value 4 mean?). You’re limited to 64 values with bigint. And query operations are less intuitive than standard ActiveRecord.
Note: Most Ruby gems (including active_flag) are based on integers with a 64-bit limit. While PostgreSQL supports the bit string data type, which can store many more flags without this limitation, the Ruby ecosystem doesn’t have widely adopted gems for this approach.
2. PostgreSQL Array Field (Multivalued Column)
PostgreSQL has native array support. We can store reason strings directly in a string array column without any gem:
# db/migrate/XXXXXXXXXXXXXX_add_reasons_to_reports.rb
class AddReasonsToReports < ActiveRecord::Migration[7.1]
def change
add_column :reports, :reasons, :string, array: true, default: []
add_index :reports, :reasons, using: 'gin'
# migrate data to new format
remove_column :reports, :reason
end
end
# app/models/report.rb
class Report < ApplicationRecord
VALID_REASONS = %w[spam harassment inappropriate_content copyright misinformation].freeze
validate :reasons_must_be_valid
private
def reasons_must_be_valid
return if reasons.blank? || reasons.all? { |r| VALID_REASONS.include?(r) }
errors.add(:reasons, "contain invalid values")
end
end
# Usage
report = Report.create(title: "Report 1", reasons: ["spam", "misinformation"])
# Read: check if spam is included
report.reasons.include?("spam") # => true
# Query: find all reports with spam AND misinformation
Report.where("reasons @> ARRAY[?]::varchar[]", ["spam", "misinformation"])
The upside: no gem dependency, human-readable values in the database, and easy to expand. The downside: it’s PostgreSQL-specific, validations require manual implementation, and it’s less familiar to developers who don’t use PostgreSQL regularly.
3. JSONB (Multivalued Column)
We store reasons as a JSON array inside a JSONB column.
# db/migrate/XXXXXXXXXXXXXX_add_reasons_to_reports.rb
class AddReasonsToReports < ActiveRecord::Migration[7.1]
def change
add_column :reports, :reasons, :jsonb, default: [], null: false
add_index :reports, :reasons, using: 'gin'
# migrate data to new format
remove_column :reports, :reason
end
end
# app/models/report.rb
class Report < ApplicationRecord
VALID_REASONS = %w[spam harassment inappropriate_content copyright misinformation].freeze
validate :reasons_must_be_valid
private
def reasons_must_be_valid
return if reasons.blank? || reasons.all? { |r| VALID_REASONS.include?(r) }
errors.add(:reasons, "contain invalid values")
end
end
# Usage
report = Report.create(title: "Report 1", reasons: ["spam", "misinformation"])
# Read: check if spam is included
report.reasons.include?("spam") # => true
# Query: find all reports with spam AND misinformation
Report.where("reasons @> ?", '["spam", "misinformation"]')
The benefit is GIN indexing for fast searches. The cost: the flexible structure requires stronger validations to enforce the expected format, it’s slightly more verbose than alternatives, and there’s JSON parsing overhead in some cases.
4. HABTM Table
The classic many-to-many approach with a join table.
# db/migrate/XXXXXXXXXXXXXX_create_reasons_and_join_table.rb
class CreateReasonsAndJoinTable < ActiveRecord::Migration[7.1]
def change
create_table :reasons do |t|
t.string :name, null: false
t.timestamps
end
add_index :reasons, :name, unique: true
create_join_table :reports, :reasons do |t|
t.index [:report_id, :reason_id]
t.index [:reason_id, :report_id]
end
# migrate data to new format
remove_column :reports, :reason
end
end
# app/models/report.rb
class Report < ApplicationRecord
has_and_belongs_to_many :reasons
end
# app/models/reason.rb
class Reason < ApplicationRecord
has_and_belongs_to_many :reports
validates :name, inclusion: { in: %w[spam harassment inappropriate_content copyright misinformation] }
end
# Usage
report = Report.create(title: "Report 1", reasons: Reason.where(name: ["spam", "misinformation"]))
# Read: check if spam is included
report.reasons.exists?(name: "spam") # => true
# Query: find all reports with spam AND misinformation
Report.joins(:reasons)
.where(reasons: { name: ["spam", "misinformation"] })
.group(:id)
.having("COUNT(DISTINCT reasons.id) = 2")
Benefits include total flexibility to add additional metadata, extreme familiarity for Rails developers, and unlimited scalability. The drawback: slower writes (create/update) and the need for more queries or joins to read data.
Comparison
Here’s a summary of how each approach compares across the attributes that matter most:
| Attribute | Bitwise | PostgreSQL Array | JSONB | HABTM |
|---|---|---|---|---|
| Write Performance | Excellent | Very Good | Very Good | Fair |
| Read Performance | Very Good | Good | Good | Good |
| Query Simplicity | Excellent | Excellent | Excellent | Fair |
| Default Values | Very Good | Excellent | Excellent | Poor |
| Database-Level Validation | Very Good | Good | Good | Excellent |
| Extensibility | Poor | Good | Very Good | Excellent |
| Familiarity | Poor | Good | Good | Excellent |
| Ecosystem Support | Poor | Fair | Fair | Excellent |
| DB Compatibility | Excellent | Poor | Excellent | Excellent |
| Scalability | 64 values | Limited | Limited | Unlimited |
| Property vs Entity | property | property | property | entity |
Property vs Entity
In data modeling terms, reasons are a property of a report, not an entity with its own identity and lifecycle. Nobody queries “show me all attributes of the spam reason”. That makes the multivalued column approaches (array, JSONB, bitwise) more true to the conceptual model, even though they break First Normal Form. The HABTM is the more purely relational approach, but it treats a property more as if it were an entity.
Performance
We ran benchmarks with 1000 operations for create, find, and update scenarios. You can find the full benchmark code and results in the multivalued_attributes repository.
The HABTM is ~4x slower on creates and ~3-5x slower on updates compared to the other methods. However, it’s worth noting that most performance issues related to HABTM and JOINs aren’t actually caused by the JOIN itself. Databases like PostgreSQL are highly optimized for these operations. The real culprits are usually N+1 queries, missing indexes, or poorly designed queries. With proper indexing and eager loading, the read performance gap narrows significantly. The first three approaches (bitwise, array, and jsonb) have very similar performance. On reads, the difference is marginal (only 2-3 extra seconds over 1000 queries).
Query Simplicity
One often overlooked aspect is query complexity at the call site. If your codebase filters by these values extensively, in scopes, serializers, admin interfaces, and column-based approaches, keep queries simple and on a single table. With array_enum or bitwise, you can do:
Report.with_reason(:spam)
Report.where.not(reasons: [])
With a HABTM, every call site would need a .joins(:reasons), adding complexity and increasing the risk of N+1 queries if eager loading is forgotten. While you can mitigate this with default scopes or associations, it adds friction that column-based approaches simply don’t have.
Default Values
With array or JSONB columns, you can set default values directly in the migration:
t.integer :reasons, array: true, default: ["spam", "harassment"]
New records automatically get these values. No callbacks required. With a HABTM, you’d need a callback to seed join table rows, adding an extra step that can be forgotten or misconfigured.
Database-Level Validation
With a HABTM, foreign keys enforce valid references automatically:
INSERT INTO reasons_reports (report_id, reason_id)
VALUES ('some-uuid', 999);
-- ERROR: Key (reason_id)=(999) is not present in table "reasons"
You also get cascading behavior. ON DELETE CASCADE automatically cleans up join table rows when a reason is removed.
With array or JSONB, you can use PostgreSQL CHECK constraints to enforce valid values at the database level:
ALTER TABLE reports
ADD CONSTRAINT valid_reasons
CHECK (reasons <@ ARRAY['spam', 'harassment', 'inappropriate_content', 'copyright', 'misinformation']);
UPDATE reports
SET reasons = ARRAY['spam', 'harassment', 'oops_typo'];
-- ERROR: new row for relation "reports" violates check constraint "valid_reasons"
Both approaches can enforce valid values at the database level. The difference is ergonomics: foreign keys handle this naturally, while CHECK constraints require explicit maintenance when adding new values.
Extensibility
- Bitwise: Limited to 64 values. To add more, you need to change to
bigintor use bit strings. - Array: Easy to expand, but storing many values (dozens+) becomes unwieldy. Row size grows, and GIN indexes start having performance issues.
- JSONB: Flexible, but the same problem as arrays when storing many values.
- HABTM: The most extensible. You can add additional columns (e.g.,
reason_details,reported_atin the join table). Also ideal for dynamic or user-defined values.
Familiarity
- Bitwise: Strange for most. Requires explaining bit-level operations.
- Array: Intuitive if you know Ruby/PostgreSQL.
- JSONB: Familiar to those who use modern REST APIs.
- HABTM: The most idiomatic. Any Rails developer immediately understands
has_and_belongs_to_many.
Ecosystem
- Bitwise/Array/JSONB: Require custom form inputs, serializers, and manual validations. Some support from gems.
- HABTM: Works out-of-the-box with ActiveAdmin, RailsAdmin, nested forms, and bulk operations.
Database Compatibility
PostgreSQL arrays are PostgreSQL-specific. Bitwise and HABTM work across any database. JSON works across MySQL, SQLite, and PostgreSQL (though with different syntax and index types. MySQL uses GIN indexes similarly to PostgreSQL, while SQLite has the JSON1 extension).
Conclusion
If your value set is static and small, PostgreSQL array is the better choice over bitwise. They offer nearly identical performance without introducing unfamiliar bit-level operations that confuse most Rails developers.
But don’t dismiss HABTM. For most applications, the write overhead is irrelevant compared to the cost of maintaining “clever” code that the next developer doesn’t understand.
In our specific case, a reporting system where users can select multiple reasons, we chose HABTM. Not because it was the fastest, it wasn’t. But because it didn’t need justification. Any new developer on the team immediately understands it, and that clarity is worth more than the performance gains we’d rarely notice in practice.
The best technical choice is the one that doesn’t need a justification.