Storing multi-valued enum fields in ActiveRecord

Rails’ enum DSL is great for single values, but what about multiple? We compared 4 approaches across performance, extensibility, and maintainability to find the best fit.

Ruby on Rails Performance

Mar 3rd, 2026

By Nazareno Moresco

Mar 3rd, 2026

By Nazareno Moresco

Ruby on Rails Performance

A few weeks ago, we ran into an interesting problem in one of our projects. We had a reports table with a reason column that used the classic Rails enum approach:

# db/migrate/XXXXXXXXXXXXXX_create_reports.rb
class CreateReports < ActiveRecord::Migration[7.1]
  create_table :reports do |t|
    t.string :title
    t.integer :reason, default: 0, null: false
  end
end

# app/models/report.rb
class Report < ApplicationRecord
  enum :reason, {
    spam: 0,
    harassment: 1,
    inappropriate_content: 2,
    copyright: 3,
    misinformation: 4
  }, prefix: true
end

Everything worked fine until the client requested the ability to select multiple reasons for a report. A user could report content for being spam and containing misinformation. The set of possible reasons is fixed and defined by developers.

We tried four approaches.

Four ways to solve it

Bitwise Operations: store multiple values in a single integer using bit-level flags.
PostgreSQL Array: use native PostgreSQL array columns with enum-like syntax.
JSONB: store reasons as a JSON array inside a JSONB column.
HABTM: the classic many-to-many approach with a join table.

1. Bitwise Operations

They use bit-level operations to store multiple values in a single integer. Each reason occupies a specific bit. The active_flag gem provides a clean DSL for this:

# db/migrate/XXXXXXXXXXXXXX_add_reasons_to_reports.rb
class AddReasonsToReports < ActiveRecord::Migration[7.1]
  def change
    add_column :reports, :reasons, :bigint, default: 0, null: false

    # migrate data to new format

    remove_column :reports, :reason
  end
end

# app/models/report.rb
class Report < ApplicationRecord
  flag :reasons, [:spam, :harassment, :inappropriate_content, :copyright, :misinformation]
end

# Usage
report = Report.new(reasons: [:spam, :misinformation])
report.reasons # => [:spam, :misinformation]
report.reasons.spam?   # => true
report.reasons = [:spam]
report.save!

# Read: check if spam is included
report.reasons.spam? # => true

# Validation: invalid values raise ArgumentError
report.reasons = [:invalid] # => raises ArgumentError

# Query: find all reports with spam AND misinformation
Report.where_reasons(:spam, :misinformation)

The benefits here are clear: extremely compact storage, very fast boolean checks, and no GIN index required, just a standard integer index. But the trade-offs become apparent quickly. Database values are inexpressive (what does value 4 mean?). You’re limited to 64 values with bigint. And query operations are less intuitive than standard ActiveRecord.

Note: Most Ruby gems (including active_flag) are based on integers with a 64-bit limit. While PostgreSQL supports the bit string data type, which can store many more flags without this limitation, the Ruby ecosystem doesn’t have widely adopted gems for this approach.

2. PostgreSQL Array Field (Multivalued Column)

PostgreSQL has native array support. We can store reason strings directly in a string array column without any gem:

# db/migrate/XXXXXXXXXXXXXX_add_reasons_to_reports.rb
class AddReasonsToReports < ActiveRecord::Migration[7.1]
  def change
    add_column :reports, :reasons, :string, array: true, default: []
    add_index :reports, :reasons, using: 'gin'

    # migrate data to new format

    remove_column :reports, :reason
  end
end

# app/models/report.rb
class Report < ApplicationRecord
  VALID_REASONS = %w[spam harassment inappropriate_content copyright misinformation].freeze

  validate :reasons_must_be_valid

  private

  def reasons_must_be_valid
    return if reasons.blank? || reasons.all? { |r| VALID_REASONS.include?(r) }

    errors.add(:reasons, "contain invalid values")
  end
end

# Usage
report = Report.create(title: "Report 1", reasons: ["spam", "misinformation"])

# Read: check if spam is included
report.reasons.include?("spam") # => true

# Query: find all reports with spam AND misinformation
Report.where("reasons @> ARRAY[?]::varchar[]", ["spam", "misinformation"])

The upside: no gem dependency, human-readable values in the database, and easy to expand. The downside: it’s PostgreSQL-specific, validations require manual implementation, and it’s less familiar to developers who don’t use PostgreSQL regularly.

3. JSONB (Multivalued Column)

We store reasons as a JSON array inside a JSONB column.

# db/migrate/XXXXXXXXXXXXXX_add_reasons_to_reports.rb
class AddReasonsToReports < ActiveRecord::Migration[7.1]
  def change
    add_column :reports, :reasons, :jsonb, default: [], null: false
    add_index :reports, :reasons, using: 'gin'

    # migrate data to new format

    remove_column :reports, :reason
  end
end

# app/models/report.rb
class Report < ApplicationRecord
  VALID_REASONS = %w[spam harassment inappropriate_content copyright misinformation].freeze

  validate :reasons_must_be_valid

  private

  def reasons_must_be_valid
    return if reasons.blank? || reasons.all? { |r| VALID_REASONS.include?(r) }

    errors.add(:reasons, "contain invalid values")
  end
end

# Usage
report = Report.create(title: "Report 1", reasons: ["spam", "misinformation"])

# Read: check if spam is included
report.reasons.include?("spam") # => true

# Query: find all reports with spam AND misinformation
Report.where("reasons @> ?", '["spam", "misinformation"]')

The benefit is GIN indexing for fast searches. The cost: the flexible structure requires stronger validations to enforce the expected format, it’s slightly more verbose than alternatives, and there’s JSON parsing overhead in some cases.

4. HABTM Table

The classic many-to-many approach with a join table.

# db/migrate/XXXXXXXXXXXXXX_create_reasons_and_join_table.rb
class CreateReasonsAndJoinTable < ActiveRecord::Migration[7.1]
  def change
    create_table :reasons do |t|
      t.string :name, null: false
      t.timestamps
    end
    add_index :reasons, :name, unique: true

    create_join_table :reports, :reasons do |t|
      t.index [:report_id, :reason_id]
      t.index [:reason_id, :report_id]
    end

    # migrate data to new format

    remove_column :reports, :reason
  end
end

# app/models/report.rb
class Report < ApplicationRecord
  has_and_belongs_to_many :reasons
end

# app/models/reason.rb
class Reason < ApplicationRecord
  has_and_belongs_to_many :reports

  validates :name, inclusion: { in: %w[spam harassment inappropriate_content copyright  misinformation] }
end

# Usage
report = Report.create(title: "Report 1", reasons: Reason.where(name: ["spam", "misinformation"]))

# Read: check if spam is included
report.reasons.exists?(name: "spam") # => true

# Query: find all reports with spam AND misinformation
Report.joins(:reasons)
      .where(reasons: { name: ["spam", "misinformation"] })
      .group(:id)
      .having("COUNT(DISTINCT reasons.id) = 2")

Benefits include total flexibility to add additional metadata, extreme familiarity for Rails developers, and unlimited scalability. The drawback: slower writes (create/update) and the need for more queries or joins to read data.

Comparison

Here’s a summary of how each approach compares across the attributes that matter most:

Attribute	Bitwise	PostgreSQL Array	JSONB	HABTM
Write Performance	Excellent	Very Good	Very Good	Fair
Read Performance	Very Good	Good	Good	Good
Query Simplicity	Excellent	Excellent	Excellent	Fair
Default Values	Very Good	Excellent	Excellent	Poor
Database-Level Validation	Very Good	Good	Good	Excellent
Extensibility	Poor	Good	Very Good	Excellent
Familiarity	Poor	Good	Good	Excellent
Ecosystem Support	Poor	Fair	Fair	Excellent
DB Compatibility	Excellent	Poor	Excellent	Excellent
Scalability	64 values	Limited	Limited	Unlimited
Property vs Entity	property	property	property	entity

Property vs Entity

In data modeling terms, reasons are a property of a report, not an entity with its own identity and lifecycle. Nobody queries “show me all attributes of the spam reason”. That makes the multivalued column approaches (array, JSONB, bitwise) more true to the conceptual model, even though they break First Normal Form. The HABTM is the more purely relational approach, but it treats a property more as if it were an entity.

Performance

We ran benchmarks with 1000 operations for create, find, and update scenarios. You can find the full benchmark code and results in the multivalued_attributes repository.

The HABTM is ~4x slower on creates and ~3-5x slower on updates compared to the other methods. However, it’s worth noting that most performance issues related to HABTM and JOINs aren’t actually caused by the JOIN itself. Databases like PostgreSQL are highly optimized for these operations. The real culprits are usually N+1 queries, missing indexes, or poorly designed queries. With proper indexing and eager loading, the read performance gap narrows significantly. The first three approaches (bitwise, array, and jsonb) have very similar performance. On reads, the difference is marginal (only 2-3 extra seconds over 1000 queries).

Query Simplicity

One often overlooked aspect is query complexity at the call site. If your codebase filters by these values extensively, in scopes, serializers, admin interfaces, and column-based approaches, keep queries simple and on a single table. With array_enum or bitwise, you can do:

Report.with_reason(:spam)
Report.where.not(reasons: [])

With a HABTM, every call site would need a .joins(:reasons), adding complexity and increasing the risk of N+1 queries if eager loading is forgotten. While you can mitigate this with default scopes or associations, it adds friction that column-based approaches simply don’t have.

Default Values

With array or JSONB columns, you can set default values directly in the migration:

t.integer :reasons, array: true, default: ["spam", "harassment"]

New records automatically get these values. No callbacks required. With a HABTM, you’d need a callback to seed join table rows, adding an extra step that can be forgotten or misconfigured.

Database-Level Validation

With a HABTM, foreign keys enforce valid references automatically:

INSERT INTO reasons_reports (report_id, reason_id)
VALUES ('some-uuid', 999);
-- ERROR: Key (reason_id)=(999) is not present in table "reasons"

You also get cascading behavior. ON DELETE CASCADE automatically cleans up join table rows when a reason is removed.

With array or JSONB, you can use PostgreSQL CHECK constraints to enforce valid values at the database level:

ALTER TABLE reports
ADD CONSTRAINT valid_reasons
CHECK (reasons <@ ARRAY['spam', 'harassment', 'inappropriate_content', 'copyright', 'misinformation']);

UPDATE reports
SET reasons = ARRAY['spam', 'harassment', 'oops_typo'];
-- ERROR: new row for relation "reports" violates check constraint "valid_reasons"

Both approaches can enforce valid values at the database level. The difference is ergonomics: foreign keys handle this naturally, while CHECK constraints require explicit maintenance when adding new values.

Extensibility

Bitwise: Limited to 64 values. To add more, you need to change to bigint or use bit strings.
Array: Easy to expand, but storing many values (dozens+) becomes unwieldy. Row size grows, and GIN indexes start having performance issues.
JSONB: Flexible, but the same problem as arrays when storing many values.
HABTM: The most extensible. You can add additional columns (e.g., reason_details, reported_at in the join table). Also ideal for dynamic or user-defined values.

Familiarity

Bitwise: Strange for most. Requires explaining bit-level operations.
Array: Intuitive if you know Ruby/PostgreSQL.
JSONB: Familiar to those who use modern REST APIs.
HABTM: The most idiomatic. Any Rails developer immediately understands has_and_belongs_to_many.

Ecosystem

Bitwise/Array/JSONB: Require custom form inputs, serializers, and manual validations. Some support from gems.
HABTM: Works out-of-the-box with ActiveAdmin, RailsAdmin, nested forms, and bulk operations.

Database Compatibility

PostgreSQL arrays are PostgreSQL-specific. Bitwise and HABTM work across any database. JSON works across MySQL, SQLite, and PostgreSQL (though with different syntax and index types. MySQL uses GIN indexes similarly to PostgreSQL, while SQLite has the JSON1 extension).

Conclusion

If your value set is static and small, PostgreSQL array is the better choice over bitwise. They offer nearly identical performance without introducing unfamiliar bit-level operations that confuse most Rails developers.

But don’t dismiss HABTM. For most applications, the write overhead is irrelevant compared to the cost of maintaining “clever” code that the next developer doesn’t understand.

In our specific case, a reporting system where users can select multiple reasons, we chose HABTM. Not because it was the fastest, it wasn’t. But because it didn’t need justification. Any new developer on the team immediately understands it, and that clarity is worth more than the performance gains we’d rarely notice in practice.

The best technical choice is the one that doesn’t need a justification.