Improving a similarity search with AI

We replaced a broken similarity search with AI that understands context and intent over specs. Results went from chaos to relevant overnight.

Ruby on Rails AI

Aug 22nd, 2025

By Victoria Escudero and Fernando Martínez

Aug 22nd, 2025

By Victoria Escudero and Fernando Martínez

Ruby on Rails AI

One of our clients operates a large boat marketplace with thousands of listings. One of the most common features in marketplaces is showing similar items: when users find a boat they like, they want to explore similar options. But our client’s similarity search was not providing useful listings.

The problem

The existing solution was based on range queries in Elasticsearch. Boats’ specs were indexed, and a query compared boats across multiple dimensions, for example:

Year (±5 years)
Length (±2 meters)
Categories and specifications
Price ranges

The logic made perfect sense: if you’re looking at a 12-meter sailboat from 2018, the similarity search would show you other 10-14 meter sailboats from 2013-2023.

But there were a couple of issues. The first was data consistency. Since most boats are imported from different sources, the database contained duplicate or meaningless categories, wrong lengths, etc., which resulted in unreliable matching. The second issue was that specification-based similarity completely missed what users actually cared about. Similarity of boats is a complex, multi-faceted criterion, even subjective; parametric search is not always able to capture the subtlety of the concept, thus comparing boats purely on technical specifications would still miss the mark on user intent.

Experimenting with LLMs

Fixing the data inconsistencies is another great example of how we can leverage AI to improve the imported data, but for now, it was out of scope for the time being. Fixing the data inconsistency manually would mean a waste of time, not to mention that it could still miss the mark on user intent, as explained above. So, what if we stopped doing traditional specification-based similarity searches and instead leveraged AI that could think more like a human?

We started experimenting with different AI models and approaches. Initially, we tried feeding the AI everything we had about each boat: detailed descriptions, technical specifications, images, and category information. We tested several prompts, and several iterations of each prompt, like some asking for explanations of why boats were similar, others requesting rankings with confidence scores.

The results were not very satisfactory. They were better than the previous similarity search, but we had that bitter taste of failure. There was a clear potential in this solution to have way better instead of barely better.

There’s always a sweet spot to find between the right combination of prompt, model, and application logic. So we continued our journey, and as it usually happens, we discovered that the best results came from the simplest approach: our main problem was the inconsistent data, so getting a lot of data into the prompt was not helping at all. The final decision was similar to what we did with image classification; we minimized the prompt to its essence and focused on building a similarity graph of boats.

The difference was night and day. Users searching for luxury sailing yachts now see other luxury sailing yachts with similar characteristics. The contextual understanding that we’ve been trying to achieve came naturally to the AI.

But here’s the best part of all:

this worked better than the original approach
required less maintenance because the query now is simpler
the cost of the whole solution is negligible in the long run

The AI approach captures something that traditional similarity algorithms struggle with: market context and buyer intent.

Two boats might have completely different specifications, but if they’re both positioned as “weekend cruising boats for families”, they’re genuinely similar from a user’s perspective. This kind of contextual similarity is extremely difficult to achieve with a traditional specification-based similarity search, but comes naturally to LLMs.

Conclusion

This experience changed how we think about recommendation and matching problems. When human judgment and context matter more than mathematical precision, AI-powered approaches can deliver results that traditional algorithms simply can’t match.

The key insight isn’t that AI is always better; it’s knowing when the problem requires understanding intent and context rather than just crunching numbers.

At SINAPTIA, we specialize in helping businesses implement AI solutions that deliver real value. If you’re facing similar challenges with large-scale data processing, content enhancement, or other AI applications, we’d love to help you explore what’s possible.