LLMs and Product Matching: A New State-Of-The-Art

Post Image

In e-commerce, product matching is about determining whether two product descriptions refer to the same real-world item. Product codes usually help us achieve this, but when they are unavailable or when their format differs across sources, we are left with the task of comparing the available text and image data.

Traditionally, product matching has been accomplished using rule-based systems or early pre-trained language models like BERT and RoBERTa, often fine-tuned on domain-specific datasets. However, large language models (LLMs) such as GPT-4o, Claude, and Gemini have transformed the landscape. Research by Steiner et al. (2024) and Peeters et al. (2024) demonstrates that LLMs, when fine-tuned, now represent the state-of-the-art for product entity matching, even for difficult datasets filled with edge cases.

LLMs for entity matching

Before LLMs, there were manual, rule-based systems. These early systems used string similarity and handcrafted logic to compare product titles, brands, and categories. Then, with the invention of the transformer, we started applying pre-trained language models, and training them on product datasets specific to a product category. Product matching models like Ditto emerged from popular pre-trained models such as RoBERTa. While their performance was much stronger, they required massive labeled datasets, and their performance dropped significantly when presented with unseen product categories.

Around 2023, language models became powerful enough to reliably match products based on text. By training on much larger corpuses of text, and leveraging their learned understanding of semantic meaning, these models began to perform much better at product classification, which much fewer training examples.  

Now, research has shown that generative LLMs outperform pre-trained language models and rule-based systems in zero-shot and fine-tuned setups. According to Peeters et al. (2024), even without task-specific training, GPT-4 achieved better performance than fine-tuned PLMs on most datasets. Additionally, LLMs performed much better on unseen entities, which is necessary when classifying product matches across billions of products.

Leaping forward to 95% accuracy

At The Product LLM, we developed an even stronger text-based matching model by improving data quality and increasing the size of data training compared to the research. We also add a very useful certainty classification to evaluate the certainty of the match. For high-certainty predictions, we provide over 95% accuracy on a variety of datasets, including our own difficult-to-score, edge case dataset.

Try It Yourself

You can test our model directly in our API Playground. It’s simple to implement in your codebase, and it can serve high volume use cases at affordable rates.