Sparse Retrieval
Sparse retrieval finds relevant content by matching actual words and their importance, using classic methods like BM25, and still complements meaning-based retrieval in modern systems.
Sparse retrieval matches on the words themselves. It represents text by which terms appear and how important each is — most famously with BM25 — and scores passages by term overlap with the query. It's called "sparse" because those representations are mostly zeros, with weight only on the words present.
Far from obsolete, sparse retrieval remains valuable for exact matches that meaning-based methods can fumble: specific product names, error codes, model numbers, rare jargon. That's why modern systems often combine it with dense retrieval in a hybrid setup. For AEO it's a reminder to use the precise terms your audience uses — the exact product names and phrasings — alongside clear, extractable explanations, so you're matchable both ways.
Example. A query for "error 0x80070057" is best served by sparse retrieval finding that exact string — a case where literal word matching beats semantic similarity, and where having the precise term on your page matters.
Relevant pillar
Related terms
- BM25BM25 is a classic ranking function that scores how well a document matches a query based on term frequency and rarity, and it remains a strong, widely-used retrieval baseline.
- Keyword SearchKeyword search finds content by matching the literal words in a query against the words in documents, the traditional approach now complemented by meaning-based vector search.
- Dense RetrievalDense retrieval finds relevant passages by comparing the meaning-based vector embeddings of a query and your content, matching on semantics rather than exact words.