BM25
BM25 is a classic ranking function that scores how well a document matches a query based on term frequency and rarity, and it remains a strong, widely-used retrieval baseline.
BM25 is the workhorse keyword-ranking algorithm. It scores a document for a query by rewarding passages that contain the query's terms, weighting rarer terms more heavily and dampening the effect of sheer repetition and document length. It's a refinement of TF-IDF and a backbone of sparse retrieval.
Decades old and still competitive, BM25 often runs alongside meaning-based retrieval in hybrid systems. Its key lesson for AEO is built into its math: it dampens repetition, so keyword stuffing doesn't help and can hurt readability. Using your topic's real terms naturally, within clear extractable prose, is what BM25 rewards — relevance, not density.
Example. For "BM25 ranking explained," BM25 favors a focused page that uses the term meaningfully a few times over one that repeats "BM25 ranking" twenty times — because it discounts the stuffing and rewards genuine relevance.
Relevant pillar
Related terms
- TF-IDFTF-IDF is a classic method for scoring how important a word is to a document, balancing how often it appears against how common it is across all documents.
- Sparse RetrievalSparse retrieval finds relevant content by matching actual words and their importance, using classic methods like BM25, and still complements meaning-based retrieval in modern systems.
- Keyword SearchKeyword search finds content by matching the literal words in a query against the words in documents, the traditional approach now complemented by meaning-based vector search.