TF-IDF
TF-IDF is a classic method for scoring how important a word is to a document, balancing how often it appears against how common it is across all documents.
Also known as: term frequency-inverse document frequency
TF-IDF weighs a word by how telling it is. It multiplies term frequency (how often a word appears in a document) by inverse document frequency (how rare the word is across all documents), so distinctive words score high and ubiquitous ones like "the" score near zero. It's the conceptual ancestor of BM25 and much of sparse retrieval.
For AEO, TF-IDF illuminates why distinctive, specific language helps you get found: the rare, meaningful terms of your topic are exactly what these methods weight most. It also shows why stuffing common words is pointless — they carry almost no weight. The takeaway is to write naturally with the precise vocabulary of your subject, inside clear extractable passages.
Example. In an article about espresso, "portafilter" carries high TF-IDF weight because it's specific and rare, while "machine" carries little. Using the precise term where it belongs makes the page matchable for the queries that truly fit it.
Relevant pillar
Related terms
- BM25BM25 is a classic ranking function that scores how well a document matches a query based on term frequency and rarity, and it remains a strong, widely-used retrieval baseline.
- Keyword SearchKeyword search finds content by matching the literal words in a query against the words in documents, the traditional approach now complemented by meaning-based vector search.
- Sparse RetrievalSparse retrieval finds relevant content by matching actual words and their importance, using classic methods like BM25, and still complements meaning-based retrieval in modern systems.