Skip to content
AEO Canon · the reference for answer-engine optimization
AEO Glossary

TF-IDF

TF-IDF is a classic method for scoring how important a word is to a document, balancing how often it appears against how common it is across all documents.

Also known as: term frequency-inverse document frequency

BBurke Atkerson

TF-IDF weighs a word by how telling it is. It multiplies term frequency (how often a word appears in a document) by inverse document frequency (how rare the word is across all documents), so distinctive words score high and ubiquitous ones like "the" score near zero. It's the conceptual ancestor of BM25 and much of sparse retrieval.

For AEO, TF-IDF illuminates why distinctive, specific language helps you get found: the rare, meaningful terms of your topic are exactly what these methods weight most. It also shows why stuffing common words is pointless — they carry almost no weight. The takeaway is to write naturally with the precise vocabulary of your subject, inside clear extractable passages.

Example. In an article about espresso, "portafilter" carries high TF-IDF weight because it's specific and rare, while "machine" carries little. Using the precise term where it belongs makes the page matchable for the queries that truly fit it.

Relevant pillar

Related terms