Transformer
The transformer is the neural-network architecture behind modern language models, which uses an attention mechanism to weigh how words relate, enabling fluent understanding and generation of text.
The transformer is the design that made modern AI language possible. Introduced in 2017, it processes text using an attention mechanism that lets the model weigh how every word relates to every other, capturing context far better than earlier approaches. Nearly every major large language model is built on it.
You don't need the math to do AEO, but the architecture explains why structure matters. Because transformers read relationships across a passage, content that's coherent and well-organized is interpreted more accurately than disjointed text — and clean, extractable writing is easier for the model to represent and reuse faithfully. The same architecture underlies the embeddings used in retrieval.
Example. A transformer can tell that "it" in "the company raised prices because it faced higher costs" refers to the company — the contextual understanding that lets an engine accurately summarize and quote your page rather than mangle it.
Relevant pillar
Related terms
- Large Language Model (LLM)A large language model is an AI system trained on vast amounts of text to predict and generate language, and is the engine that writes the answers in AI search.
- AttentionAttention is the mechanism that lets a language model weigh which words in the input matter most for understanding each part, and it tends to concentrate on prominent, early, and clearly-related text.
- EmbeddingsEmbeddings are numerical representations of text that capture its meaning, letting AI systems find passages that are semantically related to a query even when they share no exact keywords.