robots.txt
robots.txt is a plain text file at the root of your domain that tells crawlers which user-agents may access which parts of your site, and is how you allow or block AI crawlers.
Also known as: robots file
robots.txt is the doorman for crawlers. It's a simple text file at
example.com/robots.txt that lists rules per user-agent — which bot can or can't
fetch which paths. It's how you control AI crawlers like GPTBot,
ClaudeBot, and PerplexityBot,
allowing the ones that cite you and blocking any you don't want.
It's central to the access pillar, and it's where the single most
common AEO mistake happens: a blanket Disallow: / or an overly broad default that
silently blocks the very crawlers you want, making you invisible to AI answers
without any error or warning. Two caveats matter — robots.txt is a public, voluntary
standard that reputable crawlers honor but that provides no real security, and it
must live at the domain root to be read at all.
Example. To welcome AI citation crawlers, you'd include blocks like
User-agent: PerplexityBot / Allow: /. To audit it, fetch your own
/robots.txt and confirm none of the bots you care about are disallowed.
Relevant pillar
Related terms
- GPTBotGPTBot is OpenAI's web crawler that gathers content to train its models, identified by the GPTBot user-agent and controllable through your robots.txt file.
- ClaudeBotClaudeBot is Anthropic's web crawler that collects content used to train its Claude models, identified by the ClaudeBot user-agent and controllable via robots.txt.
- llms.txtllms.txt is a proposed standard file at your domain root that gives AI systems a curated, markdown map of your most important content, helping them find and understand your best pages.