Bytespider
Bytespider is ByteDance's web crawler, associated with gathering training data for its AI models, and is often noted for aggressive crawling that some sites rate-limit or block.
Bytespider is ByteDance's crawler, used to collect AI training data. It fetches public pages to support ByteDance's models and is frequently singled out for high-volume crawling, which leads some operators to block or rate-limit it for server- load reasons as much as content-rights ones.
It checks robots.txt under the Bytespider user-agent, so
the access choice is straightforward. As with other
training crawlers, blocking it limits how your content
informs those models but doesn't affect search-grounded citations. Watching your
server logs for crawler load is a reasonable reason to manage it, independent of any
AEO consideration.
Example. A site seeing heavy Bytespider traffic in its logs might add a
User-agent: Bytespider / Disallow: / rule to reduce load — a performance and
rights decision, not a visibility one.
Relevant pillar
Related terms
- robots.txtrobots.txt is a plain text file at the root of your domain that tells crawlers which user-agents may access which parts of your site, and is how you allow or block AI crawlers.
- GPTBotGPTBot is OpenAI's web crawler that gathers content to train its models, identified by the GPTBot user-agent and controllable through your robots.txt file.
- Training DataTraining data is the body of text and other content an AI model learns from during training, shaping what it knows by default before any live retrieval is involved.