How to Check if AI Crawlers Can Read Your Site

Most AI crawlers don't run JavaScript, so checking whether they can read your site means looking at the raw HTML the server returns — not the rendered DOM in your dev tools. Two quick tests show you exactly what GPTBot and PerplexityBot see: disable JavaScript in the browser, and fetch the page with curl.

The short answer

Test what's in the raw HTML, not the rendered DOM. If your main content survives with JavaScript disabled and appears in a curl fetch, AI crawlers can read it. If it vanishes, it's client-rendered and most crawlers can't see it.

Why doesn't 'Inspect Element' tell the truth?

Inspect Element doesn't tell the truth about crawlability because it shows the rendered DOM — the page after your JavaScript has executed and injected content. A non-rendering crawler never runs that JavaScript, so it sees only the raw HTML the server sent. The two can differ wildly: a client-rendered app often ships an almost-empty <div id="root"> that Inspect shows full of content, while View Source shows it empty. Always test the raw HTML — that's the access reality.

How do you run the JS-disabled browser test?

Disable JavaScript in your browser and reload the page — what remains is approximately what a non-rendering crawler sees.

1
Open DevTools command menu
On the page, open DevTools (F12), then press Ctrl+Shift+P (Cmd+Shift+P on macOS).
2
Disable JavaScript
Type 'Disable JavaScript', select it, and keep DevTools open (the setting only applies while it is).
3
Reload the page
Hard-reload (Ctrl+Shift+R). The page now renders without running any JavaScript.
4
Look for your answer
Is your headline, main content, and key answer text still visible? If the page is blank or shows a spinner, crawlers see the same nothing.
5
Re-enable JavaScript
Remove the 'Disable JavaScript' override when you're done.

How do you check with curl?

Fetch the page with an AI user-agent and read the raw HTML it returns — this is the exact response the crawler receives, with no rendering involved.

# See the raw HTML GPTBot receives (first 60 lines)
curl -s -A "GPTBot" https://yourdomain.com/ | head -n 60
 
# Check whether a specific answer is present in the raw HTML
curl -s -A "GPTBot" https://yourdomain.com/ | grep -i "your answer phrase"
 
# Compare a few crawlers at once — counts of a key phrase in each response
for ua in "GPTBot" "PerplexityBot" "Bingbot"; do
  count=$(curl -s -A "$ua" https://yourdomain.com/ | grep -ic "your answer phrase")
  echo "$ua: $count match(es)"
done

If grep finds your answer text, the crawler can read it. If it returns nothing while the phrase is clearly on the rendered page, your content is being injected by JavaScript after load — invisible to crawlers that don't render.

Also check the status and headers

Add -I to see response headers, or -o /dev/null -w "%{http_code}\n" to print just the status code. Confirm the crawler gets a 200, not a 403 (blocked), a 302 to a login wall, or a soft 404. Some sites serve bots a different response than browsers — this catches that.

# Confirm the crawler gets a 200, not a block or redirect
curl -s -o /dev/null -w "%{http_code}\n" -A "GPTBot" https://yourdomain.com/

What should you do if the test fails?

If your content fails the test — present in the DOM but missing from View Source and curl — it's client-rendered, and the fix is to put it in the initial HTML via server-side rendering, static generation, or prerendering. That's covered in why JavaScript breaks your AI citation eligibility and how to make a single-page app citable by AI.

Crawler-readability check

0 / 5

Each unchecked box is a place a competitor can beat you to the AI answer.

Where this fits in the Canon

This test is how you verify the access pillar in practice — proving an engine can actually read your content before you worry about anything else. It pairs with the permission checks in the robots.txt guide (see Google's robots.txt documentation) and leads into the rendering fixes in why JavaScript breaks AI citation eligibility.

Does HTTPS affect AI trust?

It's a baseline, not a boost.

Read the full answer →

Does infinite scroll hurt AI citation?

It can, when content is loaded only by scroll-triggered JavaScript.

Read the full answer →

Should I use dynamic rendering for AI crawlers?

It works as a stopgap, not a first choice.

Read the full answer →

Do paywalls block AI from citing content?

Generally yes.

Read the full answer →

Frequently asked questions

How do I check if AI crawlers can read my site?

Look at the raw HTML the server returns, not the rendered DOM. Two quick tests, that should agree: (1) in your browser, disable JavaScript and reload — if your main content disappears, crawlers can't see it either; (2) run curl with an AI user-agent and read the HTML it returns. If your answer text is present in both, AI crawlers can read it.

Why not just use 'Inspect Element' to check?

Because Inspect shows the rendered DOM after JavaScript has run, which is not what a non-rendering crawler sees. Use 'View Source' (Ctrl+U / Cmd+Option+U) or curl to see the raw HTML the server actually sent. If content is in the DOM but not in View Source, it was added by JavaScript and most AI crawlers will miss it.

What command shows me what GPTBot sees?

Run: curl -s -A "GPTBot" https://yourdomain.com/ and read the output. That fetches the page with GPTBot's user-agent and prints the raw HTML the server returned — the same bytes the crawler receives. Pipe it to grep to check for a specific answer: curl -s -A "GPTBot" https://yourdomain.com/ | grep -i "your answer phrase".

My content shows in the browser but not in curl — why?

Because it's rendered client-side. The browser ran JavaScript that fetched and injected the content into the DOM; curl (like most AI crawlers) doesn't run JavaScript, so it only sees the initial server HTML, which lacks that content. The fix is server-side rendering or prerendering so the content is in the initial HTML.

AI Crawler User Agents — The 2026 Cheat Sheet

One scannable reference for the major AI bots — GPTBot and OAI-SearchBot (OpenAI), ClaudeBot and Claude-SearchBot (Anthropic), Google-Extended, Applebot-Extended, PerplexityBot, CCBot, and Bytespider — with each bot's owner, purpose, and whether it actually obeys robots.txt.

2 min read

Technical SEO

What HTTP Status Codes Tell an AI Crawler

HTTP status codes are instructions to AI crawlers — 200 means crawl and use, 301 consolidates signals to a new URL, 404 drops a page slowly while 410 drops it fast, 429 says back off, and repeated 5xx errors can get a page removed from the answer pool entirely.

2 min read

Technical SEO

Core Web Vitals Thresholds for AEO

The Core Web Vitals thresholds are fixed — LCP good under 2.5s, INP good under 200ms, CLS good under 0.1 — and you pass only when 75% of visits hit good on all three. Speed does not directly rank you in AI answers, but it keeps crawlers fetching and users landing on the pages that get cited.

2 min read

How to Check if AI Crawlers Can Read Your Site

Why doesn't 'Inspect Element' tell the truth?

How do you run the JS-disabled browser test?

How do you check with curl?

What should you do if the test fails?

Where this fits in the Canon

Frequently asked questions

Part of

Related reading

AI Crawler User Agents — The 2026 Cheat Sheet

What HTTP Status Codes Tell an AI Crawler

Core Web Vitals Thresholds for AEO

Why doesn't 'Inspect Element' tell the truth?

How do you run the JS-disabled browser test?

How do you check with curl?

What should you do if the test fails?

Where this fits in the Canon

Related questions

Frequently asked questions

Part of

Related reading

AI Crawler User Agents — The 2026 Cheat Sheet

What HTTP Status Codes Tell an AI Crawler

Core Web Vitals Thresholds for AEO