Skip to content
AEO Canon · the reference for answer-engine optimization

How to Check if AI Crawlers Can Read Your Site

Most AI crawlers don't run JavaScript, so the test is simple — view what's in the raw HTML, not the rendered DOM. This guide walks through the JS-disabled browser test and curl checks that show you exactly what GPTBot and PerplexityBot see.

BBurke Atkerson3 min read

Most AI crawlers don't run JavaScript, so checking whether they can read your site means looking at the raw HTML the server returns — not the rendered DOM in your dev tools. Two quick tests show you exactly what GPTBot and PerplexityBot see: disable JavaScript in the browser, and fetch the page with curl.

The short answer

Test what's in the raw HTML, not the rendered DOM. If your main content survives with JavaScript disabled and appears in a curl fetch, AI crawlers can read it. If it vanishes, it's client-rendered and most crawlers can't see it.

Why doesn't 'Inspect Element' tell the truth?

Inspect Element doesn't tell the truth about crawlability because it shows the rendered DOM — the page after your JavaScript has executed and injected content. A non-rendering crawler never runs that JavaScript, so it sees only the raw HTML the server sent. The two can differ wildly: a client-rendered app often ships an almost-empty <div id="root"> that Inspect shows full of content, while View Source shows it empty. Always test the raw HTML — that's the access reality.

How do you run the JS-disabled browser test?

Disable JavaScript in your browser and reload the page — what remains is approximately what a non-rendering crawler sees.

  1. 1

    Open DevTools command menu

    On the page, open DevTools (F12), then press Ctrl+Shift+P (Cmd+Shift+P on macOS).

  2. 2

    Disable JavaScript

    Type 'Disable JavaScript', select it, and keep DevTools open (the setting only applies while it is).

  3. 3

    Reload the page

    Hard-reload (Ctrl+Shift+R). The page now renders without running any JavaScript.

  4. 4

    Look for your answer

    Is your headline, main content, and key answer text still visible? If the page is blank or shows a spinner, crawlers see the same nothing.

  5. 5

    Re-enable JavaScript

    Remove the 'Disable JavaScript' override when you're done.

How do you check with curl?

Fetch the page with an AI user-agent and read the raw HTML it returns — this is the exact response the crawler receives, with no rendering involved.

# See the raw HTML GPTBot receives (first 60 lines)
curl -s -A "GPTBot" https://yourdomain.com/ | head -n 60
 
# Check whether a specific answer is present in the raw HTML
curl -s -A "GPTBot" https://yourdomain.com/ | grep -i "your answer phrase"
 
# Compare a few crawlers at once — counts of a key phrase in each response
for ua in "GPTBot" "PerplexityBot" "Bingbot"; do
  count=$(curl -s -A "$ua" https://yourdomain.com/ | grep -ic "your answer phrase")
  echo "$ua: $count match(es)"
done

If grep finds your answer text, the crawler can read it. If it returns nothing while the phrase is clearly on the rendered page, your content is being injected by JavaScript after load — invisible to crawlers that don't render.

Also check the status and headers

Add -I to see response headers, or -o /dev/null -w "%{http_code}\n" to print just the status code. Confirm the crawler gets a 200, not a 403 (blocked), a 302 to a login wall, or a soft 404. Some sites serve bots a different response than browsers — this catches that.

# Confirm the crawler gets a 200, not a block or redirect
curl -s -o /dev/null -w "%{http_code}\n" -A "GPTBot" https://yourdomain.com/

What should you do if the test fails?

If your content fails the test — present in the DOM but missing from View Source and curl — it's client-rendered, and the fix is to put it in the initial HTML via server-side rendering, static generation, or prerendering. That's covered in why JavaScript breaks your AI citation eligibility and how to make a single-page app citable by AI.

Crawler-readability check

0 / 5

Each unchecked box is a place a competitor can beat you to the AI answer.

Where this fits in the Canon

This test is how you verify the access pillar in practice — proving an engine can actually read your content before you worry about anything else. It pairs with the permission checks in the robots.txt guide (see Google's robots.txt documentation) and leads into the rendering fixes in why JavaScript breaks AI citation eligibility.

Frequently asked questions

How do I check if AI crawlers can read my site?
Look at the raw HTML the server returns, not the rendered DOM. Two quick tests, that should agree: (1) in your browser, disable JavaScript and reload — if your main content disappears, crawlers can't see it either; (2) run curl with an AI user-agent and read the HTML it returns. If your answer text is present in both, AI crawlers can read it.
Why not just use 'Inspect Element' to check?
Because Inspect shows the rendered DOM after JavaScript has run, which is not what a non-rendering crawler sees. Use 'View Source' (Ctrl+U / Cmd+Option+U) or curl to see the raw HTML the server actually sent. If content is in the DOM but not in View Source, it was added by JavaScript and most AI crawlers will miss it.
What command shows me what GPTBot sees?
Run: curl -s -A "GPTBot" https://yourdomain.com/ and read the output. That fetches the page with GPTBot's user-agent and prints the raw HTML the server returned — the same bytes the crawler receives. Pipe it to grep to check for a specific answer: curl -s -A "GPTBot" https://yourdomain.com/ | grep -i "your answer phrase".
My content shows in the browser but not in curl — why?
Because it's rendered client-side. The browser ran JavaScript that fetched and injected the content into the DOM; curl (like most AI crawlers) doesn't run JavaScript, so it only sees the initial server HTML, which lacks that content. The fix is server-side rendering or prerendering so the content is in the initial HTML.

Last updated .

Part of

Related reading

Auto detailers should use AutomotiveBusiness (a LocalBusiness subtype) schema with accurate name, address, phone, service area, hours, and services, plus FAQ schema on answer pages — it helps engines parse who you are. Schema clarifies content for AI; it never rescues a thin site or a buried answer.

2 min read

A detailing business needs a website rebuild for AEO when it lives on social media with no real site, is slow, or lacks per-package answer-first pages and schema — because the engine can only recommend what it can read. The rebuild is the access layer everything else depends on.

2 min read

Auto repair shops should use the AutoRepair (a LocalBusiness subtype) schema with accurate name, address, phone, service area, hours, and services, plus FAQ schema on answer pages — it helps engines parse and confirm who you are. Schema clarifies content for AI; it never rescues a slow site or a buried answer.

2 min read