Technical · AI readiness
AI Crawler Access Checker
Your robots.txt quietly decides whether ChatGPT, Claude, Perplexity, and Google's AI can see your site. Enter a domain and find out which AI crawlers are allowed, which are blocked, and what each one costs or protects.
Site blocking the check? Paste its robots.txt instead
Large sites with bot protection often refuse automated requests like
this one. Open https://yourdomain.com/robots.txt in your
own browser, copy everything, and paste it below to analyze it the same
way.
AI search indexes
Decide whether your pages can appear and be cited in AI search answers.
Builds the index behind ChatGPT search.
Blocked here means: Your pages can't appear or be cited in ChatGPT search results.
Indexes pages for Claude's web search answers.
Blocked here means: Claude's search answers won't surface or cite your pages.
Crawls and indexes pages for Perplexity answers.
Blocked here means: You lose visibility and citations in Perplexity.
Classic Google search, which also feeds AI Overviews and AI Mode.
Blocked here means: Removes you from Google entirely, AI Overviews included. Almost never what you want.
There is no robots.txt rule that keeps you in classic results but out of AI Overviews. That is controlled with nosnippet rules instead.
The Bing index, which also powers Copilot answers.
Blocked here means: Out of Bing and Copilot both.
Live AI fetches
Fetch a page in real time when a user or agent asks about it.
Fetches a page on demand when a ChatGPT user or agent asks about it.
Blocked here means: ChatGPT can't open your pages when users paste your links or ask about you.
Fetches pages on demand for Claude users and agents.
Blocked here means: Claude can't read your pages when users ask about them.
Real-time fetches for Perplexity user queries.
Blocked here means: Perplexity can't pull your pages into live answers.
AI training
Collect pages that teach future models what your brand is.
Collects pages for training future OpenAI models.
Blocked here means: Future GPT models learn less about you. A legitimate choice, but know the brand-presence tradeoff.
Collects pages for training future Anthropic models.
Blocked here means: Future Claude models learn less about you.
A control token, not a crawler. Tells Google not to use your content for Gemini training.
Blocked here means: Your content stays out of Gemini training. Does not affect Google search or AI Overviews.
Control token for Apple Intelligence training (Applebot does the crawling).
Blocked here means: Your content stays out of Apple foundation model training.
Nonprofit web crawl that many AI labs train from.
Blocked here means: Removes you from a dataset behind many current and future models.
Collects pages for training Meta AI models.
Blocked here means: Future Meta models learn less about you.
Collects pages for training ByteDance models (Doubao).
Blocked here means: Signals ByteDance to stay out.
Widely reported to ignore robots.txt. Blocking it here is a signal, not a guarantee; hard enforcement needs WAF or bot rules.
View the robots.txt we fetched
Why this matters now
robots.txt used to have one audience: search engine crawlers. Now it is the gatekeeper for three different kinds of AI access, and they have very different stakes. Search index bots decide whether AI answers can cite you. Live fetchers decide whether an AI can open your page when a user asks about it. Training bots decide whether the next generation of models knows your brand exists.
Plenty of sites blocked these bots in one sweep back when blocking AI felt like the safe default, and plenty of others are blocking them by accident with an old wildcard rule. Either way the result is the same: invisible in the fastest-growing discovery surfaces on the web. This tool reads a site's actual rules and tells you, bot by bot, what they mean.
The check runs against the live robots.txt, fetched the same way a crawler would fetch it. Nothing is stored, and the parsing follows the same group-matching rules (RFC 9309) compliant crawlers use.
Common questions
Does robots.txt actually stop AI crawlers?
It stops the compliant ones, which includes every major lab crawler listed here except Bytespider. robots.txt is a convention, not enforcement. If you need a hard block, use your CDN or firewall bot rules; Cloudflare can block AI crawlers at the network level regardless of what robots.txt says.
Should I block AI training bots?
It is a tradeoff. Blocking GPTBot, ClaudeBot, and similar keeps your content out of future model training, which matters if you license content. The cost is that future models know less about your brand, and answers about your category get shaped by sources that stayed open. For most brands competing on visibility, open wins.
Can I stay in Google search but out of AI Overviews?
Not through robots.txt. AI Overviews are built from the regular Google index, so blocking Googlebot removes you from everything. Google-Extended only controls Gemini model training. The closest lever is the nosnippet or max-snippet robots meta rules, which limit what Google can quote from your pages anywhere, AI Overviews included.
What is the difference between GPTBot, OAI-SearchBot, and ChatGPT-User?
Three different jobs from one company. GPTBot collects training data for future models. OAI-SearchBot builds the search index that ChatGPT cites when it searches the web. ChatGPT-User fetches a page live when a user or agent asks about it. You can allow or block each one independently.
I blocked a bot by accident. How fast does access recover?
Fix the robots.txt and the crawlers pick it up on their next check, typically within about a day since most refetch robots.txt every 24 hours or so. Lost index presence can take longer to rebuild than the access itself.
Did this help?
I build these tools to make SEO and AI-search work less tedious. If this saved you time, I'd love to hear about it, or talk shop.