Technical · AI readiness

AI Crawler Access Checker

Your robots.txt quietly decides whether ChatGPT, Claude, Perplexity, and Google's AI can see your site. Enter a domain and find out which AI crawlers are allowed, which are blocked, and what each one costs or protects.

Site blocking the check? Paste its robots.txt instead

Large sites with bot protection often refuse automated requests like this one. Open https://yourdomain.com/robots.txt in your own browser, copy everything, and paste it below to analyze it the same way.

AI search indexes

Decide whether your pages can appear and be cited in AI search answers.

OAI-SearchBot OpenAI

Builds the index behind ChatGPT search.

Claude-SearchBot Anthropic

Indexes pages for Claude's web search answers.

PerplexityBot Perplexity

Crawls and indexes pages for Perplexity answers.

Googlebot Google

Classic Google search, which also feeds AI Overviews and AI Mode.

There is no robots.txt rule that keeps you in classic results but out of AI Overviews. That is controlled with nosnippet rules instead.

Bingbot Microsoft

The Bing index, which also powers Copilot answers.

Live AI fetches

Fetch a page in real time when a user or agent asks about it.

ChatGPT-User OpenAI

Fetches a page on demand when a ChatGPT user or agent asks about it.

Claude-User Anthropic

Fetches pages on demand for Claude users and agents.

Perplexity-User Perplexity

Real-time fetches for Perplexity user queries.

AI training

Collect pages that teach future models what your brand is.

GPTBot OpenAI

Collects pages for training future OpenAI models.

ClaudeBot Anthropic

Collects pages for training future Anthropic models.

Google-Extended Google

A control token, not a crawler. Tells Google not to use your content for Gemini training.

Applebot-Extended Apple

Control token for Apple Intelligence training (Applebot does the crawling).

CCBot Common Crawl

Nonprofit web crawl that many AI labs train from.

Meta-ExternalAgent Meta

Collects pages for training Meta AI models.

Bytespider ByteDance

Collects pages for training ByteDance models (Doubao).

Widely reported to ignore robots.txt. Blocking it here is a signal, not a guarantee; hard enforcement needs WAF or bot rules.

View the robots.txt we fetched

Why this matters now

robots.txt used to have one audience: search engine crawlers. Now it is the gatekeeper for three different kinds of AI access, and they have very different stakes. Search index bots decide whether AI answers can cite you. Live fetchers decide whether an AI can open your page when a user asks about it. Training bots decide whether the next generation of models knows your brand exists.

Plenty of sites blocked these bots in one sweep back when blocking AI felt like the safe default, and plenty of others are blocking them by accident with an old wildcard rule. Either way the result is the same, leaving them invisible in the fastest-growing discovery surfaces on the web. This tool reads a site's actual rules and tells you, bot by bot, what they mean.

The check runs against the live robots.txt, fetched the same way a crawler would fetch it. Nothing is stored, and the parsing follows the same group-matching rules (RFC 9309) compliant crawlers use.

Common questions

Does robots.txt actually stop AI crawlers?

It stops the compliant ones, which includes every major lab crawler listed here except Bytespider. robots.txt is a convention, not enforcement. If you need a hard block, use your CDN or firewall bot rules; Cloudflare can block AI crawlers at the network level regardless of what robots.txt says.

Should I block AI training bots?

It is a tradeoff. Blocking GPTBot, ClaudeBot, and similar keeps your content out of future model training, which matters if you license content. The cost is that future models know less about your brand, and answers about your category get shaped by sources that stayed open. For most brands competing on visibility, open wins.

Can I stay in Google search but out of AI Overviews?

Not through robots.txt. AI Overviews are built from the regular Google index, so blocking Googlebot removes you from everything. Google-Extended only controls Gemini model training. The closest lever is the nosnippet or max-snippet robots meta rules, which limit what Google can quote from your pages anywhere, AI Overviews included.

What is the difference between GPTBot, OAI-SearchBot, and ChatGPT-User?

Three different jobs from one company. GPTBot collects training data for future models. OAI-SearchBot builds the search index that ChatGPT cites when it searches the web. ChatGPT-User fetches a page live when a user or agent asks about it. You can allow or block each one independently.

I blocked a bot by accident. How fast does access recover?

Fix the robots.txt and the crawlers pick it up on their next check, typically within about a day since most refetch robots.txt every 24 hours or so. Lost index presence can take longer to rebuild than the access itself.

Did this help?

I build these tools to make SEO and AI-search work less tedious. If this saved you time, I'd love to hear about it, or talk shop.

Connect on LinkedIn Send a message

More free SEO and AI-search tools