AI visibility plays a crucial role for SEOs, and this starts with controlling AI crawlers. If AI crawlers can’t access your pages, you’re invisible to AI discovery engines.
On the flip side, unmonitored AI crawlers can overwhelm servers with excessive requests, causing crashes and unexpected hosting bills.
User-agent strings are essential for controlling which AI crawlers can access your website, but official documentation is often outdated, incomplete, or missing entirely. So, we curated a verified list of AI crawlers from our actual server logs as a useful reference.
Every user-agent is validated against official IP lists when available, ensuring accuracy. We will maintain and update this list to catch new crawlers and changes to existing ones.
The Complete Verified AI Crawler List (December 2025)
| Name | Purpose | Crawl Rate of SEJ (pages/hour) | Verified IP List | Robots.txt disallow | Complete User Agent |
|---|---|---|---|---|---|
| GPTBot | AI training data collection for GPT models (ChatGPT, GPT-4o) | 100 | Official IP List | User-agent: GPTBot Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot) |
| ChatGPT-User | AI agent for real-time web browsing when users interact with ChatGPT | 2400 | Official IP List | User-agent: ChatGPT-User Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot |
| OAI-SearchBot | AI search indexing for ChatGPT search features (not for training) | 150 | Official IP List | User-agent: OAI-SearchBot Allow: / Disallow: /private-folder |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot |
| ClaudeBot | AI training data collection for Claude models | 500 | Official IP List | User-agent: ClaudeBot Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected]) |
| Claude-User | AI agent for real-time web access when Claude users browse | <10 | Not available | User-agent: Claude-User Disallow: /sample-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-User/1.0; [email protected]) |
| Claude-SearchBot | AI search indexing for Claude search capabilities | <10 | Not available | User-agent: Claude-SearchBot Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-SearchBot/1.0; +https://www.anthropic.com) |
| Google-CloudVertexBot | AI agent for Vertex AI Agent Builder (site owners’ request only) | <10 | Official IP List | User-agent: Google-CloudVertexBot Allow: / Disallow: /private-folder |
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Google-CloudVertexBot; +https://cloud.google.com/enterprise-search) |
| Google-Extended | Token controlling AI training usage of Googlebot-crawled content. | User-agent: Google-Extended Allow: / Disallow: /private-folder |
|||
| Gemini-Deep-Research | AI research agent for Google Gemini’s Deep Research feature | <10 | Official IP List | User-agent: Gemini-Deep-Research Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini.google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36 |
| Gemini’s chat when a user asks to open a webpage | <10 | ||||
| Bingbot | Powers Bing Search and Bing Chat (Copilot) AI answers | 1300 | Official IP List | User-agent: BingBot Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36 |
| Applebot-Extended | Doesn’t crawl but controls how Apple uses Applebot data. | <10 | Official IP List | User-agent: Applebot-Extended Allow: / Disallow: /private-folder |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot) |
| PerplexityBot | AI search indexing for Perplexity’s answer engine | 150 | Official IP List | User-agent: PerplexityBot Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot) |
| Perplexity-User | AI agent for real-time browsing when Perplexity users request information | <10 | Official IP List | User-agent: Perplexity-User Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user) |
| Meta-ExternalAgent | AI training data collection for Meta’s LLMs (Llama, etc.) | 1100 | Not available | User-agent: meta-externalagent Allow: / Disallow: /private-folder |
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) |
| Meta-WebIndexer | Used to improve Meta AI search. | <10 | Not available | User-agent: Meta-WebIndexer Allow: / Disallow: /private-folder |
meta-webindexer/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler) |
| Bytespider | AI training data for ByteDance’s LLMs for products like TikTok | <10 | Not available | User-agent: Bytespider Allow: / Disallow: /private-folder |
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/) |
| Amazonbot | AI training for Alexa and other Amazon AI services | 1050 | Not available | User-agent: Amazonbot Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36 |
| DuckAssistBot | AI search indexing for DuckDuckGo search engine | 20 | Official IP List | User-agent: DuckAssistBot Allow: / Disallow: /private-folder |
DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html) |
| MistralAI-User | Mistral’s real-time citation fetcher for “Le Chat” assistant | <10 | Not available | User-agent: MistralAI-User Allow: / Disallow: /private-folder |
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MistralAI-User/1.0; +https://docs.mistral.ai/robots) |
| Webz.io | Data extraction and web scraping used by other AI training companies. Formerly known as Omgili. | <10 | Not available | User-agent: webzio Allow: / Disallow: /private-folder |
webzio (+https://webz.io/bot.html) |
| Diffbot | Data extraction and web scraping used by companies all over the world. | <10 | Not available | User-agent: Diffbot Allow: / Disallow: /private-folder |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com) |
| ICC-Crawler | AI and machine learning data collection | <10 | Not available | User-agent: ICC-Crawler Allow: / Disallow: /private-folder |
ICC-Crawler/3.0 (Mozilla-compatible; ; https://ucri.nict.go.jp/en/icccrawler.html) |
| CCBot | Open-source web archive used as training data by multiple AI companies | <10 | Official IP List | User-agent: CCBot Allow: / Disallow: /private-folder |
CCBot/2.0 (https://commoncrawl.org/faq/) |
The user-agent strings above have all been verified against Search Engine Journal server logs.
Popular AI Agent Crawlers With Unidentifiable User Agent
We’ve found that the following didn’t identify themselves:
- you.com.
- ChatGPT’s agent Operator.
- Bing’s Copilot chat.
- Grok.
- DeepSeek.
There is no way to track this crawler from accessing webpages other than by identifying the explicit IP.
We set up a trap page (e.g., /specific-page-for-you-com/) and used the on-page chat to prompt you.com to visit it, allowing us to locate the corresponding visit record and IP address in our server logs. Below is the screenshot:
What About Agentic AI Browsers?
Unfortunately, AI browsers such as Comet or ChatGPT’s Atlas don’t differentiate themselves in the user agent string, and you can’t identify them in server logs and blend with normal users’ visits.
This is disappointing for SEOs because tracking agentic browser visits to a website is important for reporting POV.
How To Check What’s Crawling Your Server
Some hosting companies offer a user interface (UI) that makes it easy to access and look at server logs, depending on what hosting service you are using.
If your hosting doesn’t offer this, you can get server log files (usually located /var/log/apache2/access.log in Linux-based servers) via FTP or request it from your server support to send it to you.
Once you have the log file, you can view and analyze it in either Google Sheets (if the file is in CSV format), Screaming Frog’s log analyzer, or, if your log file is less than 100 MB, you can try analyzing it with Gemini AI.
How To Verify Legitimate Vs. Fake Bots
Fake crawlers can spoof legitimate user agents to bypass restrictions and scrape content aggressively. For example, anyone can impersonate ClaudeBot from their laptop and initiate crawl request from the terminal. In your server log, you will see it as Claudebot is crawling it:
curl -A 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])' https://example.com
Verification can help to save server bandwidth and prevent harvesting content illegally. The most reliable verification method you can apply is checking the request IP.
Check all IPs and scan to match if it’s one of the officially declared IPs listed above. If so, you can allow the request; otherwise, block.
Various types of firewalls can help you with this via allowlist verified IPs (which allows legitimate bot requests to pass through), and all other requests impersonating AI crawlers in their user agent strings are blocked.
For example, in WordPress, you can use Wordfence free plugin to allowlist legitimate IPs from the official lists (as above) and add blocking custom rules as below:
The allowlist rule is superior, and it will let legitimate crawlers pass through and block any impersonation request which comes from different IPs.
However, please note that it is possible to spoof an IP address, and in that case, when bot user agent and IPs are spoofed, you won’t be able to block it.
Conclusion: Stay In Control Of AI Crawlers For Reliable AI Visibility
AI crawlers are now part of our web ecosystem, and the bots listed here represent the major AI platforms currently indexing the web, although this list is likely to grow.
Check your server logs regularly to see what’s actually hitting your site and make sure you inadvertently don’t block AI crawlers if visibility in AI search engines is important for your business. If you don’t want AI crawlers to access your content, block them via robots.txt using the user-agent name.
We’ll keep this list updated as new crawlers emerge and update existing ones, so we recommend you bookmark this URL, or revisit this article on a regular basis to keep your AI crawler list up to date.
More Resources:
Featured Image: BestForBest/Shutterstock