By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
ProbizbeaconProbizbeacon
  • Business
  • Investing
  • Money Management
  • Entrepreneur
  • Side Hustles
  • Banking
  • Mining
  • Retirement
Reading: Complete Crawler List For AI User-Agents [Dec 2025]
Share
Notification
ProbizbeaconProbizbeacon
Search
  • Business
  • Investing
  • Money Management
  • Entrepreneur
  • Side Hustles
  • Banking
  • Mining
  • Retirement
© 2025 All Rights reserved | Powered by Probizbeacon
Probizbeacon > Money Management > Complete Crawler List For AI User-Agents [Dec 2025]
Money Management

Complete Crawler List For AI User-Agents [Dec 2025]

December 5, 2025 10 Min Read
Share
10 Min Read

AI visibility plays a crucial role for SEOs, and this starts with controlling AI crawlers. If AI crawlers can’t access your pages, you’re invisible to AI discovery engines.

On the flip side, unmonitored AI crawlers can overwhelm servers with excessive requests, causing crashes and unexpected hosting bills.

User-agent strings are essential for controlling which AI crawlers can access your website, but official documentation is often outdated, incomplete, or missing entirely. So, we curated a verified list of AI crawlers from our actual server logs as a useful reference.

Every user-agent is validated against official IP lists when available, ensuring accuracy. We will maintain and update this list to catch new crawlers and changes to existing ones.

The Complete Verified AI Crawler List (December 2025)

Name Purpose Crawl Rate of SEJ (pages/hour) Verified IP List Robots.txt disallow Complete User Agent
GPTBot AI training data collection for GPT models (ChatGPT, GPT-4o) 100 Official IP List User-agent: GPTBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.3; +https://openai.com/gptbot)
ChatGPT-User AI agent for real-time web browsing when users interact with ChatGPT 2400 Official IP List User-agent: ChatGPT-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
OAI-SearchBot AI search indexing for ChatGPT search features (not for training) 150 Official IP List User-agent: OAI-SearchBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.3; +https://openai.com/searchbot
ClaudeBot AI training data collection for Claude models 500 Official IP List User-agent: ClaudeBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])
Claude-User AI agent for real-time web access when Claude users browse <10 Not available User-agent: Claude-User
Disallow: /sample-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-User/1.0; [email protected])
Claude-SearchBot AI search indexing for Claude search capabilities <10 Not available User-agent: Claude-SearchBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Claude-SearchBot/1.0; +https://www.anthropic.com)
Google-CloudVertexBot AI agent for Vertex AI Agent Builder (site owners’ request only) <10 Official IP List User-agent: Google-CloudVertexBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.7390.122 Mobile Safari/537.36 (compatible; Google-CloudVertexBot; +https://cloud.google.com/enterprise-search)
Google-Extended Token controlling AI training usage of Googlebot-crawled content. User-agent: Google-Extended
Allow: /
Disallow: /private-folder
Gemini-Deep-Research AI research agent for Google Gemini’s Deep Research feature <10 Official IP List User-agent: Gemini-Deep-Research
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Gemini-Deep-Research; +https://gemini.google/overview/deep-research/) Chrome/135.0.0.0 Safari/537.36
Google  Gemini’s chat when a user asks to open a webpage <10 Google
Bingbot Powers Bing Search and Bing Chat (Copilot) AI answers 1300 Official IP List User-agent: BingBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
Applebot-Extended Doesn’t crawl but controls how Apple uses Applebot data. <10 Official IP List User-agent: Applebot-Extended
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15 (Applebot/0.1; +http://www.apple.com/go/applebot)
PerplexityBot AI search indexing for Perplexity’s answer engine 150 Official IP List User-agent: PerplexityBot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; PerplexityBot/1.0; +https://perplexity.ai/perplexitybot)
Perplexity-User AI agent for real-time browsing when Perplexity users request information <10 Official IP List User-agent: Perplexity-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)
Meta-ExternalAgent AI training data collection for Meta’s LLMs (Llama, etc.) 1100 Not available User-agent: meta-externalagent
Allow: /
Disallow: /private-folder
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Meta-WebIndexer Used to improve Meta AI search. <10 Not available User-agent: Meta-WebIndexer
Allow: /
Disallow: /private-folder
meta-webindexer/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
Bytespider AI training data for ByteDance’s LLMs for products like TikTok <10 Not available User-agent: Bytespider
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)
Amazonbot AI training for Alexa and other Amazon AI services 1050 Not available User-agent: Amazonbot
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214 Safari/537.36
DuckAssistBot AI search indexing for DuckDuckGo search engine 20 Official IP List User-agent: DuckAssistBot
Allow: /
Disallow: /private-folder
DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html)
MistralAI-User Mistral’s real-time citation fetcher for “Le Chat” assistant <10 Not available User-agent: MistralAI-User
Allow: /
Disallow: /private-folder
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; MistralAI-User/1.0; +https://docs.mistral.ai/robots)
Webz.io Data extraction and web scraping used by other AI training companies. Formerly known as Omgili. <10 Not available User-agent: webzio
Allow: /
Disallow: /private-folder
webzio (+https://webz.io/bot.html)
Diffbot Data extraction and web scraping used by companies all over the world. <10 Not available User-agent: Diffbot
Allow: /
Disallow: /private-folder
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729; Diffbot/0.1; +http://www.diffbot.com)
ICC-Crawler AI and machine learning data collection <10 Not available User-agent: ICC-Crawler
Allow: /
Disallow: /private-folder
ICC-Crawler/3.0 (Mozilla-compatible; ; https://ucri.nict.go.jp/en/icccrawler.html)
CCBot Open-source web archive used as training data by multiple AI companies <10 Official IP List User-agent: CCBot
Allow: /
Disallow: /private-folder
CCBot/2.0 (https://commoncrawl.org/faq/)
See also  New AI Models Make More Mistakes, Creating Risk for Marketers

The user-agent strings above have all been verified against Search Engine Journal server logs.

Popular AI Agent Crawlers With Unidentifiable User Agent

We’ve found that the following didn’t identify themselves:

  • you.com.
  • ChatGPT’s agent Operator.
  • Bing’s Copilot chat.
  • Grok.
  • DeepSeek.

There is no way to track this crawler from accessing webpages other than by identifying the explicit IP.

We set up a trap page (e.g., /specific-page-for-you-com/) and used the on-page chat to prompt you.com to visit it, allowing us to locate the corresponding visit record and IP address in our server logs. Below is the screenshot:

What About Agentic AI Browsers?

Unfortunately, AI browsers such as Comet or ChatGPT’s Atlas don’t differentiate themselves in the user agent string, and you can’t identify them in server logs and blend with normal users’ visits.

This is disappointing for SEOs because tracking agentic browser visits to a website is important for reporting POV.

How To Check What’s Crawling Your Server

Some hosting companies offer a user interface (UI) that makes it easy to access and look at server logs, depending on what hosting service you are using.

If your hosting doesn’t offer this, you can get server log files (usually located  /var/log/apache2/access.log in Linux-based servers) via FTP or request it from your server support to send it to you.

Once you have the log file, you can view and analyze it in either Google Sheets (if the file is in CSV format), Screaming Frog’s log analyzer, or, if your log file is less than 100 MB, you can try analyzing it with Gemini AI.

See also  10 Biggest & Best PPC Features Of The Year

How To Verify Legitimate Vs. Fake Bots

Fake crawlers can spoof legitimate user agents to bypass restrictions and scrape content aggressively. For example, anyone can impersonate ClaudeBot from their laptop and initiate crawl request from the terminal. In your server log, you will see it as Claudebot is crawling it:

curl -A 'Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; [email protected])' https://example.com

Verification can help to save server bandwidth and prevent harvesting content illegally. The most reliable verification method you can apply is checking the request IP.

Check all IPs and scan to match if it’s one of the officially declared IPs listed above. If so, you can allow the request; otherwise, block.

Various types of firewalls can help you with this via allowlist verified IPs (which allows legitimate bot requests to pass through), and all other requests impersonating AI crawlers in their user agent strings are blocked.

For example, in WordPress, you can use Wordfence free plugin to allowlist legitimate IPs from the official lists (as above) and add blocking custom rules as below:

The allowlist rule is superior, and it will let legitimate crawlers pass through and block any impersonation request which comes from different IPs.

However, please note that it is possible to spoof an IP address, and in that case, when bot user agent and IPs are spoofed, you won’t be able to block it.

Conclusion: Stay In Control Of AI Crawlers For Reliable AI Visibility

AI crawlers are now part of our web ecosystem, and the bots listed here represent the major AI platforms currently indexing the web, although this list is likely to grow.

See also  Brave Reveals Systemic Security Issues In AI Browsers

Check your server logs regularly to see what’s actually hitting your site and make sure you inadvertently don’t block AI crawlers if visibility in AI search engines is important for your business. If you don’t want AI crawlers to access your content, block them via robots.txt using the user-agent name.

We’ll keep this list updated as new crawlers emerge and update existing ones, so we recommend you bookmark this URL, or revisit this article on a regular basis to keep your AI crawler list up to date.

More Resources:


Featured Image: BestForBest/Shutterstock

You Might Also Like

8 GEO Strategies For Boosting AI Visibility in 2025

How To Use Paid Search & Social Ads For Promoting Events

What OpenAI’s Research Reveals About The Future Of AI Search

Instagram Testing 10-Minute Reels For Long-Form Video Content

A Strategic Approach To Boost SEO

Previous Article YouTube Title A/B Testing Rolls Out Globally To Creators YouTube Title A/B Testing Rolls Out Globally To Creators
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

probizbeacon probizbeacon
probizbeacon probizbeacon

We are dedicated to providing accurate, timely, and in-depth coverage of financial trends, empowering professionals, entrepreneurs, and investors to make informed decisions..

Editor's Picks

YouTube Upgrades Shorts Editor With Timeline View
YouTube & Facebook On Top, TikTok Growing
8 Best Places To Cash A Check Without A Bank Account
Is Your Small Business Actually Ready to Adopt AI?

Follow Us on Socials

We use social media to react to breaking news, update supporters and share information

Facebook Twitter Telegram
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms of Service
Reading: Complete Crawler List For AI User-Agents [Dec 2025]
Share
© 2025 All Rights reserved | Powered by Probizbeacon
Welcome Back!

Sign in to your account

Lost your password?