Robots.txt Checker
Enter a URL to check its robots.txt file. Find syntax errors, review crawl rules by user-agent, and verify your sitemaps are declared.
What Is Robots.txt?
The robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they can or cannot access.
It follows the Robots Exclusion Protocol, a standard used by all major search engines including Google, Bing, and Yahoo. While robots.txt is not a security mechanism — it is advisory, not enforceable — well-behaved crawlers will respect its directives.
A properly configured robots.txt file helps you manage your crawl budget, prevent indexing of duplicate or private content, and point crawlers to your XML sitemaps. Misconfigurations can accidentally block important pages from search engines, causing significant SEO damage.
Essential Robots.txt Directives
User-agent
Specifies which crawler the following rules apply to. Use * (asterisk) to target all crawlers, or specify individual bots like Googlebot, Bingbot, or GPTBot.
User-agent: *Disallow
Tells the crawler not to access a specific URL path or directory. An empty Disallow value means nothing is blocked for that user-agent.
Disallow: /admin/Allow
Explicitly permits access to a path within a disallowed directory. Useful for making exceptions. Supported by Google and Bing.
Allow: /admin/public/Sitemap
Points crawlers to your XML sitemap. Unlike other directives, Sitemap is not tied to any user-agent and applies globally. You can list multiple sitemaps.
Sitemap: https://example.com/sitemap.xmlCrawl-delay
Requests that the crawler wait a specified number of seconds between requests. Supported by Bing and Yandex but ignored by Google (use Search Console instead).
Crawl-delay: 10Common Robots.txt Mistakes
Blocking CSS and JS Files
Blocking CSS or JavaScript files prevents search engines from rendering your page, which hurts rankings. Google needs access to these resources to evaluate your content properly.
Blocking the Entire Site
A single Disallow: / under User-agent: * blocks all crawlers from your entire site. This is sometimes left in place accidentally after a staging or pre-launch phase.
Missing Sitemap Declaration
Not including a Sitemap directive means crawlers have to discover your sitemap through other means. Always declare your sitemaps in robots.txt for faster discovery.
Using Robots.txt for Privacy
Robots.txt is publicly readable and only advisory. Never rely on it to hide sensitive pages. Use authentication, meta noindex tags, or HTTP headers instead.
Incorrect Path Syntax
Paths in robots.txt are case-sensitive and must start with a forward slash. A typo or missing slash can silently invalidate your rule.
Wrong File Location
The robots.txt file must be at the root of your domain. Placing it in a subdirectory (e.g., /blog/robots.txt) will be ignored by crawlers.