Robots.txt Checker

Enter a URL to check its robots.txt file. Find syntax errors, review crawl rules by user-agent, and verify your sitemaps are declared.

What Is Robots.txt?

The robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they can or cannot access.

It follows the Robots Exclusion Protocol, a standard used by all major search engines including Google, Bing, and Yahoo. While robots.txt is not a security mechanism — it is advisory, not enforceable — well-behaved crawlers will respect its directives.

A properly configured robots.txt file helps you manage your crawl budget, prevent indexing of duplicate or private content, and point crawlers to your XML sitemaps. Misconfigurations can accidentally block important pages from search engines, causing significant SEO damage.

Essential Robots.txt Directives

User-agent

Specifies which crawler the following rules apply to. Use * (asterisk) to target all crawlers, or specify individual bots like Googlebot, Bingbot, or GPTBot.

User-agent: *

Disallow

Tells the crawler not to access a specific URL path or directory. An empty Disallow value means nothing is blocked for that user-agent.

Disallow: /admin/

Allow

Explicitly permits access to a path within a disallowed directory. Useful for making exceptions. Supported by Google and Bing.

Allow: /admin/public/

Sitemap

Points crawlers to your XML sitemap. Unlike other directives, Sitemap is not tied to any user-agent and applies globally. You can list multiple sitemaps.

Sitemap: https://example.com/sitemap.xml

Crawl-delay

Requests that the crawler wait a specified number of seconds between requests. Supported by Bing and Yandex but ignored by Google (use Search Console instead).

Crawl-delay: 10

Common Robots.txt Mistakes

Blocking CSS and JS Files

Blocking CSS or JavaScript files prevents search engines from rendering your page, which hurts rankings. Google needs access to these resources to evaluate your content properly.

Blocking the Entire Site

A single Disallow: / under User-agent: * blocks all crawlers from your entire site. This is sometimes left in place accidentally after a staging or pre-launch phase.

Missing Sitemap Declaration

Not including a Sitemap directive means crawlers have to discover your sitemap through other means. Always declare your sitemaps in robots.txt for faster discovery.

Using Robots.txt for Privacy

Robots.txt is publicly readable and only advisory. Never rely on it to hide sensitive pages. Use authentication, meta noindex tags, or HTTP headers instead.

Incorrect Path Syntax

Paths in robots.txt are case-sensitive and must start with a forward slash. A typo or missing slash can silently invalidate your rule.

Wrong File Location

The robots.txt file must be at the root of your domain. Placing it in a subdirectory (e.g., /blog/robots.txt) will be ignored by crawlers.

Frequently Asked Questions

Does every website need a robots.txt file?
Not strictly, but it is strongly recommended. Without a robots.txt file, crawlers will attempt to access all pages on your site. Having one gives you control over crawl behavior, helps manage crawl budget, and lets you declare your sitemaps.
Can robots.txt prevent a page from appearing in Google?
No. Robots.txt blocks crawling, not indexing. If other sites link to a blocked page, Google may still index the URL (without content). To prevent indexing, use a meta noindex tag or X-Robots-Tag HTTP header instead.
What happens if robots.txt has syntax errors?
Most crawlers are lenient with minor formatting issues, but significant errors can cause rules to be ignored entirely. For example, missing the User-agent line before a Disallow will make that rule ineffective. Our checker identifies these syntax problems.
How often do search engines check robots.txt?
Google typically caches your robots.txt for up to 24 hours. Changes may take a day or more to take effect. If the file becomes unreachable (HTTP 5xx), Google will use the last cached version for up to 30 days.
Can I use robots.txt to block AI crawlers?
Yes. Many AI companies have published their crawler user-agent names (e.g., GPTBot, ChatGPT-User, Google-Extended, CCBot). You can add Disallow rules for these user-agents. However, compliance is voluntary and not all AI crawlers respect robots.txt.
What is the difference between Disallow and noindex?
Disallow in robots.txt prevents crawlers from fetching a page. The meta noindex tag (or X-Robots-Tag header) tells search engines not to include a page in their index. They serve different purposes and can be used together, but note that if you block crawling, the crawler cannot see the noindex tag.

Related Tools