Robots.txt Syntax Guide
Complete reference for the robots.txt file. Every directive explained with practical examples.
What Is robots.txt?
The robots.txt file is a plain text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they are allowed or not allowed to crawl.
It follows the Robots Exclusion Protocol, a standard respected by all major search engines including Google, Bing, and Yandex. It does not prevent pages from being indexed if they are linked from other sites — for that, you need noindex meta tags.
Basic Syntax
A robots.txt file consists of one or more groups. Each group starts with a User-agent line and contains one or more directives.
# This is a comment
User-agent: *
Disallow: /private/
Allow: /public/
Sitemap: https://example.com/sitemap.xmlDirectives Reference
User-agent
Specifies which crawler the rules apply to. Use * to target all crawlers, or a specific bot name.
User-agent: *
Disallow: /admin/User-agent: Googlebot
Disallow: /no-google/
User-agent: Bingbot
Disallow: /no-bing/Disallow
Tells the crawler not to access the specified path. An empty Disallow: means nothing is disallowed.
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /api/Allow
Overrides a Disallow for a specific sub-path. Useful for allowing a page inside a blocked directory.
User-agent: *
Disallow: /private/
Allow: /private/public-page.htmlSitemap
Points crawlers to your XML sitemap. This is independent of any User-agent group and can appear anywhere in the file. You can list multiple sitemaps.
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/blog-sitemap.xmlCrawl-delay
Specifies the number of seconds a crawler should wait between requests. Supported by Bing and Yandex. Google ignores this directive— use Google Search Console to control Googlebot's crawl rate instead.
User-agent: Bingbot
Crawl-delay: 10Common Patterns
Block All Crawlers
User-agent: *
Disallow: /Use this for staging or development environments that should never be indexed.
Allow All Crawlers
User-agent: *
Disallow:
Sitemap: https://example.com/sitemap.xmlAn empty Disallow means everything is accessible. Always include your sitemap.
Block Specific Paths
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /cart/
Disallow: /checkout/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Sitemap: https://example.com/sitemap.xmlBlock admin pages, API endpoints, and URL parameters that create duplicate content.
Block Specific Bots
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: *
Disallow: /private/
Sitemap: https://example.com/sitemap.xmlBlock AI training crawlers while allowing search engines to access your content.
Common Mistakes
Blocking CSS and JavaScript
If you Disallow /css/ or /js/, Googlebot can't render your page properly. This hurts your rankings. Never block static assets.
# DON'T do this:
User-agent: *
Disallow: /css/
Disallow: /js/Forgetting the Sitemap directive
The Sitemap line helps crawlers discover all your pages. Always include it, even if you've already submitted it in Search Console.
Using robots.txt to hide pages from Google
Disallow prevents crawling, not indexing. If other sites link to a blocked page, Google can still index the URL (without content). Use a noindex meta tag instead.
Case sensitivity issues
Paths in robots.txt are case-sensitive. /Admin/ and /admin/ are treated as different paths. Double-check your casing.
Wrong file location
robots.txt must be at the root of your domain: https://example.com/robots.txt. It won't work in a subdirectory.
Complete Example Template
Here's a production-ready robots.txt template you can adapt for your website:
# Robots.txt for example.com
# Allow all search engines
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*?utm_
Disallow: /*?ref=
# Block AI training crawlers
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
# Sitemaps
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/blog-sitemap.xmlCheck Your Robots.txt
Validate your robots.txt file for syntax errors and misconfigurations.
Semrush — All-in-One SEO Toolkit
Audit your robots.txt, sitemaps, and full site SEO with professional tools.