</>
ValidateHTML

Robots.txt Syntax Guide

Complete reference for the robots.txt file. Every directive explained with practical examples.

What Is robots.txt?

The robots.txt file is a plain text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells search engine crawlers which pages or sections of your site they are allowed or not allowed to crawl.

It follows the Robots Exclusion Protocol, a standard respected by all major search engines including Google, Bing, and Yandex. It does not prevent pages from being indexed if they are linked from other sites — for that, you need noindex meta tags.

Basic Syntax

A robots.txt file consists of one or more groups. Each group starts with a User-agent line and contains one or more directives.

robots.txt
# This is a comment
User-agent: *
Disallow: /private/
Allow: /public/

Sitemap: https://example.com/sitemap.xml

Directives Reference

User-agent

Specifies which crawler the rules apply to. Use * to target all crawlers, or a specific bot name.

Target all crawlers
User-agent: *
Disallow: /admin/
Target a specific crawler
User-agent: Googlebot
Disallow: /no-google/

User-agent: Bingbot
Disallow: /no-bing/

Disallow

Tells the crawler not to access the specified path. An empty Disallow: means nothing is disallowed.

Block a directory
User-agent: *
Disallow: /admin/
Disallow: /tmp/
Disallow: /api/

Allow

Overrides a Disallow for a specific sub-path. Useful for allowing a page inside a blocked directory.

Allow within a blocked directory
User-agent: *
Disallow: /private/
Allow: /private/public-page.html

Sitemap

Points crawlers to your XML sitemap. This is independent of any User-agent group and can appear anywhere in the file. You can list multiple sitemaps.

Sitemap directive
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/blog-sitemap.xml

Crawl-delay

Specifies the number of seconds a crawler should wait between requests. Supported by Bing and Yandex. Google ignores this directive— use Google Search Console to control Googlebot's crawl rate instead.

Crawl-delay
User-agent: Bingbot
Crawl-delay: 10

Common Patterns

Block All Crawlers

robots.txt
User-agent: *
Disallow: /

Use this for staging or development environments that should never be indexed.

Allow All Crawlers

robots.txt
User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

An empty Disallow means everything is accessible. Always include your sitemap.

Block Specific Paths

robots.txt
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /cart/
Disallow: /checkout/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=

Sitemap: https://example.com/sitemap.xml

Block admin pages, API endpoints, and URL parameters that create duplicate content.

Block Specific Bots

robots.txt
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: *
Disallow: /private/

Sitemap: https://example.com/sitemap.xml

Block AI training crawlers while allowing search engines to access your content.

Common Mistakes

Blocking CSS and JavaScript

If you Disallow /css/ or /js/, Googlebot can't render your page properly. This hurts your rankings. Never block static assets.

# DON'T do this:
User-agent: *
Disallow: /css/
Disallow: /js/

Forgetting the Sitemap directive

The Sitemap line helps crawlers discover all your pages. Always include it, even if you've already submitted it in Search Console.

Using robots.txt to hide pages from Google

Disallow prevents crawling, not indexing. If other sites link to a blocked page, Google can still index the URL (without content). Use a noindex meta tag instead.

Case sensitivity issues

Paths in robots.txt are case-sensitive. /Admin/ and /admin/ are treated as different paths. Double-check your casing.

Wrong file location

robots.txt must be at the root of your domain: https://example.com/robots.txt. It won't work in a subdirectory.

Complete Example Template

Here's a production-ready robots.txt template you can adapt for your website:

robots.txt (template)
# Robots.txt for example.com

# Allow all search engines
User-agent: *
Disallow: /admin/
Disallow: /api/
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*?utm_
Disallow: /*?ref=

# Block AI training crawlers
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

# Sitemaps
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/blog-sitemap.xml

Check Your Robots.txt

Validate your robots.txt file for syntax errors and misconfigurations.

Recommended

Semrush All-in-One SEO Toolkit

Audit your robots.txt, sitemaps, and full site SEO with professional tools.

Try Semrush Free →