The robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot access. It's a text file placed in your site's root directory that provides crawling instructions.
Bot visits your site
Fetches /robots.txt first
Crawls allowed pages only
# Comment - ignored by crawlers
User-agent: *
Disallow: /private/
Allow: /private/public-page.html
User-agent: Googlebot
Disallow: /no-google/
Sitemap: https://example.com/sitemap.xml
| Directive | Purpose | Example |
|---|---|---|
User-agent |
Specifies which crawler the rules apply to | User-agent: Googlebot |
Disallow |
Blocks access to specified path | Disallow: /admin/ |
Allow |
Permits access (overrides Disallow) | Allow: /admin/public/ |
Sitemap |
Location of XML sitemap | Sitemap: https://... |
Crawl-delay |
Seconds between requests (not Google) | Crawl-delay: 10 |
* - All crawlersGooglebot - Google's main crawlerGooglebot-Image - Google ImagesGooglebot-News - Google NewsBingbot - Microsoft BingSlurp - YahooDuckDuckBot - DuckDuckGoBaiduspider - Baidu| Pattern | Matches | Example |
|---|---|---|
* |
Any sequence of characters | Disallow: /*.php |
$ |
End of URL | Disallow: /*.php$ |
/ |
Root or path separator | Disallow: /folder/ |
User-agent: *
Disallow: /
User-agent: *
Disallow:
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
User-agent: *
Disallow: /*?*
Disallow: /*&*