Robots.txt Configuration

All Topics Learning Paths

Getting Started SEO On-Page Advanced SEO Content Strategy Technical Deep Dives Industry Guides Visual Guides Using the Platform Issue Reference Troubleshooting & FAQ Glossary Off-Page SEO CMS Guides AI & SEO

In this section: XML Sitemaps Robots.txt Configuration Redirect Strategies Site Architecture Page Speed Optimization

Technical Deep Dives

Robots.txt Configuration

The robots.txt file tells search engine crawlers which pages or sections of your site they can or cannot access. It's a text file placed in your site's root directory that provides crawling instructions.

How Robots.txt Works

1

Crawler Arrives

Bot visits your site

2

Checks robots.txt

Fetches /robots.txt first

3

Follows Rules

Crawls allowed pages only

Basic Syntax

# Comment - ignored by crawlers

User-agent: *
Disallow: /private/
Allow: /private/public-page.html

User-agent: Googlebot
Disallow: /no-google/

Sitemap: https://example.com/sitemap.xml

Robots.txt Directives

Directive	Purpose	Example
`User-agent`	Specifies which crawler the rules apply to	`User-agent: Googlebot`
`Disallow`	Blocks access to specified path	`Disallow: /admin/`
`Allow`	Permits access (overrides Disallow)	`Allow: /admin/public/`
`Sitemap`	Location of XML sitemap	`Sitemap: https://...`
`Crawl-delay`	Seconds between requests (not Google)	`Crawl-delay: 10`

Common User Agents

* - All crawlers
Googlebot - Google's main crawler
Googlebot-Image - Google Images
Googlebot-News - Google News

Bingbot - Microsoft Bing
Slurp - Yahoo
DuckDuckBot - DuckDuckGo
Baiduspider - Baidu

Pattern Matching

Pattern	Matches	Example
`*`	Any sequence of characters	`Disallow: /*.php`
`$`	End of URL	`Disallow: /*.php$`
`/`	Root or path separator	`Disallow: /folder/`

Common Robots.txt Examples

Block Everything

User-agent: *
Disallow: /

Allow Everything

User-agent: *
Disallow:

Block Specific Folder

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

Block URL Parameters

User-agent: *
Disallow: /*?*
Disallow: /*&*

Robots.txt vs Noindex

robots.txt

Blocks crawling
Page may still be indexed (via external links)
Saves crawl budget
Cannot read noindex if blocked

noindex

Allows crawling
Prevents indexing
Page won't appear in search
Best for removing from search

Important: Don't use robots.txt to hide pages from search. If external sites link to a blocked page, it can still appear in search results. Use noindex instead.

Common Mistakes

Blocking CSS/JS files - Prevents Google from rendering pages properly
Using robots.txt for security - It's public and not a security measure
Blocking entire site accidentally - Forgetting to update after development
Syntax errors - Case sensitivity, missing colons, wrong paths
Conflicting rules - Order matters; more specific rules should come first

Testing Your Robots.txt

Use Google Search Console's robots.txt Tester
Check that critical pages aren't accidentally blocked
Verify CSS and JS files are accessible
Test after any changes