robots.txt for SEO: A Practical 2026 Guide

robots.txt is a small text file with outsized power: one wrong line can hide your entire site from Google, and one common misunderstanding can leave private pages in search results anyway. It lives at the root of your domain and tells crawlers which URLs they may request. In 2026 it remains one of the first things to check when traffic drops, because it sits upstream of everything else.

This guide covers what robots.txt does and does not do, the syntax that matters, the critical difference between robots.txt and noindex, how WordPress handles the file, and the mistakes a crawl-based audit catches before they cost you rankings.

What Is robots.txt?

robots.txt is a plain-text file at yoursite.com/robots.txt that gives crawlers instructions about which parts of your site they may or may not request. It follows the Robots Exclusion Protocol, which well-behaved bots like Googlebot and Bingbot respect.

Two things are essential to understand from the start:

It is a request, not a wall. Reputable crawlers obey it. Malicious bots and scrapers can ignore it entirely.
It is not a privacy tool. Disallowing a URL does not hide it. Anything truly private must sit behind authentication, not a robots.txt rule.

robots.txt controls crawl access. It is the traffic cop at the entrance, not the lock on the door.

robots.txt Syntax: The Directives That Matter

A robots.txt file is made of groups. Each group names a crawler with User-agent and then lists the rules that apply to it. The directives you actually use are few:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /cart/
Disallow: /checkout/

Sitemap: https://example.com/sitemap_index.xml

What each line does:

User-agent names the bot the rules apply to. * means all crawlers.
Disallow tells the bot not to request a path. Disallow: / blocks the whole site.
Allow carves an exception out of a disallowed path.
Sitemap points crawlers to your XML sitemap. It is absolute and independent of the user-agent groups.

The single most dangerous line in all of SEO is Disallow: / left in a live robots.txt after a site launch. It tells every crawler to stay out of the entire site. It is also the easiest mistake to make, because staging sites ship with exactly that rule and it gets copied to production.

⚡

Not sure your robots.txt is letting Google in? The AI-Readiness Audit reads your live robots.txt and crawl access — 28 checks, 30 seconds, no signup.

robots.txt vs noindex: A Critical Difference

This is the distinction that trips up even experienced site owners, and getting it wrong does the opposite of what you intend.

robots.txt Disallow controls crawling. It asks a bot not to fetch the URL.
noindex (a meta tag or HTTP header) controls indexing. It tells search engines not to list the page in results.

The trap: if you want a page out of Google, blocking it in robots.txt is the wrong move. When a URL is disallowed, Google cannot fetch it, which means Google never sees the noindex tag. If anything links to that URL, Google can index it anyway, showing the bare URL with no description. To remove a page from results, you do the reverse: allow crawling and add noindex, so Google can fetch the page and read the instruction to drop it.

Goal	Use	Do NOT use
Keep a page out of search results	`noindex` (and allow crawling)	robots.txt `Disallow`
Stop a bot wasting crawl budget on a section	robots.txt `Disallow`	noindex alone
Hide truly private content	Authentication / password	robots.txt or noindex

A page that is both disallowed in robots.txt and tagged noindex is a contradiction: Google cannot read the noindex because it is not allowed to crawl the page. That conflict is one of the most common findings on real sites.

robots.txt on WordPress

WordPress creates a virtual robots.txt at yoursite.com/robots.txt automatically. You will not find a file on disk by default; WordPress generates it on request. SEO plugins hook into this:

Yoast and Rank Math both provide a robots.txt editor in their tools section, writing to that virtual file.
A physical robots.txt file in your site root, if one exists, overrides the virtual one entirely. This is a frequent source of confusion: you edit the plugin setting, but an old physical file is what actually serves.

Two WordPress-specific cautions:

Do not block /wp-content/ wholesale. It contains your themes’ CSS and JS, and your images. Blocking it stops Google from rendering and understanding your pages.
Always verify the live file. Open the URL in a browser. The rule that serves to Google is the one at the URL, not the one in your plugin’s text box, if a physical file is overriding it.

For the separate question of which AI crawlers to allow or block in that same file, see the dedicated guide to robots.txt rules for AI bots. It sits next to your SEO rules in the same file.

Common robots.txt Mistakes an Audit Catches

robots.txt mistakes are quiet and expensive, because the file looks fine until you realize what it is blocking. A crawl-based audit reads the live file and tests actual access:

Disallow: / in production, blocking the entire site, usually left over from staging.
Blocked CSS/JS, which stops Google from rendering the page correctly.
Important pages disallowed, removing them from crawl and eventually from rankings.
robots.txt used as privacy, leaving “hidden” URLs indexable and even listed in the public robots file as a map of what you wanted to hide.
Missing Sitemap directive, so crawlers are not pointed at your sitemap.
robots.txt vs noindex conflicts, where a page is both blocked and noindexed, so the noindex never takes effect.

Mistake	Why it costs you	The fix
`Disallow: /` live	De-indexes the whole site	Remove it; allow crawling of public content
Blocked CSS / JS	Google misrenders the page	Allow `/wp-content/` assets
Privacy by robots.txt	Pages stay indexable and exposed	Use auth for private content
No `Sitemap` line	Crawlers miss your sitemap	Add the absolute sitemap URL
Disallow + noindex on same URL	noindex never read	Allow crawl so noindex can apply

🔍

Want to know if your robots.txt is blocking the wrong things? robots and crawl-access checks are part of the full audit — 200+ checks across 17 categories.

How to Test and Audit robots.txt

Testing robots.txt means reading the live file, confirming how Google interprets it, and verifying that the pages you care about are actually reachable. Three layers cover it:

Read the live file. Open yoursite.com/robots.txt in a browser. This is the source of truth, not your plugin setting.
Use Google Search Console. Its robots.txt report shows how Google fetched and parsed the file and flags errors.
Run a crawl-based audit. A whole-site crawl tests whether your important URLs are actually reachable, catches blocked assets, finds the robots-vs-noindex conflicts, and confirms the sitemap directive is present.

Yoast and Rank Math give you the editor to write the rules. They do not crawl your whole site and tell you that rule three is blocking a section you wanted indexed, or that a page is disallowed and noindexed at the same time. That whole-site verification is what Aetos SEO does: it reads your live robots.txt, tests crawl access, and reports the conflicts so you can fix the highest-impact ones first. It does not edit your robots.txt; it tells you exactly where the rules work against you. For the AI-crawler side of the same file, read the robots.txt for AI bots guide.

Frequently Asked Questions About robots.txt

What is robots.txt?

robots.txt is a plain-text file at the root of your domain that tells crawlers which parts of the site they may or may not request. It guides well-behaved bots like Googlebot, but it is not a security control and does not hide a page from being indexed if other pages link to it.

Does robots.txt stop a page from appearing in Google?

No. robots.txt controls crawling, not indexing. If you disallow a URL but other pages link to it, Google can still index the URL without its content. To keep a page out of search results, use a noindex meta tag and allow crawling so Google can see it.

What is the difference between robots.txt and noindex?

robots.txt blocks crawling: it asks bots not to fetch a URL. noindex blocks indexing: it tells search engines not to list the page in results. They are not interchangeable, and blocking a page in robots.txt actually prevents Google from seeing a noindex tag on it.

Should I block CSS and JavaScript in robots.txt?

No. Google needs to fetch your CSS and JavaScript to render the page the way a user sees it. Blocking those resources can make Google misjudge your layout, mobile usability, and content, which can hurt rankings. Allow your assets.

Where is the robots.txt file in WordPress?

WordPress generates a virtual robots.txt at yoursite.com/robots.txt by default, and SEO plugins like Yoast and Rank Math let you edit it. If a physical robots.txt file exists in the site root, it overrides the virtual one. Always check the live file, not just the plugin setting.

How do I test my robots.txt?

Open yoursite.com/robots.txt in a browser to see the live rules, use Google Search Console’s robots.txt report to confirm how Google reads it, and run a crawl-based audit to catch blocked resources, conflicts with noindex, and a missing sitemap directive across the whole site.