robots.txt Optimized For WordPress
- January 18, 2011
- Posted By: John Kiminas
- WORDPRESS
- 2 comments
Like any other Web site platform, your WordPress installation might contain some folders and files that you do not want search engines to index. Several of the standard WordPress directories, and the login page, do not need to be indexed by any search engine, so use a robots.txt file and exclude them.
What is a robots.txt file, and what does it do?
Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
A robo
ts.txt file is a simple text file which restricts access to your site by search engine robots that crawl the web. These automated bots, before they access pages of a site, check to see if a robots.txt file exists that prevents them from accessing certain pages. (Only respectable robots will respect the directives in a robots.txt file, although some may interpret them differently. However, a robots.txt is not enforceable, so spammers and troublemakers will ignore it. For this reason, we recommend using an .htaccess file.)
While most search engines won't crawl or index the content of pages blocked by robots.txt, they may still index the URLs if they are found on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), may appear in search engine search results.
In order to use a robots.txt file, you'll need to have access to the root of your domain. If you don't have access to the root of a domain, you can restrict access using the robots meta tag.
Here is how it works?
It works likes this: a robot wants to vists a Web site URL, say http://www.sitedoodle.com/demo.html. Before it does so, it firsts checks for http://www.sitedoodle.com/robots.txt, and finds:
User-agent: *
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-content/plugins/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-login.php
The "User-agent: *" means this section applies to all robots. The "Disallow: /wp-admin/" tells the robot that it should not visit any pages in the "wp-admin" directory.
Remember these two important considerations when using /robots.txt:
- robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
- the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
VERY IMPORTANT: Don't try to use a robots.txt to hide information.
Creating a robots.txt file:
The "robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~johndoe/
In this example, three directories are excluded.
Note that you need a separate "Disallow" line for every URL prefix you want to exclude — you cannot say "Disallow: /cgi-bin/ /tmp/" on a single line. Also, do not leave any blank lines in a record, as they are used to delimit multiple records.
Globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: *bot*", "Disallow: /tmp/*" or "Disallow: *.gif".
For a basic WordPress installation, the contents of your .robots.txt file should look like this:
User-agent: *
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Disallow: /wp-content/plugins/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-login.php
This configuration may change with future WordPress upgrades, but as of version 3.0.4, this will suffice. Below is a download link to a robots.txt file you may use for your WordPress installation.
What you want to exclude depends on your server. Everything not explicitly disallowed is considered fair game to retrieve. Visit www.robotstxt.org more details and examples.
Having created your robots.txt file, you'll need a place to put it. Where does it go?
The short answer: in the top-level directory of your web server.
The longer answer: When a bot looks for the "/robots.txt" file for URL, it strips the path component from the URL (everything from the first single slash), and puts "/robots.txt" in its place.
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site's main "index.html" home page. Where exactly that is, and how to put the file there, depends on your web server software.
Remember to use all lower case for the filename: "robots.txt", not "Robots.TXT.
Having just learned about robots.txt files, you can build your own custom file for your Web site. If you are a Wordress user, you may download our optimized robots.txt file to use with any standard WordPress installation.
Need Hosting? Try one of our Affordable and Reliable Hosting Plans



Thanks for this article, I’ve been trying to block any pages with low quality content from robots so Google wont rank me lower for them. I use wordpress for my sites and just wondering how I would do this? I’m using the same robot data as you have.
If you would like to exclude certain pages from Google, using the robots.txt file, you must include each page to your robots.txt file as follows: Disallow: /path to page here/pagename/.
For example, to exclude the following post, http://www.sitedoodle.com/2011/01/18/robots-txt-optimized-wordpress/, the robots.txt entry would be:
Disallow: /2011/01/18/robots-txt-optimized-wordpress/
Make sure to place each page or post on its own separate line in the robots.txt file.