A Deeper Look At Robots.txt

admin » 20 April 2009 » In HTML Stuff »

Robots.txt syntax

* User-Agent: the robot the following rule applies to (e.g. “Googlebot,” etc.)

* Disallow: the pages you want to block the bots from accessing (as many disallow lines as needed)

* Noindex: the pages you want a search engine to block AND not index (or de-index if previously indexed). Unofficially supported by Google; unsupported by Yahoo and Live Search.

* Each User-Agent/Disallow group should be separated by a blank line; however no blank lines should exist within a group (between the User-agent line and the last Disallow).

* The hash symbol (#) may be used for comments within a robots.txt file, where everything after # on that line will be ignored. May be used either for whole lines or end of lines.

* Directories and filenames are case-sensitive: “private”, “Private”, and “PRIVATE” are all uniquely different to search engines.

Let’s look at an example robots.txt file. The example below includes:

* The robot called “Googlebot” has nothing disallowed and may go anywhere

* The entire site is closed off to the robot called “msnbot”;

* All robots (other than Googlebot) should not visit the /tmp/ directory or directories or files called /logs, as explained with comments, e.g., tmp.htm, /logs or logs.php.

via A Deeper Look At Robots.txt.

Trackback URL

Comments are closed.