A Deeper Look At Robots.txt

admin » 20 April 2009 » In HTML Stuff »

Robots.txt syntax

* User-Agent: the robot the following rule applies to (e.g. “Googlebot,” etc.)

* Disallow: the pages you want to block the bots from accessing (as many disallow lines as needed)

* Noindex: the pages you want a search engine to block AND not index (or de-index if previously indexed). Unofficially supported by Google; unsupported by Yahoo and Live Search.

* Each User-Agent/Disallow group should be separated by a blank line; however no blank lines should exist within a group (between the User-agent line and the last Disallow).

* The hash symbol (#) may be used for comments within a robots.txt file, where everything after # on that line will be ignored. May be used either for whole lines or end of lines.

* Directories and filenames are case-sensitive: “private”, “Private”, and “PRIVATE” are all uniquely different to search engines.

Let’s look at an example robots.txt file. The example below includes:

* The robot called “Googlebot” has nothing disallowed and may go anywhere

* The entire site is closed off to the robot called “msnbot”;

* All robots (other than Googlebot) should not visit the /tmp/ directory or directories or files called /logs, as explained with comments, e.g., tmp.htm, /logs or logs.php.

via A Deeper Look At Robots.txt.

Trackback URL

No Comments on "A Deeper Look At Robots.txt"

Hi Stranger, leave a comment:

ALLOWED XHTML TAGS:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="">

*
Laat zien dat je geen robot bent, voer deze code in
Anti-Spam Image

Subscribe to Comments