A Deeper Look At Robots.txt
Robots.txt syntax
* User-Agent: the robot the following rule applies to (e.g. “Googlebot,” etc.)
* Disallow: the pages you want to block the bots from accessing (as many disallow lines as needed)
* Noindex: the pages you want a search engine to block AND not index (or de-index if previously indexed). Unofficially supported by Google; unsupported by Yahoo and Live Search.
* Each User-Agent/Disallow group should be separated by a blank line; however no blank lines should exist within a group (between the User-agent line and the last Disallow).
* The hash symbol (#) may be used for comments within a robots.txt file, where everything after # on that line will be ignored. May be used either for whole lines or end of lines.
* Directories and filenames are case-sensitive: “private”, “Private”, and “PRIVATE” are all uniquely different to search engines.
Let’s look at an example robots.txt file. The example below includes:
* The robot called “Googlebot” has nothing disallowed and may go anywhere
* The entire site is closed off to the robot called “msnbot”;
* All robots (other than Googlebot) should not visit the /tmp/ directory or directories or files called /logs, as explained with comments, e.g., tmp.htm, /logs or logs.php.
No Comments on "A Deeper Look At Robots.txt"