Friday, September 12, 2008

Robots.txt file

When optimizing a Web site, remember to utilize the power of the robots.txt file.

Your robots.txt should be a list of files and pages within your web site that you want the search engines to find and which ones you don’t want them to find (Disallow), such as confidential pages, or product download pages.

You must also be aware that your robots.txt file is a public file and anyone can access it once you publish it. So to protect pages and files you don’t want accessed, I recommend you place them in a separate folder, and then you can place the disallow command in your robots.txt file, ie :

Disallow: /cgi-bin/

You need a separate "Disallow" line for every URL prefix you want to exclude.

If every page on your website is to be visited, you can list each one a separate line (recommended), or you can easily make a single two line command:

User-agent: *
Disallow: (put nothing here)


You need to put the robots.txt file in your main directory, the same one where you have your home page, in order for it to be located.

If you use web analytics or traffic reporting programs, these programs will tell you how many times your robots.txt file was accessed.

This is just one more piece of information that helps to keep you website searchable.

A free tool to create a robots.txt file is available at:
http://www.xml-sitemaps.com

I do recommend that after creating the file, you manually check it for accuracy before publishing it with your website.

Wishing you awesome and continuing success


No comments: