Category Archives: robots.txt

Robots.txt round 2

Ok, I need to share this story about robots.txt files and how they can kill your site. I keep reading posts from people on how robots.txt files are useless and will not help you. My god, if you accidentally put up the wrong file it could kill your business. Without a doubt put up a robots.txt file and make sure that it is clean and does exactly what you want. Then go to the Google Site Maps tool and validate it, if you ignore it, it could cost you an unlimited amount of money.

I remember meeting someone from a very large wholesale trading site at the SES NYC show a few years back. He asked me why their site did not show up in Google and I immediately noticed that it did not have any PR. All pages where PR 0, a huge clue that something dramatic was at play and killing them. Instead of charging them a fortune to figure this out I was able to quickly determine that not only did they not have a robots.txt file, but they had an error handler spitting out a pretty error page with a 200 OK as the header. This is a BAD IDEA! You cannot provide a search engine a bunch of junk when it is looking for a simple text file. Low and behold they ranked all across the board without any optimization.

Moral of this story, put up a robots.txt file! Even if you have a blank file, put it up! It’s better than nothing.

Robots.txt

What is a robots.txt file and why it may be important to you? The simple answer, keep spiders out of places you don’t want indexed. Note the password protected content from colleges getting out.

How this affects big business? Ever wonder why your large site has a Google Page Rank of 0 on most of its pages?

Most large sites use 404 error handling logic to display a “pretty” not found error page. The problem with this, most of these pages respond with a 200 OK message when the file is sent back to Google. Google opens the file and vomits on the data it sees. In some cases I have seen several sites not get any pages indexed just from this issue.

The easy way to handle issues like this is just put a blank robots.txt file in place. A blank file is easy for the engine to understand.