What is a robots.txt file and why it may be important to you? The simple answer, keep spiders out of places you don’t want indexed. Note the password protected content from colleges getting out.
How this affects big business? Ever wonder why your large site has a Google Page Rank of 0 on most of its pages?
Most large sites use 404 error handling logic to display a “pretty” not found error page. The problem with this, most of these pages respond with a 200 OK message when the file is sent back to Google. Google opens the file and vomits on the data it sees. In some cases I have seen several sites not get any pages indexed just from this issue.
The easy way to handle issues like this is just put a blank robots.txt file in place. A blank file is easy for the engine to understand.
Recent Comments