Wednesday, September 06, 2006

SEO test for your pages using wget

The post by Matt Cutts today about crawl dates in the Google cache reminded me about how I test pages for a linux command line. The command line program wget: Which is usually pre-loaded in most variations of linux, except for Apple’s OSx.

Wget can be used in several ways to test your web server for problems, particularly with dynamic sites that have load balancers.

1). Simple test, what is my web server doing when Google stops by?

Command: wget –user-agent=googlebot http://www.aaronshear.com


Operation –

bash-3.00$ wget –user-agent=googlebot http://www.aaronshear.com
--09:37:56-- http://%E2%80%93user-agent=googlebot/
=> `index.html.41'
Resolving \342\200\223user-agent=googlebot... failed: Name or service not known.
--09:37:56-- http://www.aaronshear.com/
=> `index.html.41'
Resolving www.aaronshear.com... 68.178.211.42
Connecting to www.aaronshear.com|68.178.211.42|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.aaronshear.com/blog/ [following]
--09:37:57-- http://www.aaronshear.com/blog/
=> `index.html.41'
Connecting to www.aaronshear.com|68.178.211.42|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ <=> ] 37,426 156.90K/s

09:37:57 (156.52 KB/s) - `index.html.41' saved [37426]


FINISHED --09:37:57--
Downloaded: 37,426 bytes in 1 files


You can see with this command that I have asked my web server to forward any requests for / the root of my home page to /blog.

I also wanted to show you a simple way to see a load balancer in action, with a very popular site. Instead I can identify a variation of cloaking. Now cloaking is in no way shape or form always spam! This example only shows that this web server is looking for Googlebot and doing something with it.

bash-3.00$ wget –user-agent=googlebot http://www.buy.com
--09:39:40-- http://%E2%80%93user-agent=googlebot/
=> `index.html.42'
Resolving \342\200\223user-agent=googlebot... failed: Name or service not known.
--09:39:40-- http://www.buy.com/
=> `index.html.42'
Resolving www.buy.com... 80.67.74.11, 80.67.74.233
Connecting to www.buy.com|80.67.74.11|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ <=> ] 123,224 --.--K/s

09:39:40 (7.67 MB/s) - `index.html.42' saved [123224]


FINISHED --09:39:40--
Downloaded: 123,224 bytes in 1 files

If I ran this using the stock user agent of wget, I get the following.

bash-3.00$ wget http://www.buy.com
--09:42:56-- http://www.buy.com/
=> `index.html.44'
Resolving www.buy.com... 80.67.74.233, 80.67.74.11
Connecting to www.buy.com|80.67.74.233|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

[ <=> ] 123,224 --.--K/s

09:42:56 (3.13 MB/s) - `index.html.44' saved [123224]

Labels:

0 Comments:

Post a Comment

<< Home

My Photo
Name: Aaron Shear
Location: San Francisco, California, United States

I have been in the search industry since the late 90’s, no not 10-20 years. My career started early in the search Day’s at Inktomi, where I supported large search portals. For example, MSN, AOL, iWon, Hotbot, CNet too name a few. After Inktomi I became a freelance consultant. I consulted for a few of the Top SEO’s around 2002 time frame; obviously the market has changed since then. After consulting I joined a small SEO firm called SEO Inc as the CTO. At SEO Inc. I successfully optimized some of the largest clients including IGN, Sony, VEGAS.com, Beaches and Sandals Resorts to name a few. Even though SEO Inc was a ton of fun, I still wanted the ultimate SEO challenge. I moved on as the global head of SEO for Shopping.com an eBay company. This challenge was an interesting one, how do I optimize a site with 50 million products? Every month I helped the business grow by leaps and bounds. I am now consulting for mostly enterprise e-commerce clients. Yes there is more too me than this profile shows, but you will just have to ask.

Powered by Blogger

Add to Google Subscribe in Bloglines