Posts Tagged ‘website monitoring’

Google Webmaster Tools Diagnostics: Web Crawl

Yesterday I wrote my first post on Google Webmaster Tools which aimed to give an introduction about webmaster tools. Today I’ll go over the Diagnostics section which gives details of results of web crawl and mobile crawl and also provide content analysis of your web site.

Let me start with Web Crawl. Here is a screen shot of Web Crawl Diagnostics page:

Google Webmaster Tools Web Crawl

Google Webmaster Tools Web Crawl

In this page seven sub-section are listed. Which are:

  1. Errors for URLs in Sitemaps
  2. HTTP errors
  3. Not found
  4. URLs not followed
  5. URLs restricted by robots.txt
  6. URLs timed out
  7. Unreachable URLs

I’ve already explained what these sections cover in my previous post about Google Webmaster Tools. Web Crawl Diagnostics page gives details of the errors encountered. It provides the problematic URL as well as the exact reason of problem (HTTP404, HTTP500, robots.txt?). Normally we can group these problems in to two sets. Problems in one set occurs when bots unable to reach the content (your server is down, or you’ve misconfigured your site, or you’re blocking Google bot via robots.txt). The other set is about problematic URLs. Some reasons can be:

  • Some external web pages links to some non-existent content in your site: A URL pointing to a non-existing article in your blog.
  • Some pages in your site links to non-existent content else where in your site: You give links to a previous post, later you decided to delete that post. And the links you’ve provided remains broken.
  • Content is there but URL is broken: For example you list items in your web site by their titles (i.e. web page of “my item” is “http://www.example.com/items/my+item”). And you forgot to handle a special case where titles contains a slash: Web page of “My Item / Item Color” will be “http://www.example.com/items/my+item/item+color” which will definitely be a problematic URL. You actually have an item in your database but you’re unable to link to it.

This list can be extended.

To reduce the problems in the first set there’s not much work to do: You have to check your server status. If it was down for a time, next time Google bot visits number of problems will reduce. If the problems are robots.txt related, you have to recheck your rules.

Second set is complicated a bit. Because there can be too many reasons. The page pointing to your site (providing wrong URL to Google Bot) will definitely be of great use. Until today there were no way of knowing this. Fortunately, today a new post on Google Webmaster Central Blog announced a new feature of Google Webmaster Tools, which is exactly what we are looking for. In each subsection there is a column named “Linked From” under which the pages linking to that problematic URL is listed. Using this information it’ll be much easier to track the problem.

That’s it for now. I’ll go over the content analysis next time.