Posts Tagged ‘web master tool’
Google Webmaster Tools Diagnostics: Web Crawl
October 14th, 2008 • seo
Tags: google webmaster tools, keyword tool, learn seo, link exchange, live webmaster tools, robots txt, search engine optimization, seo, seo tool, seo tools, sitemap, web master tool, web master tools, webmaster, webmaster central, webmaster resource, webmaster tool, webmasters tools, website monitoring, website optimization, website tools
Yesterday I wrote my first post on Google Webmaster Tools which aimed to give an introduction about webmaster tools. Today I’ll go over the Diagnostics section which gives details of results of web crawl and mobile crawl and also provide content analysis of your web site.
Let me start with Web Crawl. Here is a screen shot of Web Crawl Diagnostics page:
In this page seven sub-section are listed. Which are:
- Errors for URLs in Sitemaps
- HTTP errors
- Not found
- URLs not followed
- URLs restricted by robots.txt
- URLs timed out
- Unreachable URLs
I’ve already explained what these sections cover in my previous post about Google Webmaster Tools. Web Crawl Diagnostics page gives details of the errors encountered. It provides the problematic URL as well as the exact reason of problem (HTTP404, HTTP500, robots.txt?). Normally we can group these problems in to two sets. Problems in one set occurs when bots unable to reach the content (your server is down, or you’ve misconfigured your site, or you’re blocking Google bot via robots.txt). The other set is about problematic URLs. Some reasons can be:
- Some external web pages links to some non-existent content in your site: A URL pointing to a non-existing article in your blog.
- Some pages in your site links to non-existent content else where in your site: You give links to a previous post, later you decided to delete that post. And the links you’ve provided remains broken.
- Content is there but URL is broken: For example you list items in your web site by their titles (i.e. web page of “my item” is “http://www.example.com/items/my+item”). And you forgot to handle a special case where titles contains a slash: Web page of “My Item / Item Color” will be “http://www.example.com/items/my+item/item+color” which will definitely be a problematic URL. You actually have an item in your database but you’re unable to link to it.
This list can be extended.
To reduce the problems in the first set there’s not much work to do: You have to check your server status. If it was down for a time, next time Google bot visits number of problems will reduce. If the problems are robots.txt related, you have to recheck your rules.
Second set is complicated a bit. Because there can be too many reasons. The page pointing to your site (providing wrong URL to Google Bot) will definitely be of great use. Until today there were no way of knowing this. Fortunately, today a new post on Google Webmaster Central Blog announced a new feature of Google Webmaster Tools, which is exactly what we are looking for. In each subsection there is a column named “Linked From” under which the pages linking to that problematic URL is listed. Using this information it’ll be much easier to track the problem.
That’s it for now. I’ll go over the content analysis next time.
Google Webmaster Tools - A Starters Guide
October 13th, 2008 • 2 comments seo
Tags: google webmaster tools, keyword tool, live webmaster tools, robots txt, search engine, search engine optimization, seo, seo tool, seo tools, sitemap, sitemap generator, submit site, submit url, verify a site, web master tool, web master tools, webmaster central, webmaster resource, webmaster tool, webmasters tool, webmasters tools
Hi,
Today I’ll start a series of posts about Google Webmaster Tools. This posts will be tutorial-alike and I’m planning to give basic elements of tools. If you’re already familiar with these tools, don’t waste your time. You can read one of my previous posts about:
or you can go somewhere else.
Anyway, let’s start. First of all, you need a Google acount for this. After you logged in to webmaster account you’ll see the dashboard. Dashboard will look like this:
All of your sites will be listed in dashboard. I’ve marked three point on the screenshot. First one is the “Messages”. This is the link to the message center. From time to time you will get notices about some of your sites. These notices will be listed in your message center. And sometimes message center will be unavailable. And I don’t know why. Don’t ask. So what notices can you get? Things like “crawl rate changed”, reconsideration requests and some crawl problems.
- I’ll explain this “crawl rate” thing later.
- Reconsideration requests are seems to be important but truly I haven’t got any results from a reconsideration request yet. Normally you can ask Google to reconsider your site which is banned (Google thinks that your site is spammy? dangerous? ) After a reconsideration request, all you have to do is wait for a return. I’m waiting for more than a year and when I got a return I’ll make you learn.
- Crawl problems are the most important ones, as you may guess. You have to follow these notices and try to resolve them immediately.
The second mark is the “add site” form. Using this form you can add your site to your webmaster tools account. Of course you’ll need to verify your site. After you added a site, it’ll appear in the third area marked. Later you can use these links to directly go to that sites reports. So if you haven’t added a site yet, just type your url and hit the “Add Site” button. Now you got a site listed in dashboard and a cross under the “Verified” section. So let’s click it and verify your site. Verification can be done in two ways:
- Using a meta tag: You need to add the provided meta tag to your index page. This means that this meta tag should be accessible from your home page (http://www.example.com). So all you have to do is just copy the line and paste to the header of your site (between <head> and </head> tags)
- Using a html file: You need to create an empty file named exactly as Google says. So if it provides google0e42cde8782c894c.html you have to create that file in the top level of your web root. And it must be accessible as http://www.example.com/google0e42cde8782c894c.html.
After you choose one of the two ways just hit the “Verify” button and Google will handle the rest. After verification completed you can go the “Overview” section. This is the starting point when you next click your site from dashboard. Here is a screenshot of it:
Lets start from the top. “Home page crawl” section gives the time of last crawl of your homepage. If your site is new, it’ll take time to see something on this section. In order to get indexed as quick as possible you can follow my way. It’ll be good for you to keep these crawl times. Later you’ll be able to see how often Google bot visits your site — and of course changes in frequency. “Index status” will give an overview of your site’s index status. Either some of your pages are included in index or not. And either some of your pages from your sitemap are included or not. You can find details of inclusion in other sections. For now let’s skip it.
Below we got an important section: “Web crawl errors”. Let’s go over them:
- Errors for URLs in Sitemaps: This gives the number of erroneous URLs listed in your sitemap. If your sitemap is auto-generated (output of a plugin etc.) most probably the url strucure will be correct. So the errors will be due to server downtime or something like that. You have to view the “Details” and inspect the errors. If urls are broken you should remove them from your sitemap. It’s really a bad idea to provide broken links in your sitemap. After correcting the problems these errors will be gone during next crawl.
- HTTP errors: This section contains urls that give an HTTP error (401, 404, 407 etc.): “Article not found”, “Item not found” etc. First of all you have to think about the reason of existance of this url. How Google bot was able to react that url? Who gave a broken link? May be you have changed your url structure lately and created some broken links?
- Not found: Again broken links. (HTTP 404)
- URLs not followed: Mostly you got errors due to redirects. You should always be careful with redirects.
- URLs restricted by robots.txt: I’ll go over the robots.txt later. If you don’t know what robots.txt is and some urls are listed in this section than there is a problem. You can use robots.txt file to protect some of your urls to not to get indexed. So if this list contains a url that you want to get indexed than inspect your robots.txt.
- URLs timed out: This section is also important. If Google bot encountered a time out probably there is an issue with your web server. Or your HTML is too large?
- Unreachable URLs: Get rid of these urls or make them reachable.
I guess this enough for this post. I’ll continue later. See you.


