6 Steps to Optimise Your Site For Crawling And Indexing

August 12th, 2009 | by BillEgan |

The web is becoming infinite, Google recently hit a milestone  of 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!dreamstime_9049630

The Internet is a big place; Think about the practicality of Google crawling and indexing that?

In fact they don’t. If you have a large website the Googlebot may be missing out on large sections of your website designated as “not useful” by Google.

Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections. So multiple times every day, we do the computational equivalent of fully exploring every intersection of every road in the United States. Except it’d be a map about 50,000 times as big as the U.S., with 50,000 times as many roads and intersections.

So if Googlebot is only able to find and crawl a percentage of that content and of the content crawled, it is only practical to index a portion.

Then the questions arise – how much of my site will be indexed?, How confident can I be that my site will be indexed fully? AND WHAT CAN I DO ABOUT IT?

Google’s Webmaster Tools blog has just published a very useful presentation, which provides advice on getting your pages crawled and indexed by the search engine.

The Measures Advised include 6 Key Steps as Summarised Below:

1. Remove User-Specific Details From Urls.

If you have URL parameters that don’t change the content of the page-like session IDs or sort order – these can be removed from the URL and put into a cookie.

2. How to Optimise Dynamic URL’s

You will recognise Dynamic URLs, from the fact that they contain a question mark.

Search engines have problems creating links to dynamic content. So if you can recognise these problems, you are halfway to getting your dynamic content indexed. Where practical, use static URLs to reference dynamic content. Otherwise, try to ensure your dynamic URL is linked to by content referenced by static URLs. Finally consider using paid-inclusion programs.

Jill Whalen’s blog post goes into more detail on this subject

3. How to Rein In Infinite Spaces

Do you have a calendar that links to an infinite number of past or future dates?

If so, you have an infinite crawl space on your website, and crawlers could be wasting their (and your!) bandwidth trying to crawl it all.

4. How to Disallow Actions Googlebot Can’t Perform

You can disallow crawling of shopping carts, landing pages, contact forms, and other pages containing calls to action that a crawler can’t perform.

5. How to Avoid Content Duplication

Google tries hard to index and show pages with distinct information. This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, Google will choose one version to list. If Google perceives that duplicate content may be shown with intent to manipulate rankings and deceive users, they may also make appropriate adjustments in the indexing and ranking of the sites involved.

Here are some tips from Google on how be pro-active about this.

6. How To Get Your Preferred Urls Indexed

  • Set your preferred domain in Google’s Webmaster Tools (www.example.com vs. example.com)
  • Put canonical URLs in your Sitemap
  • Use the new rel=”canonical” on any duplicate URLs

Example:  <link rel=”canonical” href=”http://www.example.com/skates/riedell/”/>

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • StumbleUpon
  • Reddit

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

  1. 5 Responses to “6 Steps to Optimise Your Site For Crawling And Indexing”

  2. By Yutik on Aug 18, 2009 | Reply

    This article is very useful and explains every step very enough, thank you for this.
    Can you give some advice what shall be done next after optimization to increase the number of visitors?

  3. By lee on Sep 11, 2009 | Reply

    hi i have been waiting for google and bing to index my blog http://www.diyanswerdirect.com/blog for two weeks now and am getting very frustrated with them. aswell as iam still waiting for them to update my site http://www.diyanswerdirect.com. My question is when do they update and how can i get them to crawl or spider my site.

  1. 3 Trackback(s)

  2. Aug 12, 2009: » 6 Steps to Optimise Your Site For Crawling And Indexing – SEO …
  3. Aug 15, 2009: Twitted by ryanenglish
  4. Aug 15, 2009: Twitted by nat014

Post a Comment


WHAT WE DO
  |  FEATURES  |  WHY SEO WORKBENCH?  |
 FAQS

Software Tools | Internet Marketing Plan | SEO Tools | Internet Marketing Promotion | SEO Tutorial | Website Marketing Strategy

© 2012 Interleado | Privacy Policy | Terms of Service | Limited Company Registered in Ireland | Reg No. 432557 | Website by ...Dotwebs