Bing

Announcing crawler improvements for Live Search

February 12, 2008, 08:00 AM by Webmaster Center team | 82 Comments |

Today we're pleased to announce several improvements in the crawler for Live Search that should significantly improve the efficiency with which we crawl and index your web sites. We are always looking for ways to help webmasters, and we hope these features take us a few more steps in the right direction.

  • HTTP Compression: HTTP compression allows faster transmission time by compressing static files and application responses, reducing network load between your servers and our crawler. We support the most common compression methods: gzip and deflate as defined by RFC 2616 (see sections 14.11 and 14.39). Compression is currently supported by all major browsers and search engines. Use this online tool to check your server for HTTP compression support.

    The following links provide configuration information for IIS, and Apache.

  • Conditional Get: We support conditional get as defined by RFC 2616 (Section 14.25), generally we will not download the page unless it has changed since the last time we crawled it. As per the standard, our crawler will include the "If-Modified-Since" header & time of last download in the GET request and when available, our crawler will include the "If-None-Match" header and the ETag value in the GET request. If the content hasn't changed the web server will respond with a 304 HTTP response.

    To check if your site already supports the "If-Modified-Since" HTTP header, you can use this online tool to check your server for HTTP Conditional Get support. Alternatively, you can check using Fiddler for Internet Explorer, or Live Headers for Firefox. Each of these tools allows you to create a custom GET request and send it to your server. You'll want to make sure that your request includes the "If-Modified-Since" header like the following simplified sample:

    GET /sa/3_12_0_163076/webmaster/webmaster_layout.css HTTP/1.1
    Host: webmaster.live.com
    If-Modified-Since: Tue, 22 Jan 2008 01:28:49 GMT

    You should receive a server response similar to the following simplified sample:

    HTTP/1.x 304 Not Modified

    Check out MSDN for more information on using Fiddler for performance tuning.

    If you have not yet configured conditional get on your site, we would strongly encourage you to do so, as it can significantly help reduce server load as most browsers and crawlers already support this feature (e.g. IIS, Apache).

In addition to these two features there are many more improvements in performance that should help further optimize our crawling. As a result, we've also upgraded our user agent to reflect the changes, it is now "msnbot/1.1". If you think you are experiencing any issues with MSNbot, or have any questions about the updates, please use our Crawler Feedback & Discussion form.

-- Fabrice Canel, Live Search Crawling Team

Filed under:
subscribe

Comments

Anonymous

Posted On February 13, 2008, 08:39 AM

I am curious how this new design affects the data usage. Are there numbers known what the reduction is in percentages?


Anonymous

Posted On February 13, 2008, 12:01 PM

How can the Conditional Get be done when your site is delivered dynamically by Apache/PHP using mod-rewrite functions for creating static URLs? From everything that I can tell, each page is new at the time of request. Is this the intended result that the new MSNbot 1.1 is looking for?


Anonymous

Posted On February 13, 2008, 12:54 PM

These improvements arrive a little bit to late but at least they've finally arrived. I wonder when true competition in the search market will arrive too! For instance, I again ask you guys at Live Search: when are you (for "you" I mean G, Y and M) going  to finally deliver a decent image search tool? Image search is a hundred steps behind text search in any engine. The one who delivers image-based (instead of text-based) image search will most certainly change the market rules... But I don't see any SEO company or Search Engine Blog mention image  (and video, and music, and...) as of strategic importance whatsoever.


Anonymous

Posted On February 14, 2008, 04:42 AM

thank you very much for this article


Anonymous

Posted On March 26, 2008, 02:59 PM

Google and Orbitz both use gzip compression to deliver compressed versions of their pages to HTTP 1.1-compliant browsers.  Google.com has been compressing for a long time.  This improvement comes to late for Live Search! BTW, what's the compression rate? Google's typical savings on compressed text files range from 60% to 85%, depending on how redundant the code is.


Anonymous

Posted On May 16, 2008, 12:09 PM

Normally how long does a site or a newly website submitted will be indexed via the MSNBOT crawler? It might be an interesting topics which i wish to discuss on my website.

Thanks

David Cheong.


Anonymous

Posted On May 23, 2008, 06:12 AM

Thank you for your improvements!


Anonymous

Posted On June 03, 2008, 11:54 PM

Thanks so much for this! This is exactly what I was looking for.


Anonymous

Posted On June 04, 2008, 04:20 AM

Great job,

But don't get too comfortable now, there is still a lot that has to be improved.


Anonymous

Posted On June 13, 2008, 03:06 AM

Thanks for your guide in Conditional Get!

Regards,


Anonymous

Posted On June 15, 2008, 11:02 AM

MSNBOT crawler is currently supported gzip! Thank you for your improvements!


Ad Manager

Posted On July 25, 2008, 01:53 PM

I find that our site has a lot of page crawled by MSNBot and shows as having high ranks. However, actual searches does not show them. What could be the problem?


sajjad

Posted On August 07, 2008, 03:16 PM

The latest Crawler Improvements for Live Search move is a welcome move and should offer a lot of utility and ease to the web master in their endeavour to improve their website performance on both counts

First of improving the quality of their site thereby offering convinience and quality to the visitor.

Second To improve the content of their web site by increasing feedback that they receive from Live.com


3dsmax-stuff.blogspot.com

Posted On August 07, 2008, 05:17 PM

Thanks for your guide.

Regard for you! :)


alvanp

Posted On August 08, 2008, 01:01 AM

thank you very much for this article


Write a Comment

This information is published in the Community – it’s public.
 

Save Comment Remember Me?

Welcome

to the Microsoft Bing community

Remember, don't post your personal information!