Bing

Optimizing your very large site for search — Part 2

January 26, 2009, 05:57 PM by Webmaster Center team | 28 Comments |

For the large website, there are many critically important issues in optimizing for search. In Part 1 of this series of posts, we discussed the importance of reducing the number of URLs you expose through canonicalization. But there are other ways to reduce the surface area of your site to search engines and focus on pages that matter.

While you may have reduced the number of URLs you exposed to Live Search, a large site can still have a large surface area to crawl. In crawling your site, search engines may not get all the best content or can eat unnecessary bandwidth that you pay for. This is where HTTP compression and conditional GET can help.

Enabling HTTP compression

Whether or not you are concerned with bandwidth control, setting up HTTP compression is a best practice for every site owner. What is HTTP compression? HTTP compression is a protocol that is a part of the HTTP 1.1 specification standard known as "content-encoding." This protocol defines how a web server can check, when it receives a request for a file, if the client browser (or crawler) is "compression enabled" before serving the file to the client.

Most people are familiar with the ZIP file format of data compression where files are added to a ZIP archive and then extracted as needed. This is not how HTTP compression works. HTTP compression is used by web servers to passively compress document files in real time as they are being transferred to the browser. The browser is able to decompress and display the file as intended.

Not all files are created equal

Certain file types are not suitable for HTTP compression. For example, files that have already been compressed, such as JPEGs, GIFs, movies, or standard compressed files (e.g. ZIP, gzip, and .RAR) are not going to compress further with HTTP compression turned on. However, sites that have a lot of plain text content, including the main HTML files, XML, CSS, and RSS, will benefit from HTTP compression. For example, most standard HTML text files will compress by about a half, sometimes more.

Setting up Apache

If your site is running Apache, you can leverage the mod_deflate tool, which will add a filter to compress the content as a gzip file. You can apply these filters site-wide or by selectively compressing only specific MIME types, determined by examining the header generated, either automatically by Apache or a CGI script or some other dynamic programming you create.

To enable compression for all MIME types, set the SetOutputFilter directive to a website or directory:

<Directory "/web/mysite/php/"> 
    SetOutputFilter Deflate 
</Directory>

To enable compression on a specific MIME type (as in this example, “text/html”), use the AddOutputFilterByType directive:

AddOutputFilterByType DEFLATE text/html 

Every site is different. If you need to support older browsers, you may need to have more advanced configurations. You can read more in the mod_deflate documentation.

Setting up IIS 7

Fortunately for most site owners, IIS 7 has HTTP compression for static files enabled by default. However, if you want to compress all files, you have to manually turn on Dynamic Compression.

You can do this by going to the IIS Services panel and double-clicking Compression.

  IIS7-1

You'll notice Enable static content compression is selected by default. To enable Dynamic Compression, simply select the Enable dynamic content compression option.

IIS7-2

Setting up IIS 6

IIS 6 also includes a native compression system and can be configured to compress both static and dynamic content. To enable HTTP compression in IIS 6, all you have to do is open the website's Properties page and edit the global properties for the site. Under the Service tab, you can configure the options within the HTTP compression section.

 IIS6-1

Both versions of IIS also cache the compressed information in a directory, which helps improve the performance by eliminating the need to re-compress files on the fly. As with Apache, IIS does let you select MIME types to compress. TechNet has more information about selective compression in IIS.

Did you content change since last time we visited it?

As with HTTP compression, the official HTTP 1.1 specification allows you to define when a document was last updated. When Live Search crawls your site, we ask if each document has changed since we last looked at it. If so, then give us the latest version. Otherwise, if it is unchanged, just let us know and give us nothing. This mechanism is referred to as conditional GET, and by implementing it, you can save yourself bandwidth and us the cost of comparing files we already have in the index.

Additionally, it allows us to spend our crawl time looking at files that we may not have previously indexed, which could improve your coverage over time. The following chart demonstrates the potential gain in coverage with conditional GET versus a site without conditional GET.

 Chart

Implementing the conditional statements

There are a lot of factors to consider when implementing conditional GET, depending on the web server, programming language, or content management system used. Fortunately, both IIS and Apache have native support for Last-Modified / If-Modified-Since / Not-Modified functionality for static files. For dynamic files, you may need to implement a code-based solution. A good pattern for this and an equally good description of how the crawler responds to conditional GET can be found in the article, “Save bandwidth costs: Dynamic pages can support If-Modified-Since too.“

Testing your setup

Once you have determined the best course for implementing your HTTP compression and conditional GET strategies, you can ensure that your implementation is working with our HTTP Compression and HTTP Conditional Get test tool. Using your robots.txt file, you can test to ensure your configuration is correct and will work with Live Search.

Coming up next

Now that your pages are compressed and you are telling us when your content is new, we will move on to discussing how to avoid hiding the content you want us to find. As always, if you have additional questions, feel free to ask in our forums.

Jeremiah Andrick -- Program Manager, Live Search Webmaster

subscribe

Comments

Ian M

Posted On January 29, 2009, 05:11 AM

Could you please provide that graph under a permissive license such as Creative Commons, so that we can use it elsewhere to promote site owners implementing this?


smokie

Posted On February 16, 2009, 03:16 AM

If the website is mainly build base on flash contents then the HTTP compression doesn't have any significant reduction in size right?


Webmaster Center team

Posted On February 19, 2009, 05:59 PM

@all Thanks for the comments

@smokie .swf files are already compressed and HTTP compression will offer no additional value.   But there are other issues with flash. Be sure to check out Part 3 in the series.

Jeremiah Andrick


Raheel

Posted On February 25, 2009, 02:03 AM

How to enable Enabling HTTP compression for web site build in Yahoo Stores.


BobsBlitz

Posted On March 11, 2009, 08:38 AM

OK, good job with the info.

B


Alok Tiwari

Posted On March 12, 2009, 03:45 PM

Whats Your Comment on Shared Hosting Environment. (Approx 92% uses)

This Said Article will not work their. as of no control over.

Could you guys provide any info over the subject.

--

Alok Tiwari

India


Rocky

Posted On March 17, 2009, 01:33 PM

I would like to ask you a question. How the site optimization is different for live search engines and Google search engines.

Don't the search engines will take care of our website automatically, if this is good?


Alex

Posted On March 19, 2009, 08:58 PM

Do you have instructions for the Cpanel?


Template Library

Posted On March 30, 2009, 03:20 AM

How can my website get rank in MSN?


Diolt

Posted On April 01, 2009, 02:34 AM

MSN rank does not evaluate as much as Google does! forget msn ranking rush to get google ranking automatically other search engine ranking will extend.

Best SEO practice at Google webmasters!


Quality Directory

Posted On June 07, 2009, 11:36 PM

I'm very concerned about bandwidth control, and I've set up HTTP compression.


markamoment

Posted On June 19, 2009, 12:59 AM

How to enable Enabling HTTP compression for web site build on FREEWebs?


James

Posted On June 23, 2009, 12:17 PM

This is so useful information

Many thanks


LeonTheGreat

Posted On June 24, 2009, 11:28 AM

What would you do for a dynamic website that uses ASP.NET? I don't have direct access to the server configuration either - is there a way to set this up on the client-side?


angelvoyagera

Posted On July 10, 2009, 07:59 AM

HTTP Conditional Get not enabled (Live.com)

HTTP status code: 304 Not Modified

HTTP conditional GET: enabled

URL generates an ETag value: 4a2770f0-697e-530a1c00

<b>HTTP compression: not enabled (HTTP compression can be enabled for non-304 URLs)</b>

HTTP headers:

   Connection: close

   Vary: Accept-Encoding

   Cache-Control: max-age=86254, public

   Date: Fri, 10 Jul 2009 14:51:37 GMT

   Expires: Sat, 11 Jul 2009 14:49:12 GMT

   ETag: 4a2770f0-697e-530a1c00

   Server: Apache/2.2.3 (CentOS)

can someone to help me

thanks.


Write a Comment

This information is published in the Community – it’s public.
 

Save Comment Remember Me?

Welcome

to the Microsoft Bing community

Remember, don't post your personal information!