Bing blogs

This is a place devoted to giving you deeper insight
into the news, trends, people and technology behind Bing.

Webmaster Blog

February
12

Partnering to help solve duplicate content issues

One of the most common challenges search engines run into when indexing a website is identifying and consolidating duplicate pages. Duplicates can occur when any given webpage has multiple URLs that point to it. For example:

URL Description
http://mysite.com A webmaster may consider this their authoritative or canonical URL for their homepage.
http://www.mysite.com However, you can add 'www' to most websites and still get the same home page.
http://mysite.com/default.aspx You can also often add the specific filename of the homepage and get the same page
http://mysite.com/default.aspx?promo=ABC Many times websites use parameters to track things like where customers are coming from (in this case an offline promotion), or parameters that determine how the content on the page is formatted.

These four cases are just a few of the many possibilities. When you consider all the combinations of these, you could have more than 10 clone URLs for every page on your site. That means if there are 1 million pages on your site, we could possibly find 10 million or more cloned URLs pointing to them. Determining your canonical URL amongst all the duplicate clutter has been an onerous challenge for search engines as we all work to reduce cost and improve relevance.

To help solve this issue, Live Search has partnered with Google and Yahoo to support a new tag attribute that will help webmasters identify the single authoritative (or canonical) URL for a given page. The link tag defines a relationship between a document and an external resource. In this case, that resource is the canonical URL. The following is an example of the new link tag attribute for canonicalization:

<link rel="canonical" href="http://mysite.com"/>

A few notes about the implementation of the new attribute:

  • This tag will be interpreted as a hint by Live Search, not as a command. We'll evaluate this in the context of all the other information we know about the website and try and make the best determination of the canonical URL. This will help us handle any potential implementation errors or abuse of this tag.
  • You can use relative or absolute URLs in the “href” attribute of the link tag.
  • The page and the URL in the “href” attribute must be on the same domain. For example, if the page is found on “http://mysite.com/default.aspx”, and the ”href” attribute in the link tag points to “http://mysite2.com”, the tag will be invalid and ignored.
    • However, the “href” attribute can point to a different subdomain. For example, if the page is found on “http://mysite.com/default.aspx” and the “href” attribute in the link tag points to “http://www.mysite.com”, the tag will be considered valid.
  • Live Search expects to implement support for this feature sometime in the near future.

While we expect this command will help us solve many of the more complex duplicate content issues, we still highly recommend that webmasters follow the existing best practices for normalizing their URLs through domain canonicalization and normalization of URL parameters. We’ll provide more details on the link tag after we’ve implemented full support in one of our upcoming releases. In the meantime, we look forward to hearing your feedback on the new tag.

-- Nathan Buggia, Live Search Webmaster Team

Comments

  • I've got a couple of plugins ready for this, for WordPress, Magento and Drupal: http://yoast.com/canonical-url-links/

  • Hey Nathan,

    why isn't there any option to reference other domains like http://domain.a/same-content

    is the duplicate content of

    http://domain.b/same-content?

    Best wishes from Germany,

    Heiko

  • I hope you'll come out with guidelines for what's right and wrong.  For instance, if I create a page that responds to URL parameters like this:

    /page.aspx?category=3

    but I normally rewrite it using an ISAPI filter or some similar technology to be referenced as:

    /page-category-3

    I should be able to put in a rel canonical that rewrites the dynamic URL to the static URL (but obviously changing based on the category parameter).  That's clearly within the spirit of why you'd implement this, but I wouldn't want a quality engineer somewhere to accuse people of spamming using rel canonical.

  • This can be a big boon to seo industry where canonicalization has always been an hot topic to discuss

  • Do you support HTTP Content-Location header as well?

    It's 10 years old and has been designed for (almost) the same purpose...

  • Google's blog suggests that you will be able to use the

    <link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" />

    syntax, and this syntax if a <base> is defined:

    <link rel="canonical" href="product.php?item=swedish-fish" />?

  • I have one absolutely burning question about this tag:

    If I include it on a page which has a meta robots tag of "noindex", and point it to a canonical variant of this page (which can be indexed), does this cause any problems?

    Essentially, we use meta robots "noindex, follow" for things like pagination, different sorting order of products, etc etc - this handles the duplicate content issue (and much better than robots.txt, from a site-owner’s perspective).

    What I want to make sure is that, if I include this new rel=canonical tag, that search engines that don’t handle this new tag can handle the "noindex" tag to eliminate duplicate content that way and search engines which do use the canonical tag are correctly supported.

    This is the single most important thing I need to know about this new tag. Please could you include this in your webmaster guidelines or a follow up blog post?

    The second most important thing is - is the behaviour of the above standardised with the other search engines which are using it too?

  • This blog has not bee useful to me. I understand that we add something like the canonical thing but how do i add it? My goal is for my website to be searchable by LiveSearch and this blog i think hasn't given me an answer or a way that can lead me to an answer. Please help.

  • Is it possible to move an entire site from php to asp using this link tag?

  • As simple as a Columbus egg but will it work as promised? Or will it be just another tag to stuff my <head>?

  • Guys, please help me to promote canonical tag for IPB forum. I want to motivate ipb creators for implementing this simple tag in their CMS, but at the moment they all show me resistance :(

    You help needed in replying in this thread:

    http://forums.invisionpower.com/index.php?showtopic=281532

    Thanks!

  • Live Search Webmaster Center Blog

    Official blog of the Live Search Webmaster Center Team.

    Nathan

    Are you still monitoring/responding to the above blog?

    If not, who at MS is?

    Jim

  • Nathan

    Why do the MSN bots ignore 'crawl delay' and 'disallow' directives?

    Many, many related (unanswered) questions on this blog.

    Comments please?  Thanks.

  • This sounds great, but has anyone done any testing, yet? I mean, are dup URLs actually being *removed* from indexes by this? How is this going to affect manipulative duplicate content alogos?

    Canonicalization issues should be addressed in planning and development and are very easily avoided when you’ve structured your website appropriately. Keyword research, develop, deploy. I just don’t trust this *one* tag (anyone remember metas?) to resolve the issues, entirely; it’s up to programmers to program accordingly. Dynamic 404s and strict URL structuring is an extremely effective, preemptive technique that people aren’t using as it is. What happens when this tag gets abused or deployed incorrectly?

    Will this tag actually have any effect on ‘big’ sites that *don’t* implement this technique?

    I need to understand the reward and penalty structure of this tag, in direct reference to white hat, and black hat, policies; and what Search Engines have in mind for this consideration.

    This will be interesting to watch unfold over the next several months…

    Arow

  • Nathan,

    I am glad to see Microsoft and others working with Google on the canonical lick element. Matt Cutt's Blog has an excellent 20 min. video explaining the new element in detail. By the way great post.