Bing blogs

This is a place devoted to giving you deeper insight
into the news, trends, people and technology behind Bing.

Bingbot crawler fails on sitemap index with querystring parameters

Webmaster

Webmaster
This group is devoted to Bing Webmaster Tools discussions.

Bingbot crawler fails on sitemap index with querystring parameters

This question is answered

I'm experiencing problems with Bingbot and a sitemap index with querystring parameters.

It seems as if Bing is doing some kind of double encoding of & (ampersand) sign in querystring, which cause subsequent crawl to fail.

For example, if I have this sitemap entry in sitemap index:

http://www.example.org/sitemap?parameterA=1&parameterB=2

Bingbot will call my server with:

http://www.example.org/sitemap?parameterA=1&parameterB=2

This can be verified by looking at the server logs. Since the querystring is malformed, the second request will fail with missing parameterB.

The sitemap index is verified to be correctly encoded, and passes the validation at http://www.validome.org/google/validate.

Google has no problem with querystring parameters, and will parse my sitemap index and crawl it correctly.

Anyone else experiencing the same issue? How can this be reported to the Bingbot developers?

Verified Answer
  • I have got a message from Bing support that this was a confirmed bug and that they now have resolved the issue.

    I can also verify from Webmaster Center that some of the links have been updated, I guess I'll have to wait some days before all of them are reprocessed..

All Replies
  • Hi,

    I suggest that you utilized the URL normalization feature of Webmaster Tools.

    Thanks,

  • Thanks Chris, but did you actually read my question? URL normalization has nothing to do with this problem.

    The sitemap protocol specifically notes that you have to create an index file which points to several smaller "chunks" of sitemap files, since a single sitemap cannot contain more than 50 000 URLs.

    See: www.sitemaps.org/protocol.html

    Since our web site contains several hundred thousands of URLs, we need to use sitemap index to divide the URLs into several sitemaps. This is done by adding a querystring parameter, and this can definitely not be filtered out since that would defy the whole point of separating the sitemaps.

    The problem is that the bingbot does not correctly parse the URLs given in the sitemap index, and when it subsequently calls the different sitemap files this is done with invalid URLs due to the double encoding.

  • We've got exactly the same problem.  The ampersand in our sitemap index urls are already escaped but Bing is double escaping them leading to malformed query strings when it requests the site map.  Again our same site map index works fine with Google.  I've contacted Bing support about the issue.

  • i am suffering from the probs.

  • I have got a message from Bing support that this was a confirmed bug and that they now have resolved the issue.

    I can also verify from Webmaster Center that some of the links have been updated, I guess I'll have to wait some days before all of them are reprocessed..