This is a place devoted to giving you deeper insight
into the news, trends, people and technology behind Bing.
One of the best parts of publishing online is that, on the Web, anyone can have a world-wide reach. But while being global is made easy on the Internet, ensuring that the content you produce will be found by the right audience can be a real challenge. Search engines can have trouble understanding geotargeting because of a few technical limitations. These include:
At Live Search, we attempt to overcome these and other challenges by examining the contents of a site, looking for indications that help us determine its intended audience. Sometimes it’s easy, for instance, when a site has a country code top level domain that matches the body text language of its intended audience. Other times, it’s more difficult. Live Search looks at a number of indicators to ascertain a site’s intended audience. The following are a few of the indicators we look at:
While we do our best to read the indicators you give us, none of them alone are crystal-clear gauges of a site’s intended geographic interest, so we take them all into consideration. So, how do you ensure that we’re able to determine the intended audience for your content?
There is no single, fully-effective approach to architecting and localizing websites. Books could and have been written on this subject, but there are ways to do it that are friendlier to both your customers and to us. Consider the following recommendations:
When writing the content for your page, are you using keywords that tell the end user what location your content is relevant to? If you’re a local business, be sure to include text with the telephone number with the area code and country codes, the physical address (if applicable), the city, state, and country name where you’re located. This’ll help both search engines and customers find you.
If you’re already doing this, one thing you may not have considered is to target additional location keywords that may also represent the location. For example, if your business is located in the Capital Hill area of Seattle, Washington, you’ll want to list both “Seattle” and “Capital Hill” because your customers will likely be searching for both. However, in this case, if you list “Capitol Hill” (note the spelling difference, because your customers might spell it incorrectly) and “Washington,” you might be mistakenly considered to be back east in the District of Columbia!
One problem we see from time to time is inconsistent language usage, especially where user generated content is concerned. These pages can trip up our detection, especially when other best practices aren’t being followed.
For example, from time to time, content in MSDN is required for external markets like France, but the page content hasn’t been converted to French or perhaps the content was user-generated, such as in the following image.
Ensure, wherever possible, that you’re speaking the same language in the title tags, description tags, and rest of your page. Consistency is key.
When you’re architecting your site, we recommend grouping your localized content by the TLD, subdomain, or subfolder. Keep all of the content from a region or language grouped together in a single structure. The following are all examples of the good organizational structures for languages using the sample URL www.domain.com:
Don’t mix content intended for one market with the content of another. This is bad for search and can be a bad experience for the customer.
Note: One thing to consider in planning the hierarchy of a global site is how many URLs you produce. Having too many URLs for each market may dilute the whole relevance of your pages and site. Furthermore, you may not get customers to link to the most important pages on your site. See our part 1 post on large site optimization for more thoughts on this topic.
Sometimes search engines struggle to deliver your content to the right audience due to problems within your (the publisher’s) control. Some of these common mistakes include:
Some sites store the language setting as a preference in a cookie, but provide no navigational method for seeing the content for other markets. The problem with this approach is that the search engines don’t support cookies when crawling, so we never see anything but the default language. This can also create a less-than-optimal user experience. For instance, my friend Mishka may be reading a site here in the US but then switches to the German version of the same site. The site drops a cookie on her computer to note the change. If Mishka then emails a link to the site to her mother, who doesn’t know English, the site won’t find the German language cookie on her mother’s machine, so she’ll see an English page that she can’t read.
The browser’s HTTP_ACCEPT_LANGUAGE header is passed to a website when the page requests it and informs the site about which language you prefer to receive the content in. If you’re using a script to detect this setting and change the content based on that setting, it’s easy to load all the languages in the website under the same URL. However, if that’s the only way the localized content is accessible, the crawler, which doesn’t pass values to this header, will only see the default language. Having a contextual URL per language is necessary for ensuring the content is crawled.
Search engines may sometimes be able to recognize a market setting in a query parameter if it’s using a common nomenclature (such as EN-US), but this is still neither optimal nor friendly for search. It’s better to follow the pattern for URL hierarchy we described above. The following are examples of both standard and non-standard market nomenclature that are used in URLs:
Search engines are always looking to improve how we detect the location and language of the intended audience for your site. At Live Search, we’re considering several possibilities, from meta tag standards to tools in Webmaster Center. But until those tools become available, by following the recommended practices and avoiding the common mistakes listed above, you can help us make those determinations today. As we make improvements in this space, we’ll continue to make announcements as well as seek your feedback and questions in our forums. Some additional great reads on the subject include:
Jeremiah Andrick, Program Manager, Live Search Webmaster Center
Yahoo! already supports the Content-Language HTTP Header (or HTTP-EQUIV meta tag) - see their official guidelines on it here:
Having search engines use the same methods of identifying language/country (this uses ISO codes so it can do both) is really, really important for webmasters.
It's even better when it re-uses the existing web standards. HTTP Content-Language (and the HTML "lang" attribute" or XHTML's xml:lang attibute) are really THE correct HTML standards for doing this.
Note: Whatever you do, please don't talk about the meta tag name=language - this is an oooold tag which was used by early web browsers before the HTML standards caught up, and is not standardised anywhere and is superseded by the above standards.
So - HTTP Content-Language, and possibly the "lang" / "xml:lang" attributes, is absolutely, most definitely, the only way to do this. Webmasters everywhere will love you for it.
A good way to push the aforementioned standards would be to do another joint declaration with the other big search engines (Google/Yahoo!, possibly get Ask involved). This will massively increase the attention it gets, particularly if Google get on board.
Google already has a way of doing it in Google Webmaster Tools - most webmasters will only do it for Google and not bother doing it for other search engines, even if you provide the same feature in Microsoft's version of Webmaster Tools. Getting a standard which is included on the shared resource of people's web pages rather than the walled garden of a search engine (Google) is important.
@Ian Thanks so much for the comments. I think in terms of identification there is a lot that a partnership could do to help the publisher community. Getting a standard together tags a long time because our engines are very different under the hood. We are committed where possible to join with the other engines. I think your feedback to this point is really good. We will let you know if something materializes as a standard.
Can you please comment on the MSN bot's disregard for robots directives such as 'crawl delay' and 'disallow'?
There are many related (unanswered) questions on this blog.
Daily, MSN brings our servers to a crawl. This is a serious issue.
Good tips will help improve the site.
Thanks for all your suggestions and indexing tips for making good site.
Very less explored subject and you have provided valuable information. Thank you!
It was suggested that creating specific language subdirectories was a better way to clarify a mutilanguage website.
Well, this is really good.
I was always looking for a post like this. Sometimes we get in trouble when making multilingual sites.
I agree with slot guy- that will be better to have a neat and clean output.
This is a good article that explains the technical limitations that make search engines encounter problems understanding geotargeting of sites. Helpful to me indeed!
Less discussed topics no one write about it. But it like a root of tree. So thank to write about it.
I have a site that targets many countries. Each country is in a dedicated folder/directory. I have implemented the best practices you wrote about here, but with little result. I'm looking for more ways for the content I produce for each country to be found by the right target audience. More articles on this topic will be much appreciated.
It's really very helpful to increase the web site value.Tips are really working.But same time we need to follow the rule strickly.The update is really very learning.It shows the creativity of writer.The steps described,need to be use in practice. Thanks looking forward for more.
Why not implement geo targeting like Google have done. So you can just set it via the webmaster tool rather than messing around with http headers?
© 2013 Microsoft