Editor's note: In our continuous effort to make this blog as compelling as possible to our large and diverse audience, we are expanding the scope of the Bing Webmaster Center blog. Starting with this post, we will host occasional posts from "guest bloggers" from within Microsoft who work on search or use search-related technologies in their daily jobs. They will offer the perspective of a user of search engine optimization (SEO) services (just like you!) rather than that of a search engine offering prescriptive SEO advice. Let us know what you think and what topics you'd like to see covered in future posts with a comment here. Thanks for being a member of the Bing Webmaster Center community!
Today's guest blogger is Vincent Wehren, who led the SEO effort for the new Office.com. Office.com has grown to become the 30th largest site in the world, and has tens of millions of pages of indexed content. He's been an SEO for three years and leads the International team responsible for improving content optimization, reducing content duplication in the index, and optimizing site search performance, among many other duties. I am very pleased to introduce him to the SEO community.
-- Rick DeJarnette, Bing Webmaster Center team
***************************************************
Office.com is the companion website to Microsoft Office. With over 200 million unique visitors per month, 6 million content pages in 38 languages, and roughly 500 community contributors, the site offers product and support info as well as productivity content such as templates, images, clip art, and add-ons.
As part of building Office 2010, Office.com went through a complete redesign. In addition, the content management system and the server infrastructure behind it were also rebuilt to run on top of SharePoint 2010. As a side benefit, this major revamp also provided us with some opportunities to improve on our SEO capabilities.
Over a couple posts, I would like to share some of the SEO challenges we faced at Office.com and some of the decisions we made in the hope that you will find it useful for your site—big or small.
A new URL structure for our pages
Backed by the recommendations that came out of a site review that we did in collaboration with an external SEO vendor, we defined a global SEO strategy and list of priority items to go after. Our core focus during the development phase was on site architecture and other items that required code work to be done by our team of developers.
One of our top priorities was improving our URL structure to become more search engine-friendly. We already had a relatively flat folder structure—never more than one/two folder levels deep—but the URLs of our pages only contained a cryptic document ID, which made sense to our internal content management systems (CMS) but not to search engines (or users, for that matter). So, as part of the redesign, we wanted to have support for keyword-based, search-engine friendly URLs.
The motivation behind this is fairly straightforward:
But that's not really all:
The solution
With a need to scale for hundreds of thousands of articles and a large number of languages, we decided to simply re-use the existing page title and algorithmically build the display URL. We created something that loosely works as follows and which doesn't differ a whole lot from what some other content management systems or blogging software solutions do:
For example, following the above rules, the article "Overview of XML in Excel" in English now can be found at http://office.microsoft.com/en-us/excel-help/overview-of-xml-in-excel-HA010206396.aspx.
On the other hand, our users in Mexico will find the article here: http://office.microsoft.com/es-mx/excel-help/informacion-general-sobre-xml-en-excel-HA010206396.aspx.
URL length and stop words
What we did not implement for Office.com but you may want to consider for your situation is to limit the number of keywords in the URL or remove stop words from it.
The argument is that too many keywords dilute the value of each individual keyword and that long URLs receive fewer click-throughs.
We explicitly did not remove stop words because this gets a lot more involved for the large number of languages we support. Also, a lot of our pages are around key terms that in other contexts would qualify as stop words. A good example would be the title such as "What-if scenario" or "If function" in Excel, where the stop words "what" and "if" are actually the most significant, so stripping them out simply didn't make sense for us.
Also, search engines have started to improve the way they surface the page URL in the search results, making the click-through argument somewhat less of a concern.
Exceptions to our keyword-based URL strategy
There are cases where we wanted to cement the folder (or what we call a "sub web") name as the ultimate display URL for the page. In those cases we do not expand the page title but just promote the folder as the canonical URL. An example would be the default page of a specific product subfolder such as http://office.microsoft.com/en-us/access-help/. This also has the advantage that if the document ID changes for the index page of this folder, we do not have to redirect from the old page to the new page which we had to do in the past.
We also didn't end up taking the keyword-based approach for non-Latin based character sets, such as for our Japanese, Russian, Arabic, or Hindi sites—not because this wasn't feasible technically, but mostly because of the fact that there was still sufficient ambiguity around how to best handle URLs in these languages for users, browsers, and search engines alike. However, this is definitely something we would like to explore further in the future.
Fewer URL parameters
In addition to the keyword-based URLs, there was also a push to reduce the use of query parameters and have our URLs be more static overall. Although we didn't manage to remove all dynamic parameters (some of them are still meaningful, as with some click tracking scenarios), we made huge strides in that direction. Not only does that make it easier for search engines to determine the "primary" URL for a resource (there should preferably only ever be one), but it also helped to reduce the URL surface which search engines have to spend time crawling, processing, de-duping, etc., allowing them to spend more time on other pages.
Redirection of the old-style URLs to new URLs and the canonical tag
When making large-scale URL changes on a site that has earned numerous inbound links in the wild, you should redirect the old URLs to the new ones using a 301 redirect.
The 301 redirect makes sure that all ranking power of the old link is concentrated in the new URL. It also helps avoid content duplication problems if both the old and new URL still "work"—which is the case for Office.com.
In addition, you could consider backing up this redirect strategy with the rel="canonical" tag, which is starting to enjoy more and more support from the search engines. The canonical tag tells search engines the preferred URL of the page if there are multiple URLs for the page.
For Office.com, we planned to use both 301 redirects and the canonical tag, although we will start doing the full redirection only in a few weeks. Also, we are exclusively advertising the new URLs in our XML Sitemaps—but more about our Sitemap strategy in a later post!
What have you planned for? Are you thinking about search engine, keyword-based friendly URLs for your site?
If you have any questions, comments, or suggestions, feel free to post them in our SEM forum. Up next: Office.com Sitemaps strategy.
-- Vincent Wehren, Lead Engineer, Office.com International Site & Services
I believe its a nice way to create search engine friendly urls
It's wise to allow professional guest bloggers to breeze in sometimes and make unbiased posts, as long as you don't open up the gate for "outsiders" to post entries.
What do you mean with "we planned to use both 301 redirects and the canonical tag, although we will start doing the full redirection only in a few weeks."?
Do all the pages with a new url already have the new canonical tag and the you are planning the 301-redirect only in a few weeks or are you going to do both together?
I love it to read such background information, both from Microsoft/Bing and Google.
@stephan.walcher
We already are using the canonical tag on our pages but are redirecting only a subset of our legacy URLs to the new URLs at the moment. Later this month we will redirect all legacy URLs to the new URLs.
We didn’t want the additional overhead of this massive amount of redirects while we switched over our traffic to the brand new server farms. Now that we’ve established we are meeting and exceeding our end-user performance goals, the team will deploy the additional redirect code.
Although this was a temporary tradeoff from an SEO perspective, this has the benefit that it allows us to monitor the effects of this particular change to indexation, rankings, and traffic in a more isolated fashion. As a side-effect, it also provided us with some interesting data on the canonical tag: for which engines it seemed to be working well even without a full 301-redirect in place.
Does that answer your question?
- Vincent Wehren
how long do you think it will take to do a 301 redirect on a website like this big?
Its a brilliant idea; having tips from someone who actually does SEO on a day to day basis. Kudos to you!
It is great to read this post. It reassures me that we went the correct way with URLS when we had a new site update.
SO many other tidbits in this post that will be tried out.
knowing about this I booked many domain.
@Vincent - I'd be interested to see a report on how the different search engines handled the canonical tag for Office.com.
Definitely a useful guide for SEO for any site.
Love your work Rick, I have to say that one of the best things to happen as far as I am concerned is the new developments around the search engines and the canonical tag. Switching a site to CMS would be deadly if not for the ability to use the canonical tag. I am a lone wolf and do all my own development and optimization, it took untold hours to change all my CMS URLs to SEF URLs.
Some of my sites have thousands of pages but you will no longer find any URL that contains ID=b44eut87zipitgo, ?67, every single category, sub etc. now contain SEF URLs. This change was well worth the work involved, some of the sites moved up several pages in the search engines, and many of them that would shift from page one position to page two now hold their spots.
Bob
thank you nice
It's goog information.
yas this is the best information for me and this is very her-full me
but thanks for it