My favorite feature of our recent launch is the Crawl Issues tool, which gives you details about issues Live Search may encounter while crawling and indexing your website. This information can help you better understand what Live Search sees when crawling your site and should ultimately help you improve your results from Live Search.
We report four types of issues:
For each of these types of issues, we show you the first 20 results on the Crawl Issues page, and allow you to download up to the first 1,000 results in a CSV file that opens easily in Excel. For large websites with potentially thousands of issues, we’ve supplied a filter option that allows you to scope the results by subdomain or by subfolder. For example, if you were the webmaster for microsoft.com and there were 250,000 file-not-found results, you could filter them by “support.microsoft.com” or “support.microsoft.com/kb” to see just the issues from a particular section of your website. Generally, we support up to 2 levels of subdomains and 2 levels of subfolders per URL, but a website may have fewer available.
Once you’ve created the filter that gives you just the URLs you need, you can download the results in CSV format and email them to the webmaster that owns that part of the website. This gives them a clear idea of the issues that need to be fixed.
Let’s take an example site and see how you might use this tool—fortunately microsoft.com is always willing to help us out here. Microsoft.com is a gigantic website, with more than 300 full-time people working on the site between the developers, IT personnel, marketers and content authors. And they have almost every type of legacy system you can think of, so it is no wonder that they experience almost every type of issue there is. For example, if we take a look at the number of File Not Founds, it is about 218,000. That alone is way too many to deal with, so I usually scan through the first couple hundred results to see which parts of the website are effected, or I start with subdomains that are the most important.
One of the most highly trafficked portions of the site is the popular support knowledge base (KB articles). These articles are used to store information about all security and other issues, so let’s drill in there. Looking through the 404 pages from that section, one of the first issues that I notice is a series of URLs that look like this: https://mvp.support.microsoft.com/default.aspx/profile/hongfeng.liu. Adding to the mystery, when I pull the page up a browser, it loads perfectly. Hmmm, is this the first bug in our tools? With a little more research using Live HTTP Headers, I discover this page is the result of some funky redirecting and status codes. Here’s what’s going on:
The “http:” version of the page is 302 redirecting to the “https:” version of the page, which is returning a 404 File Not Found error code while still displaying a valid page. Because the page renders correctly in a browser, it can be difficult to manually detect this type of issue. But now that I’ve figured out the problem, I can use the filter functionality to generate a list of all 160 URLs that appear to have this problem, download them as a CSV file, and email them to the site manager who owns that part of microsoft.com.
Hopefully folks will find this tool useful in diagnosing issues within their own sites as well. Please let us know if you have any questions or comments.
--Nathan Buggia, Lead Program Manager, Webmaster Center
Thanks for providing us with more information about the crawling and indexing our sites. Thanks and greetings!
That's a nice list of suggestions..useful to optimize a web page! Thanks a lot
Our data isn't better or worse than googlebot, however there are some differences. We provide two reports that Google doesn't:
- Long Dyanmic URLs
- Unsupported Content-Types
Google provides similar data for the other reports, however, they may have different coverage of your website, and may have implemented their crawler differently which could give you different results.
Look for us to continue to expand the reports we provide here in the future.
-- Nathan Buggia, Webmaster Team
HI Nathan Buggia,
Gr8 job i expect more from MSN also regarding Flash indexing if it can implemented would be gr8.
I love the HTTP Compression and HTTP Conditional Get Results facility which is cool.
I have just added to my blog about adLabs Research Center Tools which gives me usefull stats like Age, gender etc.
Keyword Forecast Tool:
http://adlab.msn.com/Keyword-Forecast/
Thanx
For some reasons I cannot find, about one of my registered sites (my main one!)
1 - The site seems to be registered OK
2 - The backlinks tools shows 600+ backlinks
3 - The Crawl Issues shows no issue
... but none of the pages has been referenced for several months.
I would hope that crawl issue would hint at the source of the problem, but does not.
Is there some option of the tools that could help me to find the source of the problem?
Everyone knows great content is fundamental to the success of your site. It is the reason people are
The Crawl Issue tool is helpful to attentive webmasters. And your report about Long Dynamic URLs and Unsupported Content-Types make your search engine unique.
I like the bing community here and just learned quite a bit by visiting here. I think it is cool that we get to know if our sites got any crawl issues by checking in with the tools here. Thank you guys.
hi i have been waiting for google and bing to index my blog www.diyanswerdirect.com/blog for two weeks now and am getting very frustrated with them. aswell as iam still waiting for them to update my site www.diyanswerdirect.com. My question is when do they update and how can i get them to crawl or spider my site.
thanks to share useful info with us.....