Search Results Clustering

Search Results Clustering

  • Comments (4)

As you may have noticed elsewhere, our teammates at MSR Asia released a Search Result Clustering site & toolbar (good job guys!). It can be used for query disambiguation (example: jaguar) and sub-topic discovery (example: data mining). It was developed at Web Search and Mining Group in MSR, Asia and does all of the clustering on the fly using MSN’s Search Results.

MSRA’s approach to clustering is a little different that other systems you might have seen.  Here’s a summary from the project’s publication:

Traditional clustering techniques don't work for this problem because the documents are short, the cluster names should be readable and the algorithm should be efficient for on-the-fly calculation. The method takes on the whole problem in a different way and overcomes the difficulties in traditional clustering methods. It tries to first identify salient topics by identifying distinct and independent keywords, and then classifies the search results into these topics.

 If you want to learn more check out the associated research paper, Learning to Cluster Web Search Results.

Brady Forrest, MSN Search PM

Join Bing Community
  • Nice work. There are many clustering meta search engines on the web(clusty, mooter etc.), but it's a nice try. I also made a clustering meta search engine using google's results last year as a course mini project, I had a same type of algorithm with phrase ranking with less number of features. As I experienced MSRA's clusters are better then others (including mine :)).
    It could be better if user has the facility to specify the number of documents to clusters from the search results, where default is 200.

    Monu Agrawal, Indian Institute of Science, Bangalore
  • That's cool!

    *stands and flips a coin for a while*

    Oh, you going to say anything else? Like, perhaps, ANYTHING?

    It's been awfully quiet around here lately. I haven't heard a peep from this place in a long time, and apart from minor shouts like this one, I haven't heard anything "significant" for a while.

    It's strange for such a "young" project to be so ...static, y'know? I'd expect you to be fine-tuning this, expanding, trying to get an edge on some minor competitors, like, I don't knoe, Google.

    I mean, I know you're busy with the second Search Champs and all, but couldn't you find a few minutes to blog? Perhaps comment on the proceedings, mention a couple of nifty things, and oh - I don't know - respond to the market as it inevitably grows more complex and expands? Search is changing, why not mention in what ways? Desktop Search is SUCH a recent invention, even though it shouldn't have been, why not talk about where the industry is moving in holistic terms? That doesn't give features away.

    See, it kind of eliminates the point of having a blog if you don't say anything. After all, isn't a blog meant to communicate? And you're not communicating if you're not talking... about SOMETHING.
  • 自动分类技术将是下一步改善搜索效率的一个很重要的方面,因为很多用户面对搜索出来的成千上万条结果没有一页一页的翻找的耐心,而又不会使用更多关键词帮助缩小搜索范围,这样搜索引擎搜索出来的内容真正被用户使用的,可能只有最前面很少的一部分。
Page 1 of 1 (4 items)