Bing blogs

This is a place devoted to giving you deeper insight
into the news, trends, people and technology behind Bing.

Search Blog

June
24

New Operators Explained

As we mentioned earlier, in the latest version of MSN Search we’ve added a number of new advanced query operators. These make it easier to find things using MSN Search, and in some cases capture some of the zeitgeist of the Internet.

Filetype:

One of the most requests operators was filetype:, which enables you to filter documents based on their particular filetype. Currently, MSN Search supports html, txt, and pdf, as well as the primary Office document types: doc, rtf, xls, and ppt for Word, Excel, and PowerPoint documents. This is outstanding for finding official forms which are usually in PDF or DOC format… for example, 1040 filetype:pdf  for the official IRS 1040 for US Taxes, or brazil visa filetype:pdf  for the Visa application for visiting Brazil, which is coming in handy for those of us who will be traveling to SIGIR in Salvador, Brazil later this year.
 

Link: and LinkDomain:

We shipped 1.0 with the Link: keyword, which allows you to find pages that link to a single page, ala link:search.msn.com. We’ve added a variation of that, LinkDomain:, which returns pages that link to any page in a given domain. This is a great way for all you bloggers out there to see how many people are linking to you some way some how, and where they’re linking. For example, to see pages that link into MSN, you just issue the query linkdomain:msn.com.
 

Contains:

One of our new, experimental operators is called Contains:. Contains:<foo> searches for pages where there’s a hyperlink to a file with the extension <foo>. For example, contains:wmv will find results that contain WMV files. You can augment this with other search terms to narrow your search… for example, the infamous news clip of the exploding whale is easily found via exploding whale contains:wmv. This will fine any filetype that our spider sees a link to on the Web, so it’s a great asset to find binary files that our Crawler doesn’t download and process --- audio and video files, images, binaries, and so forth. We’re really just starting to explore the utility of this feature, so we can’t wait to see what you come up with as well!
 

Blog search in 4 lines of code

One of the tricks to use with Contains: is searching for blogs. As it turns out, most blogs, at least most blogs that are nowadays worth reading, have a RSS feed somewhere on them. Contains: is a great way to find pages that have RSS feeds, which is usually just a RSS, XML, RDF, or ATOM document type. Interested in finding blogs on African Cichlids? Try: African cichlids contains:(rss xml rdf atom). Or perhaps you’re a Steelers fan. It’s a quick hack, but we’ve found that it works surprisingly well!


InURL:, InAnchor:, InTitle:, InBody:

Ever have those times where you’re looking for something you saw once upon a time but can’t for the life of you remember exactly what it was? Perhaps some page where you clicked on a link titled simply “trebuchet” and it had a list of things like “gog” and “magog” but you can’t track it down? These keywords are for you. For example, “inAnchor:trebuchet inBody:gog inBody:magog will put you directly to the site you want.

 

Various limitations with these operators:

There are currently a few limitations with them, especially the inUrl: and inAnchor: operators. As a commenter in a previous post noticed, inUrl: doesn’t work like the Google operator, it follows the same logic as the rest. So, inurl:trebuchet will find documents containing “trebuchet” somewhere in the domain or path of the URL; however, inurl:www.trebuchet.com/models doesn’t work yet (although we will be considering that for a feature in a future release!). Also note that these operators don't take multi-word phrases (yet) --- for example, to find pages that use "Darth Vader" as the anchor text, you'll need to use inAnchor:Darth inAnchor:Vader.


Finding pages that link to a certain page with certain terms:

Danny Sullivan over at SearchEngineWatch.com asked us how to use these operators to find pages that link to a target page using specific anchor text. Continuing with his example, both George Bush and Michael Moore have a number of people who link to them using the term “miserable failure.” But who has more links? The query inanchor:miserable inanchor:failure link:www.whitehouse.gov/president/gwbbio.html won’t work. As it turns out, this query returns documents that link to George Bush’s bio page and have some OTHER page linking to them that use the terms “miserable” and “failure.” That’s why you only get a handful of pages, including other president’s bios. As it turns out, InAnchor: doesn’t work too well with Link: and LinkDomain:, and we’ll be doing something about that in a future release. In the meantime, you want to use InBody:, which will also match text found in the links on the page. So, inbody:miserable inbody:failure link:www.whitehouse.gov/president/gwbbio.html and inbody:miserable inbody:failure link:www.michaelmoore.com is the way to go. It’s not perfect, but it’ll get you most of the way there.

Erik Selberg
Program Manager, Relevance
MSN Search

 

Comments

  • It is very nice to see MSN Search adding more ways for users to refine their search results. I especially like the LinkDomain: feature.

    Keep up the good work!
  • OFF:
    any comments on the new toolbar version? :-)
  • This functionality is great, and reinforces my recurring thought that computing is moving from text-based DOS/BASH prompt to Windows/KDE graphics interfaces back to text-based input methods.

    I'm exaggerating to make a point -- here's a quick explanation along with past entries on the topic of the New Command-Line:

    http://spaces.msn.com/members/boikej/Blog/cns!1pMlHWbwh3d14m5WSwXVu9IQ!246.entry
  • This is a quite welcomed addition, thank you!

    Nevertheless there are still some blind spots, where is very hard (read impossible) to find some information, because there is no way to filter pages containning keywords on images or links. I think that there is no need rocket science to gather this meta-data...

    Is there on the wishlist operators like:

    image.src: (keywords in image's URL), image.alt: (keywords in image's alt attribute),
    link.all: (keywors annywhere in link's URL),
    link.host: (keywords in links's host),
    link.text: (keywors in anchor text),
    url.host: (keywords in host),
    url.path: (keywords in path),
    depth:[number] (page's depth) and
    filesize: document's size?


    Keep up the good work!
  • Hi Eric, or whoever on the MSN team read this. I applied online today, but feel my resume will fall through the cracks.

    I'm the creator of Nata1, at one time the most popular (still?) .Net search engine on the market, open source, free. My current version is much more powerful than the Google Mini and maybe the Appliance as well, I give full control over relevance.

    I'd like to talk to you guys about what I am trying to do because I know that its truly unique, I have a new and fresh approach to Searching, something that noone has, it has to do with a Rule Engine, a nice API, and the power of human beings. I hope I come across your radar, as I'm offering something better than Google Mini for free. Paul AtSign Nata1.com
  • My Fav' is the "contains:wmv". Finally, a way to quickly find windows media files of the exploding whale!