Bing

“Do what I mean, not what I say!” [Part 2 of 2]

October 29, 2007, 01:37 AM by Bing | 15 Comments

Continuing on with our “Do what I mean, not what I say”, blog post from last time, here are some additional categories we tackled with this initiative.

Equivalencies

“We do really badly on the query ca chp” a coworker complained in one email. 

“Ca chp?” I thought.  “What the heck does that mean?” 

It turned out it was pretty simple: “ca” was short for California and “chp” was short for California highway patrol.  Obviously, my coworker knew what he meant by the query ca chp, but I didn’t know it, and our search engine definitely didn’t know it.  After seeing many complaints from customers of this sort we began to realize that to truly improve the relevance of our search engine, it was more confirmation that we had to move past just simple keyword matching, and into understanding the intent of your query. 

So when you search for crossroads mall in OKC we take this to mean crossroads mall in Oklahoma City.  When you search for Julia child bio we’ll also look for Julia child biography to give you better results.  But of course, the same word could mean something different in another context.  Hence, when you search for nw university we we’ll search for northwestern university but if you search for nw co-ed soccer we’ll search for northwest co-ed soccer instead. 

Intelligent “stop word” retention
 
Another area that fell under the “Do what I mean, not what I say!” category were “stop words”.

What are “stop words” you ask?

Well, in Search Engine parlance they are words that oftentimes may not contain much “meaning” in the query  - words such as (a, the, in, etc…) and hence it may not be crucial as to whether they are found on the desired results page or not.   For example if the query was the aurora borealis, you probably wouldn’t be too concerned as to whether the word “the” was found on the top page returned or not, since “the” doesn’t contain much meaning here.  Hence, it may be perfectly acceptable to drop it from the query when retrieving pages.

However, if your query was The Office (the title of a popular televisions show) it would be absolutely ridiculous to drop the word “the” since the query would essentially change meaning - and we received a lot of emails about how we were doing just that.  In fact, previously we were routinely dropping all stop words – and knew this needed dramatic improvement.

In our recent release  we’ve overhauled our logic, and if you search for something where the “stop words” contain crucial meaning, we can sense thatand realize that “the” in The Office is crucial, or the “A” in Avenue A is crucial;   Whereas if you query for something like the aurora borealis we realize that the word “the” isn’t as crucial as the other query words.
 
Thanks, and let us know what you think!

subscribe

Comments

Dheeraj Kumar

Posted On October 29, 2007, 05:02 AM

Quite interesting but When I searched for "Julia Bio"  and it did't search for "Julia Biography" :(


Tommy Kristoffersen

Posted On October 29, 2007, 05:31 AM

Try the search term "Julia child bio"


Dheeraj Kumar

Posted On October 29, 2007, 05:52 AM

What I am trying to say if _bio_ is being expended to _biography_ in case of "Julia Chlid Bio"..then why not in case of "Julia Bio"?


Alexis Kauffmann

Posted On October 29, 2007, 02:51 PM

Live Search has yet a long way to go understanding human behaviour on the web. Translating intentions into algorithm takes time and hard work.


Michael Zerman

Posted On October 29, 2007, 10:10 PM

Hi there,

"Translating intentions into algorithmS takes time and hard work" said Alexis Kaufmann, above.

I suggest you (MSN, AOL, Goog, Yahoo and ASK) stop wasting your time fine-tuning algorithms, as researchers have been doing since the early 1990s.

The way forward is to educate searchers about the ease of searching, so that searchers know how to form queries that return the required results. Learning to search competently is much, much easier than learning Word or Excel.

May I respectfully suggest that the MSN Live Search team reads my brief (700 words) article at the URL above.

And takes note of my conclusions, in the "Four Tips for Newbie Searchers" section.

Regards,

Michael Zerman

Adelaid, AUSTRALIA


Cheryl Oxenham

Posted On October 30, 2007, 03:00 AM

I think this is a fantastic evolution of Live Search. However there is one thing that continually annoys the hell out of people here in the UK.

When will all the boffins at MSN realise there is a world outside the USA.

We are a .COM domain based in the UK - just because we are not in the USA it shouldn't mean we have to have a .CO.UK domain name - indeed we bought our domain over 6 years ago and didn't realise how much trouble it would cause and I'm afraid to say things seem to be getting worse. If we were to set the .co.uk domain up as an alias for the .COM we may even get penalised for duplicate content.

Can the clever people at MSN finally sort this mess out but adding in an extra tag like UK_COM so we can specify our market and you guys please remember NOT all .COM's are based in the USA!


Anonymous

Posted On October 30, 2007, 04:13 PM

@Deeraj & Tommy:   Thanks for sending the example that didn't yet perform exactly as you expected/desired.   Rest assured that we're committed to continually improving and refining our "Do what I mean, not what I say!" technology until we deliver what you're expecting the first time around.  This will be an area that will see continual refinements, improvements, and tuning (most shipped unannounced) over time. :)

Thanks!

Luke DeLorme

Program Manager

Search Relevance - Microsoft


Michael Zerman

Posted On October 30, 2007, 07:10 PM

Hi Luke,

+Julia +Child +bio*

This above could be your response to Deeraj and Tommy, explaining how the "+" sign means the term MUST be included in the results, and that the "*" (wildcard) will find all of, biography, biographer, bio and even biopic.

Rather than promising Deeraj and Tommy a further refinement, a more finely-tuned alogorithm or other unnecesary fiddling. Which they have to wait an unstated period of time to receive the benefits of.

MSN will never beat any other free-text based engine simply on the basis of technical prowess. Your history of bought services (LookSmart, Wisenut, Yahoo, Google, etc) to underpin the MSN search offering shows this to be the case.

The way forward for MSN to improve the value proposition of its search offering is to teach the searcher how to search. Not to continue the incessant registration of search patents.

Amigo, it's threatening for technicians, R&D types and algorithmists, but the education of searchers is the necessary paradigm shift.

And it's the way for MSN to start earning on its pure search offering (ie, improving ROI), rather than spending for little fiscal return.

Michael Zerman

Adelaide, Australia


Dheeraj Kumar

Posted On October 31, 2007, 06:17 AM

@Luke,

I know there is long way to go and there is not fixed destination, search engines are supposed to be improved each day.

@Michael Zerman,

I didn't like the your proposal at all becuase  Search Engines are made to give results what user wants.

bio-->biography can be avoid by regular expression in search term. But about Soccer -->Football and vice versa. Bharat -->India

Certainly examples are i am gave  here kind to difficult to crack but we should be working to help users at our best.

Try out:

http://searchradar.webaroo.com/s?searchQuery=Football

http://searchradar.webaroo.com/s?searchQuery=Bharat

Wishing good luck to MSN team,

Dheeraj


Michael Zerman

Posted On October 31, 2007, 08:26 PM

Hi Deeraj

A few responses to your post, all with good intention my friend.

1. "I didn't like the your proposal at all because Search Engines are made to give results what user wants."

I agree totally, but the problem is the user generally can't articulate what they want. They think they want "soccer", but the search engine only thinks "football". Or vice-versa.

2. The searcher wants Brazil, because they are a first-language English user, but the results want "Brasil" because 80 million Brazilians use that spelling.

3. Of course you don't like my proposal - it undermines your attempt to sell to a major player in the search arena a search property you are developing, much like Cuill, Lexxe, Webaroo, Scruffy or the twenty other search startups and directories that constantly visit my modest two-page website.

4. VCs are constantly looking for a Google-killer, meaning they are lemming-like in their investment priorities. The shift in investment will occur when the search majors (Goog, Yahoo, MSN, ASK and others) agree that a better-formed query is more profitable than a better-resolved fuzzy pair.

5. So Deeraj, did [+Julia +Child +bio*] provide a good result set?

Regards, and best wishes for success with your Webaroo start up (which needs to understand 404s for lack of fav.icos better).

Michael Zerman

Adelaide, Australia


Dheeraj Kumar

Posted On November 01, 2007, 12:15 AM

1) I think we are using wrong space to discuss.  Please feel free to mail me at dksidana AT gmail DOT com for further discussion or any other forum will also be fine.

2) My name is Dheeraj, plz note H at second place.

3) All above comments were not on my Company(Webaroo) 's behalf.

-Withdrawing,

Dheeraj


Michael Zerman

Posted On November 01, 2007, 02:12 AM

Please pardon me, I dislike it when people misspell my first or family name also.

It's not necessary to conduct this conversation in private, if at all.

The point is simple - will ROI for search engines, whether established or startups, be improved by better algorithms or by better queries from searchers?

That's all.

Best wishes,

Michael Zerman


oyun

Posted On November 07, 2007, 04:14 PM

Rather than promising Deeraj and Tommy a further refinement, a more finely-tuned alogorithm or other unnecesary fiddling. Which they have to wait an unstated period of time to receive the benefits of.


Welcome

to the Microsoft Bing community

Remember, don't post your personal information!