There are plenty of bots out there and, as a result, some conventions have arisen. Well-behaved bots identify themselves with a unique user-agent. They also follow the robots.txt conventions, which allow webmasters to control how their sites are crawled.
Here at Live Search, our crawlers are identified by the user-agent ‘MSNBot’. This may seem a little non-intuitive, but many webmasters depend on this, and so we chosen not to change it. In order to make things a little more transparent, we also identify our different types of crawlers. The complete list is as follows:
MSNBot Main web crawler (www.live.com)
MSNBot-Media Images & all other media (images.live.com)
MSNBot-NewsBlogs News and blogs (search.live.com/news)
MSNBot-Products Products & shopping (products.live.com)
MSNBot-Academic Academic search (academic.live.com)
But what about crawlers that aren’t so well-behaved? After all, anyone could call themselves ‘MSNBot’, and proceed to be as rude and aggressive as they like. Fortunately, there is a way you can catch these impersonators. Here is how it works:
By verifying the crawler’s identity, you can catch masquerading crawlers. When you do catch one, you can simply return an HTTP Error, thus blocking them from seeing your content.
We are constantly looking for your feedback to help improve our engine – please send it our way using this link.
Brent Hands, Program Manager, Live Search
PingBack from http://www.mattcutts.com/blog/msft-adds-bot-verification/
Wow. Thanks MS! Google makes a big deal about their google bot being so big and great that they cannot release the IP address and not to worry about fake bots using the googlebot name.. You guys on the otherhand have given out the IP address and encourage people to do a reverse DNS on it.. Looks like google is dropping the ball again!
Could you give a more indept explanation to identifying the bot with Reverse and Forward DNS Lookup?
Also, I don't want my site images to show up in the listings, can I DisAllow MSNBot-Media in robots.txt
Will it ensure that the rest of my site i.e. content will still be crawled?
@ Explorer5 Actually MS is following Google's lead on this one. See http://googlewebmastercentral.blogspot.com/2006/09/how-to-verify-googlebot.html
Nach Google hat jetzt auch MSN eine Methode implementiert, um die Echtheit der MSN-Bots zu besttigen. Grund hierfr ist vermutlich, dass mittlerweile eine Vielzahl der Scraperbots die Useragents der groen Suchmaschinen nutzen, um nicht aufzufallen. Di
Nice work Brent, this is useful stuff, only a couple more engines to follow.
Microsoft has jumped on the bandwagon for allowing you to verify if the useragent, MSNBot, that is crawling your site, is truly from Microsoft or being spoofed by some content scraper. Google has released information in the past for verifying...
Live Search's WebLog : Search robots in disguise...
就算我們知道方法,但是還是沒辦法有效處理啊… 剛看到一篇文章,提到繼 Google 先前公佈了 GoogleBot 的驗證法之後,Microsoft Live Search 也公佈了 MSNBot 的驗證法。
之所以會需要這種驗證,...
В блоге Live Search даются рекомендации, как распознать ботов, которые выдают себя за ботов MSN....
this sounds great, yahoo should do the same... hehe
MSN Live Search的蜘蛛共有五种,各有自己不同的用处。
What about other Google or MSN bot products including newly bought sites ?
There is an every increasing nnumber of bots to keep a track of when looking at web stats. MSN are making...
As a member of the Live Search Webmaster Team , I'm often asked by web publishers how they can control