Bing blogs

This is a place devoted to giving you deeper insight
into the news, trends, people and technology behind Bing.

msnbot and non-existant files

Webmaster

Webmaster
This group is devoted to Bing Webmaster Tools discussions.

msnbot and non-existant files

This question has suggested answer(s)

for many many years, msnbot has been crawling my sites looking for files that have never existed... i'm trying to figure out why...

the filenames have changed slightly in recent times but they have been similar in structure since the beginning... they are something like 000092601_00002.temp0001.htm... in other words, 9 numbers underscore 5 numbers dot temp 4 numbers dot htm... the search for these is all over my server's directory tree...

i'll emphasize once more that these files have never existed on my site and i have no clue how msnbot may have picked them up...

now, how can i get msnbot to stop polluting my logs looking for them???

All Replies
  • JSherrod
    It seems to me that the answer lies with dynip.com.  Even though your code is straight html, you are a sub domain of a much larger domain that you don't control. Perhaps you could ask them.

    i highly doubt that asking them anything would result in anything positive... they are not a redirection service, for one thing... they are a DNS registration service... as such, their service is at the DNS level, not the http level...

    JSherrod
    Search engines just don't make up urls to crawl.  If you read through the forums you will see that people have enough issues getting their real content crawled, much less non-existent content.

    yes, i know they shouldn't but it is possible that it is some kind of internal situation to msnbot... remember, this problem has gone back many years... not just 2 or 3 but 10 or so, at least...

    as an example, i had one spider that was walking my files libraries and downloading only 32K of everything that was available... quite many of the files i offer are much larger than 32K... when i was finally able to get hold of one of their coders, it turned out that they were not paying attention to the MIME type or the format of the data they were getting back... since my sight hosts over 4000 binary file archives, this was a huge problem...

    this may also be similar to a problem i had with another provider that forced all their users thru their proxy server but their firewall or proxy server was set to only accept connections that were finished within 3 - 5 seconds... anything shorter resulted in their proxy server re-requesting the URL for their cache... this resulted in a huge bandwidth consumption as well as failed updates of their proxy's cache... fixing that took over 6 months and numerous calls, paid for out of my pocket, to their tech support before i finally got to one of their actual engineers and the problem wsa finally seen and figured out... before that, i was always dismissed out of hand because i was not one of their customers :?

  • Archie

    wkitty42

    if this is not where support hangs out, then bing has more to fix in their links and pages because this is where they directed me for msnbot support...

    As far as I can tell there is only one Bing representative who actually posts on these forums so a reply from Bing is not guaranteed if you ask for support here

    then they (bing) need to "FixTheirShit<tm>" because coming in thru their webmaster support links throw us into here which is obviously of no help... has someone redefined what "web master" really means?? it would seem so as it seemingly applies to anyone who is running a web site no matter what their qualifications :?

    FWIW: i've been done this stuff since before the internet was "the internet" and i won't even speak of the 30+ years as an applications coder :(

  • wkitty42

    Archie

    wkitty42

    if this is not where support hangs out, then bing has more to fix in their links and pages because this is where they directed me for msnbot support...

    As far as I can tell there is only one Bing representative who actually posts on these forums so a reply from Bing is not guaranteed if you ask for support here

    then they (bing) need to "FixTheirShit<tm>" because coming in thru their webmaster support links throw us into here which is obviously of no help... has someone redefined what "web master" really means?? it would seem so as it seemingly applies to anyone who is running a web site no matter what their qualifications :?

    FWIW: i've been done this stuff since before the internet was "the internet" and i won't even speak of the 30+ years as an applications coder :(

    I agree with what you've said but unfortunately that just isn't the case.  To contact Bing follow the steps I posted earlier and they will hopefully help you.

  • good reply archie

  • I've been seeing my server logs littered with the same 404 errors from msnbot.  It is constantly making bogus requests for what appears to be a randomly generated file name using  the file naming methodology described above.  It is really annoying and I wish Microsoft would stop this practice.

    The 32 character GUID at the beginning of the file name does seem to be pretty stable. It is almost like Microsoft is trying to litter server logs with the bogus GUID for some reason known only to them.

  • question is, why come here asking questions and then respond with borderline flame posts when people are trying to help you and understand the issue. I suggest using the support page already provided.

  • Brett Yount

    question is, why come here asking questions and then respond with borderline flame posts when people are trying to help you and understand the issue. I suggest using the support page already provided.

    How many times are you going to repost this rude non-answer.  I did not see the original post in this thread as borderline flame, nor any of his replies. The only flaming I've seen is in response to those asking questions. I certainly wasn't flaming anyone in my post just before your last reply.

    Contacting Bling in the manner you suggestion will be ineffective at best at resolving this problem as in all likelihood they will just ignore any such support request.  It also won't provide any answers for others, who like me, will Google this issue and find that this is just about the only forum thread out there that is discussing the bogus requests that msnbot is submitting. 

    The most effective route to resolve this problem is via a thread in this forum where others like myself, can stop by and express their concern about this issue. Maybe, just maybe, someone who actually works on the Bling and/or msnbot projects will bring this issue to the attention of someone and msnbot will be modified to stop behaving badly. 

    I know personally I find msnbot's behavior entirely unacceptable.  I can see a bot testing once in a very great while (e.g. once a week, month, etc.) to make sure 404 errors are being properly flagged in response headers, but sending out repeated bogus requests to the same domain  many times a day amounts to abuse of resources.   It is especially a nuisance on websites like mine where a great deal of scripting is in place to try and redirect users to the page they were trying to get to when they followed a bad link from somewhere else.  The noise msnbot is creating makes it just that much harder to find legitimate 404 errors that need to be fixed.

  • Brett Yount
    question is, why come here asking questions and then respond with borderline flame posts when people are trying to help you and understand the issue. I suggest using the support page already provided.

    well, brent, to be honest, as i originally posted, this has been going on for years... not just with bing but with msnbot since it first started visiting my site and indexing it...

    as far as trying to understand the post, i don't know how much plainer it can be... yes, there is and has been a level of frustration involved... as a support technician, i'm very aware of both sides of the fence and sadly the support techs end up taking the brunt of the problem and anger...

    all i've asked for is an explanation of where msnbot has gotten these invalid URLs for my site and how to terminate them...

    my apologies for unloading several years of frustration and anger...

  • Brett Yount
    question is, why come here asking questions and then respond with borderline flame posts when people are trying to help you and understand the issue. I suggest using the support page already provided.

    FWIW: the links to the support page(s) all end back in here...

    like that old text adventure game... "you're in a maze of twisty passages" :rolleyes: :(

  • wkitty42

    all i've asked for is an explanation of where msnbot has gotten these invalid URLs for my site and how to terminate them...

    I think msnbot is manufacturing the invalid URLs on its own. It is like they are intentionally trying to get 404 errors.

    wkitty42

    Brett Yount
    question is, why come here asking questions and then respond with borderline flame posts when people are trying to help you and understand the issue. I suggest using the support page already provided.

    FWIW: the links to the support page(s) all end back in here...

    like that old text adventure game... "you're in a maze of twisty passages" :rolleyes: :(

    Agreed!

    What I have learned about any of Microsoft's support resources is that unless Google can land you on exactly the right page all hope is lost in finding what you need on any Microsoft support site. They all suck for navigation.

    In regards to these invalid URLs msnbot is requesting, there should be a page explaining them linked to directly from the URL provided in msnbot's user agent string.

  • Without wishing to simplify things too much, your server is setup to send out a 404 error upon these requests?

    Very strange and illogical for the bot to just make up URLs.

    Unless someone is intentionally linking to these URL's just to keep you on your toes!

  • AG Property Greece
    Without wishing to simplify things too much, your server is setup to send out a 404 error upon these requests?

    In my case, yes my server is set to send out 404 errors for invalid URLs. 

    AG Property Greece
    Very strange and illogical for the bot to just make up URLs.

    The only logical reason I could think of for sending out invalid requests is to detect whether or not the server is correctly configured to issue 404 errors or not. Spammers will sometimes set their domains to always serve a "page" via a 200 header regardless of what URL is requested.  Sending out invalid requests to see if 404 errors are returned would be one way of detecting a domain that is trying to spam the SERPS.

    The thing is, to test this msnbot doesn't need to make multiple requests per day. One request per day, week, month should be sufficient.

    AG Property Greece
    Unless someone is intentionally linking to these URL's just to keep you on your toes!

    Unlikely that multiple sites would see this issue.  My feeling is that if folks did a search through their logs for msnbot hits many would find they have these invalid page requests in their logs as well.

     

  • AG Property Greece
    Without wishing to simplify things too much, your server is setup to send out a 404 error upon these requests?

    ummm... of course... is this not the default behavior for a web server?

    AG Property Greece
    Very strange and illogical for the bot to just make up URLs.

    i agree...

    AG Property Greece
    Unless someone is intentionally linking to these URL's just to keep you on your toes!

    that would indicate that these links should be able to be found and they are not... this problem has been ongoing for some 10 years or so... it is not something that is new... it is a very old problem with MSNBOT and only MSNBOT... google, yahoo, and all of the other several dozen search indexing bots out there that visit my sites on a daily basis have never searched for such... this is a MSNBOT problem, pure and simple...

  • Hi wkitty42,

    Can you  share your url  with us because we  are a group of volunteers and starting a new initiative in a community. Your post provided us valuable information to work on.You have done a marvellous job!
    Web Design | Miami Web Design 

  • Here are a couple of server log entries I have for the bad msnbot requests:

    '65.55.207.126'|Tue, 15 Dec 2009 20:39:49 -0500|'msnbot/2.0b (+http://search.msn.com/msnbot.htm)'|'*/*'|'/ADBF3C7AB534E8356F30D8AC05291640_00000.temp019f.html'|''
    '65.55.207.28'|Wed, 16 Dec 2009 05:46:22 -0500|'msnbot/2.0b (+http://search.msn.com/msnbot.htm)'|'*/*'|'/000166709_00001.temp00be.html'|''

    Note that the requested file is obviously a totally manufactured file name, which is designed to cause a 404 error. Seeing maybe one or two of these per week in the server logs wouldn't be too much of an annoyance, but when you see multiple requests per day it is really annoying.