How to Create a Robots.txt file
Use the robots.txt file to control which directories and files on your web server a Robots Exclusion Protocol (REP)-compliant search engine crawler (aka a robot or bot) is permitted to visit. In your robots.txt file, you can also implement a crawl-delay: directive, which helps to scale back bot crawling activity. Also, you can add one or more Sitemap: directives with a reference to your website's sitemap or sitemaps.
- Identify which directories and files on your web server you want to block from being crawled
- Identify whether or not you need to specify additional instructions for a particular search engine bot beyond a generic set of crawling directives
- Use a text editor to create the robots.txt file and directives to block content
- Optional: Add a reference to your sitemap file (if you have one)
- Check for errors by validating your robots.txt file
- Upload the robots.txt file to the root directory of your site
Identify which directories and files on your web server you want to block from the crawler
- Examine your web server for published content that you do not want to be visited by search engines.
- Create a list of the site’s publicly accessible files and directories on your web server you want to disallow.Example You might want to have bots ignore crawling such site directories as /cgi-bin, /scripts, and /tmp (or their equivalents, if they exist in your server’s architecture).
|If you have added a URL pattern to your robots.txt file which matches URLs that Bing considers significant or important for your site, you will receive an alert in the Message Center. It will be also flagged for your attention on the Crawl Information feature|
Identify whether or not you need to specify additional instructions for a particular search engine bot beyond a generic set of crawling directives
- Examine your web server’s referrer logs to see if there are bots crawling your site that you want to block beyond the generic directives that apply to all bots.
|Bingbot, upon finding a specific set of instructions for itself, will ignore the directives listed in the generic section, so you might need to repeat all of the general directives in addition to the specific directives you created for them in their own section of the file.|
Use a text editor to create the robots.txt file and add REP directives to block content
- Using a text editor, such as Windows Notepad, create a new file called robots.txt (use this exact file name with no capitalized letters).
- Bots are referenced as user-agents in the robots.txt file. In the beginning of the file, start the first section of directives applicable to all bots by adding this line: User-agent: *
Create a list of Disallow directives listing the content you want blocked. Example Given our previously used directory examples, such set of directives would look like this:
- User-agent: *
- Disallow: /cgi-bin/
- Disallow: /scripts/
- Disallow: /tmp/
- You cannot list multiple content references in one line, so you’ll need to create a new Disallow: directive for each pattern to be blocked. You can, however, use wildcard characters.
- You can also use an Allow: directive for files stored in a directory whose contents will otherwise be blocked.
- For more information on using wildcards and on creating Disallow and Allow directives, see the Webmaster Center blog article Prevent a bot from getting “lost in space”.
If you want to add customized directives for specific bots that are not appropriate for all bots, such as crawl-delay:, add them in a custom section after the first, generic section, changing the User-agent reference to a specific bot. For a list of applicable bot names, see the Robots Database.
Note Adding sets of directives customized for individual bots is not a recommended strategy. The typical need to repeat directives from the generic section complicates file maintenance tasks. Furthermore, omissions in properly maintaining these customized sections are often the source of crawling problems with search engine bots.
Optional: Add a reference to your sitemap file (if you have one)
- If you have created a Sitemap file listing the most important pages on your site, you can point the bot to it by referencing it in its own line at the end of the file.
- Example A Sitemap file is typically saved to the root directory of a site. Such a Sitemap directive line would look like this:
- Sitemap: http://www.your-url.com/sitemap.xml
Check for errors by validating your robots.txt file
- Once the robots.txt file is built, it’s a good idea to validate the code. To do so, look for a robots.txt validation tool on the Web.
Upload the robots.txt file to the root directory of your site