…………………………………………………….
…………………………………………………….
What is a robots.txt file and Robots.txt Checker ?
The robots.txt file is a simple text file used to inform Googlebot about the areas of a domain that may be crawled by the search engine’s crawler and those that may not. In addition, a reference to the XML sitemap can also be included in the robots.txt file. Robots.txt is a text file webmasters create to instruct web robots (typically search engine robots) how to crawl pages on their website. The robots.txt file is part of the the robots exclusion protocol (REP), a group of web standards that regulate how robots crawl the web, access and index content, and serve that content up to users. The REP also includes directives like meta robots, as well as page-, subdirectory-, or site-wide instructions for how search engines should treat links (such as “follow” or “nofollow”). In practice, robots.txt files indicate whether certain user agents (web-crawling software) can or cannot crawl parts of a website. These crawl instructions are specified by “disallowing” or “allowing” the behavior of certain (or all) user agents.
What is it?
Check if your website is using a robots.txt file. When search engine robots crawl a website, they typically first access a site’s robots.txt file. Robots.txt
tells Googlebot and other crawlers what is and is not allowed to be crawled on your site. In order to pass this test you must create and properly install a robots.txt file. For this, you can use any program that produces a text file or you can use an online tool (Google Webmaster Tools has this feature).
Remember to use all lower case for the filename: robots.txt, not ROBOTS.TXT.
A simple robots.txt file looks like this:
User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /pages/thankyou.html
This would block all search engine robots from visiting “cgi-bin” and “images” directories and the page “http://www.yoursite.com/pages/thankyou.html”
TIPS:
One important thing to know if you are creating your own robots.txt file is that although the wildcard (*) is used in the User-agent line (meaning “any
robot”), it is not allowed in the Disallow line. You may not have blank lines in a record because they are used to delimit multiple records
Once you have your robots.txt file, you can upload it in the top-level directory of your web server. After that, make sure you set the permissions on the
file so that visitors (like search engines) can read it. You need a separate Disallow line for every URL prefix you want to exclude
Notice that before the Disallow command, you have the command: User-agent: *. The User-agent: part specifies which robot you want to block. Major known crawlers are: Googlebot (Google), Googlebot-Image (Google Image Search), Baiduspider (Baidu), Bingbot (Bing)
Regular expressions are not supported in either the User-agent or Disallow lines robots.txt Tools
1. www.tools.seobook.com/robots-txt/analyzer/
2. www.technicalseo.com/tools/robots-txt/
3. www.seositecheckup.com
4. www.sitechecker.pro/robots-tester/om/tools/robotstxt-test
5. www.en.ryte.com/free-tools/robots-txt/
6. www.screamingfrog.co.uk/robots-txt-tester/
7. www.websiteplanet.com/webtools/robots-txt/
…………………………………………………….
…………………………………………………….