What is the robot doing txt

robots.txt

The file with the name robots.txt is the file of a website that is first accessed by bots (short for: robots; also: web crawler), which is necessarily positioned in the root directory of this website. It is used to transmit instructions from the webmasters to the web crawlers on how they should proceed with the targeted website.

For example, robots.txt gives webmasters the opportunity to exclude the entire or specific individual parts of the website from being included in the search engine index or not to use these links to access the other domains linked from the website.

The requests contained in the robots.txt are purely indicative and by no means mandatory: Although the major search engines Google, Microsoft and Yahoo in particular commit to compliance, it is possible for so-called "nasty" web crawlers to ignore the instructions.

Possible instructions (including examples) within the robots.txt are

  • User agent:
    Instructions applicable to individual (Sidewinder) or all (*) crawlers
  • Disallow:
    Exclusion of the entire (/) or certain (/ Temp / or/*.pdf$) Parts of the website from crawling
  • Allow:
    Explicitly allowing the reading of certain parts that are part of a previously overDisallow excluded section of the website
  • Crawl delay:
    Limitation of the crawling speed in seconds (120)
  • Sitemap:
    URL of the sitemap of a website (http://www.OnlineMarketing.de/Sitemap.xml)

Example excerpt from a robots.txt file:

# robots.txt for example.com # Bots to be excluded User-agent: Sidewinder Disallow: / User-agent: Microsoft.URL.Control Disallow: / # Generally not to be searched directories or files User-agent: * Disallow: / default. html Disallow: / Temp / Disallow: /Privat/Geburtstage.html # Exception of the directories or files not to be searched User-agent: * Allow: / Temp / PermanentTemp