robots.txt File

robots is just a text file

Robots.txt file is just a text document you can create using Windows programs and name it as robots with a file extension .txt however, when you place rules within this document and place it on web server root folder, then, you can better control crawling of search engine bots such as Googlebot.

Important to Note About Using robots.txt File

Most website owners will NOT need to use this file, nor waste time trying to learn how to Disallow or Allow certain sections of their website. Because using robots.txt file directives is not for controlling Google indexing URLs. The only real use cases for using robots.txt file rules are:

  1. If you want to let certain parts of a website not be crawled by user-agents such as Googlebot
  2. If Googlebot is requesting to many URLs thus causing server performance issues

Stop Wasting Time Trying to Create Robots.txt File for SEO

I know there are many so called SEO experts out there confusing business website owners about using robots.txt file for search engine optimization purposes. Do not use it for SEO. Simple.

Instead focus on creating useful engaging and optimized content for your ideal customers because that will bring better results for your business website. Once again, you would need to use robots.txt file NOT for SEO.

But rather to either control crawling of Google so that it doesn’t burden a web server with 1000’s of URLs, or, use it if Google Search Console complains about Submitted URL blocked by robots.txt

How to Create & Use robots.txt file Video Tutorial

Robot Exclusion Protocol Definition

The real definition is that its just a standard instead of it being a protocol. Robots exclusion standard, also known as the Robots Exclusion Protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots. Here’s a simple usage user-agent: * Disallow: /pathtofoldertostopcrawleraccess/ Disallow: /pathtofiletostopcrawleraccess.html

What is a User-Agent

Besides a browser, a user agent could be a bot scraping webpages (such as Googlebot), a website download manager, or another application accessing the Web. When we use the asterisk symbol * after user-agent: we signal to ALL Crawlers that obey the Robot Exclusion Standard rules (because some may not) within robots.txt file

What is robots.txt Disallow

Anytime you use Disallow: /path-to-URL you are telling user-agents to NOT access (as in crawl the URL). You can have many different Disallow rule sets.

What is robots.txt Allow

Anytime you use Allow: /path-to-URL-to-Allow-Crawl-Access you are granting user-agents to access (as in crawl the URL). You can have many different Allow rule sets. How can this be useful? User-agent: disallow: /entirefoldertoNOTaccess/* Allow: /entirefoldertoNOTaccess/butgrantaccesstofile.js You would only use Allow directive when certain parts of are disallowed, and yet, you need to allow access to some individual parts within the disallowed parts.

Where is robots text File Located?

For most self installed websites, the location of the file is the web server root (home) directory (usually within public_html). That means if a website is installed inside a subdomain, you do not place this file in subdomain or subfolders. It has to reside at root (home) directory.

I Can’t Find My ROBOTS.TXT File on My Web Server

Keep in mind that some Content Management Systems (e.g. Blogger, WiX) generate this file automatically (as in virtually). That means, you won’t find the actual file. Furthermore, some plugins may also generate this file automatically. That means, your web server home directory may not even have a file called robots.txt file.

How to robots.txt file Blogger

How to Disallow Subdomains

Because robots.txt file must reside inside the root (home) directory. And because web crawlers don’t check for robots.txt files in subdirectories. You will need to create robots.txt file NOT in subdomain folder, but you need to create one (SEE first video tutorial) in the root (home) directory of your web hosting server, and then use this format to disallow subdomains: user-agent: * Disallow: /subdomainfoldername/* Notice the * symbol which takes care of everything within a subdomain. Also remember that subdomain is nothing more than subdirectory on a website.

robots.txt file WordPress Example

Below is a sample example you can safely use for all WordPress built sites User-agent: Googlebot Disallow: /cgi-bin/ Disallow: /wp-admin/$ Disallow: /author/admin/ Disallow: /wp-content/cache* Disallow: */trackback/$ Disallow: /comments/feed* Disallow: /wp-login.php?* Disallow: *?comments= Disallow: *?replytocom Disallow: *?s= Allow: /*.js* Allow: /*.css* Allow: /wp-admin/admin-ajax.php Allow: /wp-admin/admin-ajax.php?action=* User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/$ Disallow: /author/admin/ Disallow: /wp-content/cache* Disallow: */trackback/$ Disallow: /comments/feed* Disallow: /wp-login.php?* Disallow: *?comments= Disallow: *?replytocom Disallow: *?s= Allow: /*.js* Allow: /*.css* Allow: /wp-admin/admin-ajax.php Allow: /wp-admin/admin-ajax.php?action=*

robots.txt file WordPress Example with Sitemap URL

User-agent: Googlebot Disallow: /cgi-bin/ Disallow: /wp-admin/$ Disallow: /author/admin/ Disallow: /wp-content/cache* Disallow: */trackback/$ Disallow: /comments/feed* Disallow: /wp-login.php?* Disallow: *?comments= Disallow: *?replytocom Disallow: *?s= Allow: /*.js* Allow: /*.css* Allow: /wp-admin/admin-ajax.php Allow: /wp-admin/admin-ajax.php?action=* User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/$ Disallow: /author/admin/ Disallow: /wp-content/cache* Disallow: */trackback/$ Disallow: /comments/feed* Disallow: /wp-login.php?* Disallow: *?comments= Disallow: *?replytocom Disallow: *?s= Allow: /*.js* Allow: /*.css* Allow: /wp-admin/admin-ajax.php Allow: /wp-admin/admin-ajax.php?action=* #DELETE This Line. Remove #hashtags below #Sitemap: https://CHANGE/page-sitemap.xml #Sitemap: https://CHANGE/post-sitemap.xml

More Advanced Sample robots.txt file for WordPress

Do NOT use the sample below, as its provided to demonstrate complex usage User-agent: Googlebot Disallow: /cgi-bin/ Disallow: /wp-admin/$ Disallow: /author/admin/ Disallow: /wp-content/cache/* Disallow: /wp-includes/*.php$ Disallow: /wp-includes/blocks/*.php$ Disallow: /wp-includes/customize/*.php$ Disallow: /wp-includes/rest-api/*.php$ Disallow: /wp-includes/rest-api/endpoints/*.php$ Disallow: /wp-includes/Requests/*.php$ Disallow: /wp-includes/Requests/*/*.php$ Disallow: /wp-includes/SimplePie/*.php$ Disallow: /wp-includes/SimplePie/*/*.php$ Disallow: /wp-includes/SimplePie/*/*/*.php$ Disallow: /wp-includes/widgets/*.php$ Disallow: */trackback/$ Disallow: /comments/feed* Disallow: /wp-login.php?* Disallow: *?comments= Disallow: *?replytocom Disallow: *?s= Allow: /*.js* Allow: /*.css* Allow: /wp-admin/admin-ajax.php Allow: /wp-admin/admin-ajax.php?action=* Allow: /wp-content/themes/rankya/*.css Allow: /wp-content/themes/rankya/*/*.css Allow: /wp-content/themes/rankya/js/* Allow: /wp-content/themes/rankya/*/*/front* Allow: /wp-includes/js/wp-em*.js Allow: /wp-includes/js/*-reply.min.js Allow: /wp-includes/js/jq*/*.js Allow: /wp-content/uploads/* User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/$ Disallow: /author/admin/ Disallow: /wp-content/cache/* Disallow: */trackback/$ Disallow: /comments/feed* Disallow: /wp-login.php?* Disallow: *?comments= Disallow: *?replytocom Disallow: *?s= Allow: /*.js* Allow: /*.css* Allow: /wp-admin/admin-ajax.php Allow: /wp-admin/admin-ajax.php?action=* Disallow: /2016/ Disallow: /2017/ Allow: /wp-content/themes/rankya/*.css Allow: /wp-content/themes/rankya/*/*.css Allow: /wp-content/themes/rankya/js/* Allow: /wp-content/themes/rankya/*/*/front* Allow: /wp-includes/js/wp-em*.js Allow: /wp-includes/js/*-reply.min.js Allow: /wp-includes/js/jq*/*.js Allow: /wp-content/uploads/* User-agent: Googlebot-image Disallow: Sitemap: https://www.example.com/page-sitemap.xml Sitemap: https://www.example.com/post-sitemap.xml

Learn More About How Google Treats Robots.txt Rules

When you want to control how Google’s website crawlers crawl and index publicly accessible websites, understand the fact that you only use robots.txt file to control crawling as opposed to Google indexing. Google like most good behaving web crawlers obey rules within robots.txt file

How to Test robots.txt Rules in Google Search Console

Search Console robots.txt Tester Tool Simply visit Search Console Robots Testing Tool and select your verified website property. It will then take you to Legacy Tools for testing your robots.txt file directives.

By RankYa

RankYa is a digital services provider dedicated to growing your sales and business website's results. Highly experienced technical problem solver, Google products expert with proven 'Social Media Marketing' skills, RankYa (100% Australian Owned and Operated) is dedicated to helping small businesses to grow.

We're looking forward to contributing towards your online success. Contact Us.

3 comments

  1. Through this blog, I got amazed that this is one of the most important things. we can say Thanks for elaborating in such a great manner.

Questions? Leave a Comment!

Your email address will not be published. Required fields are marked *