Robots.txt File Complete Guide | what is a robots.txt file?

Admin
0

Today we are going to discuss a small but quite powerful file within your website, known as the robots.txt file. Now, it’s an important file when it comes to technical SEO, and we’re going to explore what the file does, how it works, and the implications it has on your SEO.

What is Robot.txt File?

First of all, what is the robots.txt file and where can you find it? Well, the Robots.txt file is a small file held on all websites, that instruct Google and other crawlers on how to handle the URLs and sections of your website. If you go to pretty much all websites out there. You can try this now if you like, go to the base URL, and then do a forward slash, and then type in robots.txt. You’ll be taken to a plain text page showing a few lines of different descriptions of different things.

what is a robots.txt file

I’m going to break down what those things are and the implications they have on SEO.

Parts in Robot.txt file

First of all, the robots.txt file is simply a file that instructs web crawlers like Google’s crawler which crawls your content in order to index your website, what to do when it hits certain areas of your site.

Now, the options of what to do can come under two categories, “Allow” in terms of letting the crawler go to the area of the website, and index it, and find the content, or “Disallow” where you don’t want the crawler to find specific pages and areas of your content. By using these two options of Allow and Disallow, you can instruct the crawlers like the Google bots or any other crawlers across the Web, to access or not access specific areas of your website. Technically, there are usually two parameters here.

User Agent

The first one will be the user agent. Now the user agent is the name of the crawler. If you just want to describe Google, it will be Googlebot. There are Bingbots as well. There are a number of different crawler types out there for different search engines and different platforms as well, so you have to describe the user agent you want to describe to access or not access areas of your site. Now, if you just put a star here, this character means everything, it means anyone and everyone. You don’t need to define all of the crawlers you know in your head within your robot’s file. Simply putting a star in there will indicate to all crawlers whether or not they can crawl or access different areas of your website.

Disallow

The next part is you add your Disallow line. You want to give the definition of the URL or the section of your website you don’t want people to crawl. After the forward-slash of your website, what are the pages, or subfolders, or sections of your site you don’t want to be crawled by any crawlers. By defining this in the second line of your robots.txt file, you can actually say to different crawlers not to access areas of your website.

The reason why to use the Disallow command

Why would you do this? Well, some areas might pose a security risk. You wouldn’t necessarily want Google to crawl very sensitive data. You might have a platform in the background of your website, maybe it’s a software as a service product, maybe hold a lot of secure details or information, or maybe the area just doesn’t provide value to users of Google. Maybe you don’t want that content to be indexed because it could be harmful to your ranking. There are a number of different reasons you might want to do this, and by using the robots.txt file you can instruct all the crawlers out there to crawl or not crawl different areas of your website.

Complexities in Robot.txt file

Most robots. txt files are pretty simple, and the majority you see will have maybe just a couple of lines in them covering a couple of areas of the website, but some might be a lot more complex as well. Some might actually include time delays as well. Some people might want their crawlers not to crawl a website as quickly and they can put time delays in, to tell the bots or the crawlers not to crawl areas of their website until a time delay has been specified. You can see that here as well, where you can actually go ahead and define a time period you can delay crawlers.

Some tips

  • If you have an XML sitemap, and you should, you should also include a directive to the XML sitemap within your robots.txt file to tell the crawler where to find your sitemap. So it has a good understanding of your content as well. If you can define your sitemap location in your robots.txt file as well.
  • Your robots.txt file should always be uploaded to your root directory. When you go to your browser, type your URL in, and then put forward slash and then robots.txt, all in lower case, then that should provide access to your robots.txt file.

Match making process

Now, with bigger and more complex websites, you might want to include something called pattern matching. I won’t go into detail into this because, generally speaking, a lot of websites don’t need to go in to do this, but what you can do is you can instruct Google, or Bing, or any other bots out there crawling your website via the robots.txt file, to go through different pages based on a set of rules. Now, again, I won’t discuss this in this article, but it’s something to be aware of and something you can learn more about.

Conclusion

Now, you can obviously go ahead and test your robots.txt file as well. To make sure you’re not blocking off good areas of your website that you want Google or other bots to crawl. This can be done in the Google search console, and I’ve gone on to explain this in other articles across my this blog, but it’s really important to make sure that you go ahead and test this file because otherwise, you might deindex your entire website from all crawlers without even realizing. It’s more of a technical issue and technical element. So, it’s important to get it right and make sure that you have the file optimized in the right way.

If you don’t have a robots.txt file doesn’t worry, they’re really easy to create, and as long as you have access to your roots directory within your website, you can upload a file. All you need to do is open up Notepad, write to your file and upload that to your root directory. That’s it. You’ll have it in place, and you’ll be able to instruct crawlers and bots what to do when they get to your website. 

Let me know in the comments how you’re getting on optimizing your robots.txt file or whether you think these technical areas of SEO are more difficult to understand.

Tags

Post a Comment

0Comments

Post a Comment (0)