How to Fix Blocked by Robots.txt

Author
Category
Time to read
0 minutes
Date

Introduction

How to Fix Blocked by Robots.txt

Robots.txt is a file that website owners use to tell search engines which parts of their site they don’t want to be crawled. It helps guide Googlebot on where they’re allowed to go. The robots.txt file is placed at the root of a website (www.your-website.com/robots.txt), and it can contain rules that block crawlers from specific pages, folders, or file types.

It is useful for keeping sensitive information private, managing server load, or controlling what gets indexed by search engines. In short, it’s a way to set boundaries for web crawlers and keep certain parts of a site hidden from search engines.

Effective management of the robots.txt file helps in SEO by preventing search engines from wasting crawl budgets on unimportant pages and ensuring that valuable content is crawled and indexed efficiently. This, in turn, can improve the website’s visibility and ranking in search engine results.

In the robots.txt file above, you can see that:

  • Some urls are not allowed to be crawled (e.g. /job/).  
  • Some bots are forbidden to crawl the page (e.g. LinkedInBot)

Resolving Blocked by Robots.txt

If your robots.txt file contains directives stopping Googlebot from accessing your site, you’ll need to remove or adjust them to allow proper crawling.

To resolve the “Indexed, though blocked by robots.txt” status in Google Search Console, follow these steps:

  1. Examine Affected URLs:

    • On Google Search Console, you can find which urls were blocked from indexing by robots.txt.  You can also find your robots.txt on your website (www.your-website.com/robots.txt).
    • You need to decide if the URLs in the list should be indexed. Determine whether they contain content valuable to your visitors.
  2. Fixing the Robots.txt File:

    • If a page was accidentally disallowed in robots.txt, you need to remove or correct the instructions in your robots.txt causing the blockage.
    • After removing the Disallow directive, Googlebot will start to crawl the page again in the next weeks.

If your site is on WordPress, SEO plugins like Yoast SEO or Rank Math offer built-in tools in their dashboard for editing the robots.txt file. Remember to validate the fix in Google Search Console after editing the file to ensure changes are recognized.

  • Accidental Blocking: Remove or adjust overly broad Disallow directives that may be blocking content you want indexed.
    • Example:
      • Before: Disallow: /folder/
      • After: Allow: /folder/page.html
  • Intentional Blocking: Confirm the blocks align with the sections of your site you wish to keep away from crawlers. Use meta tags to exclude specific pages if necessary.

Identifying blocked by robots.txt error in Google Search Console

To begin, use Google Search Console to identify which URL is affected. This tool offers an ‘Index Coverage Report’ that reveals details about indexed, though blocked by robots.txt warnings and crawl errors. If a URL is labeled as ‘Submitted URL blocked by robots.txt’, this indicates a disallow directive is preventing Google’s crawlers from indexing the page. Utilizing the robots.txt Tester within the dashboard, you can highlight errors and warnings that could lead to misconfigurations of the robots.txt file.

  1. Sign in to your Google Search Console and navigate to the “Pages” option under the “Indexing” section on the left sidebar
  2. This brings you to the Page Indexing Report, where you’ll find an overview of your site’s indexing status
  3. Scroll through the report to view the list of indexing issues Google has detected Note that this list varies for each site, as different websites have different indexing issues
  4. Look for entries with the error message “Blocked by robots.txt”
  5. If you don’t see this, your website likely isn’t experiencing this problem
  6. If you do see this, you’ll want to address the errors every pages labeled as aforementioned except the admin page.

You can also use Google’s robots.txt tester to scan your robots.txt file for syntax warnings or other issues. Here’s how:

  1. At the bottom of the tester page, enter a specific URL to check if it’s blocked by the robots.txt file.
  2. Select a user agent from the dropdown menu to test against.
  3. Click “Test” to run the check.
  4. Alternatively, you can manually check your robots.txt file by visiting yourdomain.com/robots.txt in your browser to see its contents.

Difference between robots.txt and noindex tag

Don’t confuse robots.txt with the “noindex” tag; they serve different purposes;

  • Robots.txt: The robots.txt tells the Search Engine crawler to NOT visit a certain page. This file is used to tell web crawlers, like Google bot, which parts of your site should not be crawled. It doesn’t prevent pages from being indexed if they’re already known to search engines through other means, like external links.
  • Noindex:  The noindex tag does NOT forbid a crawler to visit a page but it explicitly tells search engines not to show that page in the search results. If a page is crawled but has a “noindex” tag, it won’t appear in search engine results.

In summary, robots.txt manages crawling, while “noindex” controls indexing. To hide content from search engines, use “noindex.” To stop crawlers from accessing certain pages or directories, use robots.txt.

When should I use robots.txt or noindex?

If you want to remove pages from the google search, you should first noindex them so that google knows that these pages should be removed from any index. Once the pages are removed from the google index, you can block the crawler to ever visit them again.

If you want some pages to be removed but block the crawler from visiting the page, the pages might still show up in the search results for a long time. Crawlers are now unable to visit the page however the last time they visited the page it was allowed to be indexed and thats what the crawler remembers. You might even cause “Indexed, though blocked by robots.txt”.

    • If you want a page to be deindexed, use the noindex tag instead of robots.txt. The noindex tag ensures Google won’t show the page in search results, even if it’s crawled.
    • Make sure to update your robots.txt file to reflect the desired crawling rules.

These steps should help you resolve the “Indexed, though blocked by robots.txt” issue and ensure your site’s content is properly indexed or deindexed, depending on your needs. For more detailed instructions, refer to Google’s guide on implementing the noindex tag.

Frequently Asked Questions

Navigating the complexities of robots.txt can often lead to questions about proper configuration and troubleshooting. This section aims to provide clarity on frequently asked questions regarding the handling of robots.txt files.

Q: What does it mean to be “Blocked by Robots.txt”?

A: Being “blocked by robots.txt” means that search engine robots are prevented from indexing certain pages of a website due to directives in the robots.txt file.

Q: How to troubleshoot Indexed though blocked by robots.txt?

A: If pages are indexed but blocked by robots.txt, ensure your robots.txt is set correctly and add noindex to the page meta tags to prevent indexing. For already indexed pages, use Google Search Console’s “Removals” tool to request deindexing. Double-check your sitemap to ensure it doesn’t include pages meant to be blocked by robots.txt. If you need more help, consult with a technical expert.

Q: How do I know if my site is blocked by robots.txt?

A: You can check if your site is blocked by robots.txt by searching for “blocked by robots.txt” in Google Search Console. If there is an issue, it will be highlighted in the Coverage Report.

Q: How can I fix the issue of being blocked by robots.txt in Google Search Console?

A: To fix the issue of being blocked by robots.txt, you need to edit your robots.txt file to allow search engine robots to index the pages you want to have indexed.

Q: Why is it important to fix the “Blocked by Robots.txt” error?

A: Fixing the “Blocked by Robots.txt” error is crucial for SEO as it ensures that your important pages are being indexed by search engines, leading to better visibility in search results.

Q: Can a WordPress site be affected by the “Blocked by Robots.txt” issue?

A: Yes, a WordPress site can be affected by the “Blocked by Robots.txt” issue if there are errors in the robots.txt file that prevent indexing of important pages.

Q: How does Yoast or other SEO plugins help in fixing the “Blocked by Robots.txt” issue?

A: Yoast or other SEO plugins can assist in fixing the “Blocked by Robots.txt” issue by providing tools to easily edit and update the robots.txt file within the WordPress dashboard.

Q: What is the purpose of a robots.txt file?

A: A robots.txt file is used to guide web crawlers like Googlebot on which parts of your website they should or shouldn’t access. By setting rules in this file, you can control which pages are indexed and shown in Google search results. Proper use of robots.txt can help you manage your site’s visibility to search engines and protect sensitive areas from being crawled.

Q: How do I fix the rules in my robots.txt file?

A: To fix the rules, you need to update your robots.txt file, ensure your Disallow and Allow directives are correctly set for the user-agents you intend to manage. To control web crawlers, use a robots.txt file to guide their behavior. Correct any syntax errors, adjust configurations, and test changes for accuracy. If unsure, consult an SEO expert to ensure you’re using a robots.txt file properly

Q: What happens if I misconfigure my robots.txt file?

A: Misconfiguring a robots.txt file can lead to unintended consequences. If you mistakenly block important pages, they won’t be indexed by search engines, affecting your site’s SEO. Conversely, if you forget to block sensitive pages, they might appear in search results. To avoid errors, review your robots.txt file regularly and use Google Search Console to check for crawling issues.

Q: How can I block a specific URL using robots.txt?

A: To block a specific URL using robots.txt, you can use the Disallow directive followed by the URL path you want to block in Google Search Console.

Q: How do I create a robots.txt file in Shopify?

A: Shopify automatically generates a default robots.txt file for your store. If you need to customize it, you can do so using the “robots.txt.liquid” template in your Shopify theme. This allows you to add or change Disallow directives, User-agent rules, and other configurations.

Q: What are common robots.txt errors that can lead to being blocked?

A: Common ‘blocked by robots.txt’ errors that can lead to being blocked include incorrect directives, syntax errors, or blocking all search engine crawlers unintentionally.

Having website indexing issues?

Check out our blogs on the most common indexing issues and how to fix them. Fix your page indexing issues

Looking for an SEO Consultant?

Find the best SEO Consultant in Singapore (and worldwide). Best SEO Consultant

Is this you?

💸 You have been spending thousands of dollars on buying backlinks in the last months. Your rankings are only growing slowly.


❌You have been writing more and more blog posts, but traffic is not really growing.


😱You are stuck. Something is wrong with your website, but you don`t know what.



Let the SEO Copilot give you the clicks you deserve.