What are page indexing issues

Author
Stephan
Category
Time to read
0 minutes
Date

Introduction

Page indexing issues occur when search engine crawlers have difficulty crawling a website’s pages or cannot access them. With the recent rise in AI content generation, website publishers who generate AI articles in bulk (hundreds of articles per day or week) also face indexing issues. For the latter case, this is a common issue for low-quality content (thin content). If Google already has a lot of answers or quality content for a given search result, why would they show your article?

Indexing issues can lead to web pages not being indexed and consequently not appearing in Google’s search engine results pages (SERPs). A page that is not indexed, will not generate any organic traffic or engage potential customers.

Not all indexing issues are problematic though. We have created an overview of the most common errors. Some of the errors shown below are not an issue and some are not very harmful.

A brief summary of the results of the index coverage report:

Crawled – currently not indexed: Not a systematic problem. The page’s quality is too poor to show up in the search results. This is not a structural technical issue for your whole website. If you want it to appear in the search results, you need to improve the content.

Discovered – currently not indexed: Google is aware that the page exists but has not crawled it yet. If you want it to appear in the search results, you need to improve the content.

Alternate Page with proper canonical tag: Not a problem. Likely, this is even a good thing. Google has chosen the right version among all canonicalized versions.

Duplicate with user-selected canonical: This is not a serious issue, but you should declare the canonical. If you do not do this, you run the risk that Google chooses the “wrong” version of the page and users see the “wrong” version.

Blocked by robots.txt: If you did this on purpose (check your robots.txt): this is not a problem and does not require your attention (unless there are hundreds of pages affected).

Blocked due to access forbidden (403): This happens often for protected pages (e.g. pages only logged-in members can see). Not a huge problem, but you can consider to remove the pages from the sitemap and no-index them.

Excluded by ‘noindex’ tag: If you did this on purpose, this is not a problem and does not require your attention.

Soft 404: Soft 404 means that google crawled the page and found text along the lines of ‘This page does not exist’. If this happens often, you should fix the error, but it is not critical. 

Not found (404): This is a problem and should be fixed. If you have ten 404 errors and 100 pages in total, this is a serious issue. If you have 10,000 pages and five 404 errors exist, you do not need to worry as much.

In the following, we will discuss all possible issues and their severity in detail.

Crawled – currently not indexed

This means that Google knows that the page exists, but it chooses to not show it in the search engine.

Why would Google do that? Often the page is very low quality.

Low quality means that the page has very few words and does not bring any value to a potential user. This is also a common occurrence with large (e-commerce) stores that have very low authority (DR<20) and thousands of low-quality pages. Google will not show them in the search results.

This can also happen if other people have written great articles and your article is not good enough.  If there are 100 good articles on a topic and your article is missing a lot of the main points, why would a search engine show your article?

Orphan pages often face these issues. An orphan page is a page no other page on your website links to. Search engines can find it through the sitemap but as long as no other page on your domain links to it, it’s likely not considered to be very important.

What can you do?

1.) Make the page better (=add more value for the user). Add more text and internal links to the page.

2.) If this only affects a few pages (~10), you can submit the pages one by one to Google Search console. Should you need to index a large number of pages, you can use the seo copilot.

Verdict: Not a problem. The page’s quality is too poor to show up in the search results. This is not a systematic issue of your whole website.

Alternate Page with proper canonical tag

The page was not indexed because a different version of this page was selected as the main version. For example, some e-commerce websites have a main and a mobile version of a page. Canonicals are used to avoid duplicate content and select the main version among many versions.

www.myecommercestore.com/phones/new-phone (self-canonical)

www.myecommercestore.com/phones/mobile/new-phone (canonicalised to first)

Verdict: Not a problem. Likely, this is even a good thing. Google has chosen the right version among all canonicalised versions.

Duplicate without user-selected canonical

Google detected duplicate pages that do not have canonicals point the search engine to the “main” version of the page.

Verdict: This is not a serious issue, but you should declare the canonical. If you do not do this, you run the risk that google chooses the “wrong” version of the page and users see the “wrong” version.

Blocked by robots.txt

Google tried to crawl this page but it was excluded by the robots.txt. Assumingly, you have modified the robots.txt yourself and blocked the crawler to index the given pages. It’s not surprising that the page is not indexed, because it was blocked by the robots.txt

Verdict: This is not a problem and does not require your attention (unless there are hundred of pages affected).

Blocked due to access forbidden (403)

This error typically happens when the search engine crawler tries to access pages (like the login area) of a page.

Verdict: Not a very serious problem, but you should block the indexing in the robots.txt

Excluded by ‘noindex’ tag

Google tried to crawl this page but a noindex tag was found. Assumingly, you have set the noindex tag yourself. Noindex tags indicate to the search engine that a given page should not be shown in the search engine. It’s not surprising that the page is not indexed, because it has the noindex tag.

Verdict: This is not a problem and does not require your attention.

Soft 404

The page was loaded but it said 404 Page not found. While 404 errors indicate that a page doesn’t exist, Soft 404s occur when a non-existent or deleted page returns a 200 OK status code instead of a 404. Both of these issues can lead to indexing problems, wasting crawl budget and decreasing the chances of other valuable pages being indexed.

Verdict: This is an error you should fix, but it is not critical.

Not found (404)

The page could not be loaded because it was not found. This is typically a serious issue. While crawling, the search engine crawler ran into a broken page. You should fix this issue. It might be that the search engine crawler cannot crawl certain pages of your website and in turn you will loose organic traffic.

The root cause can be errors on your website or pages that do not exist anymore. You can either fix the error on the page or simply redirect the broken page to another page.

Verdict: This is a problem and should be fixed. If you have ten 404 errors and 100 pages in total, this is a serious issue. If you have 10,000 pages and five 404 errors exist, you do not need to worry as much.

Common Page Indexing Issues

Page indexing is essential for a website’s visibility in search results. However, various issues can hinder the indexing process, affecting a website’s online presence. This section discusses some of the most common page indexing issues and how to address them.

One common indexing issue is when a site’s URLs are blocked by the robots.txt file. This file tells search engine crawlers which parts of a website to access or avoid. When a page is mistakenly blocked by robots.txt, it prevents crawlers like Googlebot from accessing that page, leading to indexing issues. Website owners should thoroughly review their robots.txt file to ensure that no essential pages are blocked.

Another issue may arise from the Google Search Console. The platform’s various reports aid in identifying and rectifying indexing problems. Making sure that the sitemap is active and regularly updated ensures that Google crawlers can access and index important pages without any hiccups.

Duplicate content poses another challenge in the indexing process. Search engines find it difficult to determine which of the duplicate versions to index and rank. Therefore, website owners should remove or merge duplicate content to avoid confusion and improve the site’s indexing.

Crawlers have a limited crawl budget, which refers to the number of pages a search engine crawler can access and index in a particular timeframe. To optimize your site’s crawl budget, focus on fixing broken links, avoiding 404 errors, and prioritizing important content. Otherwise, the crawler may waste valuable time on irrelevant pages, resulting in indexing issues.

404 errors and Soft 404s can impact indexation. While 404 errors indicate that a page doesn’t exist, Soft 404s occur when a non-existent or deleted page returns a 200 OK status code instead of a 404. Both of these issues can lead to indexing problems, wasting crawl budget and decreasing the chances of other valuable pages being indexed. Identifying and fixing these errors helps improve indexing.

It’s essential to ensure that pages don’t include the noindex tag unintentionally. This meta tag instructs search engine crawlers not to index a page. Accidentally including this tag on important pages can lead to indexing issues.

Server errors (5xx) like 500 Internal Server Error or 503 Service Unavailable can also prevent crawlers from accessing and indexing your pages. Diligently monitor your website’s server logs and fix any errors that could impact indexing.

Finally, access or authorization issues such as Access Forbidden (403) or Unauthorized Request (401) could lead to indexing problems when crawlers can’t access specific pages. Rectifying these issues by adjusting restricted content access helps improve indexing.

In summary, addressing issues like blocked URLs in robots.txt, sitemap errors, duplicate content, crawl budget optimization, fixing 404s and Soft 404s, removing noindex tags, resolving server errors, and correcting access restrictions can all contribute to a better indexing experience. By staying vigilant and proactive in solving these common problems, website owners can significantly improve their search engine presence.

The Importance of Quality Content

Quality content is essential in today’s digital landscape, as it plays a significant role in attracting traffic from Google and enhancing user engagement. When a website consistently provides informative, relevant, and well-written content, it not only improves its chances of being indexed by search engines but also creates a positive user experience, leading to higher conversions.

One of the key ranking factors in Google’s search algorithm is the quality of content present on a website. High-quality content is more likely to be indexed and ranked higher in search results, which in turn, drives organic traffic to the site. This underscores the need for website owners to invest time and resources in developing valuable content that serves the needs of their target audience.

In addition to drawing traffic, quality content plays a crucial role in maximizing conversions. Engaging content captures the attention of visitors, encouraging them to explore the website further and ultimately become loyal customers or clients. A well-structured website with clear and concise information demonstrates expertise in the subject matter and builds trust among users, increasing the likelihood of conversion.

To produce quality content, focus on addressing the needs and concerns of your target audience, presenting information in a clear and easily digestible manner, and avoiding misleading or false claims. This approach will not only improve your website’s chances of being indexed but also contribute to a positive user experience, ultimately driving success in terms of both traffic and conversions.

Identifying and Fixing 404 Errors

One common page indexing issue is the occurrence of 404 errors. These errors happen when a user tries to access a web page that doesn’t exist on the server. There are two main types of 404 errors: hard 404s and soft 404s.

Hard 404 errors occur when a non-existent page returns a “not found” message to users and a 404 status to search engines. On the other hand, soft 404 errors happen when a non-existent page displays a “not found” message to users but returns a 200 OK status to search engines1.

To identify and fix 404 errors, you can use Google Search Console. This tool allows you to see which pages of your website have indexing issues, including 404 errors2. First, verify your website with Google Search Console to gain access to the page indexing report. Here, you can check for any hard or soft 404 errors affecting your site.

When you find 404 errors using the console, it’s essential to address them to prevent negative impacts on user experience and search engine ranking. For hard 404 errors, you can either create the missing page, set up a redirect to a relevant existing page, or remove any links pointing to the non-existent page.

Fixing soft 404 errors usually involves ensuring that the server returns the correct 404 status code to search engines when a page is not found. This can be achieved by checking the server settings, removing incorrect redirects, or updating the headers sent by the server.

In conclusion, identifying and fixing 404 errors is crucial for maintaining a well-optimized website and ensuring a positive user experience. By using Google Search Console, you can efficiently detect and address hard and soft 404 errors, improving your site’s page indexing and overall performance.

Footnotes

  1. https://neilpatel.com/blog/fix-404-errors/ 
  2. https://support.google.com/webmasters/answer/7440203?hl=en 

The Role of Robots.txt in Page Indexing

Robots.txt is a crucial element in controlling the indexing of your website by search engine crawlers. A robots.txt file tells crawlers which URLs on your site they can access, and it helps prevent overloading your site with requests. However, improper use of robots.txt can lead to indexing issues, so it’s important to understand how it functions and how to manage it properly.

One of the common problems involving robots.txt is when a URL is inadvertently blocked by robots.txt. Sometimes, a page may get mistakenly blocked due to misconfiguration of the robots.txt file. To investigate this issue, you can use the Google Search Console to check the status of a page’s indexing. If it says “Blocked by robots.txt,” you’ll need to fix the issue to ensure proper indexing of your content.

To properly manage the robots.txt file, it’s important to be familiar with its structure and follow certain guidelines. A robots.txt file should be a text file named “robots.txt” (case sensitive) and located in the top-level directory of your canonical domain. Additionally, it must include the specific directives for crawlers like User-agent, Allow, and Disallow.

In the context of page indexing, a common reason for pages being blocked by robots.txt is the presence of a “Disallow” directive for that URL. To resolve this, you can either remove the Disallow rule or modify it to allow the crawler access to the specific URL. Always ensure that your robots.txt file is properly configured to avoid accidentally blocking essential content from being indexed by search engines.

In summary, robots.txt plays a significant role in managing the indexing of your website’s pages. By carefully configuring the file and monitoring its directives, you can avoid accidentally blocking URLs from search engine crawlers and help ensure a more efficient and targeted indexing process.

Overcoming Redirect and Server Errors

When dealing with page indexing issues, it’s crucial to address redirect errors and server errors (5xx) as they can impede Google’s ability to crawl and index your website. By addressing these issues, you’ll ensure that your site’s content is properly indexed and becomes more accessible to users.

Redirect errors occur when a user or search engine crawler tries to access a URL, but it’s being redirected to a different URL. These issues can happen for various reasons, such as 301 redirects, which are meant to be permanent. Ensuring your redirects are set up correctly and resolve to the desired target URL can help you avoid this type of indexing problem. If you’re unsure about your redirects’ configuration or notice errors in Google Search Console, consider using tools that can help you identify and fix redirect-related issues.

Server errors, typically classified as 5xx errors, occur when a server fails to process a request, which can be a significant hindrance to page indexing. Addressing server errors usually involves checking your website’s server and fixing the underlying issues causing the errors. You can monitor server errors by leveraging the Page Indexing report in Google Search Console, which can help you spot these issues early on and act promptly to resolve them.

In some cases, server errors might be related to your website’s hosting service or infrastructure. Engaging with your hosting provider or web developer to address the underlying technical issues can be an efficient way to resolve server errors and improve your website’s indexing status.

By following these guidelines to address redirect errors and server errors, you’ll be well on your way to overcoming common page indexing challenges and improving your website’s visibility on search engines.

Is this you?

💸 You have been spending thousands of dollars on buying backlinks in the last months. Your rankings are only growing slowly.


❌You have been writing more and more blog posts, but traffic is not really growing.


😱You are stuck. Something is wrong with your website, but you don`t know what.



Let the SEO Copilot give you the clicks you deserve.