Symptom

Pages deindexed in bulk — diagnose the indexation collapse

Sudden mass move of URLs from "Indexed" to "Crawled — currently not indexed" or "Discovered — currently not indexed" in Search Console.

Diagnose your site

Loading bot check… if this doesn't resolve in a few seconds, refresh the page.

We'll highlight findings linked to: thin-content, near-duplicate, boilerplate-ratio.

What you see in Search Console

30% indexation loss in a week, sometimes 80% in a single day, shows up as a stair-step drop in Google Search Console's Page indexing report. The corresponding rise lands in two buckets: "Crawled — currently not indexed" (Google fetched the page and decided it wasn't worth keeping) and "Discovered — currently not indexed" (Google saw the URL in your sitemap but didn't bother to crawl). Traffic does not always drop one-for-one, because the deindexed URLs were often long-tail pages that contributed impressions but few clicks. The leading indicator most operators miss is the Discovered bucket growing first; deindexation follows 2 to 3 weeks later when Google re-evaluates what it already has. Many bulk-deindexation events trace back to the March 5, 2024 scaled-content-abuse update or the May 7, 2024 site-reputation-abuse policy hitting one specific template family.

Likely causes

Quality threshold tripped on a programmatic template
Google maintains a quality budget per host. Once a template crosses an internal threshold for thinness or duplication, it stops indexing new URLs from that pattern and gradually drops existing ones. This is the single most common cause of bulk deindexation on programmatic sites and is invisible from the Coverage report alone — you have to segment by URL pattern.
Canonical conflicts pointing to a single URL
If a template renders rel=canonical pointing to a homepage or category root rather than self, Google de-duplicates and keeps only the canonical. The pages disappear from the index but show up as "Duplicate, Google chose different canonical" in the URL Inspection tool. A single bad template change can collapse thousands of URLs in one crawl cycle.
Soft 404 detection on near-empty pages
Pages that return HTTP 200 but render an empty state — "No results found," "Coming soon," or a category with zero items — get classified as soft 404 and removed from the index. This often happens after a database migration drops content rows or a feature flag hides content paths.
Robots noindex accidentally shipped at template level
A staging-only meta robots noindex tag that survives a deploy is the fastest way to deindex a site. Combined with a CDN edge cache, the noindex header can persist on cached HTML for hours after the underlying source is fixed. Always verify the live response, not just the source code.
Sitemap declaring URLs Google has decided not to crawl
If your sitemap lists URLs that consistently come back as "Discovered — not indexed," Google trusts your sitemap less and crawls fewer URLs from it next cycle. This is a slow-burn cause: each weekly sitemap submission shrinks the indexed footprint until you stop declaring URLs Google has signaled it doesn't want.

Diagnostic steps

  1. 1

    Open Search Console → Page indexing and screenshot the trend chart for both "Indexed" and the largest "Not indexed" buckets — you want a baseline before you start changing things.

  2. 2

    Click into each "Not indexed" reason and export the top fifty URLs. Pattern-match by URL prefix to identify whether the loss is template-wide or scattered.

  3. 3

    Run URL Inspection on five sample URLs from the affected template — the live test will surface canonical conflicts, robots directives, and soft 404 classifications individually.

  4. 4

    Run pseolint on your sitemap with the affected template included — focus on tech/canonical-consistency, tech/canonical-noindex-conflict, tech/soft-404, and tech/robots-noindex-conflict findings first.

  5. 5

    Verify the live HTTP response for affected URLs using curl with a Googlebot user agent — what your CMS renders and what the edge serves can differ when CDN headers override origin.

  6. 6

    Compare your sitemap to your indexed URLs. If your sitemap declares 50,000 URLs and only 8,000 are indexed, prune the sitemap to the indexed set plus URLs you have a credible plan to make indexable.

  7. 7

    If the cause is quality-driven (no technical issue found), pick the top 10% of deindexed URLs by historical clicks, rewrite them with substantive unique content, and resubmit only those — let the rest stay deindexed rather than re-asking Google to consider them.

Rules that detect this symptom

Case study

An e-commerce marketplace lost 47,000 indexed product pages over a 21-day window starting October 4, 2024. The audit traced it to a template change shipped September 30, 2024 that began emitting rel=canonical pointing to the parent category for any product with fewer than 3 reviews — meant as a quality signal, read by Google as "these pages are duplicates." The fix was a one-line template revert. Indexation recovered to 80% within 2 crawl cycles (about 10 days) without any content changes, and recovered roughly $186,000 of monthly product-discovery revenue within 45 days of the fix.

Frequently asked questions

How fast does deindexation reverse once I fix the cause?

For technical causes (canonical, robots, soft 404), recovery starts within the next crawl cycle — typically 2 to 7 days for high-priority hosts. For quality causes, recovery is gated by Google's willingness to re-crawl and re-evaluate, which can take 4 to 12 weeks.

Should I use the Indexing API to push URLs back in?

Only if your site is in a category Google explicitly supports for the Indexing API (job postings, livestreams). Using it for other content types signals manipulation and does not reliably index pages. URL Inspection's "Request indexing" works for one-off cases but is not a bulk recovery tool.

My sitemap shows fewer URLs than I have. Should I declare all of them?

No. Declaring URLs Google has already decided not to index trains the algorithm to trust your sitemap less. Trim the sitemap to URLs that are actually indexed plus a small buffer of new URLs you genuinely want crawled. A 95% indexed-vs-declared ratio is a strong signal; a 20% ratio is a red flag.

Does the Discovered — not indexed bucket count as a penalty?

It is not a manual action and is not a penalty in the formal sense, but it is a quality signal: Google saw the URL, evaluated the cost-benefit of crawling it, and chose not to. Treat it as a vote against that URL pattern's perceived value to users.

Can a CDN cause bulk deindexation by itself?

Yes — most often through edge-cached noindex headers, edge-cached 5xx errors during incidents, or edge-level redirects creating canonical loops. Always test the live edge response with the URL Inspection tool's "Test live URL" rather than trusting your origin's response.

What recovery looks like

Technical-cause recoveries land within 1 to 3 crawl cycles — typically 7 days for high-traffic sites, 21 days for low-traffic sites — once Google re-fetches affected URLs and confirms the issue is gone. Quality-cause recoveries require both your fixes shipping and Google's quality re-evaluation, which is gated to the roughly 75-day cadence of core updates. Expect a partial re-indexation within 30 days if you've trimmed the sitemap aggressively, and full recovery (or a stable new equilibrium) within 90 days. Watch the Discovered-to-Indexed conversion rate in Search Console or Ahrefs Site Audit as your leading indicator: when it climbs above 80%, your sitemap is back in good standing.

Stop guessing. See the findings on your domain.

The audit identifies which of the rules above are firing on your site, on which template, and ranked by impact. No signup for the first run.

Run a SpamBrain check

Other symptoms