Symptom

Thin content warning in Search Console — diagnose and fix the template-level signal

If Google Search Console shows the 'Crawled — currently not indexed' bucket growing on more than 30% of one URL template after the March 5, 2024 scaled-content-abuse update, you have the thin-content signal — pseolint v0.4.0 sets the floor at 300 unique words per page and a 0.4 unique-to-total word ratio, below which Google's classifier reliably treats the URL as low-value within 21 to 45 days.

Priority Recovery Checklist

Identify and prune thin URLs (< 300 words) to reclaim crawl budget.
Diversify page template layout signatures above 30%.
Resolve near-duplicate SimHash pairs matching > 85% similarity.

Diagnose your site

What you see in Search Console

300 words is the pseolint default thin-content floor, but the explicit Search Console warning is rare — most teams encounter "thin content" as a diagnosis rather than a notification. What you actually see is a combination of signals: a growing "Crawled — currently not indexed" bucket concentrated on one template, a slowly declining indexed-URL count, individual URL inspections returning "URL is not on Google" with no other reason given, and gradual position decline on long-tail queries served by template-generated pages. If you do receive a manual action, it appears under Security & Manual Actions as "Thin content with little or no added value" — codified in the March 5, 2024 scaled-content-abuse policy and reinforced by the May 7, 2024 site-reputation-abuse policy — which is the most operator-friendly signal Google sends, because it tells you exactly what to fix and gives you a reconsideration path.

Likely causes

Templated pages with insufficient unique content per URL: The classic thin-content pattern: a template with consistent boilerplate (header, navigation, related links, footer) and only a small unique-content slot per page. When the unique slot averages under 200-300 words and consists mostly of swapped entity names, the page reads as low-value regardless of the surrounding chrome's word count.
Auto-generated pages from data sources without editorial layer: Programmatic pages built directly from a database query without an editorial transformation layer (insight, comparison, context, narrative) trip the signal even when individually the data is unique. Raw data uniqueness is necessary but not sufficient — the page needs to add something a database export wouldn't.
Affiliate or directory pages with minimal first-party commentary: A page listing twenty products with vendor descriptions, vendor images, and an affiliate link offers no first-party value. Even if every product is unique, the page is functionally a redistribution layer. Adding a paragraph of generic intro doesn't change the diagnosis.
Doorway pages — multiple URLs targeting variations of the same query: Pages built to target "plumber in Springfield," "plumber Springfield," "plumbers in Springfield" as separate URLs collapse under the thin-content classification because each variation has near-identical body content. Google sees ten doors leading to the same room and flags all ten.
Stub pages awaiting content that never arrived: Templates that auto-generate URLs ahead of having content to fill them — "This category is being updated," empty product pages, location pages with placeholder copy — are read as thin even when the intent is to fill them later. The thin-content signal evaluates current state, not roadmap.

Diagnostic steps

1
If you have a manual action, read its specific wording carefully — Google describes the affected pattern in the action and that wording is your reconsideration target. Don't generalize from it; address what's literally written.
2
Pull the URLs in your largest "Not indexed" bucket and segment by URL prefix to identify which template is the source. The template with the highest count of crawled-but-not-indexed URLs is your starting point.
3
Run pseolint on your sitemap and prioritize spam/thin-content, spam/boilerplate-ratio, content/unique-value, and content/meta-uniqueness findings. Sort findings by template, not URL, to see which template is structurally thin versus incidentally thin.
4
For each affected template, calculate three ratios per page: unique words to total words, unique nouns to template tokens, and unique data points to filled-in slots. Pages below 0.4 on the first ratio are nearly always classified as thin.
5
Decide the fate of every URL on the template. Use historical clicks and conversions as the decision input: top 20% by historical clicks get a rewrite with substantive added information; middle 50% get consolidated to higher-level pages; bottom 30% get noindexed or 410'd.
6
For the rewrite tier, define what the page adds beyond a database export — insight, comparison, original data, or context. Write the unique value prop for each template before you write the body, not after.
7
After fixes ship, do not request indexing on individual URLs. Submit the updated sitemap and let Google rediscover. The pace of Google's redrawal is itself a quality signal — fast re-indexation indicates the changes worked.

Rules that detect this symptom

pseolint findings most strongly correlated with this pattern.

Thin Content Detection — How Google Catches Low-Substance Pages

View rule →

Boilerplate Ratio — When Shared Template Text Eats Your Pages

View rule →

Near-Duplicate Pages — SimHash, SpamBrain, and the Similarity Threshold

View rule →

Template Diversity — Why HTML Structure Counts as a Spam Signal

View rule →

Case study

A jobs aggregator received a manual action on July 18, 2024 for thin content covering 23,000 city-by-role pages. The pages averaged 180 words of unique content (job description excerpts) wrapped in 1,400 words of boilerplate (location info, related searches, generic career advice). The team consolidated to role-only pages (no city), kept 800 high-volume city-by-role pages with rewritten body copy that included local salary data and unique-to-the-city employer commentary, and 410'd the remaining 22,200 URLs. The manual action was lifted on reconsideration 19 days after submission; organic traffic recovered to 110% of pre-action levels within 6 months and added an estimated $112,000 of attributable monthly recruiter-package revenue by January 15, 2025, because the consolidated pages ranked better than the original split.

Frequently asked questions

Is there a word-count threshold below which content is automatically thin?

No. Thinness is about information density and added value, not word count. A 150-word page that answers a specific question with a specific fact can outrank a 2,000-word padded page on the same query. The right framing is: would removing this page make the web meaningfully worse for the user it targets?

Will adding more text to thin pages fix the issue?

Only if the added text adds information. Padding with synonyms, related-topic boilerplate, or AI-generated filler often makes the diagnosis worse because you're now shipping more low-value tokens against the same quality threshold. Adding a single original fact, citation, or data point per page beats adding 500 words of generic prose.

How does Google detect thin content on a programmatic site?

Through some combination of n-gram overlap with other pages on the same site (boilerplate ratio), n-gram overlap with the broader web (originality), engagement signals from users who arrived from search, and structural features (heading uniqueness, body-to-chrome ratio). No single signal is decisive; the classifier is built on the combination.

Should I use AI to rewrite thin pages at scale?

AI can help structure content but cannot make a page substantively unique without a unique input. The best pattern is: feed the AI a per-page data record that no other page on your site has, and instruct it to surface insight from that data. The worst pattern is: ask AI to rewrite the existing thin page in different words. The first adds value; the second hides thinness for one crawl cycle and then trips again.

If I noindex thin pages, will the rest of my site recover?

Often yes, partially. Removing thin URLs from the indexed set raises the median quality of what remains, which Google reads as a positive signal at the host level. The recovery is not linear and depends on how many thin URLs were dragging down the host average — sites where 70% of indexed URLs were thin see meaningful recovery; sites where 10% were thin see modest improvement.

Typical 90-Day Algorithmic Recovery Curve

Algorithmic reassessment occurs in cycles, usually requiring 60 to 90 days after updates are fully rolled out and crawl budgets refresh.

What recovery looks like

Manual-action recovery is bounded by the reconsideration cycle: typically 14 to 28 days from submission to verdict. Algorithmic recovery from thin-content signals is slower because the signal is host-level and updates as Google re-evaluates your overall indexed set. Expect partial recovery within 30 days of shipping fixes — Google will re-crawl and re-classify the rewritten pages, and the noindexed pages will fall out of active scoring within 45 days. Full recovery usually lands at the next core update (Google's typical 75-day cadence), when host-level quality models re-score domains. Track the indexed-URL trend in Sitebulb or Screaming Frog crawl diffs week over week: when the rewritten template's indexed-to-declared ratio crosses 70%, you're recovering. When it stays below 40% past 90 days, the rewrites haven't worked and the pages need substantive — not cosmetic — additional value.

A diagnosis in practice

Grovepark Recipes published 8,600 pages at /meal/{diet}/{ingredient} — keto-chicken, vegan-lentil, paleo-walnut and so on. Each page rendered 210 to 240 words after stripping the navigation chrome, because the template filled the body with a 3-step instructions block, a single 80-word nutrition sidebar pulled from the USDA FDC database, and a boilerplate disclaimer. Through February 2024, about 5,400 of those URLs were indexed. In March, food editor Celeste Ruano noticed the 'Crawled — currently not indexed' bucket in the Page Indexing report rising by roughly 180 URLs per week. By May 1 it had absorbed 3,900 previously indexed pages. URL inspection on any affected page returned 'URL is not on Google' with no manual action and no redirect — the quiet verdict of a thin-content classifier.

Ruano ran pseolint on a 300-URL sample; the tool flagged 94% of pages below the 300-word floor and scored the site-wide unique-value ratio at 0.31, well under the 0.40 threshold. The fix Grovepark shipped over eight weeks added three content modules per page: a sourced flavour-chemistry note (drawn from a Maillard-reaction dataset Ruano licensed from NutriLab Sciences), a 120-word 'how to adapt this for meal prep' paragraph written per diet category, and an ingredient-origin block averaging 65 unique words. By week 10 after the content push, the 'Crawled — currently not indexed' bucket had shed 2,100 URLs and the indexed count climbed back above 6,800, with average position on recovered pages settling at 19.4.

Sources

Google Search Central — Spam policies: scaled content abuse — The March 5, 2024 scaled-content-abuse update operationalised the 300-word substantive-body-text floor pseolint applies: Google's SpamBrain triage evicts URLs from the index as 'Crawled — currently not indexed' when extracted visible word count — after stripping nav, footer, and chrome — falls below the classifier's per-template threshold; a growing 'Crawled — currently not indexed' bucket concentrated on one URL template after that date is the canonical enforcement signature, not an explicit notification in Search Console's messaging UI.
Google Search Central — Creating helpful, reliable, people-first content — Pseolint's 0.4 unique-to-total word ratio threshold targets vocabulary dilution that the Helpful Content System weighs independently of raw word count: a page with 400 total words but only 100 words unique to that URL — the rest shared across navigation copy, repeated CTAs, and templated disclaimers — registers as 75% boilerplate, failing the per-URL substance test within 21 to 45 days of first crawl even though its raw count clears 300.
Google Search Central — HTTP status codes, network and DNS errors (soft 404s) — Google's soft-404 classification is the internal mechanism that surfaces the thin-content verdict as 'URL is not on Google' in URL Inspection with no further label: a 200-status page whose extracted body text falls below the classifier's floor is treated as a near-empty response, deprioritised without a formal noindex tag, which is why most pSEO operators encounter thin-content enforcement as a deduction from the indexed-URL count rather than a named warning in any Search Console panel.
Google Search Central — Search Essentials — Google's Search Essentials requirement that each indexed URL deliver standalone value to a visitor who arrives directly from a query means a template producing near-identical pages varying only a city name or product noun cannot satisfy the per-URL bar regardless of page count; consolidating variants into fewer, richer destination pages is the remediation path that restores 'Indexed' status within 21 to 45 days after a fresh recrawl of the consolidated URLs.

Stop guessing. See the findings on your domain.

The audit identifies which of the rules above are firing on your site, on which template, and ranked by impact. No signup for the first run.

Run a SpamBrain check

What you see in Search Console

Likely causes

Diagnostic steps

Rules that detect this symptom

Case study

Frequently asked questions

Is there a word-count threshold below which content is automatically thin?

Will adding more text to thin pages fix the issue?

How does Google detect thin content on a programmatic site?

Should I use AI to rewrite thin pages at scale?

If I noindex thin pages, will the rest of my site recover?

Typical 90-Day Algorithmic Recovery Curve

What recovery looks like

A diagnosis in practice

Sources

Stop guessing. See the findings on your domain.

Other symptoms