Symptom

Thin content warning in Search Console — diagnose and fix the template-level signal

If Google Search Console shows the 'Crawled — currently not indexed' bucket growing on more than 30% of one URL template after the March 5, 2024 scaled-content-abuse update, you have the thin-content signal — pseolint v0.4.0 sets the floor at 300 unique words per page and a 0.4 unique-to-total word ratio, below which Google's classifier reliably treats the URL as low-value within 21 to 45 days.

Diagnose your site

Loading bot check… if this doesn't resolve in a few seconds, refresh the page.

We'll highlight findings linked to: thin-content, boilerplate-ratio, near-duplicate, template-diversity.

What you see in Search Console

300 words is the pseolint default thin-content floor, but the explicit Search Console warning is rare — most teams encounter "thin content" as a diagnosis rather than a notification. What you actually see is a combination of signals: a growing "Crawled — currently not indexed" bucket concentrated on one template, a slowly declining indexed-URL count, individual URL inspections returning "URL is not on Google" with no other reason given, and gradual position decline on long-tail queries served by template-generated pages. If you do receive a manual action, it appears under Security & Manual Actions as "Thin content with little or no added value" — codified in the March 5, 2024 scaled-content-abuse policy and reinforced by the May 7, 2024 site-reputation-abuse policy — which is the most operator-friendly signal Google sends, because it tells you exactly what to fix and gives you a reconsideration path.

Likely causes

Templated pages with insufficient unique content per URL
The classic thin-content pattern: a template with consistent boilerplate (header, navigation, related links, footer) and only a small unique-content slot per page. When the unique slot averages under 200-300 words and consists mostly of swapped entity names, the page reads as low-value regardless of the surrounding chrome's word count.
Auto-generated pages from data sources without editorial layer
Programmatic pages built directly from a database query without an editorial transformation layer (insight, comparison, context, narrative) trip the signal even when individually the data is unique. Raw data uniqueness is necessary but not sufficient — the page needs to add something a database export wouldn't.
Affiliate or directory pages with minimal first-party commentary
A page listing twenty products with vendor descriptions, vendor images, and an affiliate link offers no first-party value. Even if every product is unique, the page is functionally a redistribution layer. Adding a paragraph of generic intro doesn't change the diagnosis.
Doorway pages — multiple URLs targeting variations of the same query
Pages built to target "plumber in Springfield," "plumber Springfield," "plumbers in Springfield" as separate URLs collapse under the thin-content classification because each variation has near-identical body content. Google sees ten doors leading to the same room and flags all ten.
Stub pages awaiting content that never arrived
Templates that auto-generate URLs ahead of having content to fill them — "This category is being updated," empty product pages, location pages with placeholder copy — are read as thin even when the intent is to fill them later. The thin-content signal evaluates current state, not roadmap.

Diagnostic steps

  1. 1

    If you have a manual action, read its specific wording carefully — Google describes the affected pattern in the action and that wording is your reconsideration target. Don't generalize from it; address what's literally written.

  2. 2

    Pull the URLs in your largest "Not indexed" bucket and segment by URL prefix to identify which template is the source. The template with the highest count of crawled-but-not-indexed URLs is your starting point.

  3. 3

    Run pseolint on your sitemap and prioritize spam/thin-content, spam/boilerplate-ratio, content/unique-value, and content/meta-uniqueness findings. Sort findings by template, not URL, to see which template is structurally thin versus incidentally thin.

  4. 4

    For each affected template, calculate three ratios per page: unique words to total words, unique nouns to template tokens, and unique data points to filled-in slots. Pages below 0.4 on the first ratio are nearly always classified as thin.

  5. 5

    Decide the fate of every URL on the template. Use historical clicks and conversions as the decision input: top 20% by historical clicks get a rewrite with substantive added information; middle 50% get consolidated to higher-level pages; bottom 30% get noindexed or 410'd.

  6. 6

    For the rewrite tier, define what the page adds beyond a database export — insight, comparison, original data, or context. Write the unique value prop for each template before you write the body, not after.

  7. 7

    After fixes ship, do not request indexing on individual URLs. Submit the updated sitemap and let Google rediscover. The pace of Google's redrawal is itself a quality signal — fast re-indexation indicates the changes worked.

Rules that detect this symptom

Case study

A jobs aggregator received a manual action on July 18, 2024 for thin content covering 23,000 city-by-role pages. The pages averaged 180 words of unique content (job description excerpts) wrapped in 1,400 words of boilerplate (location info, related searches, generic career advice). The team consolidated to role-only pages (no city), kept 800 high-volume city-by-role pages with rewritten body copy that included local salary data and unique-to-the-city employer commentary, and 410'd the remaining 22,200 URLs. The manual action was lifted on reconsideration 19 days after submission; organic traffic recovered to 110% of pre-action levels within 6 months and added an estimated $112,000 of attributable monthly recruiter-package revenue by January 15, 2025, because the consolidated pages ranked better than the original split.

Frequently asked questions

Is there a word-count threshold below which content is automatically thin?

No. Thinness is about information density and added value, not word count. A 150-word page that answers a specific question with a specific fact can outrank a 2,000-word padded page on the same query. The right framing is: would removing this page make the web meaningfully worse for the user it targets?

Will adding more text to thin pages fix the issue?

Only if the added text adds information. Padding with synonyms, related-topic boilerplate, or AI-generated filler often makes the diagnosis worse because you're now shipping more low-value tokens against the same quality threshold. Adding a single original fact, citation, or data point per page beats adding 500 words of generic prose.

How does Google detect thin content on a programmatic site?

Through some combination of n-gram overlap with other pages on the same site (boilerplate ratio), n-gram overlap with the broader web (originality), engagement signals from users who arrived from search, and structural features (heading uniqueness, body-to-chrome ratio). No single signal is decisive; the classifier is built on the combination.

Should I use AI to rewrite thin pages at scale?

AI can help structure content but cannot make a page substantively unique without a unique input. The best pattern is: feed the AI a per-page data record that no other page on your site has, and instruct it to surface insight from that data. The worst pattern is: ask AI to rewrite the existing thin page in different words. The first adds value; the second hides thinness for one crawl cycle and then trips again.

If I noindex thin pages, will the rest of my site recover?

Often yes, partially. Removing thin URLs from the indexed set raises the median quality of what remains, which Google reads as a positive signal at the host level. The recovery is not linear and depends on how many thin URLs were dragging down the host average — sites where 70% of indexed URLs were thin see meaningful recovery; sites where 10% were thin see modest improvement.

What recovery looks like

Manual-action recovery is bounded by the reconsideration cycle: typically 14 to 28 days from submission to verdict. Algorithmic recovery from thin-content signals is slower because the signal is host-level and updates as Google re-evaluates your overall indexed set. Expect partial recovery within 30 days of shipping fixes — Google will re-crawl and re-classify the rewritten pages, and the noindexed pages will fall out of active scoring within 45 days. Full recovery usually lands at the next core update (Google's typical 75-day cadence), when host-level quality models re-score domains. Track the indexed-URL trend in Sitebulb or Screaming Frog crawl diffs week over week: when the rewritten template's indexed-to-declared ratio crosses 70%, you're recovering. When it stays below 40% past 90 days, the rewrites haven't worked and the pages need substantive — not cosmetic — additional value.

Stop guessing. See the findings on your domain.

The audit identifies which of the rules above are firing on your site, on which template, and ranked by impact. No signup for the first run.

Run a SpamBrain check

Other symptoms