Thin content checker — find pages Google sees as filler
Find every page on your site that Google might classify as thin — under-substance, templated, or AI-padded — in one 60-second crawl.
What it does
The thin-content scanner samples your sitemap, fetches up to 200 pages on the free tier (up to 500 on Pro manual re-audits at $19/month), and grades each against the substance heuristics SpamBrain appears to use: visible word count against pseolint's 300-word floor (configurable per archetype), lexical uniqueness compared to sibling pages, presence of original media or sourced data, and the ratio of unique to recycled phrasing. The check runs in a 60-second median window and is powered by @pseolint/core v0.4.3 (MIT-licensed at github.com/ouranos-labs/pseolint). It produces a per-page substance score plus a domain-wide breakdown of how much of your indexable surface area is below the practical threshold where Google starts ignoring or demoting pages.
Why it matters
Thin content is the oldest cause of pSEO failure and the one most operators still get wrong. Helpful Content updates in 2022 and 2023 made it a ranking factor; the March 5, 2024 scaled content abuse update (https://developers.google.com/search/docs/essentials/spam-policies) made it a policy violation, and the May 7, 2024 site reputation policy extended that to third-party content on parasite subdomains. The cost has changed too — historically a thin page just didn't rank. Today, a critical mass of thin pages can pull your entire domain's quality signal down, which means your good pages stop ranking too. The pseolint scanner uses a 300-word default floor (configurable per archetype) and weights findings as info=5, warning=12, error=25 in the overall score. If your site has a long tail of auto-generated location pages, AI-spun product variants, or templated comparison articles, the question is no longer whether some are thin — it's whether enough of them are thin to taint the rest.
How it works
- Sample up to 100 URLs from your sitemap on the free tier (500 on Pro at $19/month), with extra weight given to URL patterns that look mass-generated. Median crawl + audit time is 60 seconds.
- For each page, strip nav, footer, and template chrome to isolate the actual unique main content.
- Score each page on visible word count (post-strip), lexical diversity, sentence-level uniqueness vs sibling pages, and presence of structured data, media, or citations. The default error threshold fires below 250 words and warns under 300.
- Cross-compare pages within the same URL pattern using 64-bit SimHash fingerprints — pages clustering at a Hamming distance of 8 or less are flagged as near-duplicate, and pages with Jaccard shingle overlap above 85% are escalated as templated boilerplate.
- Surface the worst offenders first, with a substance score and a one-line diagnosis (under-length, near-duplicate, AI-padded, no unique research). The infrastructure runs on Next.js 15 with Inngest-backed background crawls so audits stay snappy even on a 500-URL Pro run.
What you get
- A list of every audited page sorted by substance score, worst first.
- A domain-level percentage: what share of your sampled surface area is below the practical thin-content threshold.
- Near-duplicate clusters — groups of pages that share so much copy that Google likely canonicalizes them or drops most of the cluster.
- Word-count distribution chart so you can see whether thin pages are a long tail or a clustered template problem.
- Specific recommendations per page — whether to expand, merge, redirect, or noindex.
FAQ
- What word count counts as thin content?
- There is no fixed Google threshold and anyone who tells you 300 words is the line is making it up. What matters is substance relative to user intent. A definitions glossary entry can rank fine at 80 words; a buyer's guide at 800 words can be thin if it's all generic platitudes. The pseolint default is a 300-word floor that you can tune per archetype (200 for product comparators, 350 for guide-style hubs). Our scanner weighs word count alongside lexical uniqueness, originality, and sibling-page comparison — so a 200-word page with a unique data point beats a 1,500-word page of recycled boilerplate.
- How do you handle pages with lots of dynamic content like product listings?
- We treat structured listings (tables, product cards, schema-marked items) as substance, since they typically represent real, queryable data. The scanner downgrades pages where the only variation between siblings is a swapped variable inside otherwise identical prose — that's the pattern SpamBrain seems to weight most heavily.
- Can the scanner tell if my content is AI-generated?
- Indirectly. We don't run an AI-detection classifier (those are unreliable) but we flag the structural fingerprints that AI-spun content tends to leave: uniform paragraph length, generic transitional phrases, low entity density, no first-person voice or sourced claims. Google has been clear that AI-generated content is fine if it's helpful — the March 5, 2024 scaled-content-abuse policy explicitly targets unhelpful AI content rather than AI itself. The scanner finds the unhelpful kind, and pairs cleanly with the dedicated aeo/* rules pseolint ships for AI-answer-engine grounding.
- What's the fix for a page flagged as thin?
- There are four options and the right one depends on the page. Expand: add a unique data point, original quote, or genuine user-relevant detail. Merge: consolidate three near-duplicate pages into one strong canonical. Redirect: 301 to the closest substantive page. Noindex: keep it accessible to users but remove it from Google's view. Each finding suggests one of these based on the failure mode.
- Will fixing thin pages immediately recover lost rankings?
- Usually no — recovery from a quality-signal hit takes 30-day to 90-day windows in our observed cases, because Google needs time to re-crawl and re-evaluate the affected pages. The March 2024 Core Update rollout itself took 45 days to fully propagate. Fixing thin content is necessary but not sufficient: you also need Google to recrawl, re-render, and re-score, which is why you should pair content fixes with a sitemap resubmit and a few high-quality external links pointing at the recovered pages. Compared to paid alternatives like Ahrefs Site Audit at $129/month or Semrush at $139.95/month, the pseolint scanner is free for the substance check and $19/month for monitoring.
Related tools
Want every rule, not just this lens? The full audit on the homepage runs the complete SpamBrain + AEO rule set and produces the same shareable report — same backend, broader output.