Unique Value — Counting the Words That Appear on No Other Page
content/unique-value counts the distinct words on each page that appear on no other page in the audit, and fires an error below a 100-word floor — the page-specific-vocabulary test Google's scaled-content-abuse policy has applied since March 5, 2024 when it asks whether a URL adds anything genuinely new.
Test your site for unique value — counting the words that appear on no other page
What it detects
content/unique-value answers a sharper question than word count: of all the distinct words on this page, how many appear on no other page in the audit? The rule tokenises each page's main content — lower-cased, split on whitespace, with leading and trailing punctuation stripped so 'word', 'word.' and '(word)' count as one token — and builds a frequency map of which words appear on which pages.
A word counts toward a page's uniqueness only if its frequency across the whole audited set is exactly one. Words that appear on even one other page — navigation labels, shared legal blocks, an industry term every page uses — are 'shared' and do not count, no matter how useful they are. If a page has fewer than 100 of these page-exclusive words, the rule fires an error and reports the split: how many words are unique, how many are shared, and how many distinct words the page has in total. The point is to make visible that a 1,500-word page can still carry only 90 unique words — 6% of its length — if 1,410 of its words live on its siblings too.
Why it matters
This is the rule that catches the failure thin-content misses. A page can clear the 300-word thin-content floor with room to spare and still be almost entirely boilerplate with an entity swapped in — long, but not original. content/unique-value measures originality directly by asking what vocabulary exists here and nowhere else on your site, which is much closer to how Google decides whether a URL earns its own slot in the index.
The most expensive mistake on programmatic sites is adding real, useful, but per-axis-shared data and expecting it to count. A regulation repeated across every page for that role, a spec block shared across a product line, a city's statutes echoed on each of that city's pages — all genuinely helpful, all shared, all worth zero toward this metric. The words that move it are the page-specific ones: a distinct lead, this record's particular facts, an example that exists only here. That is the difference between a database export and a page worth ranking.
A page that fails
/api/stripe-vs-square and /api/stripe-vs-paypal on a fintech directory. Each is 900 words, comfortably past the thin-content floor. But the shared 'What is a payment API' intro, the identical feature glossary, and the same integration checklist mean each page carries only sixty-odd words that appear nowhere else — 11% of its vocabulary is page-specific, 89% shared. The rule fires error: '/api/stripe-vs-square has only 64 page-unique words (min 100); 510 of its 574 distinct words also appear on other pages.'
A page that passes
The same two pages, rebuilt so each leads with provider-specific material: real Stripe Radar fraud-tooling detail on one, Square's in-person hardware fees on the other, each with its own code sample and pricing edge cases. The shared glossary moves to a linked reference page. Now each page carries two hundred-plus words that appear on no other URL — over 35% of its vocabulary is page-specific — the shared-to-unique ratio inverts, and the rule clears.
How to fix it
- 1Write a page-specific lead. The fastest 100 unique words are usually the opening paragraph — the one thing true of this entity and nothing else. Boilerplate intros are the first thing to cut.
- 2Move shared blocks to a shared URL. A glossary, a methodology note, or a legal disclaimer that repeats across pages should live on one page the others link to, not embedded everywhere where it dilutes uniqueness.
- 3Stop counting per-axis data as unique. Content repeated across pages on the same axis — a role's regulations across that role's documents — is useful but shared. Only text that exists on exactly one page moves the metric.
- 4Bind distinct records, not shared ones. If two pages pull the same fields from your data source, they will share vocabulary; differentiate the records or merge the pages.
- 5Read the shared/unique split the finding reports. It tells you exactly how many words you need to add and confirms that the problem is overlap, not length.
SpamBrain context
Originality has been the spine of Google's quality guidance for over a decade — the Search Quality Rater Guidelines have used 'no added value' as a Lowest-quality marker since 2014 — but the March 5, 2024 scaled-content-abuse update made it enforceable at scale by naming pages that exist 'with little unique value' as a policy violation regardless of how they were produced.
content/unique-value (in @pseolint/core, MIT-licensed at github.com/ouranos-labs/pseolint) is pseolint's most direct measure of that clause. Where spam/thin-content counts total substantive words and spam/boilerplate-ratio measures shared sentence blocks, this rule counts the page-exclusive vocabulary that survives comparison against every other page in the audit. It is an integrity-category error, not a warning, because a page below the 100-word floor is by definition contributing almost nothing the rest of the site does not already say — which is precisely the condition Google's deduplication and quality systems are built to demote.
Frequently asked questions
- How is this different from the thin-content rule?
- spam/thin-content counts total substantive words and fires below 300; content/unique-value counts only the words that appear on no other page and fires below 100. A page can pass thin-content with 1,000 words and still fail unique-value if 950 of those words are boilerplate shared with its siblings. Length and originality are different axes — this rule measures the second one.
- Does useful, accurate data count toward uniqueness?
- Only if it appears on exactly one page. This trips up pSEO teams constantly: a regulation, spec, or statistic that is genuinely useful but repeats across every page on the same axis is 'shared' and counts for nothing here. The metric moves only on page-specific text. The fix is not to remove the shared data but to add material that exists nowhere else on the site.
- Why count distinct words rather than total words?
- Because repetition should not be rewarded. Counting distinct tokens, then keeping only those with a global frequency of one, isolates the genuinely page-specific vocabulary from filler. Saying 'San Francisco' fifty times adds one unique word, not fifty — which is the right behaviour for a rule trying to measure how much new information a page actually contributes.
- Will the count change as I add or remove pages?
- Yes — uniqueness is relative to the audited set. A word that is unique today becomes shared the moment a second page uses it, so adding near-identical pages can lower the unique count on pages that previously passed. That is intentional: it reflects how Google evaluates your site as a whole, not each URL in isolation.
- I sell telescopes — every page repeats the same optics glossary. Does that count?
- The glossary does not, but the instrument's own numbers do. A refractor page stating its 102-millimetre aperture, its 660-millimetre focal length, the supplied 25-millimetre eyepiece, and the dovetail mount it ships on carries vocabulary no sibling listing repeats. A computerised go-to altazimuth mount and a manual equatorial tripod differentiate two products that would otherwise read alike. Move the shared 'what is magnification' explainer to one reference URL, and each telescope's distinct aperture, focal ratio, and eyepiece kit becomes the page-unique substance the rule counts. A heritage-orchard nursery that lists the rootstock, the chill-hours requirement, and the pollination group for each apple cultivar gives every page words no sibling listing repeats.
Related rules
- spam/thin-contentThin Content DetectionGoogle's Helpful Content System (rebuilt August 25, 2022) demoted an estimated 45% of low-effort pages in the March 5, 2024 scaled-content-abuse update — the spam/thin-content rule mirrors that floor by flagging every URL under 300 words of substantive body text (default), after stripping nav and footer chrome via SpamBrain-style readability heuristics.Read →
- spam/boilerplate-ratioBoilerplate Ratio60% is the default boilerplateMaxRatio: pseolint identifies sentence-level blocks appearing on 80%+ of pages, then flags any URL whose word count is dominated by those repeated blocks (warning severity, weight 12).Read →
- spam/near-duplicateNear-Duplicate Pages85% SimHash similarity is the pseolint default threshold — every page pair at or above that mirrors the near-duplicate canonicalisation ceiling Google's web indexing team has used since adopting Charikar's 2002 SimHash paper in 2007, and which the March 5, 2024 scaled-content-abuse update reaffirmed as policy via SpamBrain's 60-second triage queue.Read →
Want to know whether this rule actually fires on your site?
Run pseolint against your sitemap. The audit is free, takes about a minute, and returns a per-URL list of every rule that fired — including this one — with the exact metric values so you can prioritise the fix queue.