Site Reputation Abuse — Detecting Parasite Sections on a Trusted Host
Google's May 7, 2024 site-reputation-abuse policy demotes subfolders that borrow a host's reputation without earning it — links/host-section-divergence flags a URL section (e.g. /coupons/, /deals/) only when it diverges from the rest of the host on at least 2 of 4 independent structural signals, and it deliberately fires on the minority section, never on a balanced multi-topic split.
Test your site for site reputation abuse — detecting parasite sections on a trusted host
What it detects
The rule groups every crawled URL by its first path segment (/coupons/, /reviews/, /best/) and tests each section that holds at least 10 pages while leaving at least 10 pages in the rest of the host. It only considers sections that are a strict minority of the corpus (under 50%) — reputation abuse is, by definition, a small parasite section riding a larger host, so a 50/50 split is read as a multi-topic site and skipped. For each qualifying section it measures four signals against the rest of the host: (1) inbound-link integration — the fraction of section pages that receive at least one internal link from outside the section, flagged when under 0.20 (the section is an island the host barely references); (2) topic divergence — Jaccard distance between the top-100 TF-IDF terms of the section versus the rest, flagged above 0.75 (under ~25% vocabulary overlap); (3) template isolation — the fraction of section pages whose structureSignature also appears anywhere else on the host, flagged when under 0.10 (the section ships its own template the host never uses); and (4) authorship mismatch — flagged when section and host byline coverage differ by at least 0.40 and one pool is mostly anonymous (≤0.30) while the other is mostly bylined (≥0.70). A section that trips 2 or more signals emits a warning naming the section, the signal values, and a 20-URL sample; a section that trips 3 or more and holds over 50 pages escalates to error. The rule reasons about structure, not contracts — it cannot read a revenue-share agreement or see a manual action, only the structural fingerprint those arrangements leave behind.
Why it matters
Site reputation abuse — colloquially 'parasite SEO' — became an explicit Google spam policy on May 7, 2024 (https://developers.google.com/search/docs/essentials/spam-policies#site-reputation-abuse), and unlike most quality signals it is enforced partly by hand: affected domains receive a 'Third-party content abuse' manual action in Search Console with a defined reconsideration path. The policy targets a specific asymmetry — a high-authority host lends its reputation to a section of content that was produced by or for a third party with minimal first-party editorial involvement, so the section ranks on borrowed trust rather than its own. The classic shapes are a /coupons/ or /deals/ subfolder run by a syndication partner under a newspaper's domain, a vendor-generated /locations/ template on a directory site, or a sponsored /best/ directory with no real editorial review. Enforcement is surgical: field reports after the May 2024 and November 5, 2024 waves show 70% to 100% traffic loss confined to the offending subfolder while the rest of the domain is untouched. The four signals this rule reads are the same structural tells a reviewer looks for — is the section cross-linked from the host's own navigation, does it talk about the same things, was it built with the host's design system, and is it signed by the same people. None of those is conclusive alone, which is why the rule requires at least two to agree before it says anything.
A page that fails
A regional news domain with 1,200 editorial articles and a 180-page /coupons/ section supplied by an affiliate network. The coupon pages receive almost no inbound links from the newsroom's own pages (inbound-integration 0.06), share under 20% of their vocabulary with the news content (topic-divergence 0.81), render from a template the rest of the site never uses (template-isolation 0.04), and carry no bylines while the editorial side is 90% bylined (authorship mismatch: 0.00 vs 0.90). All four signals trip and the section holds more than 50 pages, so the rule fires at error severity — the structural signature of exactly the arrangement the May 2024 policy was written to catch.
A page that passes
The same news domain, but the /reviews/ section is produced in-house: every review is linked from the relevant news category, written by named staff who also write the news, and built with the site's standard article template. Inbound integration is 0.74, topic vocabulary overlaps the host's coverage (topic-divergence 0.38), the template is shared (template-isolation 0.61), and byline coverage matches the rest of the host. Zero signals trip. The section is a genuine part of the publication, not a parasite riding its authority — and the rule stays silent, because structural integration is exactly what the policy asks for.
How to fix it
- 1Decide per section whether you actually own it editorially. If a third party produces the content with minimal first-party review, the honest fixes are to integrate it properly or to move it off the host — not to game the four signals.
- 2Integrate, option A: cross-link the section from your primary navigation and from topically-related host pages so it stops reading as an island. Low inbound integration is the cheapest signal to flip and often the most diagnostic.
- 3Integrate, option B: share authorship and schema. Put real, named reviewers on the pages who actually vet them, and align the section's template with the rest of the host so it isn't a structurally foreign body.
- 4Separate, the clean alternative: move the section to a subdomain or a partner-owned domain and 301 the old URLs. It stops borrowing your reputation — which is the point of the enforcement — and stops being a liability.
- 5Do not try to defeat the rule by sprinkling a few host links into the section while leaving it editorially third-party. The policy is about substance, not surface signals; a reviewer applies the same 'would a reasonable user see this as the host's own content' test the rule only approximates.
SpamBrain context
Site reputation abuse was announced in the March 5, 2024 spam-policy update and took effect on May 7, 2024 (https://developers.google.com/search/docs/essentials/spam-policies#site-reputation-abuse), closing a loophole that the scaled-content-abuse and doorway policies left open: content that is individually passable but exists only to monetise a host's accumulated authority. Google has been explicit that the arrangement, not the topic, is what's penalised — a disclosed but otherwise-passive partnership is still in scope. This rule (shipped in @pseolint/core v0.5.1, MIT-licensed at github.com/ouranos-labs/pseolint) is the structural complement to the spam/* family: where spam/doorway-pattern and spam/near-duplicate look within a template for duplication, links/host-section-divergence looks across a host for a section that doesn't belong to it. It is deliberately conservative — the minority gate, the dual 10-page floors, and the 2-of-4 threshold exist to avoid crying abuse on legitimate multi-topic sites — and it scores at the engine's default rule weight rather than a hand-tuned spam weight, so treat a finding as a prompt to audit the arrangement, not as a verdict that you have been penalised. What it cannot do is read intent: it sees an unintegrated, off-topic, separately-templated, unsigned section and tells you it looks like parasite content. Whether it is depends on facts only you and your contracts hold.
Frequently asked questions
- Does this rule detect that content is literally 'third-party'?
- No, and it doesn't claim to. It has no way to read a revenue-share contract or know who authored a page. It measures four structural proxies — inbound integration, topic overlap, template sharing, and byline coverage — that genuine first-party sections tend to satisfy and parasite sections tend to fail. A finding means the section looks structurally like the pattern Google's policy targets; confirming it requires looking at the actual arrangement.
- Why does it only fire on the smaller section, not both halves of a split?
- By design. The rule requires the divergent section to be a strict minority of the corpus (under 50%). Site reputation abuse is a small section riding a large host's reputation; a roughly even split between two topics is a multi-topic site, not abuse. The minority gate is what stops the rule from emitting a symmetric, useless finding on both halves of a 50/50 site.
- How many signals have to trip before it warns?
- At least 2 of the 4. One signal alone is too noisy — plenty of legitimate sections are lightly cross-linked or use a distinct template. Requiring two independent signals to agree keeps the false-positive rate low. Three or more on a section larger than 50 pages escalates the finding from warning to error.
- I run a genuine in-house section that still trips this. What now?
- Look at which signals fired. If it's inbound integration, your section is under-linked from the rest of the site — usually worth fixing for users regardless. If it's authorship, add real bylines. If it's topic and template divergence on content that's legitimately yours, the rule is a false positive on your content type; document the decision and ignore the warning. The rule surfaces a structural pattern; it doesn't assume your intent.
- Will Google penalise the whole domain or just the subfolder?
- Site reputation abuse enforcement is characteristically subfolder-scoped — the affected section loses ranking while the rest of the domain is left intact, which is precisely the asymmetry the policy is designed to remove. That's also why this rule reports at the section level and names the specific prefix, rather than scoring the whole site down.
Related rules
- spam/doorway-patternDoorway PagesGoogle has banned doorway pages since the March 16, 2015 Search Central post — pseolint's spam/doorway-pattern rule mirrors SpamBrain's convergence logic by requiring 3 independent signals to stack (SimHash near-duplicate above 0.85, entity-swap, and structural confirmation) before firing at error severity (weight 25), the highest-confidence spam pattern reported by @pseolint/core v0.4.3.Read →
- spam/template-diversityTemplate Diversity30% is the default minUniqueRatio threshold — pseolint warns when fewer than 30% of pages carry a structurally distinct HTML skeleton, the floor at which SpamBrain (rebuilt August 25, 2022) starts reading a domain as one template rather than N designed pages.Read →
- spam/boilerplate-ratioBoilerplate Ratio60% is the default boilerplateMaxRatio: pseolint identifies sentence-level blocks appearing on 80%+ of pages, then flags any URL whose word count is dominated by those repeated blocks (warning severity, weight 12).Read →
Want to know whether this rule actually fires on your site?
Run pseolint against your sitemap. The audit is free, takes about a minute, and returns a per-URL list of every rule that fired — including this one — with the exact metric values so you can prioritise the fix queue.