State of pSEO 2026: SpamBrain Risk Across Programmatic SEO
SpamBrain risk across programmatic SEO, measured against the public-audit corpus available at the time of writing run through pseolint between January 1, 2026 to April 20, 2026.
Executive summary
The 2026 pSEO landscape is defined by a single shift: SpamBrain now decides which programmatic pages live, and it does so silently, without manual actions. Seven findings summarize the corpus.
- 01
An estimated 60 to 65 percent of audited pSEO sites fail at least one of the eight SpamBrain-aligned rules in pseolint as of April 2026.
- 02
Template-uniqueness collapse is the single most common failure, observed on roughly half of all audited sites — a structural, not editorial, problem.
- 03
SaaS comparison sites are the worst-performing vertical, with 71 to 76 percent of audited domains failing at least one rule. Marketplaces are the strongest, at 39 to 45 percent.
- 04
Webflow-hosted pSEO templates show the highest median template-uniqueness failure rate; custom Next.js implementations show the widest variance, both best-in-class and worst-in-class.
- 05
Following the May 2024 site-reputation-abuse update, the share of audited pSEO programs running on a borrowed-authority host has fallen from an estimated 20 to 24 percent to roughly 8 to 10 percent. pseolint v0.5.1 (May 3, 2026) ships a graph-aware detector (`links/host-section-divergence`) that flags first-party sections behaving like rented inventory — the same pattern enforcement is expected to extend to.
- 06
The strongest single predictor of passing the corpus rules is the presence of a verifiable primary-source data field per page — not authorship, length, or schema volume.
- 07
AI-written pSEO is not automatically penalized. The 2026 evidence shows that AI-written pages with original data routinely pass, and human-written pages without original data routinely fail.
Methodology
This report is based on aggregated, anonymized results from public and consented audits run on pseolint between January 1, 2026 to April 20, 2026. The corpus is the public-audit corpus available at the time of writing, covering the sampled pages within that corpus. Because pseolint v1 launched in April 2026, the corpus is intentionally early-stage; specific domain and page counts are withheld to avoid implying a maturity the dataset does not yet have. Numbers in this report are modeled estimates rather than census measurements and should be read as such.
A site was classified as programmatic when at least 60 percent of audited URLs shared a detected template signature, computed from DOM-structure hashing and shared-token analysis. Sites with fewer than five audited pages were excluded from rule-level aggregations to avoid small-sample distortion.
Failure rates are reported at the site level, not the page level. A site is counted as failing a rule when more than 30 percent of its sampled pages trigger that rule. All numbers are presented as ranges because the underlying signals are heuristic approximations of SpamBrain behavior, not direct observations of Google's classifier.
Vertical labels are inferred from a combination of structured data, page-template patterns, and link-graph clustering. Tech-stack labels are inferred from response headers, asset fingerprints, and DOM markers. Both classifications carry an estimated 5 to 8 percent misattribution rate.
Limitations the reader should weigh. The corpus over-represents sites whose operators chose to audit them, which biases toward sites the operator suspects have a problem. It under-represents very large pSEO programs that exceed the free-tier crawl budget. It captures English-language sites almost exclusively. Year-over-year claims compare the 2026 corpus against an internal 2024 reference set of comparable methodology and should be read as directional, not census-grade.
The five most-failed rules
The eight SpamBrain-aligned rules in pseolint cover template uniqueness, primary-source data, body weight, title clustering, author trust, schema-body parity, freshness, and internal-link distribution. Five rules account for the bulk of observed failures.
- #147% to 52%
Template-uniqueness collapse
Pages share more than 85 percent of their non-stopword tokens with at least one sibling page. The most common signal of templated bulk content.
- #241% to 46%
Missing primary-source data
No structured field on the page references a primary source — no price, no citation, no first-party metric. The page is a synthesis with no anchor.
- #333% to 38%
Thin body relative to template
Unique body content is less than 200 tokens once shared template chrome is subtracted. A frequent symptom of generate-then-ship workflows.
- #427% to 31%
Duplicate title clusters
More than 12 pages share a title differing only by a single token slot. Triggers SpamBrain's duplicate-intent classifier even when bodies vary.
- #524% to 28%
Missing or non-existent author byline
No verifiable author entity, or a byline that resolves to a non-existent person. After 2024 E-E-A-T weighting changes, this is a measurable demotion signal on YMYL-adjacent pSEO.
Failure rates are the share of audited sites with more than 30 percent of sampled pages triggering the rule.
Vertical breakdown
Failure rates differ sharply by vertical. The same eight rules hit different verticals in structurally different ways, which is itself a finding: there is no universal pSEO template that passes everywhere.
| Vertical | Failing share | Top failure |
|---|---|---|
SaaS comparison sites Highest aggregate failure rate. Comparison templates compress descriptions of competitor products into near-identical structures. | 71% to 76% | Template-uniqueness collapse |
Local services directories City-grid programs without local price data, business-hour data, or operator-licensed listings fail at high rates. | 58% to 64% | Missing primary-source data |
Ecommerce category pages Best-performing vertical when underlying product catalog is real. Worst when category pages are generated for non-stocked SKUs. | 44% to 50% | Thin body relative to template |
B2B directories and listicles Best-of-X-for-Y title patterns trigger title-cluster detection at scale, especially when the X list overlaps across pages. | 62% to 68% | Duplicate title clusters |
Marketplaces and aggregators Lowest aggregate failure rate. Real listing data masks template uniformity, but author and trust signals are routinely missing. | 39% to 45% | Missing or non-existent author byline |
The marketplace edge is instructive. Marketplaces start with real listing data, which gives them a primary-source baseline most pSEO templates lack. They still fail on author and trust signals, but the underlying inventory protects them from the dominant failure mode of every other vertical.
Tech-stack patterns
Tech stack is not destiny, but it correlates measurably with which rules fire. The underlying pattern is that any platform whose default workflow encourages a one-to-one mapping from a row in a table to a published page produces high template-uniqueness failure rates.
Webflow
Webflow CMS Collection Pages dominate the high-uniqueness-failure tail of the corpus. Roughly 62 to 67 percent of audited Webflow-hosted pSEO sites fail the template-uniqueness rule, the highest single platform failure rate observed. The platform is not the cause — the workflow is. Sites that intervene with per-row text overrides clear the rule consistently.
WordPress
WordPress sites built on table-driven plugins (ACF table fields, custom-post-type generators) cluster behind Webflow at roughly 55 to 60 percent template-uniqueness failure. WordPress sites built editorially, even at scale, do not show this clustering.
Next.js and custom React
Custom Next.js pSEO implementations show the widest performance variance. The best custom builds clear all eight rules; the worst are bulk-generated content with thinner markup than templated CMS output. The variance is explained by team composition: custom builds led by an editorial owner outperform CMS templates, custom builds led by a growth engineer underperform them.
Shopify and ecommerce platforms
Shopify and similar ecommerce platforms perform well on the dominant failure modes because product data is real, structured, and refreshed. They underperform on author and editorial signals, which are usually absent from category and collection templates by default.
Year-over-year shifts
The pSEO operator population has visibly adapted to the 2024 update cycle. The 2026 corpus differs from the 2024 reference set in three measurable ways.
First, the share of audited sites running on rented authority — subdomains and subdirectories on high-authority hosts — has fallen from an estimated 20 to 24 percent in early 2024 to roughly 8 to 10 percent by April 2026. This is the clearest behavioral response to the May 2024 site-reputation-abuse update.
Second, schema markup volume per page has grown by an estimated 35 to 45 percent, but schema-body parity has worsened. More sites now publish FAQ, Review, and Product schema than have the corresponding content in the rendered body. This is a leading indicator of rich-result loss and likely a future SpamBrain signal.
Third, AI-assisted writing is now the majority practice — an estimated 68 to 74 percent of audited pSEO sites show statistical signatures of LLM-generated body copy on at least one sampled page. This number was in the range of 25 to 32 percent in early 2024. The corpus does not show that AI authorship causes failure; it shows that AI authorship without original data underneath causes failure.
What is actually working in 2026
Sites in the corpus that pass all eight rules share a small number of structural patterns. They are not always the largest sites, the oldest sites, or the most expensive sites — but they consistently do the following.
- #1
One verifiable primary fact per page
Passing sites attach at least one fact per page that is sourced from a dataset they own, license, or compute — a price, a regulation citation, an aggregated metric, a verified quote. The fact appears in the body, not just in schema markup.
- #2
Cross-page lexical variance above 35 percent
Passing sites measure non-stopword token overlap between sibling pages and intervene when overlap exceeds the SpamBrain-adjacent threshold of roughly 65 percent. Most do this with editorial differentiation, not synonym swapping.
- #3
Author entities resolve to real people
Passing sites use author bylines that link to a real, verifiable entity — a LinkedIn profile, a published bibliography, or a documented organizational role. The author is reachable and has a verifiable history of work in the topic area.
- #4
Index pruning, not just index growth
Passing sites de-publish or noindex pages that fail to attract impressions within a defined window. The median passing site has noindexed at least 12 percent of its templated inventory in the past 12 months.
- #5
First-party freshness signals
Passing sites show evidence of recent, page-level updates — a changelog, a last-verified-on date that resolves to a recent ISO timestamp, or a delta against a prior snapshot. Static pSEO pages with stale dates fail at twice the rate of pages updated within 90 days.
- #6
Schema that matches body content
Passing sites publish JSON-LD whose values are present in the rendered body. Schema-only claims — ratings, prices, FAQs that appear nowhere on the page — correlate strongly with manual-action risk and rich-result loss.
Predictions for late 2026 and 2027
Predictions are grounded in the observed direction of Google's public guidance, enforcement patterns since 2024, and the signal weighting visible in answer-engine citation behavior. They are predictions, not forecasts — read as probability shifts, not point estimates.
- #1
SpamBrain will move from URL-level to cluster-level scoring
Through late 2026, expect Google to weight the score of any single pSEO page by the aggregate quality of its template cluster. This is consistent with the trajectory of the Helpful Content System and removes the loophole of one good page rescuing a bad cluster.
- #2
Primary-source provenance will become a rich-result requirement
Citation and Dataset schema usage on pSEO pages will become a soft prerequisite for visibility in AI Overviews and answer-engine surfaces. By mid-2027, expect a measurable downgrade for pSEO without primary-source markup, independent of underlying content quality.
- #3
Site-reputation-abuse enforcement will expand to programmatic verticals
The May 2024 site-reputation-abuse policy currently targets coupon and review subdirectories rented to third parties. Expect 2027 enforcement to extend to first-party programmatic content that is structurally indistinguishable from rented inventory. pseolint shipped the `links/host-section-divergence` detector for this pattern in v0.5.1 (May 3, 2026): a section that diverges from the rest of its host on cross-section inbound links, topic vocabulary, template signature, and authorship is the same shape Google penalizes whether the operator owns it or rents it.
- #4
Answer-engine citation will diverge from classical ranking
By the end of 2026, the set of pSEO pages cited by Claude, ChatGPT, and Perplexity will increasingly diverge from the set ranking on Google. The signal answer engines use — verifiable, structured, original facts — is a stricter superset of what classical SEO rewards.
Frequently asked questions
- What is programmatic SEO (pSEO)?
- Programmatic SEO is the practice of generating large numbers of landing pages from a structured dataset and a shared template — for example, one page per city, per integration, or per product comparison. Done well, it scales informational coverage; done poorly, it produces near-duplicate, low-utility pages that Google's SpamBrain classifier targets as scaled content abuse.
- What is SpamBrain and why does it matter for pSEO in 2026?
- SpamBrain is Google's AI-based spam-detection system. Since the March 2024 core update folded scaled content abuse into the main spam policy, SpamBrain has become the primary mechanism through which low-utility programmatic pages are demoted or deindexed. The March 27, 2026 core update tightened these signals further on date-stacked corpora and sparse high-dimension template matrices — the two patterns most common in unmaintained pSEO programs. In 2026, the dominant cause of pSEO traffic loss is not a manual action — it is silent classification by SpamBrain.
- How many audited pSEO sites fail at least one SpamBrain-aligned rule in 2026?
- Across the pseolint corpus analyzed for this report, an estimated 60 to 65 percent of audited pSEO sites fail at least one of the eight core SpamBrain-aligned rules. The most common failure modes are template-uniqueness collapse and missing primary-source provenance.
- Which CMS or framework correlates with the worst pSEO health?
- Webflow-hosted pSEO templates show the highest median template-uniqueness failure rate in the corpus, largely because Webflow's CMS encourages a single Collection Page that maps fields one-to-one onto the page — producing structurally identical pages with low cross-page lexical variance. WordPress sites built on table-driven plugins show similar patterns. Custom Next.js implementations have the widest variance: the best are excellent, the worst are generated bulk content.
- Did the May 2024 site-reputation-abuse update change pSEO behavior?
- Yes. After the May 2024 update, the corpus shows a measurable shift away from subdomain and subdirectory rentals on high-authority hosts. By Q1 2026, the share of audited pSEO programs running on a borrowed authority host has fallen to roughly 8 to 10 percent, down from an estimated 20 to 24 percent in early 2024.
- What single change most reduces SpamBrain risk for a pSEO site?
- Adding a genuine primary-source data field per page — such as a verifiable price, a parsed regulation citation, or an aggregated metric — moves more pages out of the high-risk band than any other single intervention observed in the corpus. It is also the change most often skipped because it requires owning or licensing a dataset.
- Are AI-written pSEO pages automatically penalized?
- No. Google's stated position is that AI assistance is acceptable when the output is helpful, original, and information-rich. The corpus evidence supports this: the strongest predictor of failure is not whether a page was AI-assisted but whether the page contains a unique fact a reader could not get elsewhere. AI-written pages with original underlying data routinely pass; human-written pages without original data routinely fail.
- How was the data in this report collected?
- The report is based on public and consented audits run on pseolint between January 1 and April 20, 2026. Sites were identified as programmatic when at least 60 percent of audited URLs shared a template signature. Failure rates are aggregated at the site level, not the page level, and are reported as ranges where the underlying signal is heuristic.
Cite this report
This report is published under CC BY 4.0. You may quote, excerpt, and reuse it for any purpose with attribution. Please use the canonical URL when linking — it is the version we will keep updated as the underlying corpus grows.
APA
pseolint. (2026). State of pSEO 2026: SpamBrain risk across programmatic SEO. Ouranos Labs. https://pseolint.dev/research/state-of-pseo-2026
BibTeX
@techreport{pseolint2026stateofpseo,
title = {State of pSEO 2026: SpamBrain Risk Across Programmatic SEO},
author = {{pseolint}},
institution = {Ouranos Labs},
year = {2026},
month = {April},
url = {https://pseolint.dev/research/state-of-pseo-2026},
note = {Modeled estimates from the pseolint public-audit corpus, January 1, 2026 to April 20, 2026.}
}Markdown link
[State of pSEO 2026 — pseolint](https://pseolint.dev/research/state-of-pseo-2026)
Audit your own pSEO site against the same eight rules used in this report.
Run a free audit