Doorway pages checker for programmatic SEO
Spot doorway-shaped clusters on your site — repeated templates with one variable swapped — that Google's doorway policy explicitly targets.
What it does
The doorway-page detector identifies clusters of pages on your site that match the structural definition Google uses in its doorway policy (https://developers.google.com/search/docs/essentials/spam-policies#doorway-pages): pages that exist primarily to rank for query variants and funnel users to the same destination, distinguished only by a swapped city, service modifier, or product noun. It crawls your sitemap (up to 200 pages free, up to 500 on Pro manual re-audits at $19/month), groups URLs by template, and compares the actual rendered content within each cluster using SimHash 64-bit signatures with a 0.85 near-duplicate threshold plus an entity-swap detector. The 60-second median run is powered by the MIT-licensed pseolint engine v0.4.3. You get a list of doorway-shaped clusters ranked by risk, with the option to see exactly which pages would survive a doorway-policy enforcement and which wouldn't.
Why it matters
Google's doorway pages policy (https://developers.google.com/search/docs/essentials/spam-policies#doorway-pages) was formalised in a March 16, 2015 Search Central post and has been on the books ever since. Enforcement accelerated sharply after SpamBrain was rebuilt on August 25, 2022 (the original classifier shipped in 2018) and then again after the March 5, 2024 scaled-content-abuse update extended doorway-style demotions to AI-spun funnels. The May 7, 2024 site-reputation-abuse policy then went after parasite hosting on otherwise reputable domains — high-profile manual actions hit subdomains operated by major media companies and affiliate networks within the first 30-day enforcement window. Manual actions for doorway pages are rare — what happens instead is algorithmic demotion of the entire cluster, sometimes the entire site, with no notification, and recovery typically takes a 90-day re-crawl window. Programmatic SEO is particularly exposed because the same cost-saving template that lets you ship 50,000 pages overnight is also the structural fingerprint the policy was written to demote. The detector exists to draw the line between programmatic-but-substantive (a template that genuinely varies useful information per page) and programmatic-but-doorway (a template where the only variation is the keyword you're trying to rank for).
How it works
- Discover URL templates by clustering your sitemap on path-segment patterns (e.g. /plumbers/[city] becomes one cluster).
- Sample representative pages from each cluster — both the largest clusters and the most templated-looking ones.
- Strip template chrome and diff the unique main content between sibling pages within each cluster.
- Score each cluster on a doorway-risk axis using the same 3-signal stack (near-duplicate at 85% SimHash similarity + entity-swap detector + identical structureSignature/meta) that pseolint's spam/doorway-pattern rule requires before firing at error severity (weight 25).
- Cross-check internal linking — doorway clusters typically link only to themselves and a single conversion destination, which is itself a strong signal.
What you get
- A list of every URL cluster on your site, ranked by doorway risk score.
- Per-cluster detail: page count, content-diversity percentage, conversion-destination overlap, sample diffs between sibling pages.
- A flag for clusters that are explicitly named in Google's doorway policy examples (location pages with no localized content, near-duplicate service pages, intermediate funnel pages).
- A recommendation per cluster: keep as-is, deepen content, consolidate to a single canonical, or sunset.
- Internal-link map showing how doorway clusters relate to the rest of your site (often a useful clue for what to keep vs cut).
FAQ
- What makes a page a doorway page in Google's eyes?
- Google's doorway policy (https://developers.google.com/search/docs/essentials/spam-policies#doorway-pages) describes them as pages designed to rank for similar queries that funnel users to the same destination, with the variations between them being keyword-driven rather than user-driven. The clearest examples are city pages for a national service business where the only differences between /plumbers/austin and /plumbers/dallas are the city name and a stock photo. The policy also covers near-duplicate service-modifier pages (cheap, best, top, near-me variants) and intermediate funnel pages whose only purpose is to capture a SERP click and bounce users to the real conversion page. The policy was first published in March 2015 and last expanded in the May 7, 2024 site-reputation update.
- Are all programmatic location pages doorway pages?
- No. The policy explicitly distinguishes between doorway pages and useful templated pages. A pizza chain's /locations/[store] pages are not doorways if each page genuinely serves a local-intent user — store hours, address, phone, menu specifics, real photos. The detector grades on whether the variation between sibling pages is substantive enough to justify the page existing, or whether it's purely keyword-targeting with cosmetic differences.
- Can I have some doorway pages and not get penalized?
- Possibly, depending on scale and intent. Google's enforcement seems to be tolerant of small numbers of borderline pages on otherwise high-quality sites and aggressive on sites where doorways are the dominant pattern. The risk threshold isn't published. The detector errs toward conservatism — flagging clusters that have the structural fingerprint, then letting you make the editorial call on whether they're worth defending or not.
- What's the difference between a doorway page and thin content?
- Thin content is about substance per page — does this single URL have enough unique value to deserve indexing. Doorway is about pattern across pages — does this cluster of URLs exist primarily to capture keyword variants rather than serve distinct user needs. A page can be thin without being doorway (a single under-researched blog post) and doorway without being individually thin (a 2,000-word city page that says nothing genuinely local). The detector handles the doorway side; the thin-content scanner handles the page-level substance side.
- If I have a doorway cluster, what's the fastest fix?
- The fastest fix is usually consolidation: pick the strongest representative page in the cluster, expand it to cover the whole topic, and 301 the rest to it. This preserves the link equity, removes the doorway pattern, and lets Google re-evaluate the consolidated page on its own merits over a typical 30-day to 60-day recrawl window. The slower but higher-ceiling fix is to deepen each page into something genuinely user-distinct — but if you can't credibly do that across 1,000 city pages, consolidation is the safer bet. Compared to running this analysis manually in Screaming Frog (£199/year) or Sitebulb ($35/month), the pseolint detector is free for one-shot audits and $19/month for scheduled monitoring across multiple domains.
Related tools
Want every rule, not just this lens? The full audit on the homepage runs the complete SpamBrain + AEO rule set and produces the same shareable report — same backend, broader output.