Regurgitated Content — When Your Directory Is Just the Google Places API Reskinned
content/regurgitated-content is a low-confidence v1 heuristic that fires a warning when a page shows at least 2 of 5 Google-Places-regurgitation tells — Powered by Google attribution, googleusercontent images over 60%, a Static Maps embed, Places API JavaScript, or an aggregator footprint of 5 or more unsigned star-rating blocks.
Test your site for regurgitated content — when your directory is just the google places api reskinned
What it detects
content/regurgitated-content looks for one shape: a page that lifts business names, reviews, addresses, and photos straight from the Google Places API and presents them as a directory with nothing of its own added on top. It reads five independent signals per page and fires only when at least 2 of them are present.
The signals are specific. (1) Google Places attribution — a 'powered by google' string, or a noopener anchor pointing at google.com/maps. (2) Google images dominate — once a page has 3 or more images, the rule fires this signal when over 60% of them are hosted on googleusercontent.com, the Places photo endpoint, or Street View pixels. (3) Static Maps or Maps embed — a maps.googleapis.com/maps/api/staticmap source, or a google.com/maps/embed iframe. (4) Places API JavaScript — a google.maps.places.PlacesService or AutocompleteService marker in the markup. (5) Aggregator footprint — 5 or more elements carrying a star rating (Unicode stars, a 4.5/5 fraction, or the word 'stars') on a page that shows fewer than 2 of 3 E-E-A-T signals (author, published date, an /about link).
Severity is fixed at warning and confidence is low. This is a v1 heuristic that reasons about structure, never about a licence: it cannot read a Places API contract or know whether you have permission. It only sees the fingerprint that raw redistribution leaves behind.
Why it matters
The Places API is a fine data source. The problem this rule names is using it as the entire product — a redistribution layer with no proprietary value, where every fact, photo, and rating on the page is something a reader could have pulled from Google Maps in one tap. When a directory adds nothing a user cannot already get from the source, the page is competing with Google using Google's own data, which is a losing position in the index and an obvious scaled-content tell.
The 2-of-5 threshold is deliberately loose because each signal alone is innocent — plenty of legitimate pages embed one map. Two signals together start to describe a page whose substance is borrowed: Google-hosted photos plus a Static Maps embed, or Places attribution plus a wall of unsigned star ratings. The pattern, not any single tell, is what the heuristic is reaching for.
Because confidence is deliberately low, a finding here is a prompt to audit, not a verdict. A genuine local guide that embeds a map and quotes a couple of reviews can trip two signals while adding real editorial value the rule cannot see. Treat the warning as 'this page looks like a thin redistribution layer — confirm it adds something the API does not.'
A page that fails
TikiFinder, a 600-page craft-cocktail-lounge directory, ships a page per bar that is pure Places API reskin. The lounge's name, address, and 5 most recent reviews come straight from the API; 9 of its 11 photos are googleusercontent.com hero shots of the bar's signature mai tai and ceramic tiki mugs (82% Google-hosted); a Static Maps embed pins the entrance; and a star-rating block repeats '4.6/5 stars' under every review with no byline, no published date, no /about page. Four of the five signals trip. There is not one sentence about the rum flight, the bitters program, or the garnish work that a reader could not have read on Google Maps 12 seconds earlier.
A page that passes
The same TikiFinder page, rebuilt as an actual guide. The embedded map and a single attributed Google review stay — that is fine — but the page now leads with 300 words the API does not hold: the editor visited, ranked the lounge's 8 rum flights, photographed the house orgeat and the hand-carved tiki mug collection with the directory's own camera (so only 18% of images are Google-hosted), and named the bartender who built the bitters menu in a signed byline with a published date. Two Places signals remain, but the page now carries proprietary tasting notes, original garnish photography, and a named author — substance the raw Places API never had.
How to fix it
- 1Add proprietary substance the API does not hold — original tasting notes, a ranked verdict, a first-person visit log — so the page is more than a redistribution layer.
- 2Shoot and host your own photography. When your own images outnumber googleusercontent.com hero shots, the Google-images-dominate signal stops firing and the page stops looking lifted.
- 3Keep one attributed Google review if you like, but write your own editorial summary alongside it rather than republishing a wall of 5-plus star-rating blocks verbatim.
- 4Attach E-E-A-T: a named byline, a published date, and an /about page describing how you evaluate each venue, which both clears the aggregator-footprint signal and answers the trust question.
- 5Use the embedded map as a convenience, not the content — one Static Maps embed is fine when the words around it are yours and not the API's.
- 6If a page genuinely has nothing to add beyond the Places data, merge it or cut it rather than shipping a thin reskin that competes with Google using Google's own facts.
SpamBrain context
Google's scaled-content-abuse policy, effective March 5, 2024, targets pages produced at scale that add little value of their own regardless of how they were made — and a directory that is a thin wrapper over the Places API is one of the cleanest examples. The data is accurate, the page renders fine, and yet the URL contributes nothing a reader could not get from the source in one tap. That is the gap between a database export and a page worth ranking.
content/regurgitated-content (in @pseolint/core, MIT-licensed at github.com/ouranos-labs/pseolint) is a v1 heuristic, and it is honest about its limits. It reads five structural tells and fires at warning with low confidence on 2 or more, because that is the level of certainty a structure-only check can responsibly claim. It does not run external corpus comparison — n-gram overlap against Wikipedia or review aggregators is deferred to a later version — so it cannot prove a page is regurgitated, only that it wears the fingerprint.
What it cannot do is read intent or licence. It sees Google-hosted photos, a Static Maps embed, and a wall of unsigned ratings, and it tells you the page looks like a redistribution layer with no proprietary value. Whether that is true depends on whether you added anything the API does not already give a reader for free — a judgment only your content can settle.
Frequently asked questions
- Why does my legitimate local guide trip this rule?
- Because two innocent signals can co-occur. A genuine guide that embeds a Static Map and quotes one Google review will trip 2 of the 5 tells even though it adds real editorial value. This is a low-confidence v1 heuristic — it reads structure, not substance, so it cannot see your original tasting notes or your on-the-ground reporting. Treat a finding as a prompt to confirm the page adds something the Places API does not, not as a verdict that it is spam.
- Is embedding a Google Map a problem on its own?
- No. One map embed is a single signal, and the rule needs at least 2 of the 5 to fire. Maps are a useful convenience and plenty of valuable pages use them. The pattern the heuristic is reaching for is a map plus Google-hosted photos plus lifted reviews plus no authorship — the combination that describes a page whose entire substance is borrowed from the API rather than the embed alone.
- We run a tiki-bar directory that embeds Google reviews — how do we pass?
- Add proprietary value the Places API does not hold, then the borrowed pieces stop defining the page. For a craft-cocktail lounge that means your own ranked verdict on its 8 rum flights, original photography of the mai tai and the hand-carved tiki mugs so your images outnumber googleusercontent ones, signed editorial notes on the bitters and garnish program, and a named byline with a published date. Keep one attributed review if you want — but make the page about your judgment, not a reskin of Google Maps.
- Why is confidence low and severity only a warning?
- Because a structure-only heuristic cannot prove regurgitation — it can only spot the fingerprint. A page that lifts everything from the API and a thoughtful local guide that happens to embed a map can look similar in markup, so the rule deliberately under-claims: warning severity, low confidence, and a 2-of-5 threshold chosen to surface the pattern without crying spam on every page with a map. A future version may add external corpus comparison to raise confidence; v1 stays honest about what markup alone can tell.
- What counts as the aggregator-footprint signal exactly?
- It fires when a page shows 5 or more elements carrying a star rating — Unicode stars, a numeric fraction like 4.6/5, or the literal word 'stars' — while exposing fewer than 2 of 3 E-E-A-T signals (an author, a published date, or an /about link). It is the shape of a review-aggregator page that republishes ratings at scale without taking responsibility for them. Add a byline and an /about page and this signal stops firing, because the page is no longer anonymous redistribution.
Related rules
- content/unique-valueUnique Valuecontent/unique-value counts the distinct words on each page that appear on no other page in the audit, and fires an error below a 100-word floor — the page-specific-vocabulary test Google's scaled-content-abuse policy has applied since March 5, 2024 when it asks whether a URL adds anything genuinely new.Read →
- spam/thin-contentThin Content DetectionGoogle's Helpful Content System (rebuilt August 25, 2022) demoted an estimated 45% of low-effort pages in the March 5, 2024 scaled-content-abuse update — the spam/thin-content rule mirrors that floor by flagging every URL under 300 words of substantive body text (default), after stripping nav and footer chrome via SpamBrain-style readability heuristics.Read →
- content/value-addValue-Add Scorecontent/value-add is a second-pass composite that reads seven other rules' findings — originality, freshness, citable facts, the four-category E-E-A-T count, translation, cliche reuse, and Wikipedia paraphrase — weights each at one-seventh, averages them into a single 0-to-1 score, and fires an error below 50% or a critical below 30%, the synthesis SpamBrain has rewarded since the March 5, 2024 update.Read →
Want to know whether this rule actually fires on your site?
Run pseolint against your sitemap. The audit is free, takes about a minute, and returns a per-URL list of every rule that fired — including this one — with the exact metric values so you can prioritise the fix queue.