feat: Automate Google Search Console indexing via Indexing API + smart sitemap diff#840
feat: Automate Google Search Console indexing via Indexing API + smart sitemap diff#840dhananjay6561 wants to merge 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds automated Google indexing for docs deploys by diffing the current sitemap against a cached previous sitemap and notifying Google via the Indexing API, with a secondary GSC sitemap ping. This is intended to reduce Google discovery lag for new/updated/removed docs pages after each main deploy.
Changes:
- Emit per-page
<lastmod>values in the generated sitemap using git commit dates to enable reliable change detection. - Add a new Node script to diff sitemaps and submit
URL_UPDATED/URL_DELETEDnotifications to the Google Indexing API, plus a GSC sitemap ping. - Extend the main deploy workflow to restore/cache the previous sitemap and run the Google indexing step post-deploy.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
scripts/google-index.js |
New script to diff sitemaps and submit Indexing API notifications with retry + rate limiting, plus GSC sitemap ping. |
docusaurus.config.js |
Enables git-based <lastmod> emission in sitemap.xml to support smart diffing. |
.github/workflows/main.yml |
Fetch full git history for correct <lastmod>, restore/cache previous sitemap, and run Google indexing after deploy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ssages, non-fatal npm install
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… are retried next deploy
amaan-bhati
left a comment
There was a problem hiding this comment.
Thank you for this well-engineered addition! The approach here is thoughtful - using fetch-depth: 0 to get accurate lastmod dates for the sitemap diff, caching the previous sitemap baseline between deploys, and only submitting URLs that actually changed are all the right design decisions. The script's retry logic, exponential backoff, burst rate control, and graceful degradation on quota exhaustion show careful attention to production reliability.
Issues to be Considered:
-
The
fetch-depth: 0change affects theactions/checkout@v4step that all jobs in this workflow share, not just the Google Indexing step. A full git history checkout can meaningfully slow down CI for large repositories. Consider whetherfetch-depth: 0is needed only for the Docusaurus build step (to computelastmoddates) and whether it can be scoped more narrowly if the workflow has multiple jobs. -
The Google Indexing API step uses
continue-on-error: true, which means a misconfigured service account, missing secret, or API quota exhaustion will not fail the deploy. This is probably the right call for a non-blocking SEO signal, but the reasoning should be documented in the workflow YAML comment so future maintainers don't accidentally remove it thinking it's an oversight. -
The
GOOGLE_SERVICE_ACCOUNT_JSONsecret setup requires granting the service account "Owner" access in Google Search Console and enabling the Web Search Indexing API on the Cloud project. These are non-trivial one-time setup steps. Please add aSETUP.mdor a section in the PR description (orREADME) documenting exactly how to provision this so the next maintainer can recreate it. -
The
npm install --no-save google-auth-library@10runs inside the CI step on every deploy. Pinning to a major version (@10) is good, but a minor or patch release ofgoogle-auth-librarycould still introduce a breaking change. Consider adding a lock file or pinning to a specific exact version for more deterministic deploys. -
Upgrading
actions/setup-node@v3tov4is bundled into this PR. While this is a good hygiene change, bundling it with a new feature makes the diff harder to review and the change harder to bisect if something breaks. It's a minor point but worth noting.
This is a meaningful improvement to Keploy's docs discoverability on Google. The setup documentation gap is the most important thing to close before merging.
What & Why
Google doesn't support IndexNow (we already have that for Bing/Yandex). Without this PR, Google discovers new/updated docs on its own crawl schedule — days or weeks late. This PR pushes URLs directly to Google the moment a deploy completes.
How it works
After every deploy to
main:sitemap.xmlagainst the cached sitemap from the last deployURL_UPDATEDto Google Indexing APIURL_DELETEDFiles changed
docusaurus.config.jslastmod: "date"— makes Docusaurus emit git-based last-modified dates in sitemap so the diff can detect which pages actually changedscripts/google-index.jsURL_UPDATED/URL_DELETEDsubmissions, retry logic (3x with backoff on 429/5xx/network errors), burst rate limiting (10 req/s), GSC sitemap ping, and quota-safe baseline gating.github/workflows/main.ymlfetch-depth: 0on checkout (required for correctlastmoddates), sitemap cache restore step, and Google Indexing API step after deployDeployment / Setup (one-time)
GOOGLE_SERVICE_ACCOUNT_JSON(paste the full service account key JSON)GSC_SITE_URLsecret if the GSC property URL differs fromhttps://keploy.io/That's it — next push to
maintriggers everything automatically.Quota
Google's default limit is 200 URL_UPDATED/day. Smart diffing means a typical deploy (10–20 changed pages) uses 10–20 of those 200.
If quota is exceeded, skipped URLs are counted as failures → the script exits non-zero → the sitemap baseline is not advanced → the next deploy re-diffs from the same old baseline and automatically retries all missed URLs. Nothing is silently dropped.