Skip to content

feat: Automate Google Search Console indexing via Indexing API + smart sitemap diff#840

Open
dhananjay6561 wants to merge 3 commits intomainfrom
feat/automate-indexing
Open

feat: Automate Google Search Console indexing via Indexing API + smart sitemap diff#840
dhananjay6561 wants to merge 3 commits intomainfrom
feat/automate-indexing

Conversation

@dhananjay6561
Copy link
Copy Markdown
Member

@dhananjay6561 dhananjay6561 commented Apr 24, 2026

What & Why

Google doesn't support IndexNow (we already have that for Bing/Yandex). Without this PR, Google discovers new/updated docs on its own crawl schedule — days or weeks late. This PR pushes URLs directly to Google the moment a deploy completes.

How it works

After every deploy to main:

  1. Diffs new sitemap.xml against the cached sitemap from the last deploy
  2. Submits only new/changed URLs → URL_UPDATED to Google Indexing API
  3. Submits removed URLs → URL_DELETED
  4. Pings GSC Sitemap API as a secondary signal
  5. Caches the sitemap for the next deploy's diff — only when all submissions completed (if quota was hit or anything failed, the baseline is preserved so skipped URLs are automatically picked up on the next run)

Files changed

File Change
docusaurus.config.js Added lastmod: "date" — makes Docusaurus emit git-based last-modified dates in sitemap so the diff can detect which pages actually changed
scripts/google-index.js New script — handles auth, sitemap diffing, URL_UPDATED/URL_DELETED submissions, retry logic (3x with backoff on 429/5xx/network errors), burst rate limiting (10 req/s), GSC sitemap ping, and quota-safe baseline gating
.github/workflows/main.yml Added fetch-depth: 0 on checkout (required for correct lastmod dates), sitemap cache restore step, and Google Indexing API step after deploy

Deployment / Setup (one-time)

  1. Google Cloud Console → enable Web Search Indexing API on the service account's project
  2. GSC → Settings → Users & permissions → add service account email as Owner
  3. GitHub Secrets → add GOOGLE_SERVICE_ACCOUNT_JSON (paste the full service account key JSON)
  4. Optionally add GSC_SITE_URL secret if the GSC property URL differs from https://keploy.io/

That's it — next push to main triggers everything automatically.

Quota

Google's default limit is 200 URL_UPDATED/day. Smart diffing means a typical deploy (10–20 changed pages) uses 10–20 of those 200.

If quota is exceeded, skipped URLs are counted as failures → the script exits non-zero → the sitemap baseline is not advanced → the next deploy re-diffs from the same old baseline and automatically retries all missed URLs. Nothing is silently dropped.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automated Google indexing for docs deploys by diffing the current sitemap against a cached previous sitemap and notifying Google via the Indexing API, with a secondary GSC sitemap ping. This is intended to reduce Google discovery lag for new/updated/removed docs pages after each main deploy.

Changes:

  • Emit per-page <lastmod> values in the generated sitemap using git commit dates to enable reliable change detection.
  • Add a new Node script to diff sitemaps and submit URL_UPDATED / URL_DELETED notifications to the Google Indexing API, plus a GSC sitemap ping.
  • Extend the main deploy workflow to restore/cache the previous sitemap and run the Google indexing step post-deploy.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
scripts/google-index.js New script to diff sitemaps and submit Indexing API notifications with retry + rate limiting, plus GSC sitemap ping.
docusaurus.config.js Enables git-based <lastmod> emission in sitemap.xml to support smart diffing.
.github/workflows/main.yml Fetch full git history for correct <lastmod>, restore/cache previous sitemap, and run Google indexing after deploy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/google-index.js Outdated
Comment thread scripts/google-index.js Outdated
Comment thread scripts/google-index.js Outdated
Comment thread scripts/google-index.js Outdated
Comment thread .github/workflows/main.yml Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/main.yml Outdated
Comment thread scripts/google-index.js Outdated
Comment thread scripts/google-index.js Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Member

@amaan-bhati amaan-bhati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this well-engineered addition! The approach here is thoughtful - using fetch-depth: 0 to get accurate lastmod dates for the sitemap diff, caching the previous sitemap baseline between deploys, and only submitting URLs that actually changed are all the right design decisions. The script's retry logic, exponential backoff, burst rate control, and graceful degradation on quota exhaustion show careful attention to production reliability.

Issues to be Considered:

  • The fetch-depth: 0 change affects the actions/checkout@v4 step that all jobs in this workflow share, not just the Google Indexing step. A full git history checkout can meaningfully slow down CI for large repositories. Consider whether fetch-depth: 0 is needed only for the Docusaurus build step (to compute lastmod dates) and whether it can be scoped more narrowly if the workflow has multiple jobs.

  • The Google Indexing API step uses continue-on-error: true, which means a misconfigured service account, missing secret, or API quota exhaustion will not fail the deploy. This is probably the right call for a non-blocking SEO signal, but the reasoning should be documented in the workflow YAML comment so future maintainers don't accidentally remove it thinking it's an oversight.

  • The GOOGLE_SERVICE_ACCOUNT_JSON secret setup requires granting the service account "Owner" access in Google Search Console and enabling the Web Search Indexing API on the Cloud project. These are non-trivial one-time setup steps. Please add a SETUP.md or a section in the PR description (or README) documenting exactly how to provision this so the next maintainer can recreate it.

  • The npm install --no-save google-auth-library@10 runs inside the CI step on every deploy. Pinning to a major version (@10) is good, but a minor or patch release of google-auth-library could still introduce a breaking change. Consider adding a lock file or pinning to a specific exact version for more deterministic deploys.

  • Upgrading actions/setup-node@v3 to v4 is bundled into this PR. While this is a good hygiene change, bundling it with a new feature makes the diff harder to review and the change harder to bisect if something breaks. It's a minor point but worth noting.

This is a meaningful improvement to Keploy's docs discoverability on Google. The setup documentation gap is the most important thing to close before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants