diff --git a/.github/workflows/db-backup.yml b/.github/workflows/db-backup.yml index e317d24..7a3d246 100644 --- a/.github/workflows/db-backup.yml +++ b/.github/workflows/db-backup.yml @@ -7,8 +7,13 @@ on: jobs: backup: - name: PostgreSQL Backup to GCS + name: PostgreSQL Backup (${{ matrix.environment }}) runs-on: ubuntu-latest + strategy: + fail-fast: false + matrix: + environment: [staging, production] + environment: ${{ matrix.environment }} steps: - name: Dump and upload uses: appleboy/ssh-action@v1 @@ -19,9 +24,12 @@ jobs: port: ${{ secrets.SERVER_PORT || 22 }} script: | set -euo pipefail - STAMP="$(date +%Y%m%d)" - DUMP="/tmp/paperscout-${STAMP}.dump" + STAMP="$(date -u +%Y%m%dT%H%M%SZ)" + RUN_KEY="${{ github.run_id }}-${{ github.run_attempt }}" + DUMP="/tmp/paperscout-${{ matrix.environment }}-${STAMP}-${RUN_KEY}.dump" + DEST="gs://insights-db-backups/paperscout/${{ matrix.environment }}/paperscout-${STAMP}-${RUN_KEY}.dump" + trap 'rm -f "$DUMP"' EXIT sudo -u postgres pg_dump -Fc paperscout > "$DUMP" - gsutil cp "$DUMP" "gs://paperscout-backups/paperscout-${STAMP}.dump" + gsutil cp "$DUMP" "$DEST" rm -f "$DUMP" diff --git a/.gitignore b/.gitignore index e39fbab..8754a17 100644 --- a/.gitignore +++ b/.gitignore @@ -37,3 +37,4 @@ build/ Icon? .com.apple.timemachine.donotpresent .VolumeIcon.icns +.cursor diff --git a/CHANGELOG.md b/CHANGELOG.md index 3afa3ec..684237c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,8 +9,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added +- Post the same Slack **status** summary as the interactive command to `NOTIFICATION_CHANNEL` once when the process starts (when that channel is configured). - Open-source hygiene: contributing guide, security policy, code of conduct, onboarding and handoff docs, pre-commit (Ruff), GitHub issue templates, Dependabot, CodeQL, CODEOWNERS template, and `.gitattributes`. +### Changed + +- Documentation: deployment URLs (Slack Request URL behind nginx `/paperscout/`), clone URL in server setup, staging-style placeholders. +- `db-backup.yml`: matrix parallel backups for `staging` / `production` using environment-level SSH secrets; uploads under `gs://insights-db-backups/paperscout//` with unique temp files and object keys (UTC timestamp + `run_id` + `run_attempt` + environment); `EXIT` trap removes temp dump on failure. `SERVER_SETUP` restore examples updated (`--no-owner`, listing/copy by object name). + ## [0.1.0] - 2026-05-05 ### Added diff --git a/README.md b/README.md index 321ace7..18cc923 100644 --- a/README.md +++ b/README.md @@ -113,15 +113,17 @@ python -m paperscout Once the scout is running and reachable at a public URL: 1. Go back to **Event Subscriptions** in the Slack app config -2. Set **Request URL** to `https://your-server.com/slack/events` -3. Slack will send a challenge request -- the scout responds automatically +2. Set **Request URL** depending on how traffic reaches Bolt: + - **Reverse proxy (recommended for production/staging):** If nginx terminates TLS and proxies under a path prefix (see [`deploy/paperscout.conf`](deploy/paperscout.conf)), Slack must use that prefix. Example: `https://your-domain.example.org/paperscout/slack/events` — not `https://your-domain.example.org/slack/events`. + - **Direct to the app (local dev or ngrok without nginx):** Bolt serves `/slack/events` at the container root. Example: `https://staging.example.org/slack/events` or `https://abc123.ngrok-free.app/slack/events`. +3. Slack will send a challenge request — the scout responds automatically 4. Click **Save Changes** -For local testing with ngrok: +For local testing with ngrok (traffic straight to `PORT`, no path prefix): ```bash ngrok http 3000 -# Use the ngrok URL: https://abc123.ngrok.io/slack/events +# Use: https:///slack/events ``` ### 8. Invite the Scout @@ -191,7 +193,7 @@ curl -sf http://localhost:9102/health See [`deploy/SERVER_SETUP.md`](deploy/SERVER_SETUP.md) for the full Ubuntu 22.04 provisioning guide, and [`.github/workflows/cd.yml`](.github/workflows/cd.yml) for the CD pipeline. -Database backups run daily via [`.github/workflows/db-backup.yml`](.github/workflows/db-backup.yml), uploading `pg_dump` snapshots to Google Cloud Storage. +Database backups run daily via [`.github/workflows/db-backup.yml`](.github/workflows/db-backup.yml): **matrix jobs** for **`staging`** and **`production`** run **in parallel**, each using that **GitHub Environment’s** SSH secrets (same names as CD: `SERVER_HOST`, `SERVER_USER`, `SERVER_SSH_KEY`, optional `SERVER_PORT`). Dumps are uploaded to **`gs://insights-db-backups/paperscout//`** so staging and production stay under the shared **`paperscout`** prefix in the bucket. ## Scout Commands @@ -331,7 +333,7 @@ paperscout/ .github/workflows/ ci.yml Test matrix on push/PR to main cd.yml SSH deploy (git pull + build) on push to main - db-backup.yml Daily pg_dump to Google Cloud Storage + db-backup.yml Matrix pg_dump (staging + production) to GCS insights-db-backups/paperscout// ``` ### PostgreSQL Schema @@ -453,8 +455,8 @@ A `concurrency` group keyed by branch prevents overlapping deploys to the same e The `.github/workflows/db-backup.yml` workflow runs daily at 3 AM UTC (and supports manual dispatch): -1. SSHes into the server and runs `pg_dump` on the host's PostgreSQL -2. Uploads the dump to Google Cloud Storage (`gs://paperscout-backups/`) -3. Old backups are auto-pruned by a GCS lifecycle rule (30 days) +1. Runs **two jobs in parallel** (matrix: `staging`, `production`), each bound to the matching **GitHub Environment** so SSH secrets match that tier’s server (same secret names as CD). +2. On each host, runs `pg_dump` and uploads to **`gs://insights-db-backups/paperscout//`**, using object keys that include UTC time plus the GitHub Actions run id so backups do not collide on reruns. +3. Configure lifecycle rules on the bucket/prefixes as needed (for example, pruning objects older than 30 days). -CD secrets and variables are configured per **GitHub Environment** (`production` and `staging`); see the table in [Deployment](#deployment). Other secrets (e.g. database backups) are documented in [`deploy/SERVER_SETUP.md`](deploy/SERVER_SETUP.md#9-github-secrets-checklist). +SSH credentials for backups live under **each environment** (`staging`, `production`), not at the repository level — parallel to [Deployment](#deployment). See [`deploy/SERVER_SETUP.md`](deploy/SERVER_SETUP.md#9-github-secrets-and-environments). diff --git a/deploy/SERVER_SETUP.md b/deploy/SERVER_SETUP.md index ecacdff..e345143 100644 --- a/deploy/SERVER_SETUP.md +++ b/deploy/SERVER_SETUP.md @@ -100,10 +100,14 @@ rm /tmp/paperscout.dump ``` If the dump is stored in GCS (from the daily backup workflow), -download it directly on the new server instead: +download it directly on the new server instead — use the prefix that matches +the environment you are restoring (**`staging`** or **`production`**). Object +names include UTC time and the workflow run id (see §8); pick the file you need, +for example: ```bash -gsutil cp gs://paperscout-backups/paperscout-.dump /tmp/paperscout.dump +gsutil ls gs://insights-db-backups/paperscout// +gsutil cp gs://insights-db-backups/paperscout//paperscout-.dump /tmp/paperscout.dump pg_restore -U paperscout -h localhost -d paperscout --no-owner /tmp/paperscout.dump rm /tmp/paperscout.dump ``` @@ -180,7 +184,7 @@ Clone the repo into `/opt/paperscout`: ```bash sudo mkdir -p /opt -sudo git clone https://github.com//.git /opt/paperscout +sudo git clone https://github.com/cppalliance/paperscout.git /opt/paperscout sudo chown -R : /opt/paperscout ``` @@ -222,6 +226,14 @@ curl -sf http://localhost:9101/health | python3 -m json.tool docker compose logs -f paperscout ``` +### Example: staging-style host + +If you use a **separate** staging deployment (second clone path and GitHub Environment `staging`), typical placeholders are: + +- TLS / DNS: `sudo certbot --nginx -d staging.example.org` (replace with your real staging hostname when provisioning). +- Health check on the staging machine after mapping ports (see README CD table): `curl -sf http://localhost:9102/health` — use whatever port your staging compose publishes for health instead of `9102` if different. +- Slack **Request URL** when nginx proxies under `/paperscout/`: `https://staging.example.org/paperscout/slack/events`. + --- ## 7. Restoring from a GCS backup (optional) @@ -229,8 +241,9 @@ docker compose logs -f paperscout If migrating from another server with an existing database: ```bash -gsutil cp gs://paperscout-backups/paperscout-.dump /tmp/paperscout.dump -pg_restore -U paperscout -h localhost -d paperscout -c /tmp/paperscout.dump +gsutil ls gs://insights-db-backups/paperscout// +gsutil cp gs://insights-db-backups/paperscout//paperscout-.dump /tmp/paperscout.dump +pg_restore -U paperscout -h localhost -d paperscout -c --no-owner /tmp/paperscout.dump rm /tmp/paperscout.dump ``` @@ -238,26 +251,31 @@ rm /tmp/paperscout.dump ## 8. Database backups -The `db-backup.yml` GitHub Actions workflow SSHes into the server daily -and runs `pg_dump` + `gsutil cp` to upload to GCS. The VM's service -account handles authentication automatically — no credentials needed. +The `db-backup.yml` workflow runs **two parallel matrix jobs** (`staging` and +`production`). Each job uses the **GitHub Environment** with the same name, so +SSH secrets (`SERVER_HOST`, etc.) resolve per tier — matching CD. Each run uploads to: -The GCS bucket `paperscout-backups` should have a lifecycle rule to -auto-delete objects older than 30 days (configured in the Cloud Console -under the bucket's **Lifecycle** tab). +```text +gs://insights-db-backups/paperscout//paperscout---.dump +``` + +Object keys include the workflow run id so same-day reruns do not overwrite objects; each matrix job uses its own temp file on the host. --- -## 9. GitHub Secrets checklist +## 9. GitHub secrets and environments + +**Continuous deployment** (`cd.yml`) and **database backups** (`db-backup.yml`) +both use the **`staging`** and **`production`** GitHub Environments. Configure the **same SSH secret names** in each environment (values differ per server): -Configure these in the repo under **Settings → Secrets and variables → Actions**: +| Secret | Purpose | +| ---------------- | -------------------------------------------------------- | +| `SERVER_HOST` | SSH target host for that environment’s VM | +| `SERVER_USER` | SSH username (e.g. ``) | +| `SERVER_SSH_KEY` | Private SSH key for the deploy user | +| `SERVER_PORT` | SSH port (optional; default `22`) | -| Secret | Purpose | -| ---------------- | ----------------------------------- | -| `SERVER_HOST` | Server IP or hostname | -| `SERVER_USER` | SSH username (e.g. ``) | -| `SERVER_SSH_KEY` | Private SSH key for the deploy user | -| `SERVER_PORT` | SSH port (optional, defaults to 22) | +CD also uses **environment Variables** (`DEPLOY_PATH`, `DEPLOY_BRANCH`, `HEALTH_PORT`) — see the README Deployment table. Backup jobs only need the secrets above. `GITHUB_TOKEN` is provided automatically by GitHub Actions. -GCS authentication uses the VM's service account — no extra secrets needed. +GCS uploads use the VM's service account (`gsutil`) — ensure each server can write to `gs://insights-db-backups/paperscout//`. diff --git a/docs/onboarding.md b/docs/onboarding.md index f321c67..3a03c35 100644 --- a/docs/onboarding.md +++ b/docs/onboarding.md @@ -99,7 +99,7 @@ python -m paperscout - **Slack HTTP app** listens on `PORT` (default **3000**). - **Health** endpoint listens on `health_port` from settings (default **8080**) — `GET /health`. -For Slack Event Subscriptions you need a public URL (e.g. ngrok); see [README](../README.md#7-set-the-request-url). +For Slack Event Subscriptions you need a public URL (e.g. ngrok). With nginx and a `/paperscout/` prefix, the Request URL must include that path; see [README — Set the Request URL](../README.md#7-set-the-request-url). ## Deployment (summary) diff --git a/src/paperscout/__main__.py b/src/paperscout/__main__.py index 388f009..a4c4e2b 100644 --- a/src/paperscout/__main__.py +++ b/src/paperscout/__main__.py @@ -14,7 +14,14 @@ from .db import init_db, init_pool from .health import start_health_server from .monitor import Scheduler -from .scout import MessageQueue, create_app, notify_channel, notify_users, register_handlers +from .scout import ( + MessageQueue, + create_app, + enqueue_startup_status, + notify_channel, + notify_users, + register_handlers, +) from .sources import ISOProber, WG21Index from .storage import ProbeState, UserWatchlist @@ -131,6 +138,8 @@ def _on_poll_result(result): ) bolt_thread.start() + enqueue_startup_status(mq, state, paper_count_fn) + await scheduler.run_forever() diff --git a/src/paperscout/scout.py b/src/paperscout/scout.py index be0614f..1ccd272 100644 --- a/src/paperscout/scout.py +++ b/src/paperscout/scout.py @@ -440,8 +440,8 @@ def _show_watchlist( ) -def _handle_status(state: ProbeState, paper_count_fn, say, reply_opts: dict) -> None: - """Post loaded paper count, last poll, probe settings.""" +def format_status_message(state: ProbeState, paper_count_fn) -> str: + """Mrkdwn body for the interactive ``status`` command and startup channel post.""" from datetime import datetime as _dt from datetime import timezone as _tz @@ -449,21 +449,35 @@ def _handle_status(state: ProbeState, paper_count_fn, say, reply_opts: dict) -> last_str = ( _dt.fromtimestamp(last, tz=_tz.utc).strftime("%Y-%m-%d %H:%M:%S UTC") if last else "never" ) - say( - text=( - f"*Paperscout Status*\n" - f"• Papers loaded: {paper_count_fn():,}\n" - f"• Last poll: {last_str}\n" - f"• Poll interval: {settings.poll_interval_minutes} min\n" - f"• Discovered via probe: {len(state.get_all_discovered())}\n" - f"• ISO probing: {'enabled' if settings.enable_iso_probe else 'disabled'}\n" - f"• Alert window: {settings.alert_modified_hours}h\n" - f"• Cold cycle: 1/{settings.cold_cycle_divisor}" - ), - **reply_opts, + return ( + f"*Paperscout Status*\n" + f"• Papers loaded: {paper_count_fn():,}\n" + f"• Last poll: {last_str}\n" + f"• Poll interval: {settings.poll_interval_minutes} min\n" + f"• Discovered via probe: {len(state.get_all_discovered())}\n" + f"• ISO probing: {'enabled' if settings.enable_iso_probe else 'disabled'}\n" + f"• Alert window: {settings.alert_modified_hours}h\n" + f"• Cold cycle: 1/{settings.cold_cycle_divisor}" ) +def _handle_status(state: ProbeState, paper_count_fn, say, reply_opts: dict) -> None: + """Post loaded paper count, last poll, probe settings.""" + say(text=format_status_message(state, paper_count_fn), **reply_opts) + + +def enqueue_startup_status( + mq: MessageQueue, + state: ProbeState, + paper_count_fn, +) -> None: + """Post *status* summary to ``NOTIFICATION_CHANNEL`` once at process start.""" + channel = settings.notification_channel + if not channel: + return + mq.enqueue(channel, format_status_message(state, paper_count_fn)) + + def _handle_version(say, reply_opts: dict) -> None: """Post package version string.""" from . import __version__ diff --git a/tests/test_scout.py b/tests/test_scout.py index d4b13a6..e00e34a 100644 --- a/tests/test_scout.py +++ b/tests/test_scout.py @@ -19,6 +19,8 @@ _paper_link, _reply_opts, _show_watchlist, + enqueue_startup_status, + format_status_message, notify_channel, notify_users, register_handlers, @@ -483,6 +485,35 @@ def test_status_after_poll(self, fake_pool): assert "100" in text and "never" not in text +class TestFormatStatusMessage: + def test_matches_handle_status_output(self, fake_pool): + state = ProbeState(fake_pool) + say = MagicMock() + with patch("paperscout.scout.settings", _make_settings()): + expected = format_status_message(state, lambda: 42) + _handle_status(state, lambda: 42, say, {}) + assert say.call_args[1]["text"] == expected + + +class TestEnqueueStartupStatus: + def test_enqueues_when_channel_configured(self, fake_pool): + mq = MagicMock() + state = ProbeState(fake_pool) + with patch("paperscout.scout.settings", _make_settings(channel="C-alerts")): + enqueue_startup_status(mq, state, lambda: 7) + mq.enqueue.assert_called_once() + assert mq.enqueue.call_args[0][0] == "C-alerts" + assert "Paperscout Status" in mq.enqueue.call_args[0][1] + assert "7" in mq.enqueue.call_args[0][1] + + def test_skips_when_no_channel(self, fake_pool): + mq = MagicMock() + state = ProbeState(fake_pool) + with patch("paperscout.scout.settings", _make_settings(channel="")): + enqueue_startup_status(mq, state, lambda: 1) + mq.enqueue.assert_not_called() + + # ── register_handlers ─────────────────────────────────────────────────────────