Skip to content

fix(backup): verify DB dump and archive integrity to prevent silent corrupt backups#483

Open
mrrobot47 wants to merge 3 commits into
EasyEngine:developfrom
mrrobot47:fix/backup-integrity
Open

fix(backup): verify DB dump and archive integrity to prevent silent corrupt backups#483
mrrobot47 wants to merge 3 commits into
EasyEngine:developfrom
mrrobot47:fix/backup-integrity

Conversation

@mrrobot47

Copy link
Copy Markdown
Member

Summary

Closes three silent-data-loss holes in the backup path so a failed/corrupt backup can never upload as "successful" and overwrite a good one.

Fixes

  1. mysqldump failure produced a silent empty backup. The dump runs via a shell > redirect, which truncates/creates the target before mysqldump runs — so a failed dump left a 0-byte file that passed the old fs->exists()-only check, got archived, and uploaded as success (after which cleanup_old_backups() could prune the good ones). The dump now runs via EE::launch('ee shell … --command=…') so its exit code is captured (EE::run_command() returns void and couldn't), and is validated by return_code === 0 AND exists AND filesize > 0.
  2. Unchecked mv could ship a database-less backup. After the dump, the staging mv into sql/ was unchecked; if it failed (cross-device, permissions, disk-full, …), sql/ was left empty, 7z u on an empty dir exits 0, and 7z t passed — a backup with no database shipping as success. The mv result is now checked and the destination asserted (exists + filesize > 0) before archiving.
  3. No archive integrity verification before upload. A new verify_archive_integrity() runs 7z t (treating exit ≥ 2 as failure, so a non-fatal 7z warning doesn't abort) on every archive rclone_upload() ships — the primary <site>.zip, conf.zip (nginx + php), and the optional custom-docker-compose zip (warn-and-exclude, preserving its optional semantics) — aborting before upload if any is corrupt.

Plus: DB credentials are escapeshellarg()-quoted in the mysqldump command (matching restore_db()/get_db_size()); the comment notes this is best-effort layer-1 quoting (the inner ee shell bash -c "…" layer still can't carry arbitrary `/"/$).

Verification

php -l passes. Reviewed by two independent reviewers (correctness + design); both empirically confirmed in containers that ee shell --command propagates the inner exit code, that a failed mv/empty sql/ slips past 7z t (the merge-blocker, now fixed), and that the exit-code check catches mid-stream mysqldump failures (OOM→137, lost-connection→3, disk-full→5).

Merge note

Built on develop independently of the EasyDash callback PR (#481); both happened to introduce error code 4003 for different failures, so whichever merges second should renumber its 4003 to a free code (trivial). This PR's 4003 = "database dump failed to stage" (ERROR_TYPE_DATABASE).

Three related fixes in Site_Backup_Restore.php:

- backup_db(): the mysqldump shell redirect (`> file`) creates/truncates
  the target before mysqldump runs, so a failed dump left a 0-byte file
  that passed exists() and was uploaded as a "successful" backup. Capture
  the dump's exit code (via `ee shell` through EE::launch) and assert
  filesize() > 0 before treating the dump as valid.
- Add a `7z t` integrity test (verify_archive_integrity()) on the primary
  site/wp-content archive after creation and before rclone_upload(), so a
  corrupt archive aborts instead of overwriting a good remote backup.
- Escape DB user/password/host/name with escapeshellarg() in the mysqldump
  command, matching restore_db()/get_db_size().
Follow-up hardening from review of the backup-integrity changes:

- backup_db(): the dump-staging `mv` ran with its return value ignored.
  A failed mv (cross-device, permissions, disk-full, etc.) left sql/ empty;
  `7z u` on an empty dir exits 0 and the new `7z t` check passes, so a
  backup with NO database shipped as "successful". Now check the mv return
  and assert the destination exists and is non-empty, failing loudly
  otherwise. Also escapeshellarg the mv arguments.
- verify_archive_integrity(): use EE::launch with `return_code < 2` instead
  of EE::exec, so a non-fatal 7z warning (exit 1) no longer aborts the
  whole backup. Matches every other 7z call in this file.
- Extend integrity verification to the config archives: backup_nginx_conf()
  and backup_php_conf() now verify conf.zip, and the optional custom
  docker-compose archive is integrity-tested too (warn-and-exclude, since
  that archive is optional) so no archive ships to remote storage unverified.
- Soften the mysqldump credential-escaping comment: it is best-effort
  layer-1 quoting; the inner `ee shell` bash -c wrapper still cannot carry
  arbitrary shell metacharacters.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the site backup workflow to prevent silent corruption from being treated as a successful backup, ensuring bad artifacts are detected before they can be uploaded and overwrite good backups.

Changes:

  • Capture and validate mysqldump exit status and ensure the resulting dump file exists and is non-empty.
  • Check mv staging of the dump into sql/ and assert the staged dump exists and is non-empty before archiving.
  • Add 7z t integrity verification for produced archives (including optional custom docker-compose archive, with warn-and-exclude semantics).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/helper/Site_Backup_Restore.php Outdated
Comment on lines +692 to +695
// Best-effort layer-1 quoting of DB credentials (consistent with restore_db()/get_db_size()).
// NOTE: the value still passes through a second double-quoted `bash -c "$command"` layer
// inside `ee shell` that escapeshellarg cannot protect, so a password containing ` " or $
// can still break the dump. Fully hardening that inner wrapper is out of scope here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ac60000 — the comment now references only get_db_size() (which does use escapeshellarg()), since restore_db() uses manual single-quoting. (restore_db() itself is switched to escapeshellarg() in the separate restore PR.)

The comment claimed consistency with restore_db(), which uses manual single-quoting rather than escapeshellarg(); reference get_db_size() (which does) so the note isn't misleading.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants