fix(backup): clean up staging/temp files on abnormal exit and tighten disk-space pre-check#482
Conversation
Resource-cleanup, disk-estimate accuracy and minor correctness fixes in
Site_Backup_Restore, all scoped to non-lock/non-archive concerns:
- Register shutdown handlers that purge the local staging dir
(EE_BACKUP_DIR/<site_url>) and the leftover temp SQL query files on any
abnormal exit. The success path clears the tracked state so the handlers
no-op; previously an error/crash leaked the half-built archive (disk fill)
or a stale partial download a later restore could reuse.
- pre_backup_check(): size the full set of dirs actually archived per site
type (app/ + config/, not just app/htdocs) plus DB, add headroom for the
transient uncompressed dump and growing archive, and fail up front when
free space is undeterminable instead of treating it as 0.
- dir_size()/get_db_size(): warn instead of silently counting an
unreadable/unparseable size as 0, so the estimate isn't quietly low.
- cleanup_old_backups(): drop the off-by-one (`> N + 1`) so retention
triggers at exactly N, matching the array_slice that keeps N.
- list_remote_backups(): filter blank lines so the empty-listing
("No remote backups found") path is actually reachable.
Address review findings on the cleanup/robustness changes: - restore(): set $staging_dir and register cleanup_staging_dir only AFTER pre_restore_check() has acquired this site's lock (mirroring backup()). Registering it earlier let an early exit (invalid backup id, or a lock held by another process) delete the staging dir a concurrent backup/restore of the same site was actively writing into -- data loss. - cleanup_staging_dir()/cleanup_temp_query_files(): wrap each shutdown safety-net body in try/catch ( \Throwable ). Symfony Filesystem::remove() throws IOException on a failed unlink (e.g. permission denied under the www-data-owned web root); a throw in the first shutdown handler would abort the remaining handlers, leaking the global lock and skipping the EasyDash failure callback. - pre_backup_check(): count the DB size once. $site_size already includes it, so multiplying ( $site_size + $db_size ) over-counted the DB ~2.2x and false-rejected DB-heavy sites; require ceil( $site_size * 1.1 ) instead. - dir_size(): build the recursive iterator with CATCH_GET_CHILD (and LEAVES_ONLY) so an untraversable subdirectory is skipped instead of throwing UnexpectedValueException and aborting the whole backup.
There was a problem hiding this comment.
Pull request overview
This PR hardens the backup/restore helper by ensuring staging artifacts and temporary query files are cleaned up on abnormal exits, and by improving disk-space preflight estimation/robustness so failures don’t leave behind large partial state.
Changes:
- Add shutdown-time cleanup for per-site staging directories and temporary SQL query files to prevent disk fill and stale partial artifacts after crashes/interrupts.
- Tighten disk-space pre-check sizing (include
app/+config/, add 10% headroom, guarddisk_free_space()), and makedir_size()resilient to missing/untraversable paths. - Fix minor correctness issues in remote backup listing (empty output) and retention logic (off-by-one).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| $this->fs->remove( EE_BACKUP_DIR . '/' . $this->site_data['site_url'] . '.lock' ); | ||
| EE::error( "Site app directory does not exist: $site_root/app" ); |
There was a problem hiding this comment.
Fixed in e92c356 — per-site lock removals now go through a best-effort try_remove_site_lock() (try/catch), so a Filesystem::remove() failure on an error path can no longer mask the intended EE::error().
| $this->fs->remove( EE_BACKUP_DIR . '/' . $this->site_data['site_url'] . '.lock' ); | ||
| EE::error( 'Unable to determine free disk space for backup directory.' ); |
There was a problem hiding this comment.
Fixed in e92c356 — per-site lock removals now go through a best-effort try_remove_site_lock() (try/catch), so a Filesystem::remove() failure on an error path can no longer mask the intended EE::error().
| $db_size = 0; | ||
| $site_size = $this->dir_size( $site_root . '/app' ); | ||
| $site_size += $this->dir_size( $site_root . '/config' ); | ||
|
|
||
| EE::debug( 'Site size (files): ' . $site_size ); |
There was a problem hiding this comment.
Fixed in e92c356 — inlined the single-use $db_size directly into $site_size.
Addresses code-review feedback: Filesystem::remove() throws on a failed unlink, so removing the lock on an error path could raise an uncaught exception and mask the intended EE::error(). Route all per-site lock removals through a best-effort try_remove_site_lock() helper. Also inline the single-use $db_size into $site_size.
Summary
Resource-cleanup, disk-estimate accuracy, and minor-correctness hardening for the backup/restore flow.
Changes
EE_BACKUP_DIR/<site>) and a restore's downloaded archive were removed only on success; any error/OOM/Ctrl-C left them to fill the disk (and a stale partial download could be reused). Aregister_shutdown_functionnow purges the staging dir on abnormal exit, with the tracked path cleared after the success-path remove so it only fires on failure. Inrestore()the tracking is armed only after the per-site lock is acquired, so a lock-conflict/bad-id early-exit can't delete a dir a concurrent operation of the same site is writing into.query.sql/db_size_query.sqlwritten into the live web root are removed inline; a shutdown handler is a safety net for an interrupt between write and remove. Both cleanup handlers are wrapped intry/catchso a failed unlink (Symfony'sFilesystem::remove()throws) can never abort the other shutdown handlers (lock release, dash callback).app/+config/, not justapp/htdocs), guardsfalse === disk_free_space()and a missingapp/, and warns (instead of silently counting 0) when a file or the DB size can't be read. Required space isceil(total_source * 1.1)(DB counted once — it's already in the source total — plus 10% slack), avoiding both the old under-count and a DB double-count that would false-reject DB-heavy sites.dir_size()robustness. Built withCATCH_GET_CHILD | LEAVES_ONLYso an untraversable subdirectory is skipped rather than throwingUnexpectedValueExceptionand aborting the whole backup; a missing dir returns 0 (some archived dirs are optional).cleanup_old_backups()off-by-one. Retention triggered atN+1; now triggers atNto match thearray_slicethat keepsN.list_remote_backups()empty result.explode(PHP_EOL, trim(''))yields[''], making the "No remote backups found" branch unreachable; now filters blank lines.Verification
php -lpasses. Reviewed by two independent reviewers (correctness + design); both must-fixes they raised — the concurrent-restore staging-dir ordering (data-loss), the unguarded shutdown handlers, the DB double-count false-reject, and thedir_sizeabort — are incorporated and confirmed.Merge note
Built on
develop(file-existence per-site lock). The parallel lock-robustness PR (#480) converts that lock toflock; the new disk-check error paths here still callfs->remove(<site>.lock)per the current pattern, so a trivial reconciliation will be needed depending on merge order.