Skip to content

repository: make check and delete work at N=1 with sha256 pack ids#9783

Open
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:repository-check-delete-sha256-pack-ids
Open

repository: make check and delete work at N=1 with sha256 pack ids#9783
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:repository-check-delete-sha256-pack-ids

Conversation

@mr-raj12

@mr-raj12 mr-raj12 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Description

borg check, delete and compact all assumed that a pack file's name is the chunk_id it holds. That is true only at N=1 while pack ids equal chunk ids. With BORG_TESTONLY_SHA256_PACK_ID=1 (and later at N>1) a pack is named sha256(pack_bytes), so the file name is the pack_id, not the chunk_id.

The object header already carries the chunk_id, meta size and data size (repoobj.py), so the chunk index can be rebuilt from the headers (real chunk_ids) instead of the file names, and removal can find the pack through the index.

Changes:

  • repoobj.py: add RepoObj.iter_object_headers(), which walks a pack's object headers and yields (chunk_id, obj_offset, obj_size). Works for one object per pack (N=1) and for several (N>1).
  • repository.py:
    • delete() becomes a logged no-op. A pack may hold several objects, so we can not remove one object by dropping its pack without losing the others; real removal goes through store_delete at the pack level.
    • list() iterates the chunk index and yields chunk_ids, instead of listing packs/ (which would yield pack_ids).
  • cache.py: build_chunkindex_from_repo() rebuilds the index from store_list + iter_object_headers (real chunk_id/offset/size) rather than calling Repository.list(), which would recurse into the very index it is building. Adds an init_flags argument so compact can start the chunks as unused and compute usage itself.
  • compact_cmd.py: --stats builds the index from a real header scan instead of a pack_id == chunk_id fake; the delete loop drops each unused chunk's pack via store_delete.
  • archive.py / debug_cmd.py: check --repair and debug delete-obj resolve the pack via the chunk index and then store_delete it; debug delete-obj maps a missing pack (stale index entry) back to "not found".
  • tests: a delete_chunk helper drops a chunk's pack at the store level (used by the check / extract / mount damage tests), since Repository.delete no longer removes anything; test_empty_repository drops packs through the store; test objects get a unique chunk_id so identical payloads do not collapse into one pack under sha256.

Verified with the local test subset, with and without BORG_TESTONLY_SHA256_PACK_ID=1: the repository- and archive-level missing-chunk checks pass under the switch and nothing regresses without it. Archive --verify-data corruption detection under sha256 pack ids (a few check --verify-data tests) is left for a separate change.

refs #8572

Checklist

  • PR is against master
  • New code has tests and docs where appropriate
  • Tests pass (ran the relevant test subset)
  • Commit messages are clean and reference related issues

@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.21053% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.84%. Comparing base (2eef6c8) to head (4121509).
⚠️ Report is 14 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/borg/cache.py 69.23% 3 Missing and 1 partial ⚠️
src/borg/repository.py 83.33% 1 Missing and 2 partials ⚠️
src/borg/archive.py 50.00% 1 Missing ⚠️
src/borg/archiver/debug_cmd.py 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9783      +/-   ##
==========================================
- Coverage   84.88%   84.84%   -0.05%     
==========================================
  Files          92       92              
  Lines       15172    15166       -6     
  Branches     2275     2281       +6     
==========================================
- Hits        12879    12867      -12     
- Misses       1589     1594       +5     
- Partials      704      705       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

Comment thread src/borg/testsuite/archiver/check_cmd_test.py Outdated
Comment thread src/borg/testsuite/repository_test.py Outdated
Comment thread src/borg/repoobj.py Outdated
Comment thread src/borg/repository.py Outdated
@mr-raj12 mr-raj12 force-pushed the repository-check-delete-sha256-pack-ids branch from 28c817b to f2ca1ff Compare June 16, 2026 20:34
A pack is named sha256(pack_bytes), not chunk_id. delete becomes a no-op
(real removal is store_delete at the pack level); list iterates the chunk
index; build_chunkindex_from_repo reads pack headers for real chunk_ids and
gains init_flags so compact computes usage; tests drop a chunk's pack.
@mr-raj12 mr-raj12 force-pushed the repository-check-delete-sha256-pack-ids branch from f2ca1ff to 4121509 Compare June 16, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants