repository: make check and delete work at N=1 with sha256 pack ids by mr-raj12 · Pull Request #9783 · borgbackup/borg

mr-raj12 · 2026-06-16T10:04:05Z

Description

borg check and per-object delete both assumed that a pack file's name is the chunk_id it holds. That is true only at N=1 while pack ids equal chunk ids. With BORG_TESTONLY_SHA256_PACK_ID=1 (and later at N>1) a pack is named sha256(pack_bytes), so the file name is the pack_id, not the chunk_id.

Two places broke under that:

Repository.check() rebuilt the chunk index from the bare packs/ listing and used the file name as the chunk_id. With sha256 pack ids it indexed every object under the wrong id and wrote that wrong index back to the repo cache, so the archive check that runs next reported all chunks as missing.
Repository.delete(id) deleted packs/{id}, which does not exist once the pack is named by content.

The object header already carries the chunk_id, meta size and data size (repoobj.py), so check can recover the real (chunk_id, offset, size) of each object from the header instead of trusting the file name, and delete can find the pack through the chunk index.

Changes:

repoobj.py: add RepoObj.iter_object_headers(), which walks a pack's object headers and yields (chunk_id, obj_offset, obj_size). Handles one object per pack (N=1) and several (N>1).
repository.py: check() rebuilds the index from iter_object_headers(); delete() resolves the pack via the chunk index, drops it, and removes the id from the index.
repository_test.py: give test objects a real chunk_id so identical payloads do not collapse into one pack under sha256, and make the two pack_id == chunk_id assertions aware of the switch.
check_cmd_test.py: test_empty_repository drops packs through the store, since list() yields pack ids, not chunk ids.

Verified with the local test subset, with and without BORG_TESTONLY_SHA256_PACK_ID=1: the repository-level checks pass under the switch and nothing regresses without it. Archive-level check --repair under sha256 pack ids is left for a separate change.

refs #8572

Checklist

PR is against master
New code has tests and docs where appropriate
Tests pass (ran the relevant test subset)
Commit messages are clean and reference related issues

check rebuilds the chunk index from object headers (real chunk_id) instead of the pack file name; delete resolves the pack via the chunk index, not pack_id==chunk_id.

codecov · 2026-06-16T10:25:12Z

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.88%. Comparing base (2eef6c8) to head (28c817b).
⚠️ Report is 5 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/borg/repository.py	71.42%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #9783   +/-   ##
=======================================
  Coverage   84.88%   84.88%           
=======================================
  Files          92       92           
  Lines       15172    15180    +8     
  Branches     2275     2277    +2     
=======================================
+ Hits        12879    12886    +7     
- Misses       1589     1591    +2     
+ Partials      704      703    -1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

ThomasWaldmann · 2026-06-16T10:14:32Z

        pytest.skip("only works locally")
    check_cmd_setup(archiver)
    with Repository(archiver.repository_location, exclusive=True) as repository:
+        # list() yields pack_ids (== the packs/* file names), so drop the packs directly via the


Repository.list yielded chunk ids. That it yields pack ids is just because it has not been adapted to packs yet, so that is a bug, not a feature.

ThomasWaldmann · 2026-06-16T10:17:00Z

+            # embed the chunk_id in the object header so each object's bytes -- and thus its
+            # sha256 pack id -- are unique; identical bytes would otherwise collapse to a single
+            # pack under BORG_TESTONLY_SHA256_PACK_ID.
+            repository.put(H(x), fchunk(b"SOMEDATA", chunk_id=H(x)))  # put() updates _chunks via PackWriter


better move that comment to fchunk definition.

ThomasWaldmann · 2026-06-16T10:20:50Z

+        Each object's identity and extent are recovered from its on-disk header, so callers do
+        not have to assume the pack file name equals the chunk_id. That assumption only holds at
+        N=1 without sha256 pack ids; once a pack is named sha256(pack_bytes) (N>1, or with the
+        BORG_TESTONLY_SHA256_PACK_ID switch) the file name is the pack_id, not the chunk_id.
+        Works for a single-object pack (N=1) and for multi-object packs (N>1).
+        """


you will have to fix/remove all that text talking about the N=1 packid==chunkid hack later.

ThomasWaldmann · 2026-06-16T10:30:23Z

+        key = "packs/" + bin_to_hex(entry.pack_id)
        try:
            self.store.delete(key)
        except StoreObjectNotFound:
            raise self.ObjectNotFound(id, str(self._location)) from None
+        del self.chunks[id]


I repeat myself: do not introduce code that is very harmful for the general case.

The API user wants to delete one chunk and you delete the whole pack here.

Did you read my last review? Did you write this code yourself? Did you use AI and not read its output?

Maybe just do the index lookup, log a warning like "ignoring deletion of CHUNKID in PACKID" and just do nothing else for now until we have a better idea.

Also: TODO fix this later

repository: make check and delete work at N=1 with sha256 pack_ids

28c817b

check rebuilds the chunk index from object headers (real chunk_id) instead of the pack file name; delete resolves the pack via the chunk index, not pack_id==chunk_id.

ThomasWaldmann requested changes Jun 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

repository: make check and delete work at N=1 with sha256 pack ids#9783

repository: make check and delete work at N=1 with sha256 pack ids#9783
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:repository-check-delete-sha256-pack-ids

mr-raj12 commented Jun 16, 2026

Uh oh!

codecov Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

ThomasWaldmann Jun 16, 2026

Uh oh!

ThomasWaldmann Jun 16, 2026

Uh oh!

ThomasWaldmann Jun 16, 2026

Uh oh!

ThomasWaldmann Jun 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mr-raj12 commented Jun 16, 2026

Description

Checklist

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ThomasWaldmann Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

ThomasWaldmann Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 16, 2026 •

edited

Loading

ThomasWaldmann Jun 16, 2026 •

edited

Loading