Skip to content

repository: make check and delete work at N=1 with sha256 pack ids#9783

Open
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:repository-check-delete-sha256-pack-ids
Open

repository: make check and delete work at N=1 with sha256 pack ids#9783
mr-raj12 wants to merge 1 commit into
borgbackup:masterfrom
mr-raj12:repository-check-delete-sha256-pack-ids

Conversation

@mr-raj12

Copy link
Copy Markdown
Contributor

Description

borg check and per-object delete both assumed that a pack file's name is the chunk_id it holds. That is true only at N=1 while pack ids equal chunk ids. With BORG_TESTONLY_SHA256_PACK_ID=1 (and later at N>1) a pack is named sha256(pack_bytes), so the file name is the pack_id, not the chunk_id.

Two places broke under that:

  • Repository.check() rebuilt the chunk index from the bare packs/ listing and used the file name as the chunk_id. With sha256 pack ids it indexed every object under the wrong id and wrote that wrong index back to the repo cache, so the archive check that runs next reported all chunks as missing.
  • Repository.delete(id) deleted packs/{id}, which does not exist once the pack is named by content.

The object header already carries the chunk_id, meta size and data size (repoobj.py), so check can recover the real (chunk_id, offset, size) of each object from the header instead of trusting the file name, and delete can find the pack through the chunk index.

Changes:

  • repoobj.py: add RepoObj.iter_object_headers(), which walks a pack's object headers and yields (chunk_id, obj_offset, obj_size). Handles one object per pack (N=1) and several (N>1).
  • repository.py: check() rebuilds the index from iter_object_headers(); delete() resolves the pack via the chunk index, drops it, and removes the id from the index.
  • repository_test.py: give test objects a real chunk_id so identical payloads do not collapse into one pack under sha256, and make the two pack_id == chunk_id assertions aware of the switch.
  • check_cmd_test.py: test_empty_repository drops packs through the store, since list() yields pack ids, not chunk ids.

Verified with the local test subset, with and without BORG_TESTONLY_SHA256_PACK_ID=1: the repository-level checks pass under the switch and nothing regresses without it. Archive-level check --repair under sha256 pack ids is left for a separate change.

refs #8572

Checklist

  • PR is against master
  • New code has tests and docs where appropriate
  • Tests pass (ran the relevant test subset)
  • Commit messages are clean and reference related issues

check rebuilds the chunk index from object headers (real chunk_id) instead of the pack file name; delete resolves the pack via the chunk index, not pack_id==chunk_id.
@codecov

codecov Bot commented Jun 16, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.88%. Comparing base (2eef6c8) to head (28c817b).
⚠️ Report is 5 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/borg/repository.py 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #9783   +/-   ##
=======================================
  Coverage   84.88%   84.88%           
=======================================
  Files          92       92           
  Lines       15172    15180    +8     
  Branches     2275     2277    +2     
=======================================
+ Hits        12879    12886    +7     
- Misses       1589     1591    +2     
+ Partials      704      703    -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

pytest.skip("only works locally")
check_cmd_setup(archiver)
with Repository(archiver.repository_location, exclusive=True) as repository:
# list() yields pack_ids (== the packs/* file names), so drop the packs directly via the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repository.list yielded chunk ids. That it yields pack ids is just because it has not been adapted to packs yet, so that is a bug, not a feature.

Comment on lines +82 to +85
# embed the chunk_id in the object header so each object's bytes -- and thus its
# sha256 pack id -- are unique; identical bytes would otherwise collapse to a single
# pack under BORG_TESTONLY_SHA256_PACK_ID.
repository.put(H(x), fchunk(b"SOMEDATA", chunk_id=H(x))) # put() updates _chunks via PackWriter

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better move that comment to fchunk definition.

Comment thread src/borg/repoobj.py
Comment on lines +45 to +50
Each object's identity and extent are recovered from its on-disk header, so callers do
not have to assume the pack file name equals the chunk_id. That assumption only holds at
N=1 without sha256 pack ids; once a pack is named sha256(pack_bytes) (N>1, or with the
BORG_TESTONLY_SHA256_PACK_ID switch) the file name is the pack_id, not the chunk_id.
Works for a single-object pack (N=1) and for multi-object packs (N>1).
"""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will have to fix/remove all that text talking about the N=1 packid==chunkid hack later.

Comment thread src/borg/repository.py
Comment on lines +822 to +827
key = "packs/" + bin_to_hex(entry.pack_id)
try:
self.store.delete(key)
except StoreObjectNotFound:
raise self.ObjectNotFound(id, str(self._location)) from None
del self.chunks[id]

@ThomasWaldmann ThomasWaldmann Jun 16, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I repeat myself: do not introduce code that is very harmful for the general case.

The API user wants to delete one chunk and you delete the whole pack here.

Did you read my last review? Did you write this code yourself? Did you use AI and not read its output?

Maybe just do the index lookup, log a warning like "ignoring deletion of CHUNKID in PACKID" and just do nothing else for now until we have a better idea.

Also: TODO fix this later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants