Skip to content

fix(pipeline): preserve ADR across full re-index (#516)fix(pipeline): preserve ADR across full re-index (#516)#539

Open
RithvikReddy0-0 wants to merge 2 commits into
DeusData:mainfrom
RithvikReddy0-0:fix/516-adr-persistence
Open

fix(pipeline): preserve ADR across full re-index (#516)fix(pipeline): preserve ADR across full re-index (#516)#539
RithvikReddy0-0 wants to merge 2 commits into
DeusData:mainfrom
RithvikReddy0-0:fix/516-adr-persistence

Conversation

@RithvikReddy0-0

Copy link
Copy Markdown

What does this PR do?

Fixes #516. ADRs stored via manage_adr were silently deleted on a full
re-index. Captures the ADR before the DB is deleted and restores it after the
rebuild. Only the full-dump path was affected (incremental never rewrites the DB).

Verified: index → store ADR → change/add files → re-index → ADR survives
(previously returned no_adr).

Checklist

  • Every commit is signed off (git commit -s) — required, CI rejects
    unsigned commits (DCO, see CONTRIBUTING.md)
  • Tests pass locally (make -f Makefile.cbm test)
  • Lint passes (make -f Makefile.cbm lint-ci)
  • New behavior is covered by a test (reproduce-first for bug fixes)

Built and manually verified in WSL; ran clang-format. Full make test requires
the Unix toolchain and timed out locally — happy to add a regression test if
pointed to the right test file.

manage_adr stores ADRs in project_summaries, but a full re-index
(triggered by file changes or new files) deletes the DB in
try_incremental_or_delete_db and rebuilds it from the graph buffer,
which writes an empty project_summaries table. file_hashes were
re-persisted after the rebuild but project_summaries were not, so the
ADR was silently lost.

Fix: capture the ADR before the DB is unlinked, stash it on the
pipeline struct, and restore it after the rebuilt DB is reopened in
dump_and_persist_hashes. The incremental path is unaffected (it never
rewrites the DB). Verified: ADR now survives a full re-index.

Signed-off-by: RithvikReddy0-0 <rithvikreddymukkara@gmail.com>
@DeusData

Copy link
Copy Markdown
Owner

Thanks @RithvikReddy0-0 — the ADR save/restore around the delete/rebuild path is the right fix for #516.

Two things before merge:

  1. Add a reproduce-first test (the checklist boxes are unchecked) — index a repo with an ADR, trigger a full re-index, assert the ADR survives. That's the regression guard the project requires for every fix.
  2. Please double-check the lifetime of p->saved_adr: on an early-exit path (an error before dump_and_persist_hashes), is it freed? If the pipeline teardown doesn't free(p->saved_adr), that's a leak — add the free on the appropriate path.

(For info: the doubled commit title is just a cosmetic git artifact — the diff itself is clean and not duplicated.) 🙏

@DeusData

Copy link
Copy Markdown
Owner

Confirming the two concerns from a closer pass:

  1. The saved_adr leak is real — on the cbm_gbuf_dump_to_sqlite failure early-return in dump_and_persist_hashes, p->saved_adr is freed neither there nor in cbm_pipeline_free, so it leaks on that error path. Free it on that path (or add it to cbm_pipeline_free).
  2. cbm_store_adr_store's return value isn't checked, so a failed restore silently drops the ADR — the same symptom as the original bug. Please surface that error.

Both can go in alongside the reproduce-first test. 🙏

…ata#516)

Addresses review on DeusData#539.

- Free p->saved_adr in cbm_pipeline_free so it is not leaked on error
  paths that exit before the restore in dump_and_persist_hashes (e.g. a
  cbm_gbuf_dump_to_sqlite failure).
- Check cbm_store_adr_store's return value on restore and log an error
  instead of silently dropping the ADR (the original DeusData#516 symptom).
- Add reproduce-first test pipeline_adr_survives_full_reindex: index,
  store an ADR, force a full re-index by adding files, assert the ADR
  survives unchanged. Passes against the fix.

Signed-off-by: RithvikReddy0-0 <rithvikreddymukkara@gmail.com>
@RithvikReddy0-0

Copy link
Copy Markdown
Author

Thanks — all three addressed in 2f434d4.

Leak. free(p->saved_adr) now lives in cbm_pipeline_free, so it's released on every exit path including the cbm_gbuf_dump_to_sqlite failure early-return. The successful-restore path still sets it to NULL after freeing in dump_and_persist_hashes, so there's no double-free.

Unchecked restore. cbm_store_adr_store's return is now checked; a failed restore logs pipeline.err phase=adr_restore instead of silently dropping the ADR.

Test. Added pipeline_adr_survives_full_reindex (reproduce-first): index a repo, store an ADR, force a full re-index by adding files past the incremental threshold, assert the ADR is present and unchanged. Passes against the fix (pipeline_adr_survives_full_reindex PASS).

One note from running the full suite locally: 5619 passed, 1 failed, and the single failure is incr_full_index (tests/test_incremental.c:302 — RSS delta 2560MB vs 2048MB ceiling on a 1100-file project). That's unrelated to this change — the fix only adds a strdup of an ≤8KB ADR string, which can't move RSS by ~512MB. It's the ASAN/UBSan-instrumented test binary inflating memory past a threshold tuned for a native non-instrumented run, and the file isn't touched by this PR. Flagging it for transparency rather than claiming a clean sweep.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

manage_adr data loss: ADRs in project_summaries are deleted during graph re-indexing

2 participants