Skip to content

Persist compact JSON freeze-frame on fault confirm#491

Open
mfaferek93 wants to merge 4 commits into
mainfrom
feat/freeze-frame-json
Open

Persist compact JSON freeze-frame on fault confirm#491
mfaferek93 wants to merge 4 commits into
mainfrom
feat/freeze-frame-json

Conversation

@mfaferek93

@mfaferek93 mfaferek93 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

On the transition to CONFIRMED, snapshot capture now also persists a compact JSON freeze-frame of the captured topic values to SQLite, keyed by fault_code and retained across clear_fault (unlike rosbag snapshots).

  • Capture set is config-driven via the existing per-fault-code snapshot topic resolution; a fault code with no capture configuration writes no freeze-frame row (a lookup returns not-found).
  • Rows survive clear_fault, startup reclassification, and a DB reopen; a genuine re-capture replaces the row.
  • Note: timer-driven confirmations do not enqueue snapshot capture (existing capture-path scope), so they produce no freeze-frame either.

Tested: freeze-frame unit + storage suites on both backends and capture-pool end-to-end; full fault_manager suite passes.

Closes #476

On a transition to CONFIRMED, write a compact JSON dict of the resolved
capture-set topic values to a new SQLite freeze_frames table, keyed by
fault_code and retained across clear_fault. Values come from the captured
/plc/* topics via SnapshotCapture::resolve_topics; an unconfigured fault
code resolves to no capture set, so no frame is written (no overhead).

Refs #476
Copilot AI review requested due to automatic review settings July 2, 2026 11:00

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds persistent “freeze-frame” capture support to the fault manager so that, on snapshot capture, a compact JSON object of captured topic values is stored per fault_code and retained across clear_fault and DB reopen (SQLite + in-memory), with accompanying unit tests.

Changes:

  • Extend FaultStorage with store_freeze_frame() / get_freeze_frame() and implement them for SQLite and in-memory backends.
  • Persist a new freeze_frames SQLite table and write freeze-frame JSON during SnapshotCapture::capture().
  • Add unit tests covering freeze-frame storage semantics (replace-on-recapture, survives clear, persists across reopen) and an end-to-end capture test that validates JSON content.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/ros2_medkit_fault_manager/test/test_sqlite_storage.cpp Adds SQLite freeze-frame storage tests (CRUD, replace, survive clear, persist reopen).
src/ros2_medkit_fault_manager/test/test_snapshot_capture.cpp Adds end-to-end capture test that validates freeze-frame JSON content; adds “unconfigured/disabled” cases.
src/ros2_medkit_fault_manager/test/test_fault_manager.cpp Adds in-memory freeze-frame storage behavior tests.
src/ros2_medkit_fault_manager/src/sqlite_fault_storage.cpp Creates freeze_frames table and implements store/get queries.
src/ros2_medkit_fault_manager/src/snapshot_capture.cpp Accumulates per-capture JSON dict and persists it as a freeze-frame keyed by fault_code.
src/ros2_medkit_fault_manager/src/fault_storage.cpp Implements in-memory freeze-frame persistence.
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/sqlite_fault_storage.hpp Adds freeze-frame API overrides to SQLite storage interface.
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/snapshot_capture.hpp Updates capture helpers to also fill a freeze-frame JSON object.
src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/fault_storage.hpp Introduces FreezeFrameData and adds freeze-frame API to FaultStorage.
src/ros2_medkit_fault_manager/CMakeLists.txt Adds std_msgs dependency for snapshot capture test publisher usage.
Comments suppressed due to low confidence (1)

src/ros2_medkit_fault_manager/src/snapshot_capture.cpp:330

  • capture_topic_from_cache() holds cache_mutex_ while calling storage_->store_snapshot() and while parsing JSON for the freeze-frame. On the background-capture path this can block the subscription callback from updating the cache, increasing contention and latency. Copy the cached payload under the lock, then release the lock before SQLite I/O and JSON parsing.
  std::lock_guard<std::mutex> lock(cache_mutex_);

  auto it = message_cache_.find(topic);
  if (it == message_cache_.end() || it->second.data.empty()) {
    RCLCPP_DEBUG(node_->get_logger(), "No cached data for topic '%s'", topic.c_str());
    return false;
  }

  const auto & cached = it->second;

Comment thread src/ros2_medkit_fault_manager/src/snapshot_capture.cpp
Comment thread src/ros2_medkit_fault_manager/test/test_snapshot_capture.cpp
Document that an unconfigured fault code writes no freeze_frames row
(lookup returns nullopt); no empty {} row is ever persisted. Join the
test publisher thread via RAII so an assertion failure cannot terminate.

Refs #476

@bburda bburda left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings outside the diff:

  • README not updated: ros2_medkit_fault_manager/README.md documents the snapshot subsystem, including that snapshots are deleted on clear, but not the new freeze-frame captured on the same confirm and retained across clear_fault - the exact contrasting behavior that section already describes. Per the docs gate, add it.

  • Two test-coverage gaps on the new code: the background_capture=true cache path (freeze_frame[topic] = json::parse(cached.data) with the raw-string fallback) is never exercised - the new e2e test uses background_capture=false only; and the empty-{} row case (a configured fault whose topics all fail to sample writes a {} frame, so get_freeze_frame() is non-null) is not asserted, though OnDemandCaptureHandlesNonExistentTopic already drives that path and could assert it in one line.

Comment thread src/ros2_medkit_fault_manager/src/snapshot_capture.cpp
A re-confirm with source publishers down captured nothing and replaced
the retained frame with {}. Skip the store when nothing was captured and
a non-empty frame exists; also cover the background-cache and empty
first-capture paths in tests.

Refs #476
… clear

clear_fault deletes per-topic snapshots, so the retained freeze_frames
row was written but never read. get_fault now falls back to it when no
snapshots remain. Document freeze-frame semantics and code-bounded
storage in the README and storage interface.

Refs #476
@mfaferek93

Copy link
Copy Markdown
Collaborator Author

Both outside-the-diff points addressed in 88f54a7: the fault_manager README now documents the freeze-frame (retained across clear, contrasted with snapshots), and the two test gaps are covered - BackgroundCaptureCachesFreezeFrame exercises the background cache path incl. the json parse branch, and OnDemandCaptureHandlesNonExistentTopic now asserts the {} row.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compact JSON freeze-frame in fault snapshots

3 participants