Persist compact JSON freeze-frame on fault confirm#491
Conversation
On a transition to CONFIRMED, write a compact JSON dict of the resolved capture-set topic values to a new SQLite freeze_frames table, keyed by fault_code and retained across clear_fault. Values come from the captured /plc/* topics via SnapshotCapture::resolve_topics; an unconfigured fault code resolves to no capture set, so no frame is written (no overhead). Refs #476
There was a problem hiding this comment.
Pull request overview
Adds persistent “freeze-frame” capture support to the fault manager so that, on snapshot capture, a compact JSON object of captured topic values is stored per fault_code and retained across clear_fault and DB reopen (SQLite + in-memory), with accompanying unit tests.
Changes:
- Extend
FaultStoragewithstore_freeze_frame()/get_freeze_frame()and implement them for SQLite and in-memory backends. - Persist a new
freeze_framesSQLite table and write freeze-frame JSON duringSnapshotCapture::capture(). - Add unit tests covering freeze-frame storage semantics (replace-on-recapture, survives clear, persists across reopen) and an end-to-end capture test that validates JSON content.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ros2_medkit_fault_manager/test/test_sqlite_storage.cpp | Adds SQLite freeze-frame storage tests (CRUD, replace, survive clear, persist reopen). |
| src/ros2_medkit_fault_manager/test/test_snapshot_capture.cpp | Adds end-to-end capture test that validates freeze-frame JSON content; adds “unconfigured/disabled” cases. |
| src/ros2_medkit_fault_manager/test/test_fault_manager.cpp | Adds in-memory freeze-frame storage behavior tests. |
| src/ros2_medkit_fault_manager/src/sqlite_fault_storage.cpp | Creates freeze_frames table and implements store/get queries. |
| src/ros2_medkit_fault_manager/src/snapshot_capture.cpp | Accumulates per-capture JSON dict and persists it as a freeze-frame keyed by fault_code. |
| src/ros2_medkit_fault_manager/src/fault_storage.cpp | Implements in-memory freeze-frame persistence. |
| src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/sqlite_fault_storage.hpp | Adds freeze-frame API overrides to SQLite storage interface. |
| src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/snapshot_capture.hpp | Updates capture helpers to also fill a freeze-frame JSON object. |
| src/ros2_medkit_fault_manager/include/ros2_medkit_fault_manager/fault_storage.hpp | Introduces FreezeFrameData and adds freeze-frame API to FaultStorage. |
| src/ros2_medkit_fault_manager/CMakeLists.txt | Adds std_msgs dependency for snapshot capture test publisher usage. |
Comments suppressed due to low confidence (1)
src/ros2_medkit_fault_manager/src/snapshot_capture.cpp:330
capture_topic_from_cache()holdscache_mutex_while callingstorage_->store_snapshot()and while parsing JSON for the freeze-frame. On the background-capture path this can block the subscription callback from updating the cache, increasing contention and latency. Copy the cached payload under the lock, then release the lock before SQLite I/O and JSON parsing.
std::lock_guard<std::mutex> lock(cache_mutex_);
auto it = message_cache_.find(topic);
if (it == message_cache_.end() || it->second.data.empty()) {
RCLCPP_DEBUG(node_->get_logger(), "No cached data for topic '%s'", topic.c_str());
return false;
}
const auto & cached = it->second;
Document that an unconfigured fault code writes no freeze_frames row
(lookup returns nullopt); no empty {} row is ever persisted. Join the
test publisher thread via RAII so an assertion failure cannot terminate.
Refs #476
bburda
left a comment
There was a problem hiding this comment.
Additional findings outside the diff:
-
README not updated:
ros2_medkit_fault_manager/README.mddocuments the snapshot subsystem, including that snapshots are deleted on clear, but not the new freeze-frame captured on the same confirm and retained across clear_fault - the exact contrasting behavior that section already describes. Per the docs gate, add it. -
Two test-coverage gaps on the new code: the
background_capture=truecache path (freeze_frame[topic] = json::parse(cached.data)with the raw-string fallback) is never exercised - the new e2e test usesbackground_capture=falseonly; and the empty-{}row case (a configured fault whose topics all fail to sample writes a{}frame, soget_freeze_frame()is non-null) is not asserted, thoughOnDemandCaptureHandlesNonExistentTopicalready drives that path and could assert it in one line.
A re-confirm with source publishers down captured nothing and replaced
the retained frame with {}. Skip the store when nothing was captured and
a non-empty frame exists; also cover the background-cache and empty
first-capture paths in tests.
Refs #476
… clear clear_fault deletes per-topic snapshots, so the retained freeze_frames row was written but never read. get_fault now falls back to it when no snapshots remain. Document freeze-frame semantics and code-bounded storage in the README and storage interface. Refs #476
|
Both outside-the-diff points addressed in 88f54a7: the fault_manager README now documents the freeze-frame (retained across clear, contrasted with snapshots), and the two test gaps are covered - BackgroundCaptureCachesFreezeFrame exercises the background cache path incl. the json parse branch, and OnDemandCaptureHandlesNonExistentTopic now asserts the {} row. |
On the transition to CONFIRMED, snapshot capture now also persists a compact JSON freeze-frame of the captured topic values to SQLite, keyed by fault_code and retained across clear_fault (unlike rosbag snapshots).
Tested: freeze-frame unit + storage suites on both backends and capture-pool end-to-end; full fault_manager suite passes.
Closes #476