Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion docs/api/rest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -707,11 +707,17 @@ Query and manage faults.
- ``freeze_frame``: Topic data captured at fault confirmation
- ``rosbag``: Recording file available via bulk-data endpoint

**Response codes:**

- **200:** Fault details
- **404:** Fault not found, or reported by an app outside this entity's scope
- **503:** Fault manager unavailable

``DELETE /api/v1/components/{id}/faults/{fault_code}``
Clear a fault.

- **204:** Fault cleared
- **404:** Fault not found
- **404:** Fault not found, or reported by an app outside this entity's scope

``DELETE /api/v1/components/{id}/faults``
Clear all faults for an entity.
Expand Down
7 changes: 5 additions & 2 deletions src/ros2_medkit_fault_manager/src/fault_manager_node.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -530,9 +530,12 @@ void FaultManagerNode::handle_clear_fault(
return;
}

// Process through correlation engine first (to get auto-clear list)
// Process through correlation engine first (to get auto-clear list).
// `skip_correlation_auto_clear` lets the caller opt out of cascade-clearing
// correlated symptom fault codes. Per-entity DELETE routes set it to true
// so they cannot reach across entity boundaries via the correlation graph.
std::vector<std::string> auto_cleared_codes;
if (correlation_engine_) {
if (correlation_engine_ && !request->skip_correlation_auto_clear) {
auto clear_result = correlation_engine_->process_clear(request->fault_code);
auto_cleared_codes = clear_result.auto_cleared_codes;
}
Expand Down
3 changes: 2 additions & 1 deletion src/ros2_medkit_gateway/CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,9 @@ Changelog for package ros2_medkit_gateway
Unreleased
----------

**Fixes:**
**Breaking Changes:**

* Per-entity fault routes are now correctly scoped to the entity's hosted apps. ``GET /api/v1/{entity-path}/faults/{fault_code}``, ``DELETE /api/v1/{entity-path}/faults/{fault_code}``, ``GET /api/v1/{entity-path}/faults``, and ``DELETE /api/v1/{entity-path}/faults`` previously fell back to a prefix match against the entity's ``namespace_path``; when that was empty (host-derived / synthetic components, manifest components without a ``namespace`` field, Areas, Functions, and Apps with a wildcard ``ros_binding.namespace_pattern``) the scope filter was silently disabled and the routes exposed - and on ``DELETE``, cleared - faults reported by apps that belonged to entirely different entities. All four handlers now resolve the addressed entity to its hosted-app FQN set (via the new ``HandlerContext::resolve_entity_source_fqns`` helper) and apply a strict all-sources scope check: a fault counts as in scope only when **every** entry in its ``reporting_sources`` is owned by the entity (exact FQN match, or strict path-child via ``<fqn>/...``). Per-fault routes return ``404 Resource Not Found`` for any fault that fails the check; collection routes return an empty ``items`` array. The underlying ``GetFault.srv`` contract is unchanged; ``ClearFault.srv`` gains a new ``skip_correlation_auto_clear`` request flag so per-entity DELETE can opt out of cascade-clearing correlated symptom fault codes that may live in other entities. Per-entity collection responses no longer include the global ``muted_count`` / ``cluster_count`` / ``muted_faults`` / ``clusters`` correlation metadata; those remain on the global ``GET /api/v1/faults`` route. Behavior changes visible to clients: (a) faults reported by apps outside the addressed entity are no longer returned or cleared via that entity's route, (b) **mixed-source** faults that include at least one out-of-entity reporter are likewise rejected with ``404`` on per-fault routes and excluded from per-entity collection responses (use the global ``GET /api/v1/faults`` to see them), (c) per-entity DELETE no longer cascade-clears correlated symptoms outside the entity (`#395 <https://github.com/selfpatch/ros2_medkit/issues/395>`_)
* ``GET /api/v1/updates/{id}/status`` no longer returns ``404`` for a registered-but-idle package; ``POST /api/v1/updates`` now seeds a ``pending`` status, so the endpoint returns ``200 {"status": "pending"}`` immediately after registration. ``404`` is reserved for packages that are not registered. Clients that used ``404`` as a signal for "registered but nothing started yet" must adapt (`#378 <https://github.com/selfpatch/ros2_medkit/issues/378>`_)

**Features:**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,10 @@ class FaultManager {
/// Get a specific fault by code (JSON result - "fault" body only).
FaultResult get_fault(const std::string & fault_code, const std::string & source_id = "");

/// Clear a fault.
FaultResult clear_fault(const std::string & fault_code);
/// Clear a fault. When `skip_correlation_auto_clear` is true the fault
/// manager will not cascade-clear correlated symptom fault codes - per-entity
/// DELETE routes set this to keep their clear inside the entity boundary.
FaultResult clear_fault(const std::string & fault_code, bool skip_correlation_auto_clear = false);

/// Get snapshots for a fault (optional topic filter).
FaultResult get_snapshots(const std::string & fault_code, const std::string & topic = "");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,12 @@ class FaultServiceTransport {

virtual FaultResult get_fault(const std::string & fault_code, const std::string & source_id) = 0;

virtual FaultResult clear_fault(const std::string & fault_code) = 0;
/// Clear a fault by its fault_code.
/// `skip_correlation_auto_clear`, when true, asks the fault manager to NOT
/// cascade-clear correlated symptom fault codes. Per-entity DELETE routes
/// set this to true so the clear cannot reach faults reported by apps
/// outside the addressed entity via the correlation graph.
virtual FaultResult clear_fault(const std::string & fault_code, bool skip_correlation_auto_clear = false) = 0;

virtual FaultResult get_snapshots(const std::string & fault_code, const std::string & topic) = 0;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#pragma once

#include <nlohmann/json.hpp>
#include <set>
#include <string>

#include "ros2_medkit_gateway/http/handlers/handler_context.hpp"
Expand Down Expand Up @@ -105,6 +106,34 @@ class FaultHandlers {
const nlohmann::json & env_data_json,
const std::string & entity_path);

/**
* @brief Scope check used by per-entity fault routes.
*
* Returns true iff every entry in `fault["reporting_sources"]` is in scope
* for `source_fqns`. A source matches a scope FQN when it equals the FQN
* or is a strict path-child (i.e. `<fqn>/<...>`), so similarly named nodes
* like `/ns/node` and `/ns/node_extra` are not conflated.
*
* The "all sources must match" semantic (rather than "any source") is
* deliberate: it blocks two cross-entity escalation paths.
*
* 1. GET would otherwise return a response whose `reporting_sources` and
* environment data carry identities of nodes the caller has no
* business reading.
* 2. DELETE would otherwise let a viewer of entity A clear the aggregated
* fault record for a fault that entity B also reports, because the
* underlying `ClearFault.srv` has no scope argument.
*
* An empty scope set, an empty `reporting_sources` array, a missing
* `reporting_sources` field, or any non-string source entry all return
* false - there is no vacuous "all match" case.
*
* Public for direct unit testing; called by `handle_get_fault`,
* `handle_clear_fault`, and indirectly via `filter_faults_by_sources` by
* the collection routes.
*/
static bool fault_in_source_scope(const nlohmann::json & fault, const std::set<std::string> & source_fqns);

private:
HandlerContext & ctx_;
};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include <nlohmann/json.hpp>
#include <optional>
#include <rclcpp/rclcpp.hpp>
#include <set>
#include <sstream>
#include <string>
#include <tl/expected.hpp>
Expand Down Expand Up @@ -339,6 +340,44 @@ class HandlerContext {
static std::vector<std::string> resolve_app_host_fqns(const ThreadSafeEntityCache & cache,
const std::vector<std::string> & app_ids);

/**
* @brief Resolve an entity to the set of source FQNs it owns.
*
* Returns the FQNs of every ROS 2 node within the entity's scope. Used by
* fault handlers to filter `reporting_sources` so that per-entity routes
* never expose faults reported from outside the addressed entity.
*
* Mapping per entity type:
* - App: the app's `effective_fqn()` (single entry, or empty set if unbound
* or if `ros_binding.namespace_pattern` is a wildcard - by design
* `effective_fqn()` returns "" for those, so the entity simply has no
* addressable ROS node and the scope check excludes every fault).
* - Component: `effective_fqn()` of every hosted app via
* `get_apps_for_component()`.
* - Area: `effective_fqn()` of every app under the area, walking
* `get_subareas()` recursively so nested areas (e.g. ``powertrain ->
* engine``) resolve to the union of their descendants' apps.
* - Function: `effective_fqn()` of every app the function hosts directly
* plus, for hosts that are component IDs, the apps inside those
* components.
* - Unknown: empty set.
*
* Apps whose `effective_fqn()` is empty are silently dropped from the set
* so the scope check cannot match arbitrary FQNs against an empty prefix.
*
* An empty result means "no apps are in scope" and callers must NEVER
* interpret that as "no filter" - any fault must be treated as out of
* scope. The exact response (404 for per-fault routes, an empty `items`
* array for collection lists, 204 for collection clears) is up to the
* caller.
*
* @param cache Entity cache for lookups
* @param entity Resolved entity info (from `validate_entity_for_route`)
* @return Set of FQNs that uniquely scopes faults to this entity
*/
static std::set<std::string> resolve_entity_source_fqns(const ThreadSafeEntityCache & cache,
const EntityInfo & entity);

private:
GatewayNode * node_;
CorsConfig cors_config_;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ class Ros2FaultServiceTransport : public FaultServiceTransport {

FaultResult get_fault(const std::string & fault_code, const std::string & source_id) override;

FaultResult clear_fault(const std::string & fault_code) override;
FaultResult clear_fault(const std::string & fault_code, bool skip_correlation_auto_clear) override;

FaultResult get_snapshots(const std::string & fault_code, const std::string & topic) override;

Expand Down
4 changes: 2 additions & 2 deletions src/ros2_medkit_gateway/src/core/managers/fault_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@ FaultResult FaultManager::get_fault(const std::string & fault_code, const std::s
return transport_->get_fault(fault_code, source_id);
}

FaultResult FaultManager::clear_fault(const std::string & fault_code) {
return transport_->clear_fault(fault_code);
FaultResult FaultManager::clear_fault(const std::string & fault_code, bool skip_correlation_auto_clear) {
return transport_->clear_fault(fault_code, skip_correlation_auto_clear);
}

FaultResult FaultManager::get_snapshots(const std::string & fault_code, const std::string & topic) {
Expand Down
Loading
Loading