Skip to content

fix(onvif_motion_recording): replace blocking cond_wait with timedwait to fix recording never stops#353

Merged
matteius merged 1 commit intoopensensor:mainfrom
origin2000:fix/motion-recording-never-stops
Apr 3, 2026
Merged

fix(onvif_motion_recording): replace blocking cond_wait with timedwait to fix recording never stops#353
matteius merged 1 commit intoopensensor:mainfrom
origin2000:fix/motion-recording-never-stops

Conversation

@origin2000
Copy link
Copy Markdown
Contributor

Problem

ONVIF motion-triggered recordings never stopped after motion ended.

The root cause is in pop_event() in src/video/onvif_motion_recording.c.
The original implementation used pthread_cond_wait() which blocks indefinitely
when the event queue is empty:

After a "motion stopped" event was processed, last_motion_time was updated —
but the thread immediately went back to sleep. With no further events arriving,
update_recording_state() was never called again, so:

  • The 2-second grace period (RECORDING → FINALIZING) was never evaluated
  • The post-buffer timeout (FINALIZING → stop) was never evaluated
  • stop_motion_recording_internal() was never called
  • The recording ran indefinitely

This is a classic sleeping state machine bug: a state machine that depends
on elapsed time to trigger transitions, but only advances when a new external
event arrives.

Fix

Replace pthread_cond_wait with pthread_cond_timedwait using a 1-second
timeout. When the timeout fires, pop_event() returns a new code -2
("periodic tick"). The event loop handles this by iterating over all active
recording contexts and calling update_recording_state() on each — without
requiring an incoming event.

Return-code contract for pop_event():

Return Meaning
0 Event successfully dequeued — process it
-1 Shutting down — exit the event loop
-2 1-second timeout — tick the state machine, then continue

Files changed

  • src/video/onvif_motion_recording.c

Testing

Tested on Rock Pi 4C+ running the lightNVR Docker container with a TP-Link
C545D dual-lens camera (wide-angle lens as ONVIF master, PTZ lens as RTSP
slave via motion_trigger_source). Recordings now reliably stop after the
configured post-buffer period following the last detected motion event.

Notes

The old pthread_cond_wait code is preserved as a commented block with
inline // OLD CODE / // <-- bug: annotations for historical context.

…t to fix recording never stops

The event processor thread used pthread_cond_wait() which blocks indefinitely
when the event queue is empty. After a 'motion stopped' event was processed,
the thread went back to sleep and never called update_recording_state() again.
This meant the RECORDING->FINALIZING->stop transitions were never evaluated
and recordings ran forever.

Fix: replace pthread_cond_wait with pthread_cond_timedwait (1s timeout).
On timeout, pop_event() returns -2 (periodic tick) and the event loop
calls update_recording_state() for all active recording contexts, driving
the state machine forward even when no new events arrive.

Tested on Rock Pi 4C+ with TP-Link C545D camera (wide ONVIF master +
PTZ RTSP slave via motion_trigger_source). Recordings now stop reliably
after the configured post-buffer period.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

✅ All contributors have signed the CLA. Thank you!
Posted by the CLA Assistant Lite bot.

@origin2000
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

1 similar comment
@origin2000
Copy link
Copy Markdown
Contributor Author

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes ONVIF motion-triggered recordings that never stop by ensuring the motion-recording state machine continues to advance even when no new events arrive.

Changes:

  • Replace indefinite pthread_cond_wait() in pop_event() with pthread_cond_timedwait() (1s timeout) and introduce a -2 “periodic tick” return code.
  • Update the event processor loop to handle the new timeout return by ticking update_recording_state() across active contexts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +161 to +167
clock_gettime(CLOCK_REALTIME, &ts);
ts.tv_sec += 1;
int rc = pthread_cond_timedwait(&event_queue.cond, &event_queue.mutex, &ts);
if (rc == ETIMEDOUT) {
pthread_mutex_unlock(&event_queue.mutex);
return -2; // periodic tick
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pop_event(): pthread_cond_timedwait() errors other than ETIMEDOUT/0 (e.g., EINVAL) are currently ignored, which can lead to a tight loop or silently skipping shutdown semantics. Handle non-timeout errors explicitly (log + unlock + return -1, or map to a distinct error code) and consider checking clock_gettime()'s return value as well.

Suggested change
clock_gettime(CLOCK_REALTIME, &ts);
ts.tv_sec += 1;
int rc = pthread_cond_timedwait(&event_queue.cond, &event_queue.mutex, &ts);
if (rc == ETIMEDOUT) {
pthread_mutex_unlock(&event_queue.mutex);
return -2; // periodic tick
}
if (clock_gettime(CLOCK_REALTIME, &ts) != 0) {
log_error("pop_event: clock_gettime failed: %s", strerror(errno));
pthread_mutex_unlock(&event_queue.mutex);
return -1;
}
ts.tv_sec += 1;
int rc = pthread_cond_timedwait(&event_queue.cond, &event_queue.mutex, &ts);
if (rc == ETIMEDOUT) {
pthread_mutex_unlock(&event_queue.mutex);
return -2; // periodic tick
}
if (rc != 0) {
log_error("pop_event: pthread_cond_timedwait failed: %s", strerror(rc));
pthread_mutex_unlock(&event_queue.mutex);
return -1;
}

Copilot uses AI. Check for mistakes.
Comment on lines 279 to 298
char dir_path[MAX_PATH_LENGTH];
snprintf(dir_path, sizeof(dir_path), "%s/%s/%s/%s/%s",
config->storage_path, stream_path, year, month, day);
config->storage_path, stream_name, year, month, day);

// Create directories if they don't exist
char temp_path[MAX_PATH_LENGTH];
snprintf(temp_path, sizeof(temp_path), "%s/%s", config->storage_path, stream_path);
snprintf(temp_path, sizeof(temp_path), "%s/%s", config->storage_path, stream_name);
mkdir(temp_path, 0755);

snprintf(temp_path, sizeof(temp_path), "%s/%s/%s", config->storage_path, stream_path, year);
snprintf(temp_path, sizeof(temp_path), "%s/%s/%s", config->storage_path, stream_name, year);
mkdir(temp_path, 0755);

snprintf(temp_path, sizeof(temp_path), "%s/%s/%s/%s", config->storage_path, stream_path, year, month);
snprintf(temp_path, sizeof(temp_path), "%s/%s/%s/%s", config->storage_path, stream_name, year, month);
mkdir(temp_path, 0755);

mkdir(dir_path, 0755);

// Generate full file path
snprintf(path, path_size, "%s/%s_%s_motion.mp4", dir_path, stream_path, timestamp);
snprintf(path, path_size, "%s/%s_%s_motion.mp4", dir_path, stream_name, timestamp);

Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_recording_path(): stream_name is now used directly in filesystem paths (mkdir/snprintf) after removing sanitize_stream_name(). This reintroduces path traversal / invalid-path risks (e.g., "../", slashes, spaces, non-ASCII) and is inconsistent with other path-building code that sanitizes stream names. Please restore sanitization (or otherwise validate/encode stream_name) before using it in directory/file names.

Copilot uses AI. Check for mistakes.
Comment on lines +519 to +525
pthread_mutex_lock(&contexts_mutex);
for (int i = 0; i < g_config.max_streams; i++) {
if (recording_contexts[i].active && recording_contexts[i].enabled) {
pthread_mutex_unlock(&contexts_mutex);
update_recording_state(&recording_contexts[i], now);
pthread_mutex_lock(&contexts_mutex);
}
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event_processor_thread_func(): the periodic tick loop reads recording_contexts[i].enabled while only holding contexts_mutex. enabled is mutated under ctx->mutex elsewhere (e.g., enable_motion_recording), so this is a data race (UB). Either drop the enabled check (update_recording_state() already checks enabled under ctx->mutex) or read enabled under the per-context mutex / make it atomic.

Copilot uses AI. Check for mistakes.
Comment on lines 253 to 286
static int generate_recording_path(const char *stream_name, char *path, size_t path_size) {
if (!stream_name || !path) {
return -1;
}

// Make sure we're using a valid path.
char stream_path[MAX_STREAM_NAME];
sanitize_stream_name(stream_name, stream_path, MAX_STREAM_NAME);

// Get storage path from config
extern config_t* get_streaming_config(void);
const config_t *config = get_streaming_config();
if (!config) {
log_error("Failed to get configuration");
return -1;
}

// Create timestamp string
time_t now = time(NULL);
struct tm tm_buf;
const struct tm *tm_info = localtime_r(&now, &tm_buf);
char timestamp[32];
strftime(timestamp, sizeof(timestamp), "%Y%m%d_%H%M%S", tm_info);

// Create directory structure: /recordings/camera_name/YYYY/MM/DD/
char year[8], month[4], day[4];
strftime(year, sizeof(year), "%Y", tm_info);
strftime(month, sizeof(month), "%m", tm_info);
strftime(day, sizeof(day), "%d", tm_info);

char dir_path[MAX_PATH_LENGTH];
snprintf(dir_path, sizeof(dir_path), "%s/%s/%s/%s/%s",
config->storage_path, stream_path, year, month, day);
config->storage_path, stream_name, year, month, day);

// Create directories if they don't exist
char temp_path[MAX_PATH_LENGTH];
snprintf(temp_path, sizeof(temp_path), "%s/%s", config->storage_path, stream_path);
snprintf(temp_path, sizeof(temp_path), "%s/%s", config->storage_path, stream_name);
mkdir(temp_path, 0755);
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR scope: this change set modifies recording path generation (removes sanitize_stream_name and changes directory/file naming) in addition to the timedwait fix described in the PR. If this path change is intentional, it should be called out in the PR description; if not, it should be reverted to keep the PR focused and avoid unintended storage layout/security changes.

Copilot uses AI. Check for mistakes.
@matteius
Copy link
Copy Markdown
Contributor

matteius commented Apr 3, 2026

@origin2000 Thanks for this PR -- Co-pilot found a few things, perhaps we could get them resolved before I merge?

@matteius matteius merged commit 34a8b34 into opensensor:main Apr 3, 2026
5 of 6 checks passed
matteius added a commit that referenced this pull request Apr 3, 2026
- Validate clock_gettime() return and handle unexpected pthread_cond_timedwait()
  errors in pop_event() to prevent tight loops
- Restore sanitize_stream_name() in generate_recording_path() to prevent
  path traversal via crafted stream names
- Remove racy read of ctx->enabled under contexts_mutex (update_recording_state
  already checks it under the per-context mutex)
- Remove commented-out OLD CODE blocks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@origin2000
Copy link
Copy Markdown
Contributor Author

It's fine for now. I'm working on db timestamps (start_time / end_time). lots is caused by the single onvif endpoint that has events from both lens and lightnvr polling with wide and ptz channel. in addition there is no VideoSourceConfigurationToken to allow for distinction of the two sources which creates a consumer-racing-problem. stay tuned. will be camera agnostic. source is done. test runs on my local 32.3. will need test on 32.4 too. I'm in Munich, Germany so we have a time difference ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants