fix(onvif_motion_recording): replace blocking cond_wait with timedwait to fix recording never stops#353
Conversation
…t to fix recording never stops The event processor thread used pthread_cond_wait() which blocks indefinitely when the event queue is empty. After a 'motion stopped' event was processed, the thread went back to sleep and never called update_recording_state() again. This meant the RECORDING->FINALIZING->stop transitions were never evaluated and recordings ran forever. Fix: replace pthread_cond_wait with pthread_cond_timedwait (1s timeout). On timeout, pop_event() returns -2 (periodic tick) and the event loop calls update_recording_state() for all active recording contexts, driving the state machine forward even when no new events arrive. Tested on Rock Pi 4C+ with TP-Link C545D camera (wide ONVIF master + PTZ RTSP slave via motion_trigger_source). Recordings now stop reliably after the configured post-buffer period.
|
✅ All contributors have signed the CLA. Thank you! |
|
I have read the CLA Document and I hereby sign the CLA |
1 similar comment
|
I have read the CLA Document and I hereby sign the CLA |
There was a problem hiding this comment.
Pull request overview
This PR fixes ONVIF motion-triggered recordings that never stop by ensuring the motion-recording state machine continues to advance even when no new events arrive.
Changes:
- Replace indefinite
pthread_cond_wait()inpop_event()withpthread_cond_timedwait()(1s timeout) and introduce a-2“periodic tick” return code. - Update the event processor loop to handle the new timeout return by ticking
update_recording_state()across active contexts.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| clock_gettime(CLOCK_REALTIME, &ts); | ||
| ts.tv_sec += 1; | ||
| int rc = pthread_cond_timedwait(&event_queue.cond, &event_queue.mutex, &ts); | ||
| if (rc == ETIMEDOUT) { | ||
| pthread_mutex_unlock(&event_queue.mutex); | ||
| return -2; // periodic tick | ||
| } |
There was a problem hiding this comment.
pop_event(): pthread_cond_timedwait() errors other than ETIMEDOUT/0 (e.g., EINVAL) are currently ignored, which can lead to a tight loop or silently skipping shutdown semantics. Handle non-timeout errors explicitly (log + unlock + return -1, or map to a distinct error code) and consider checking clock_gettime()'s return value as well.
| clock_gettime(CLOCK_REALTIME, &ts); | |
| ts.tv_sec += 1; | |
| int rc = pthread_cond_timedwait(&event_queue.cond, &event_queue.mutex, &ts); | |
| if (rc == ETIMEDOUT) { | |
| pthread_mutex_unlock(&event_queue.mutex); | |
| return -2; // periodic tick | |
| } | |
| if (clock_gettime(CLOCK_REALTIME, &ts) != 0) { | |
| log_error("pop_event: clock_gettime failed: %s", strerror(errno)); | |
| pthread_mutex_unlock(&event_queue.mutex); | |
| return -1; | |
| } | |
| ts.tv_sec += 1; | |
| int rc = pthread_cond_timedwait(&event_queue.cond, &event_queue.mutex, &ts); | |
| if (rc == ETIMEDOUT) { | |
| pthread_mutex_unlock(&event_queue.mutex); | |
| return -2; // periodic tick | |
| } | |
| if (rc != 0) { | |
| log_error("pop_event: pthread_cond_timedwait failed: %s", strerror(rc)); | |
| pthread_mutex_unlock(&event_queue.mutex); | |
| return -1; | |
| } |
| char dir_path[MAX_PATH_LENGTH]; | ||
| snprintf(dir_path, sizeof(dir_path), "%s/%s/%s/%s/%s", | ||
| config->storage_path, stream_path, year, month, day); | ||
| config->storage_path, stream_name, year, month, day); | ||
|
|
||
| // Create directories if they don't exist | ||
| char temp_path[MAX_PATH_LENGTH]; | ||
| snprintf(temp_path, sizeof(temp_path), "%s/%s", config->storage_path, stream_path); | ||
| snprintf(temp_path, sizeof(temp_path), "%s/%s", config->storage_path, stream_name); | ||
| mkdir(temp_path, 0755); | ||
|
|
||
| snprintf(temp_path, sizeof(temp_path), "%s/%s/%s", config->storage_path, stream_path, year); | ||
| snprintf(temp_path, sizeof(temp_path), "%s/%s/%s", config->storage_path, stream_name, year); | ||
| mkdir(temp_path, 0755); | ||
|
|
||
| snprintf(temp_path, sizeof(temp_path), "%s/%s/%s/%s", config->storage_path, stream_path, year, month); | ||
| snprintf(temp_path, sizeof(temp_path), "%s/%s/%s/%s", config->storage_path, stream_name, year, month); | ||
| mkdir(temp_path, 0755); | ||
|
|
||
| mkdir(dir_path, 0755); | ||
|
|
||
| // Generate full file path | ||
| snprintf(path, path_size, "%s/%s_%s_motion.mp4", dir_path, stream_path, timestamp); | ||
| snprintf(path, path_size, "%s/%s_%s_motion.mp4", dir_path, stream_name, timestamp); | ||
|
|
There was a problem hiding this comment.
generate_recording_path(): stream_name is now used directly in filesystem paths (mkdir/snprintf) after removing sanitize_stream_name(). This reintroduces path traversal / invalid-path risks (e.g., "../", slashes, spaces, non-ASCII) and is inconsistent with other path-building code that sanitizes stream names. Please restore sanitization (or otherwise validate/encode stream_name) before using it in directory/file names.
| pthread_mutex_lock(&contexts_mutex); | ||
| for (int i = 0; i < g_config.max_streams; i++) { | ||
| if (recording_contexts[i].active && recording_contexts[i].enabled) { | ||
| pthread_mutex_unlock(&contexts_mutex); | ||
| update_recording_state(&recording_contexts[i], now); | ||
| pthread_mutex_lock(&contexts_mutex); | ||
| } |
There was a problem hiding this comment.
event_processor_thread_func(): the periodic tick loop reads recording_contexts[i].enabled while only holding contexts_mutex. enabled is mutated under ctx->mutex elsewhere (e.g., enable_motion_recording), so this is a data race (UB). Either drop the enabled check (update_recording_state() already checks enabled under ctx->mutex) or read enabled under the per-context mutex / make it atomic.
| static int generate_recording_path(const char *stream_name, char *path, size_t path_size) { | ||
| if (!stream_name || !path) { | ||
| return -1; | ||
| } | ||
|
|
||
| // Make sure we're using a valid path. | ||
| char stream_path[MAX_STREAM_NAME]; | ||
| sanitize_stream_name(stream_name, stream_path, MAX_STREAM_NAME); | ||
|
|
||
| // Get storage path from config | ||
| extern config_t* get_streaming_config(void); | ||
| const config_t *config = get_streaming_config(); | ||
| if (!config) { | ||
| log_error("Failed to get configuration"); | ||
| return -1; | ||
| } | ||
|
|
||
| // Create timestamp string | ||
| time_t now = time(NULL); | ||
| struct tm tm_buf; | ||
| const struct tm *tm_info = localtime_r(&now, &tm_buf); | ||
| char timestamp[32]; | ||
| strftime(timestamp, sizeof(timestamp), "%Y%m%d_%H%M%S", tm_info); | ||
|
|
||
| // Create directory structure: /recordings/camera_name/YYYY/MM/DD/ | ||
| char year[8], month[4], day[4]; | ||
| strftime(year, sizeof(year), "%Y", tm_info); | ||
| strftime(month, sizeof(month), "%m", tm_info); | ||
| strftime(day, sizeof(day), "%d", tm_info); | ||
|
|
||
| char dir_path[MAX_PATH_LENGTH]; | ||
| snprintf(dir_path, sizeof(dir_path), "%s/%s/%s/%s/%s", | ||
| config->storage_path, stream_path, year, month, day); | ||
| config->storage_path, stream_name, year, month, day); | ||
|
|
||
| // Create directories if they don't exist | ||
| char temp_path[MAX_PATH_LENGTH]; | ||
| snprintf(temp_path, sizeof(temp_path), "%s/%s", config->storage_path, stream_path); | ||
| snprintf(temp_path, sizeof(temp_path), "%s/%s", config->storage_path, stream_name); | ||
| mkdir(temp_path, 0755); |
There was a problem hiding this comment.
PR scope: this change set modifies recording path generation (removes sanitize_stream_name and changes directory/file naming) in addition to the timedwait fix described in the PR. If this path change is intentional, it should be called out in the PR description; if not, it should be reverted to keep the PR focused and avoid unintended storage layout/security changes.
|
@origin2000 Thanks for this PR -- Co-pilot found a few things, perhaps we could get them resolved before I merge? |
- Validate clock_gettime() return and handle unexpected pthread_cond_timedwait() errors in pop_event() to prevent tight loops - Restore sanitize_stream_name() in generate_recording_path() to prevent path traversal via crafted stream names - Remove racy read of ctx->enabled under contexts_mutex (update_recording_state already checks it under the per-context mutex) - Remove commented-out OLD CODE blocks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
It's fine for now. I'm working on db timestamps (start_time / end_time). lots is caused by the single onvif endpoint that has events from both lens and lightnvr polling with wide and ptz channel. in addition there is no VideoSourceConfigurationToken to allow for distinction of the two sources which creates a consumer-racing-problem. stay tuned. will be camera agnostic. source is done. test runs on my local 32.3. will need test on 32.4 too. I'm in Munich, Germany so we have a time difference ;-) |
Problem
ONVIF motion-triggered recordings never stopped after motion ended.
The root cause is in
pop_event()insrc/video/onvif_motion_recording.c.The original implementation used
pthread_cond_wait()which blocks indefinitelywhen the event queue is empty:
After a "motion stopped" event was processed,
last_motion_timewas updated —but the thread immediately went back to sleep. With no further events arriving,
update_recording_state()was never called again, so:RECORDING → FINALIZING) was never evaluatedFINALIZING → stop) was never evaluatedstop_motion_recording_internal()was never calledThis is a classic sleeping state machine bug: a state machine that depends
on elapsed time to trigger transitions, but only advances when a new external
event arrives.
Fix
Replace
pthread_cond_waitwithpthread_cond_timedwaitusing a 1-secondtimeout. When the timeout fires,
pop_event()returns a new code-2("periodic tick"). The event loop handles this by iterating over all active
recording contexts and calling
update_recording_state()on each — withoutrequiring an incoming event.
Return-code contract for
pop_event():0-1-2Files changed
src/video/onvif_motion_recording.cTesting
Tested on Rock Pi 4C+ running the lightNVR Docker container with a TP-Link
C545D dual-lens camera (wide-angle lens as ONVIF master, PTZ lens as RTSP
slave via
motion_trigger_source). Recordings now reliably stop after theconfigured post-buffer period following the last detected motion event.
Notes
The old
pthread_cond_waitcode is preserved as a commented block withinline
// OLD CODE/// <-- bug:annotations for historical context.