Skip to content

fix(video): check encoder started up before trying to flush frames on destruction#5257

Open
Kishi85 wants to merge 1 commit into
LizardByte:masterfrom
Kishi85:remove-video-frame-flushing-on-destruction
Open

fix(video): check encoder started up before trying to flush frames on destruction#5257
Kishi85 wants to merge 1 commit into
LizardByte:masterfrom
Kishi85:remove-video-frame-flushing-on-destruction

Conversation

@Kishi85

@Kishi85 Kishi85 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Description

This PR aims to fix #4943 by removing the frame flushing on ~avcodec_encode_session_t() for cases where the encoder has not stared up yet. Checking this is done by checking if avcodec_ctx->frame_num > 0. This check is sufficient as the first frame sent will setup ctx->pic_end which is segfaulting otherwise here:

        // Fix timestamps if we hit end-of-stream before the initial decode
        // delay has elapsed.
        if (ctx->input_order <= ctx->decode_delay)
            ctx->dts_pts_diff = ctx->pic_end->pts - ctx->first_pts;

This code sequence is expecting ctx->pic_end != NULL but if the encoder has not seen any frame yet then this expectation will not be fulfilled causing a segfault as ctx->pic_start and ctx->pic_end are still NULL.

Screenshot

Issues Fixed or Closed

Roadmap Issues

Type of Change

  • feat: New feature (non-breaking change which adds functionality)
  • fix: Bug fix (non-breaking change which fixes an issue)
  • docs: Documentation only changes
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semicolons, etc.)
  • refactor: Code change that neither fixes a bug nor adds a feature
  • perf: Code change that improves performance
  • test: Adding missing tests or correcting existing tests
  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Other changes that don't modify src or test files
  • revert: Reverts a previous commit
  • BREAKING CHANGE: Introduces a breaking change (can be combined with any type above)

Checklist

  • Code follows the style guidelines of this project
  • Code has been self-reviewed
  • Code has been commented, particularly in hard-to-understand areas
  • Code docstring/documentation-blocks for new or existing methods/components have been added or updated
  • Unit tests have been added or updated for any new or modified functionality

AI Usage

  • None: No AI tools were used in creating this PR
  • Light: AI provided minor assistance (formatting, simple suggestions)
  • Moderate: AI helped with code generation or debugging specific parts
  • Heavy: AI generated most or all of the code changes

@Kishi85 Kishi85 changed the title fix(video): remove frame flushing on destruction to avoid segfault due to racecondition fix(video): remove frame flushing to avoid segfault due to racecondition Jun 6, 2026
@Kishi85 Kishi85 force-pushed the remove-video-frame-flushing-on-destruction branch from ec0fcc7 to 4ccc01b Compare June 6, 2026 08:14
@Kishi85 Kishi85 changed the title fix(video): remove frame flushing to avoid segfault due to racecondition fix(video): remove explicit frame flushing to avoid segfault due to racecondition Jun 6, 2026
@ReenigneArcher ReenigneArcher requested a review from cgutman June 6, 2026 23:04
@codecov

codecov Bot commented Jun 6, 2026

Copy link
Copy Markdown

Bundle Report

Bundle size has no change ✅

@cgutman

cgutman commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

This flushing was a workaround to address a memory leak each time streaming is started and stopped on AMD AMF encoders, so please test that to make sure you aren't reintroducing that.

I agree that it shouldn't be necessary, but it also shouldn't be crashing either. Flushing is a totally valid thing to do here. The crash probably means there's a deeper use-after-free or race condition that we're just hiding by removing the flushing code.

@Kishi85

Kishi85 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

This flushing was a workaround to address a memory leak each time streaming is started and stopped on AMD AMF encoders, so please test that to make sure you aren't reintroducing that.

I agree that it shouldn't be necessary, but it also shouldn't be crashing either. Flushing is a totally valid thing to do here. The crash probably means there's a deeper use-after-free or race condition that we're just hiding by removing the flushing code.

I'm also thinking that this is some kind of race condition that occurs when the hw context is torn down (by closing the capture instance as the specific problem here is accessing ctx->pic_end which is NULL as noted in #4943).

What would be interesting is to know if we have a way to ensure that this context is still valid when the frame flushing is validated (as that is where it segfaults) or if the fix can be done solely in FFmpeg.

@Kishi85 Kishi85 force-pushed the remove-video-frame-flushing-on-destruction branch from 4ccc01b to 3731283 Compare June 22, 2026 18:27
@Kishi85 Kishi85 changed the title fix(video): remove explicit frame flushing to avoid segfault due to racecondition fix(video): check encoder started up before trying to flush frames on destruction Jun 22, 2026
@Kishi85

Kishi85 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

I think I've found a better solution and also finally the root cause for this issue which is due to the encoder never having seen a single frame (avcodec_ctx->frame_num being 0) when trying to flush frames. This does also explain why ctx->pic_end is NULL when the segfault hits as there was never a frame that could have been used to setup start/end.

So my updated solution is to check if the encoder has started up before trying to flush frames and otherwise skip this step. Checking if the encoder has started up is done by checking if avcodec_ctx->frame_num > avcodec_ctx->delay. This check was specifically chosen because of the following comment in the problematic section that was segfaulting before:
https://github.com/FFmpeg/FFmpeg/blob/15504610b0dc12c56e5e9f94ff06c873382368f5/libavcodec/hw_base_encode.c#L508:

        // Fix timestamps if we hit end-of-stream before the initial decode
        // delay has elapsed.

This seems to fix the issue without having to remove the frame flushing fully leaving the workaround for AMD AMF intact, except for when it has not seen enough frames. When the encoder is starting up the delay seems to be 0 for the encoders I was able to test here (so it's just checking if the encoder has seen a frame at all in order to do the flushing).

@cgutman Does this look like a better solution to you?

@Kishi85

Kishi85 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

I'm not sure if we could just check for avcodec_ctx->frame_num > 0 instead of waiting the full encoder delay (that'll likely depend on if ctx->pic_end is setup after the first frame already) which for most encoders seems to be 0 skipping the segfaulting frame reordering. That way we could be sure that we'll never leak a frame as even when encoder delay has not yet fully passed we'd be able to drain without issues which would make this logically fully the same as before just without the segfault.

I'll have another look at the code, will do another testrun and update the PR if this is working.

EDIT: Looks like ctx->pic_start and ctx->pic_end are setup from the first valid frame sent:
https://github.com/FFmpeg/FFmpeg/blob/15504610b0dc12c56e5e9f94ff06c873382368f5/libavcodec/hw_base_encode.c#L491-L497
That means it's enough to check for >0 to avoid the segfault.

@Kishi85 Kishi85 force-pushed the remove-video-frame-flushing-on-destruction branch from 3731283 to 5885788 Compare June 26, 2026 06:42
@sonarqubecloud

Copy link
Copy Markdown

@Kishi85

Kishi85 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Updated PR to check for at least one frame sent to the encoder to make sure ctx->pic_end is setup on flushing.
If no frame was sent there is nothing to flush so that part is just skipped without causing a segfault in FFmpeg while retaining the same overall logical behaviour.

@psyke83

psyke83 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

This still doesn't resolve crashing on mode or display switch with Vulkan on my system, but adding an arbitrary sleep to the destructor actually seems to resolve the issue. This is not the right solution but perhaps it might help identify the proper area to fix.

I tested the position of the sleep and it seems to work OK right before or after the flush block, but adding it after avcodec_ctx.reset() doesn't work.

diff --git a/src/video.cpp b/src/video.cpp
index 86e95f37..1dedb559 100644
--- a/src/video.cpp
+++ b/src/video.cpp
@@ -418,6 +418,7 @@ namespace video {
     avcodec_encode_session_t(avcodec_encode_session_t &&other) noexcept = default;
 
     ~avcodec_encode_session_t() {
+      std::this_thread::sleep_for(1ms);
       // Flush any remaining frames in the encoder if the encoder started up (frame num > 0)
       if (avcodec_ctx->frame_num > 0 && avcodec_send_frame(avcodec_ctx.get(), nullptr) == 0) {
         packet_raw_avcodec pkt;

The crash that occurs on master or with this PR (and no sleep insertion) is easily reproduced either by spamming the display switch key combo (it often happens on the first switch, and usually within about 5 switch attempts) or with this script:

#!/bin/bash

OUTPUT="1" # Change this to your output ID from kscreen-doctor -o
RES_A="1920x1080@60"
RES_B="3840x2160@60"
TIME=$1
SUNSHINE_PID="$(pidof sunshine)"

if [[ -z "$TIME" ]]; then
  TIME=5
fi

for i in {1..100}
do
   if [[ "$(pidof sunshine)" != "$SUNSHINE_PID" ]]; then
     echo "Sunshine PID changed, aborting"
     break
   fi
   echo "Iteration $i: Switching to $RES_B"
   kscreen-doctor output.$OUTPUT.mode.$RES_B
   sleep $TIME
   echo "Iteration $i: Switching to $RES_A"
   kscreen-doctor output.$OUTPUT.mode.$RES_A
   sleep $TIME
done

echo "Stress test complete."

Running with a 0 sec timeout triggers the crash every time on my system, but it can survive 50 or so of the 100 cycles sometimes. If you can't reproduce the issue, it might be an issue that's exposed with a slower CPU; my 5700X is getting old, so manually throttling your CPU frequency may help to reproduce the issue.

Finally, adding the sleep to the destructor on master branch alone doesn't resolve the display switch crashing, so your flush check is definitely helping the situation. There may be multiple reasons for these crashes and the flush check you added is definitely resolving at least one of them.

Edit: I can still reproduce a rare crash on display switch with the sleep directly after the flush block, but it may just be an insufficient sleep interval to workaround the issue. When I have time I will test this more thoroughly and open a separate issue if I don't make any progress.

@Kishi85

Kishi85 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

This still doesn't resolve crashing on mode or display switch with Vulkan on my system, but adding an arbitrary sleep to the destructor actually seems to resolve the issue. This is not the right solution but perhaps it might help identify the proper area to fix.

I tested the position of the sleep and it seems to work OK right before or after the flush block, but adding it after avcodec_ctx.reset() doesn't work.

diff --git a/src/video.cpp b/src/video.cpp
index 86e95f37..1dedb559 100644
--- a/src/video.cpp
+++ b/src/video.cpp
@@ -418,6 +418,7 @@ namespace video {
     avcodec_encode_session_t(avcodec_encode_session_t &&other) noexcept = default;
 
     ~avcodec_encode_session_t() {
+      std::this_thread::sleep_for(1ms);
       // Flush any remaining frames in the encoder if the encoder started up (frame num > 0)
       if (avcodec_ctx->frame_num > 0 && avcodec_send_frame(avcodec_ctx.get(), nullptr) == 0) {
         packet_raw_avcodec pkt;

The crash that occurs on master or with this PR (and no sleep insertion) is easily reproduced either by spamming the display switch key combo (it often happens on the first switch, and usually within about 5 switch attempts) or with this script:

#!/bin/bash

OUTPUT="1" # Change this to your output ID from kscreen-doctor -o
RES_A="1920x1080@60"
RES_B="3840x2160@60"
TIME=$1
SUNSHINE_PID="$(pidof sunshine)"

if [[ -z "$TIME" ]]; then
  TIME=5
fi

for i in {1..100}
do
   if [[ "$(pidof sunshine)" != "$SUNSHINE_PID" ]]; then
     echo "Sunshine PID changed, aborting"
     break
   fi
   echo "Iteration $i: Switching to $RES_B"
   kscreen-doctor output.$OUTPUT.mode.$RES_B
   sleep $TIME
   echo "Iteration $i: Switching to $RES_A"
   kscreen-doctor output.$OUTPUT.mode.$RES_A
   sleep $TIME
done

echo "Stress test complete."

Running with a 0 sec timeout triggers the crash every time on my system, but it can survive 50 or so of the 100 cycles sometimes. If you can't reproduce the issue, it might be an issue that's exposed with a slower CPU; my 5700X is getting old, so manually throttling your CPU frequency may help to reproduce the issue.

Finally, adding the sleep to the destructor on master branch alone doesn't resolve the display switch crashing, so your flush check is definitely helping the situation. There may be multiple reasons for these crashes and the flush check you added is definitely resolving at least one of them.

I've seen one instance of a crash that was specifically related to vulkan but the stacktrace is different from what this PR fixes. In fact it had a totally different callstack:

Stack trace of thread 139838:
#0  0x00005650d89c016f ff_vk_free_buf (sunshine + 0x63216f)
#1  0x00005650d89c023d free_data_buf (sunshine + 0x63223d)
#2  0x00005650d8a89d07 buffer_pool_flush (sunshine + 0x6fbd07)
#3  0x00005650d8a89f16 buffer_replace (sunshine + 0x6fbf16)
#4  0x00005650d896e1ce av_packet_unref (sunshine + 0x5e01ce)
#5  0x00005650d896e23a av_packet_free (sunshine + 0x5e023a)
#6  0x00005650d876a362 _ZN5video18packet_raw_avcodecD2Ev (sunshine + 0x3dc362)
#7  0x00005650d876a38a _ZN5video18packet_raw_avcodecD0Ev (sunshine + 0x3dc38a)
#8  0x00005650d8737106 _ZNKSt14default_deleteIN5video12packet_raw_tEEclEPS1_ (sunshine + 0x3a9106)
#9  0x00005650d87302cf _ZNSt10unique_ptrIN5video12packet_raw_tESt14default_deleteIS1_EED2Ev (sunshine + 0x3a22cf)
#10 0x00005650d871f8ce _ZN6stream20videoBroadcastThreadERN5boost4asio21basic_datagram_socketINS1_2ip3udpENS1_15any_io_executorEEE (sunshine + 0x3918ce)
#11 0x00005650d874ffde _ZSt13__invoke_implIvPFvRN5boost4asio21basic_datagram_socketINS1_2ip3udpENS1_15any_io_executorEEEEJSt17reference_wrapperIS6_EEET_St14__invoke_otherOT0_DpOT1_ (sun>
#12 0x00005650d874fcd3 _ZSt8__invokeIPFvRN5boost4asio21basic_datagram_socketINS1_2ip3udpENS1_15any_io_executorEEEEJSt17reference_wrapperIS6_EEENSt15__invoke_resultIT_JDpT0_EE4typeEOSD_D>
#13 0x00005650d874f8dd _ZNSt6thread8_InvokerISt5tupleIJPFvRN5boost4asio21basic_datagram_socketINS3_2ip3udpENS3_15any_io_executorEEEESt17reference_wrapperIS8_EEEE9_M_invokeIJLm0ELm1EEEEv>
#14 0x00005650d874f6e0 _ZNSt6thread8_InvokerISt5tupleIJPFvRN5boost4asio21basic_datagram_socketINS3_2ip3udpENS3_15any_io_executorEEEESt17reference_wrapperIS8_EEEEclEv (sunshine + 0x3c16e>
#15 0x00005650d874f580 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJPFvRN5boost4asio21basic_datagram_socketINS4_2ip3udpENS4_15any_io_executorEEEESt17reference_wrapperIS9_EEEEEE6_M_r>
#16 0x00005650d9925d56 execute_native_thread_routine (sunshine + 0x1597d56)
#17 0x00007fc04b2ac6ec n/a (libc.so.6 + 0xac6ec)
#18 0x00007fc04b3477bc n/a (libc.so.6 + 0x1477bc)

This PR is only fixing the specific issue of ctx->pic_end = NULL in hw_base_encode_send_frame() that is causing a segfault as explained in the description in the destructor and can happen with any encoder in theory (although I've only tried vaapi and vulkan).

Do you have a stacktrace/systemd coredump with debug symbols by any chance? Would be interesting to see what it looks like as it seems to be linked to the destructor call as well (if it's like my stacktrace above it might be a double free from the encoder and av_packet network stack, which a sleep workaround would help with) and was likely masked by the flush segfault before this PR is applied.

That new vulkan-only crash would be a separate issue with a separate fix though.

EDIT: I also cannot reproduce this with your script on a 0 second timeout unfortunately (even throttled). With 0 seconds its green screening from time to time but it never crashes.

@psyke83

psyke83 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

The problem is that I can't effectively test this PR with Vulkan due to the multiple conflicting issues - but I can confirm that it fixes VAAPI. The backtrace against master from a display switch crash shows the hw_base_encode_send_frame site:

#0  0x0000564cc8e495ee in hw_base_encode_send_frame (frame=<optimized out>, ctx=<optimized out>, avctx=<optimized out>) at libavcodec/hw_base_encode.c:508

⚠ warning: 508 libavcodec/hw_base_encode.c: No such file or directory
[Current thread is 1 (Thread 0x7f8b149fd6c0 (LWP 142011))]
(gdb) bt
#0  0x0000564cc8e495ee in hw_base_encode_send_frame (frame=<optimized out>, ctx=<optimized out>, avctx=<optimized out>) at libavcodec/hw_base_encode.c:508
#1  ff_hw_base_encode_receive_packet (ctx=0x7f8aedba2558, avctx=0x7f8aecac6e80, pkt=0x7f8aecb1d900) at libavcodec/hw_base_encode.c:587
#2  0x0000564cc8dd0044 in encode_receive_packet_internal (avctx=avctx@entry=0x7f8aecac6e80, avpkt=0x7f8aecb1d900) at libavcodec/encode.c:363
#3  0x0000564cc8dd0287 in avcodec_send_frame (avctx=0x7f8aecac6e80, frame=0x0) at libavcodec/encode.c:514
#4  0x0000564cc8c6e403 in video::avcodec_encode_session_t::~avcodec_encode_session_t (this=0x7f8aed40f890, this=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:422
#5  0x0000564cc8c80319 in video::avcodec_encode_session_t::~avcodec_encode_session_t (this=0x7f8aed40f890, this=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:430
#6  std::default_delete<video::encode_session_t>::operator() (this=<optimized out>, __ptr=0x7f8aed40f890) at /usr/include/c++/16/bits/unique_ptr.h:92
#7  std::unique_ptr<video::encode_session_t, std::default_delete<video::encode_session_t> >::~unique_ptr (this=<optimized out>, this=<optimized out>)
    at /usr/include/c++/16/bits/unique_ptr.h:408
#8  0x0000564cc8c6a6ed in video::encode_run (frame_nr=<synthetic pointer>: <optimized out>, mail=Python Exception <class 'gdb.error'>: value has been optimized out
, images=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., disp=Python Exception <class 'gdb.error'>: value has been optimized out

   , encode_device=std::unique_ptr<platf::encode_device_t> = {...}, reinit_event=..., encoder=<optimized out>, channel_data=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2446
#9  video::capture_async (mail=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., channel_data=0x7f8b2c0064e0) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2877
#10 video::capture (mail=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., channel_data=0x7f8b2c0064e0) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2907
#11 stream::videoThread (session=0x7f8b2c0064e0) at /usr/src/debug/sunshine/sunshine/src/stream.cpp:2097
#12 0x0000564cc9d05909 in execute_native_thread_routine ()
#13 0x00007f8b5f897739 in start_thread (arg=<optimized out>) at pthread_create.c:454
#14 0x00007f8b5f91bedc in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb) 

I'll move my UAF/deadlock issue with Vulkan that's not targeted by this PR into another PR or issue, but for the purposes of testing, I added the 1ms sleep to master branch in order to try and reproduce hw_base_encode_send_frame type crash on display switch with Vulkan, which seemed to be successful:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005647d3f8262e in hw_base_encode_send_frame (frame=<optimized out>, ctx=<optimized out>, avctx=<optimized out>) at libavcodec/hw_base_encode.c:508

⚠ warning: 508 libavcodec/hw_base_encode.c: No such file or directory
[Current thread is 1 (Thread 0x7fd5c4e156c0 (LWP 151453))]
(gdb) bt
#0  0x00005647d3f8262e in hw_base_encode_send_frame (frame=<optimized out>, ctx=<optimized out>, avctx=<optimized out>) at libavcodec/hw_base_encode.c:508
#1  ff_hw_base_encode_receive_packet (ctx=0x7fd53f4fc218, avctx=0x7fd5b48d4540, pkt=0x7fd5b493b440) at libavcodec/hw_base_encode.c:587
#2  0x00005647d3f09084 in encode_receive_packet_internal (avctx=avctx@entry=0x7fd5b48d4540, avpkt=0x7fd5b493b440) at libavcodec/encode.c:363
#3  0x00005647d3f092c7 in avcodec_send_frame (avctx=0x7fd5b48d4540, frame=0x0) at libavcodec/encode.c:514
#4  0x00005647d3da7425 in video::avcodec_encode_session_t::~avcodec_encode_session_t (this=0x7fd5b48d6a70, this=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/utility.h:1015
#5  0x00005647d3db9359 in video::avcodec_encode_session_t::~avcodec_encode_session_t (this=0x7fd5b48d6a70, this=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:431
#6  std::default_delete<video::encode_session_t>::operator() (this=<optimized out>, __ptr=0x7fd5b48d6a70) at /usr/include/c++/16/bits/unique_ptr.h:92
#7  std::unique_ptr<video::encode_session_t, std::default_delete<video::encode_session_t> >::~unique_ptr (this=<optimized out>, this=<optimized out>)
    at /usr/include/c++/16/bits/unique_ptr.h:408
#8  0x00005647d3da36ed in video::encode_run (frame_nr=<synthetic pointer>: <optimized out>, mail=Python Exception <class 'gdb.error'>: value has been optimized out
, images=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., disp=Python Exception <class 'gdb.error'>: value has been optimized out

   , encode_device=std::unique_ptr<platf::encode_device_t> = {...}, reinit_event=..., encoder=<optimized out>, channel_data=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2447
#9  video::capture_async (mail=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., channel_data=0x7fd5e4005c10) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2878
#10 video::capture (mail=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., channel_data=0x7fd5e4005c10) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2908
#11 stream::videoThread (session=0x7fd5e4005c10) at /usr/src/debug/sunshine/sunshine/src/stream.cpp:2097
#12 0x00005647d4e3e949 in execute_native_thread_routine ()
#13 0x00007fd61e097739 in start_thread (arg=<optimized out>) at pthread_create.c:454
#14 0x00007fd61e11bedc in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

So I think I can confirm that your flush check is definitely working as intended for VAAPI and most likely for Vulkan too.

@Kishi85

Kishi85 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

The problem is that I can't effectively test this PR with Vulkan due to the multiple conflicting issues - but I can confirm that it fixes VAAPI. The backtrace against master from a display switch crash shows the hw_base_encode_send_frame site:

#0  0x0000564cc8e495ee in hw_base_encode_send_frame (frame=<optimized out>, ctx=<optimized out>, avctx=<optimized out>) at libavcodec/hw_base_encode.c:508

⚠ warning: 508 libavcodec/hw_base_encode.c: No such file or directory
[Current thread is 1 (Thread 0x7f8b149fd6c0 (LWP 142011))]
(gdb) bt
#0  0x0000564cc8e495ee in hw_base_encode_send_frame (frame=<optimized out>, ctx=<optimized out>, avctx=<optimized out>) at libavcodec/hw_base_encode.c:508
#1  ff_hw_base_encode_receive_packet (ctx=0x7f8aedba2558, avctx=0x7f8aecac6e80, pkt=0x7f8aecb1d900) at libavcodec/hw_base_encode.c:587
#2  0x0000564cc8dd0044 in encode_receive_packet_internal (avctx=avctx@entry=0x7f8aecac6e80, avpkt=0x7f8aecb1d900) at libavcodec/encode.c:363
#3  0x0000564cc8dd0287 in avcodec_send_frame (avctx=0x7f8aecac6e80, frame=0x0) at libavcodec/encode.c:514
#4  0x0000564cc8c6e403 in video::avcodec_encode_session_t::~avcodec_encode_session_t (this=0x7f8aed40f890, this=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:422
#5  0x0000564cc8c80319 in video::avcodec_encode_session_t::~avcodec_encode_session_t (this=0x7f8aed40f890, this=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:430
#6  std::default_delete<video::encode_session_t>::operator() (this=<optimized out>, __ptr=0x7f8aed40f890) at /usr/include/c++/16/bits/unique_ptr.h:92
#7  std::unique_ptr<video::encode_session_t, std::default_delete<video::encode_session_t> >::~unique_ptr (this=<optimized out>, this=<optimized out>)
    at /usr/include/c++/16/bits/unique_ptr.h:408
#8  0x0000564cc8c6a6ed in video::encode_run (frame_nr=<synthetic pointer>: <optimized out>, mail=Python Exception <class 'gdb.error'>: value has been optimized out
, images=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., disp=Python Exception <class 'gdb.error'>: value has been optimized out

   , encode_device=std::unique_ptr<platf::encode_device_t> = {...}, reinit_event=..., encoder=<optimized out>, channel_data=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2446
#9  video::capture_async (mail=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., channel_data=0x7f8b2c0064e0) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2877
#10 video::capture (mail=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., channel_data=0x7f8b2c0064e0) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2907
#11 stream::videoThread (session=0x7f8b2c0064e0) at /usr/src/debug/sunshine/sunshine/src/stream.cpp:2097
#12 0x0000564cc9d05909 in execute_native_thread_routine ()
#13 0x00007f8b5f897739 in start_thread (arg=<optimized out>) at pthread_create.c:454
#14 0x00007f8b5f91bedc in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
(gdb) 

I'll move my UAF/deadlock issue with Vulkan that's not targeted by this PR into another PR or issue, but for the purposes of testing, I added the 1ms sleep to master branch in order to try and reproduce hw_base_encode_send_frame type crash on display switch with Vulkan, which seemed to be successful:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00005647d3f8262e in hw_base_encode_send_frame (frame=<optimized out>, ctx=<optimized out>, avctx=<optimized out>) at libavcodec/hw_base_encode.c:508

⚠ warning: 508 libavcodec/hw_base_encode.c: No such file or directory
[Current thread is 1 (Thread 0x7fd5c4e156c0 (LWP 151453))]
(gdb) bt
#0  0x00005647d3f8262e in hw_base_encode_send_frame (frame=<optimized out>, ctx=<optimized out>, avctx=<optimized out>) at libavcodec/hw_base_encode.c:508
#1  ff_hw_base_encode_receive_packet (ctx=0x7fd53f4fc218, avctx=0x7fd5b48d4540, pkt=0x7fd5b493b440) at libavcodec/hw_base_encode.c:587
#2  0x00005647d3f09084 in encode_receive_packet_internal (avctx=avctx@entry=0x7fd5b48d4540, avpkt=0x7fd5b493b440) at libavcodec/encode.c:363
#3  0x00005647d3f092c7 in avcodec_send_frame (avctx=0x7fd5b48d4540, frame=0x0) at libavcodec/encode.c:514
#4  0x00005647d3da7425 in video::avcodec_encode_session_t::~avcodec_encode_session_t (this=0x7fd5b48d6a70, this=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/utility.h:1015
#5  0x00005647d3db9359 in video::avcodec_encode_session_t::~avcodec_encode_session_t (this=0x7fd5b48d6a70, this=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:431
#6  std::default_delete<video::encode_session_t>::operator() (this=<optimized out>, __ptr=0x7fd5b48d6a70) at /usr/include/c++/16/bits/unique_ptr.h:92
#7  std::unique_ptr<video::encode_session_t, std::default_delete<video::encode_session_t> >::~unique_ptr (this=<optimized out>, this=<optimized out>)
    at /usr/include/c++/16/bits/unique_ptr.h:408
#8  0x00005647d3da36ed in video::encode_run (frame_nr=<synthetic pointer>: <optimized out>, mail=Python Exception <class 'gdb.error'>: value has been optimized out
, images=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., disp=Python Exception <class 'gdb.error'>: value has been optimized out

   , encode_device=std::unique_ptr<platf::encode_device_t> = {...}, reinit_event=..., encoder=<optimized out>, channel_data=<optimized out>) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2447
#9  video::capture_async (mail=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., channel_data=0x7fd5e4005c10) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2878
#10 video::capture (mail=Python Exception <class 'gdb.error'>: value has been optimized out
, config=..., channel_data=0x7fd5e4005c10) at /usr/src/debug/sunshine/sunshine/src/video.cpp:2908
#11 stream::videoThread (session=0x7fd5e4005c10) at /usr/src/debug/sunshine/sunshine/src/stream.cpp:2097
#12 0x00005647d4e3e949 in execute_native_thread_routine ()
#13 0x00007fd61e097739 in start_thread (arg=<optimized out>) at pthread_create.c:454
#14 0x00007fd61e11bedc in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

So I think I can confirm that your flush check is definitely working as intended for VAAPI and most likely for Vulkan too.

Thanks for confirming the fix and ensuring the vulkan crash is indeed a separate issue. I'll keep an eye out for your Issue/PR on that and will help as much as I'm able fixing that one as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Coredump with SIGSEGV when switching displays

3 participants