trickle_publisher: `_resolve_next_seq` fallback drops first video segment + logs noisy warning

## Symptom

Two issues from the same code path in [`trickle_publisher.py:_resolve_next_seq`](https://github.com/livepeer/livepeer-python-gateway/blob/main/src/livepeer_gateway/trickle_publisher.py):

### (1) Noisy warning at every session start

```text
WARNING livepeer_gateway.trickle_publisher: Trickle /next missing Lp-Trickle-Latest header
```

Fires once per publisher (events + data + media-out) at every `LivePipeline` session start. Visible in every example pipeline (`live_grayscale`, `live_depth`, `live_transcribe`).

### (2) First video segment dropped on the `-out` channel

Real bug, not just noise. Orchestrator log signature:

```text
POST segment stream=<sid>-out idx=0 next=0      ← URL /-1, server resolved to slot 0
POST segment stream=<sid>-out idx=0 next=0      ← URL /0,  duplicate write to slot 0
POST segment stream=<sid>-out idx=0 next=0      ← retry after the drop
POST segment stream=<sid>-out idx=1 next=1      ← in-sync from here onwards
```

Runner-side log:

```text
WARNING ... MediaPublish[<sid>-out] dropped segment seq=0 mid-stream; draining pipe
livepeer_gateway.trickle_publisher.TrickleSegmentWriteError: Trickle segment writer timed out after 5.0s
WARNING ... Trickle segment close suppressed seq=0
```

The first MPEG-TS video segment never makes it through. Subsequent segments are fine because the publisher's local counter is back in sync with the server's `nextWrite` from segment 1.

## Root cause

The publisher's `_resolve_next_seq()` ([line 346](https://github.com/livepeer/livepeer-python-gateway/blob/main/src/livepeer_gateway/trickle_publisher.py#L346)) does:

```python
async def _resolve_next_seq(self) -> int:
    url = f"{self.url}/next"
    try:
        resp = await self._session.get(url)
        latest = resp.headers.get("Lp-Trickle-Latest")
        resp.release()
        if latest is not None:
            return int(latest)
        else:
            _LOG.warning("Trickle /next missing Lp-Trickle-Latest header")
    except Exception:
        _LOG.warning("Trickle /next request failed", exc_info=True)
    return -1
```

The `/next` endpoint is in [livepeer/go-livepeer#3884](https://github.com/livepeer/go-livepeer/pull/3884) (Josh's "Serverless" PR, still open) — both sides authored as a designed pair on the same day:

| When | Where | Author |
|---|---|---|
| 2026-03-26 01:11 PDT | `_resolve_next_seq` lands in this repo (commit `5007e2c`) | Josh Allmann |
| 2026-03-26 12:21 PDT | `/next` endpoint added to go-livepeer (commit `8da1c692`, in PR #3884) | Josh Allmann |

The Python side merged to `main`. The Go side hasn't merged to `master`. So on master, every probe hits a 400 (no `/next` route), no header is set, the warning fires, and `_resolve_next_seq` returns `-1`.

The caller in [`next()`](https://github.com/livepeer/livepeer-python-gateway/blob/main/src/livepeer_gateway/trickle_publisher.py#L382):

```python
if self.seq < 0:
    send_reset = True
    self.seq = await self._resolve_next_seq()    # = -1
self._next_state = await self.preconnect(self.seq, send_reset=send_reset)  # POST /-1
self.seq += 1                                    # = 0
preconnect_task = asyncio.create_task(self._preconnect_task(self.seq))     # POST /0 in background
```

Server-side, `getForWrite(idx == -1)` resolves locally to `nextWrite` (= 0 for a fresh channel) and creates `segment[0]`. The publisher's *next* POST then targets URL `/0` — server's `getForWrite(0)` finds the existing `segment[0]` and returns it. Two HTTP handlers now both writing to `segment[0]`. On the events channel the writes are tiny JSON heartbeats and the race is invisible. On the video `-out` channel the first POST is mid-stream when the second arrives → the server's segment-write semantics race → 5 s later the first writer trips its timeout → seq=0 dropped.

## Fix

One line, both symptoms covered:

```python
async def _resolve_next_seq(self) -> int:
    url = f"{self.url}/next"
    try:
        resp = await self._session.get(url)
        latest = resp.headers.get("Lp-Trickle-Latest")
        resp.release()
        if latest is not None:
            return int(latest)
        # Common pre-#3884 — server has no /next endpoint, so we get no header.
        # Treat as fresh channel and start at slot 0; matches pytrickle's approach.
        _LOG.debug("Trickle /next missing Lp-Trickle-Latest header — assuming fresh channel")
    except Exception:
        _LOG.warning("Trickle /next request failed", exc_info=True)
    return 0  # fresh-channel default; the -1 sentinel race causes duplicate POSTs to slot 0
```

Two changes:

1. **`return 0` (was `-1`)** — eliminates the duplicate POST → first video segment delivers cleanly.
2. **`debug` (was `warning`)** — eliminates the operator-visible noise, matches what PR [#6](https://github.com/livepeer/livepeer-python-gateway/pull/6) already does for symptom (1).

Both pre-#3884 (probe fails, fresh-channel default kicks in) and post-#3884 (probe succeeds, returns server-reported `nextWrite`) work correctly.

## Why not just wait for #3884

Option, but unbounded timeline. PR #6 already partially addresses (1) by demoting the warning, but doesn't touch the `return -1` line — so even after #6 merges into our branches, symptom (2) persists. The `return 0` change is the load-bearing fix.

## Caveat — resume semantics

Returning `0` on probe failure means a publisher reconnecting on a non-fresh channel pre-#3884 would overwrite slot 0 instead of resuming from `nextWrite`. **No regression**: today's `-1` path also breaks resume on master (the sentinel resolves to `nextWrite` server-side, then the publisher's local counter increments from -1 to 0, posting to `/0` and overwriting all existing segments). Resume only works post-#3884 anyway.

## Out of scope

- Removing `_resolve_next_seq()` entirely — it's correct and useful post-#3884 for resume.
- Server-side fixes — covered by [livepeer/go-livepeer#3884](https://github.com/livepeer/go-livepeer/pull/3884).

## Confirmed impact on the data channel

Observable in `live_transcribe`'s SSE-based caller test (`examples/runner/live_transcribe/test.sh`). Runner emits 5 transcripts; SSE subscriber on the gateway's `data_url` proxy receives only 3:

```
runner _LOG (all 5):                        SSE subscriber output (3 of 5):
  transcript[0]: ask not                      transcript[0]: ask not
  transcript[1]: What your country can do…    (missing — `seq=0` race)
  transcript[2]: ask what you can do for…     transcript[2]: ask what you can do for…
  transcript[3]: And so am I fellow Americans transcript[3]: And so am I fellow Americans
  transcript[4]: Ask!                         (missing — separate teardown race)
```

`transcript[1]` is dropped by exactly the bug above: the publisher's first POST goes to URL `/-1` (server resolves to slot 0), and the second POST goes to URL `/0` — server finds the existing `segment[0]` and races it. transcript[0]'s write wins; transcript[1]'s overwrites or is dropped depending on timing.

This matches the video-channel `seq=0 dropped + writer timed out 5.0s` we saw before, just with smaller payloads and different visibility. Same root cause, same one-line fix.

(`transcript[4]` is a separate teardown timing issue between the runner's `on_stream_stop` flush and the gateway's SSE end signal — not part of this issue.)

## Context

Surfaced while testing `live_transcribe` end-to-end. The noisy warning is in every example pipeline; the duplicate-POST loss is in any pipeline that emits on the video output channel OR the data channel. The data-channel impact is now directly observable via the SSE test in `live_transcribe`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

trickle_publisher: `_resolve_next_seq` fallback drops first video segment + logs noisy warning #12

Symptom

(1) Noisy warning at every session start

(2) First video segment dropped on the `-out` channel

Root cause

Fix

Why not just wait for #3884

Caveat — resume semantics

Out of scope

Confirmed impact on the data channel

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

When	Where	Author
2026-03-26 01:11 PDT	`_resolve_next_seq` lands in this repo (commit `5007e2c`)	Josh Allmann
2026-03-26 12:21 PDT	`/next` endpoint added to go-livepeer (commit `8da1c692`, in PR #3884)	Josh Allmann

trickle_publisher: _resolve_next_seq fallback drops first video segment + logs noisy warning #12

Description

Symptom

(1) Noisy warning at every session start

(2) First video segment dropped on the -out channel

Root cause

Fix

Why not just wait for #3884

Caveat — resume semantics

Out of scope

Confirmed impact on the data channel

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

trickle_publisher: `_resolve_next_seq` fallback drops first video segment + logs noisy warning #12

(2) First video segment dropped on the `-out` channel