Skip to content

Raw callbacks (data records as bytes)#130

Closed
alexandervlpl wants to merge 4 commits intodatabento:mainfrom
Quantum-Signals:raw-callback
Closed

Raw callbacks (data records as bytes)#130
alexandervlpl wants to merge 4 commits intodatabento:mainfrom
Quantum-Signals:raw-callback

Conversation

@alexandervlpl
Copy link
Copy Markdown

Description

This adds support for callbacks that handle data records as bytes via decode_raw() as implemented in databento/dbn#117 (see that PR for details).

Workaround for databento/dbn#116 , but also a more efficient code path that eliminates ~10K objects per second for subscribers that don't need them.

@alexandervlpl alexandervlpl changed the title Raw callback (data records as bytes) Raw callbacks (data records as bytes) Mar 31, 2026
@nmacholl
Copy link
Copy Markdown
Collaborator

nmacholl commented Mar 31, 2026

While we don't merge public PRs, we will consider this as a feature request. I'll need some convincing though.

Is this use case not served by Live.add_stream()?

@alexandervlpl
Copy link
Copy Markdown
Author

We've found that the callback path performs best for our use case, and getting bytes there would be useful. But I will take another look at add_stream(), thanks for the suggestion.

@alexandervlpl
Copy link
Copy Markdown
Author

alexandervlpl commented Apr 1, 2026

Actually, I see that Python DBNRecord objects are created before dispatching via either callbacks or streams. So we gain nothing. And anything like symbol resolution, filtering and metrics become much harder compared to the callback approach.

@nmacholl
Copy link
Copy Markdown
Collaborator

nmacholl commented Apr 1, 2026

What is gained by going through the DBN decoder is:

  • non-fragmented DBN
  • validation
  • version upgrades

I still don't fully understand your use case, but there are other options here.
If the stream is not usable for you, you can implement a subclass of the DatabentoLiveProtocol.

@alexandervlpl
Copy link
Copy Markdown
Author

alexandervlpl commented Apr 3, 2026

The idea is exactly keeping at least some of the decoder benefits you mentioned and not re-implementing all the fine work you've done with the protocol, while still creating a fast path to process the actual data records with minimal overhead. In our case I see a ~2x reduction in slow reader warnings with this PR, so significantly less skipping. We could probably get that with our own DatabentoLiveProtocol, but that looks like a lot of complex re-implementing and more than a simple subclass.

So to me the proposed split between decoded Python control (system) records and raw bytes data records seems logical as an additional code path. By no means replacing the existing ones.

Our use case has been passing bytes(MBP10Msg) to a minimal native encoder while handling everything else (metrics, error handling, logging, filtering, etc) in Python. But I can see some effective Python-only implementations as well.

@nmacholl
Copy link
Copy Markdown
Collaborator

nmacholl commented Apr 3, 2026

For the live protocol, you would just need to re-implement the _process_dbn() function. Then you can use that protocol with asyncio.create_connection() directly or just replace the existing protocol with your own.

That said, I am going to close this PR. I've created a roadmap item to track this idea here. You can vote or comment on it if you'd like.

I'm not keen on increasing the public API surface for what seems like a niche use case, but that may change in the future.

@nmacholl nmacholl closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants