Skip to content

feat(common): multi-network streaming query execution#2112

Open
Theodus wants to merge 2 commits intomainfrom
theodus/multi-network-stream
Open

feat(common): multi-network streaming query execution#2112
Theodus wants to merge 2 commits intomainfrom
theodus/multi-network-stream

Conversation

@Theodus
Copy link
Copy Markdown
Member

@Theodus Theodus commented Apr 8, 2026

Implement timestamp-aligned streaming for queries spanning multiple networks. Materialized multi-network output is not included in this PR.

Multi-network execution:

  • Branch on blocks_tables.len() for single vs multi-network paths
  • Add blocks_table_fetch_by_timestamp() for timestamp-keyed block lookups
  • Add latest_src_watermark_multi() for cross-network watermark consensus
  • Shared timestamp window: start_ts = min(all starts), end_ts = min(all ends)
  • Rewind all networks on any single-network reorg
  • Reject end_block for multi-network queries in spawn()
  • Force WatermarkColumn::Ts for multi-network (block numbers are incomparable across networks)

Integration tests:

  • Add anvil_rpc_a/anvil_rpc_b manifest fixtures for two-network testing
  • Add set_block_timestamp_interval() and set_next_block_timestamp() to Anvil fixture for deterministic block timestamps
  • Three tests with deterministic timestamp alignment: UNION ALL across networks, CROSS JOIN across networks, and reorg on one network triggering rewind-all behavior

@Theodus Theodus requested review from LNSD and leoyvens April 8, 2026 15:23
@Theodus Theodus force-pushed the theodus/multi-network-stream branch from 189c35b to 9d4f764 Compare April 8, 2026 16:06
@Theodus Theodus mentioned this pull request Apr 8, 2026
16 tasks
@Theodus Theodus marked this pull request as draft April 8, 2026 22:30
Implement timestamp-aligned streaming for queries spanning multiple
networks. Materialized multi-network output is not included in this PR.

Multi-network execution:
- Branch on `blocks_tables.len()` for single vs multi-network paths
- Add `blocks_table_fetch_by_timestamp()` for timestamp-keyed block
lookups
- Add `latest_src_watermark_multi()` for cross-network watermark
consensus
- Shared timestamp window: `start_ts = min(all starts)`,
`end_ts = min(all ends)`
- Rewind all networks on any single-network reorg
- Reject `end_block` for multi-network queries in `spawn()`
- Force `WatermarkColumn::Ts` for multi-network (block numbers are
incomparable across networks)

Integration tests:
- Add `anvil_rpc_a`/`anvil_rpc_b` manifest fixtures for two-network
testing
- Add `set_block_timestamp_interval()` and
`set_next_block_timestamp()` to Anvil fixture for deterministic
block timestamps
- Three tests with deterministic timestamp alignment: UNION ALL
across networks, CROSS JOIN across networks, and reorg on one
network triggering rewind-all behavior
@Theodus Theodus force-pushed the theodus/multi-network-stream branch from 9d4f764 to 457509b Compare April 8, 2026 22:59
@Theodus Theodus marked this pull request as ready for review April 8, 2026 23:22
///
/// In single-network mode this value is a block count. In multi-network mode it
/// is interpreted as a second-based interval because block numbers are
/// incomparable across networks.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means what we want it to mean. Microbatch sizes are a social construct anyways 😄

Best thing would be to start thinking about making these sizes adaptive somehow.

// The cursor stores the previous end block's timestamp, but the Delta
// filter uses an inclusive lower bound (_ts >= start). Fetch the actual
// start block to get the correct timestamp and avoid re-including the
// previous batch's rows.
Copy link
Copy Markdown
Collaborator

@leoyvens leoyvens Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra SQL queries in this path are never great. The trick we did for block numbers was to set the start as prev.number + 1, under the semantics that the start block number is not required to correspond to a block that exists. Could a similar trick apply here, prev.timestamp + 1?

I remember we rediscussed theses semantics when skipped Solana slots actually came up, but don't recall exactly where we landed. But the number + 1 code is still there 🤷

Copy link
Copy Markdown
Member Author

@Theodus Theodus Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is not great. I'm hoping to find a way to optimize as a followup, which is part of the reason I've kept the single-network path isolated from these changes.

_block_num + 1 is still a valid lower bound for the scan, even with block-number gaps. _ts + 1 is different, because adjacent segments can legitimately share the same timestamp (especially for sub-second block times, which we normalize to whole seconds). I ran into that on local anvil chains while preparing the demo.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that makes sense.

But just in terms of readability the way the timestamp is handled in SegmentStart is quite hacky, a dead value is written in next_microbatch_start_for_network only to be overwritten here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be improved in 6bbb40a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants