Skip to content

PS-11136: non-GTID transactions cause one storage flush per transaction, bypassing size/interval checkpointing#127

Open
kamil-holubicki wants to merge 1 commit into
Percona-Lab:0.2from
kamil-holubicki:PS-11136
Open

PS-11136: non-GTID transactions cause one storage flush per transaction, bypassing size/interval checkpointing#127
kamil-holubicki wants to merge 1 commit into
Percona-Lab:0.2from
kamil-holubicki:PS-11136

Conversation

@kamil-holubicki
Copy link
Copy Markdown
Collaborator

https://perconadev.atlassian.net/browse/PS-11136

Problem:
In non-GTID (anonymous transaction) replication mode, PBS flushes its in-memory event buffer to the storage backend on every transaction boundary, ignoring the configured 'checkpoint_size_bytes' and 'checkpoint_interval_seconds' thresholds. For object-store backends this turns into one PUT per transaction.

Cause:
'storage::write_event()' had a fast-path keyed on
'at_transaction_boundary && transaction_gtid.is_empty()' whose intent was "flush regardless of thresholds because this is the file-final ROTATE/STOP event". The condition wasn't tight enough: anonymous transactions also satisfy it (they never populate 'transaction_gtid_'), so every XID terminating an anonymous transaction was misidentified as a file terminator and forced a synchronous flush.

Solution:
Removed the fast-path. The file-final ROTATE/STOP event is still flushed - just through the already-existing 'storage::close_binlog()' call on the 'process_rotate_or_stop_event()' / artificial-rotate rename paths, which is the natural place for a file-boundary flush. GTID-mode behavior is unchanged.

…on, bypassing size/interval checkpointing

https://perconadev.atlassian.net/browse/PS-11136

Problem:
In non-GTID (anonymous transaction) replication mode, PBS flushes its
in-memory event buffer to the storage backend on every transaction
boundary, ignoring the configured 'checkpoint_size_bytes' and
'checkpoint_interval_seconds' thresholds. For object-store backends
this turns into one PUT per transaction.

Cause:
'storage::write_event()' had a fast-path keyed on
'at_transaction_boundary && transaction_gtid.is_empty()' whose intent
was "flush regardless of thresholds because this is the file-final
ROTATE/STOP event". The condition wasn't tight enough: anonymous
transactions also satisfy it (they never populate 'transaction_gtid_'),
so every XID terminating an anonymous transaction was misidentified as
a file terminator and forced a synchronous flush.

Solution:
Removed the fast-path. The file-final ROTATE/STOP event is still
flushed - just through the already-existing 'storage::close_binlog()'
call on the 'process_rotate_or_stop_event()' / artificial-rotate
rename paths, which is the natural place for a file-boundary flush.
GTID-mode behavior is unchanged.

--echo
--echo *** Waiting for pull mode to read existing transactions.
--sleep 10
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do better here. Maybe, make several attempts to grep the content of the PBS log file until we encounter something like next event position: <expected_position>
<expected_position> can be probably extracted from SHOW BINLOG EVENTS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants