Skip to content

RUBY-3701 Apply backpressure in with_transaction#2992

Open
comandeo-mongo wants to merge 2 commits intomongodb:masterfrom
comandeo-mongo:3701-with-transaction-backpressure
Open

RUBY-3701 Apply backpressure in with_transaction#2992
comandeo-mongo wants to merge 2 commits intomongodb:masterfrom
comandeo-mongo:3701-with-transaction-backpressure

Conversation

@comandeo-mongo
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings March 3, 2026 16:39
@comandeo-mongo comandeo-mongo requested a review from a team as a code owner March 3, 2026 16:39
@comandeo-mongo comandeo-mongo requested a review from jamis March 3, 2026 16:39
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces retry backpressure to Mongo::Session#with_transaction by adding a jittered exponential backoff between transaction retry attempts, along with new specs intended to validate the backoff behavior (including a prose-style integration spec).

Changes:

  • Add retry attempt tracking and sleep-based exponential backoff to Mongo::Session#with_transaction.
  • Introduce private helper methods/constants for backoff calculation and deadline interaction.
  • Add new specs for backoff calculation and a prose test measuring the added delay under commit failures.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
lib/mongo/session.rb Adds attempt tracking, backoff sleep, and helper methods/constants for retry backpressure in with_transaction.
spec/mongo/session_transaction_spec.rb Adds unit-style examples intended to validate backoff math/jitter behavior.
spec/mongo/session_transaction_prose_spec.rb Adds a prose/integration spec measuring end-to-end delay when jitter is enabled using failCommand.
Comments suppressed due to low confidence (1)

lib/mongo/session.rb:527

  • Backoff is only applied between outer transaction attempts (before start_transaction). When commit_transaction raises UnknownTransactionCommitResult, this code uses retry to immediately retry the commit without any delay, meaning commit retry loops can still hot-spin. If the intent is “apply backpressure in with_transaction” (and to satisfy the prose spec expectations), consider applying the same backoff between commit retries as well (e.g., track a separate commit-attempt counter or route commit retries through the same backoff path).
          rescue Mongo::Error => e
            if e.label?('UnknownTransactionCommitResult')
              if deadline_expired?(deadline) ||
                e.is_a?(Error::OperationFailure::Family) && e.max_time_ms_expired?
              then
                transaction_in_progress = false
                raise
              end
              wc_options = case v = commit_options[:write_concern]
                when WriteConcern::Base
                  v.options
                when nil
                  {}
                else
                  v
                end
              commit_options[:write_concern] = wc_options.merge(w: :majority)
              retry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Exponential backoff settings for with_transaction retries.
BACKOFF_INITIAL = 0.005
BACKOFF_MAX = 0.5
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BACKOFF_INITIAL and BACKOFF_MAX are introduced as Mongo::Session constants; constants are part of the public surface area even when defined under private. If these are intended to be internal tuning knobs, consider marking them private_constant (similar to other internal constants like Mongo::Server::RoundTripTimeCalculator::RTT_WEIGHT_FACTOR) to avoid unintentionally expanding the public API.

Suggested change
BACKOFF_MAX = 0.5
BACKOFF_MAX = 0.5
private_constant :BACKOFF_INITIAL, :BACKOFF_MAX

Copilot uses AI. Check for mistakes.
mode: { times: times },
data: {
failCommands: ['commitTransaction'],
errorCode: 251,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failCommand fail point here only sets errorCode: 251 and does not specify any errorLabels. In the driver code, UnknownTransactionCommitResult is not added for generic OperationFailure codes like 251 (only for retryable/wtimeout/maxTimeMS, etc.), so this test may not actually exercise the intended retry path on some server versions/configurations. To make the prose test deterministic, configure the fail point to return the expected transaction label(s) via data[:errorLabels] (and/or use an error code that is guaranteed to carry the desired label).

Suggested change
errorCode: 251,
errorCode: 251,
errorLabels: ['UnknownTransactionCommitResult'],

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +304
# Test backoff formula: jitter * min(BACKOFF_INITIAL * 1.5^(attempt-1), BACKOFF_MAX)
backoff_initial = Mongo::Session::BACKOFF_INITIAL
backoff_max = Mongo::Session::BACKOFF_MAX

# Test attempt 1: 1.5^0 = 1
expected_attempt_1 = backoff_initial * (1.5 ** 0)
expect(expected_attempt_1).to eq(0.005)

# Test attempt 2: 1.5^1 = 1.5
expected_attempt_2 = backoff_initial * (1.5 ** 1)
expect(expected_attempt_2).to eq(0.0075)

# Test attempt 3: 1.5^2 = 2.25
expected_attempt_3 = backoff_initial * (1.5 ** 2)
expect(expected_attempt_3).to eq(0.01125)

# Test cap at BACKOFF_MAX
expected_attempt_large = [backoff_initial * (1.5 ** 20), backoff_max].min
expect(expected_attempt_large).to eq(backoff_max)
end

it 'applies jitter to backoff' do
# Jitter should be a random value between 0 and 1
# When multiplied with backoff, it should reduce the actual sleep time
backoff = 0.100 # 100ms
jitter_min = 0
jitter_max = 1

actual_min = jitter_min * backoff
actual_max = jitter_max * backoff

expect(actual_min).to eq(0)
expect(actual_max).to eq(0.100)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These examples don’t exercise the production backoff implementation: they only re-compute the math from the constants (and in the jitter case, just multiply fixed numbers), so they would still pass even if backoff_seconds_for_retry were broken or jitter were not applied. Consider stubbing Random.rand and asserting against backoff_seconds_for_retry (via send since it’s private), and use a tolerance matcher for floats to avoid brittleness.

Suggested change
# Test backoff formula: jitter * min(BACKOFF_INITIAL * 1.5^(attempt-1), BACKOFF_MAX)
backoff_initial = Mongo::Session::BACKOFF_INITIAL
backoff_max = Mongo::Session::BACKOFF_MAX
# Test attempt 1: 1.5^0 = 1
expected_attempt_1 = backoff_initial * (1.5 ** 0)
expect(expected_attempt_1).to eq(0.005)
# Test attempt 2: 1.5^1 = 1.5
expected_attempt_2 = backoff_initial * (1.5 ** 1)
expect(expected_attempt_2).to eq(0.0075)
# Test attempt 3: 1.5^2 = 2.25
expected_attempt_3 = backoff_initial * (1.5 ** 2)
expect(expected_attempt_3).to eq(0.01125)
# Test cap at BACKOFF_MAX
expected_attempt_large = [backoff_initial * (1.5 ** 20), backoff_max].min
expect(expected_attempt_large).to eq(backoff_max)
end
it 'applies jitter to backoff' do
# Jitter should be a random value between 0 and 1
# When multiplied with backoff, it should reduce the actual sleep time
backoff = 0.100 # 100ms
jitter_min = 0
jitter_max = 1
actual_min = jitter_min * backoff
actual_max = jitter_max * backoff
expect(actual_min).to eq(0)
expect(actual_max).to eq(0.100)
backoff_initial = Mongo::Session::BACKOFF_INITIAL
backoff_max = Mongo::Session::BACKOFF_MAX
# Stub jitter to 1.0 so that backoff is equal to the base value
allow(Random).to receive(:rand).and_return(1.0)
# Attempt 1: 1.5^0 = 1
backoff_attempt_1 = session.send(:backoff_seconds_for_retry, 1)
expected_attempt_1 = [backoff_initial * (1.5 ** 0), backoff_max].min
expect(backoff_attempt_1).to be_within(1e-9).of(expected_attempt_1)
# Attempt 2: 1.5^1 = 1.5
backoff_attempt_2 = session.send(:backoff_seconds_for_retry, 2)
expected_attempt_2 = [backoff_initial * (1.5 ** 1), backoff_max].min
expect(backoff_attempt_2).to be_within(1e-9).of(expected_attempt_2)
# Attempt 3: 1.5^2 = 2.25
backoff_attempt_3 = session.send(:backoff_seconds_for_retry, 3)
expected_attempt_3 = [backoff_initial * (1.5 ** 2), backoff_max].min
expect(backoff_attempt_3).to be_within(1e-9).of(expected_attempt_3)
# Large attempt should be capped at BACKOFF_MAX
backoff_attempt_large = session.send(:backoff_seconds_for_retry, 20)
expected_attempt_large = [backoff_initial * (1.5 ** 19), backoff_max].min
expect(backoff_attempt_large).to be_within(1e-9).of(expected_attempt_large)
end
it 'applies jitter to backoff' do
backoff_initial = Mongo::Session::BACKOFF_INITIAL
attempt = 2
base_backoff = [backoff_initial * (1.5 ** (attempt - 1)), Mongo::Session::BACKOFF_MAX].min
# First call: jitter 0.25
# Second call: jitter 0.75
allow(Random).to receive(:rand).and_return(0.25, 0.75)
backoff_low_jitter = session.send(:backoff_seconds_for_retry, attempt)
backoff_high_jitter = session.send(:backoff_seconds_for_retry, attempt)
expect(backoff_low_jitter).to be_within(1e-9).of(base_backoff * 0.25)
expect(backoff_high_jitter).to be_within(1e-9).of(base_backoff * 0.75)
expect(backoff_high_jitter).to be > backoff_low_jitter

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with copilot here -- these tests would be far more meaningful if they tested the code that performs the calculations.

Comment on lines 528 to 535
elsif e.label?('TransientTransactionError')
if Utils.monotonic_time >= deadline
transaction_in_progress = false
raise
end
last_error = e
@state = NO_TRANSACTION_STATE
next
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the TransientTransactionError commit path, the code does next without clearing transaction_in_progress. With the new backoff check at the top of the loop (raise(last_error) before starting a new transaction), this can cause the ensure block to think the callback broke out of the loop and emit a misleading warning / attempt an abort when exiting. Set transaction_in_progress = false before next (and similarly before any early raises taken on retry paths) so the ensure logic reflects actual state.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@jamis jamis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just agreeing with copilot about testing the calculations. The tests should probably call the computation methods, rather than performing the calculations directly.

Comment on lines +272 to +304
# Test backoff formula: jitter * min(BACKOFF_INITIAL * 1.5^(attempt-1), BACKOFF_MAX)
backoff_initial = Mongo::Session::BACKOFF_INITIAL
backoff_max = Mongo::Session::BACKOFF_MAX

# Test attempt 1: 1.5^0 = 1
expected_attempt_1 = backoff_initial * (1.5 ** 0)
expect(expected_attempt_1).to eq(0.005)

# Test attempt 2: 1.5^1 = 1.5
expected_attempt_2 = backoff_initial * (1.5 ** 1)
expect(expected_attempt_2).to eq(0.0075)

# Test attempt 3: 1.5^2 = 2.25
expected_attempt_3 = backoff_initial * (1.5 ** 2)
expect(expected_attempt_3).to eq(0.01125)

# Test cap at BACKOFF_MAX
expected_attempt_large = [backoff_initial * (1.5 ** 20), backoff_max].min
expect(expected_attempt_large).to eq(backoff_max)
end

it 'applies jitter to backoff' do
# Jitter should be a random value between 0 and 1
# When multiplied with backoff, it should reduce the actual sleep time
backoff = 0.100 # 100ms
jitter_min = 0
jitter_max = 1

actual_min = jitter_min * backoff
actual_max = jitter_max * backoff

expect(actual_min).to eq(0)
expect(actual_max).to eq(0.100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with copilot here -- these tests would be far more meaningful if they tested the code that performs the calculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants