[build] retry Bazel on transient GitHub CDN 5xx errors during repo fetch by titusfortner · Pull Request #17568 · SeleniumHQ/selenium

titusfortner · 2026-05-25T01:36:05Z

I'm seeing a bunch of 502 Gateway errors trying to run tests:
https://github.com/SeleniumHQ/selenium/actions/runs/26376849996

Options:

I thought --repository_downloader_retries was the answer until I realized that Bazel considers 502 errors non-transient and doesn't retry them.
Figure out mirrors - this looks complicated
Ask EngFlow to let us use remote download, but we're already getting a lot from them
Use Github Cache - I have another PR to investigate this incoming
Retry the whole run when we detect a problem

This PR is Option 5

💥 What does this PR do?

Wraps our current execution in a retry loop that parses the output log we're already generating and if it shows there was a 502 Gateway error, it reruns.

🔧 Implementation Notes

The log being parsed is overwritten on future runs so it won't flag on 502 errors from previous attempts
Github log will still show everything
Updated to use ENV for the inputs.run rather than positional arg so quote syntax doesn't collide

💡 Additional Considerations

I'm not sure how common this is going to be going forward
I think this may cause issues with mirrors if we use them

🤖 AI assistance

No substantial AI assistance used
AI assisted (complete below)
- Tool(s):
- What was generated:
- I reviewed all AI output and can explain the change

🔄 Types of changes

Bug fix (backwards compatible)

qodo-code-review · 2026-05-25T01:36:18Z

Review Summary by Qodo

Add retry logic for Bazel on transient CDN errors

🐞 Bug fix ✨ Enhancement

Walkthroughs

Description

• Implements retry mechanism for transient GitHub CDN 5xx errors
• Detects HTTP 502/503/504 errors in Bazel output logs
• Retries failed Bazel commands up to 3 times with exponential backoff
• Preserves original exit codes for non-transient failures

Diagram

flowchart LR
  A["Bazel Command"] --> B["Execute with Log Capture"]
  B --> C{"Exit Code = 0?"}
  C -->|Yes| D["Success"]
  C -->|No| E{"5xx Error Detected?"}
  E -->|No| F["Exit with Error"]
  E -->|Yes| G{"Max Retries?"}
  G -->|Yes| H["Exit Failure"]
  G -->|No| I["Wait & Retry"]
  I --> B

File Changes

1. scripts/github-actions/run-bazel-with-retry.sh ✨ Enhancement +44/-0

New retry wrapper script for Bazel CDN errors

• New shell script that wraps Bazel commands with retry logic
• Detects transient HTTP 5xx errors using regex pattern matching
• Implements exponential backoff (15s, 30s, 45s) between retry attempts
• Logs output to file and preserves non-transient error exit codes

scripts/github-actions/run-bazel-with-retry.sh

2. .github/workflows/bazel.yml ✨ Enhancement +1/-5

Integrate retry script into Bazel workflow

• Replaces inline Bazel execution with call to retry wrapper script
• Removes manual log file creation and tee piping logic
• Simplifies "Run Bazel" step by delegating retry handling to script

.github/workflows/bazel.yml

qodo-code-review · 2026-05-25T01:36:19Z

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (1)

1. Tee failures ignored 🐞 Bug ☼ Reliability ⭐ New

Description

run-bazel-with-retry.sh only considers Bazel’s exit code, so if tee fails to write the console log
(e.g., permission/disk-full), the step can still exit 0 and silently drop build/bazel-console.log.
This can hide infrastructure problems and break downstream tooling that parses that log on failures.

Code

scripts/github-actions/run-bazel-with-retry.sh[20]

Evidence
The script pipes output to tee but only reads PIPESTATUS[0] (Bazel), so tee write failures are
ignored; meanwhile downstream parsing relies on that log existing.
scripts/github-actions/run-bazel-with-retry.sh[19-22]
scripts/github-actions/rerun-failures.sh[4-10]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`run-bazel-with-retry.sh` pipes Bazel output to `tee`, but it only uses Bazel’s exit code (`PIPESTATUS[0]`). If `tee` fails (can’t write the log), the script may still report success and/or proceed with retry logic based on incomplete logs.

## Issue Context
Downstream logic expects `build/bazel-console.log` to be present and readable when diagnosing failures.

## Fix Focus Areas
- scripts/github-actions/run-bazel-with-retry.sh[20-21]

### Implementation guidance
After the pipeline, capture both statuses (Bazel + tee) and fail fast on `tee` errors, e.g.:
- `BAZEL_EXIT_CODE=${PIPESTATUS[0]}`
- `TEE_EXIT_CODE=${PIPESTATUS[1]}`
- if `TEE_EXIT_CODE != 0`, print an error and `exit $TEE_EXIT_CODE` (or a fixed non-zero code).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~Retry check never matches~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

run-bazel-with-retry.sh searches the log for the literal text "BAZEL_ERROR_PATTERN" (missing $),
so 502/503/504 fetch errors will never be detected and the Bazel run will not be retried.

Code

scripts/github-actions/run-bazel-with-retry.sh[R28-36]

Evidence

The script defines a regex in BAZEL_ERROR_PATTERN, but grep is invoked with the quoted literal
"BAZEL_ERROR_PATTERN" (no $), so it will only match if the log contains that exact text, not the
intended 5xx error lines.

scripts/github-actions/run-bazel-with-retry.sh[16-40]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The retry condition uses `grep -qE "BAZEL_ERROR_PATTERN"` and `grep -E "BAZEL_ERROR_PATTERN"`, which greps for the literal string `BAZEL_ERROR_PATTERN` instead of the regex stored in the `BAZEL_ERROR_PATTERN` variable. This prevents the retry loop from ever triggering on 5xx CDN errors.

## Issue Context
The script defines `BAZEL_ERROR_PATTERN='GET returned 50[234] '`, but never dereferences it when calling `grep`.

## Fix Focus Areas
- scripts/github-actions/run-bazel-with-retry.sh[16-36]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. tee/grep missing -- 📘 Rule violation ⛨ Security ⭐ New

Description

The script passes $LOG_FILE to tee and grep without a -- argument separator, so a log path
beginning with - could be interpreted as options (option/argument injection) instead of a
filename. This violates the safe argument-handling requirement for CI scripts and can cause
incorrect behavior or unexpected flags being applied.

Code

scripts/github-actions/run-bazel-with-retry.sh[R20-34]

Evidence

PR Compliance ID 14 requires safe argument handling in build/CI scripts. The new code invokes `tee
"$LOG_FILE" and grep ... "$LOG_FILE" without --, allowing a -`-prefixed log filename (via
env/config) to be parsed as options.

scripts/github-actions/run-bazel-with-retry.sh[20-34]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`scripts/github-actions/run-bazel-with-retry.sh` passes `$LOG_FILE` as a positional argument to `tee`/`grep` without a `--` separator. If `$LOG_FILE` begins with `-`, these tools may treat it as an option rather than a filename.

## Issue Context
This script is used in GitHub Actions. Rule requires safe argument handling; adding `--` prevents option parsing for filename-like arguments.

## Fix Focus Areas
- scripts/github-actions/run-bazel-with-retry.sh[20-34]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. ~~bash -c executes CMD~~ ✓ Resolved 📘 Rule violation ⛨ Security

Description

The workflow passes a single command string into run-bazel-with-retry.sh, which then executes it
via bash -c, enabling shell interpretation/injection and brittle argument parsing. This violates
the safe argument-handling requirement for build/CI scripts.

Code

scripts/github-actions/run-bazel-with-retry.sh[R11-22]

Evidence
PR Compliance ID 16 requires safe argument handling in build/CI scripts. The script builds CMD
from a single string argument and runs it via bash -c, which performs shell interpretation rather
than executing a safely-quoted argv vector.
scripts/github-actions/run-bazel-with-retry.sh[11-22]
.github/workflows/bazel.yml[295-305]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`scripts/github-actions/run-bazel-with-retry.sh` executes a caller-provided command string using `bash -c "$CMD"`, which relies on shell parsing and can lead to command injection or incorrect argument handling.

## Issue Context
This script is invoked from `.github/workflows/bazel.yml` with `"${{ inputs.run }}"`, so the called command is provided as a single string. Compliance requires safe argument handling for build/CI scripts.

## Fix Focus Areas
- scripts/github-actions/run-bazel-with-retry.sh[11-22]
- .github/workflows/bazel.yml[295-305]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. ~~Retries lose Bazel exit~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

If CDN errors persist through the last attempt, the script exits with code 1 instead of Bazel’s
actual exit code, contradicting the script’s stated behavior and potentially breaking downstream
error handling.

Code

scripts/github-actions/run-bazel-with-retry.sh[R43-44]

Evidence
The header comment claims the script exits with Bazel’s actual exit code, but after breaking out of
the retry loop it unconditionally exits 1, discarding the previously captured BAZEL_EXIT_CODE.
scripts/github-actions/run-bazel-with-retry.sh[2-6]
scripts/github-actions/run-bazel-with-retry.sh[19-44]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When retries are exhausted, the script unconditionally `exit 1` rather than returning the last Bazel exit code it captured. This violates the script’s own contract (“exits with Bazel's actual exit code”).

## Issue Context
The script already captures Bazel’s exit code via `BAZEL_EXIT_CODE=${PIPESTATUS[0]}`. Preserve and return that value after the retry loop when it terminates due to max attempts.

## Fix Focus Areas
- scripts/github-actions/run-bazel-with-retry.sh[19-44]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Previous review results

Review updated until commit 2ba409c

Results up to commit 61b582b

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0)

1. ~~Retry check never matches~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

run-bazel-with-retry.sh searches the log for the literal text "BAZEL_ERROR_PATTERN" (missing $),
so 502/503/504 fetch errors will never be detected and the Bazel run will not be retried.

Code

scripts/github-actions/run-bazel-with-retry.sh[R28-36]

Evidence

The script defines a regex in BAZEL_ERROR_PATTERN, but grep is invoked with the quoted literal
"BAZEL_ERROR_PATTERN" (no $), so it will only match if the log contains that exact text, not the
intended 5xx error lines.

scripts/github-actions/run-bazel-with-retry.sh[16-40]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The retry condition uses `grep -qE "BAZEL_ERROR_PATTERN"` and `grep -E "BAZEL_ERROR_PATTERN"`, which greps for the literal string `BAZEL_ERROR_PATTERN` instead of the regex stored in the `BAZEL_ERROR_PATTERN` variable. This prevents the retry loop from ever triggering on 5xx CDN errors.

## Issue Context
The script defines `BAZEL_ERROR_PATTERN='GET returned 50[234] '`, but never dereferences it when calling `grep`.

## Fix Focus Areas
- scripts/github-actions/run-bazel-with-retry.sh[16-36]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. ~~bash -c executes CMD~~ ✓ Resolved 📘 Rule violation ⛨ Security

Description

The workflow passes a single command string into run-bazel-with-retry.sh, which then executes it
via bash -c, enabling shell interpretation/injection and brittle argument parsing. This violates
the safe argument-handling requirement for build/CI scripts.

Code

scripts/github-actions/run-bazel-with-retry.sh[R11-22]

Evidence
PR Compliance ID 16 requires safe argument handling in build/CI scripts. The script builds CMD
from a single string argument and runs it via bash -c, which performs shell interpretation rather
than executing a safely-quoted argv vector.
scripts/github-actions/run-bazel-with-retry.sh[11-22]
.github/workflows/bazel.yml[295-305]
Best Practice: Learned patterns

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`scripts/github-actions/run-bazel-with-retry.sh` executes a caller-provided command string using `bash -c "$CMD"`, which relies on shell parsing and can lead to command injection or incorrect argument handling.

## Issue Context
This script is invoked from `.github/workflows/bazel.yml` with `"${{ inputs.run }}"`, so the called command is provided as a single string. Compliance requires safe argument handling for build/CI scripts.

## Fix Focus Areas
- scripts/github-actions/run-bazel-with-retry.sh[11-22]
- .github/workflows/bazel.yml[295-305]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. ~~Retries lose Bazel exit~~ ✓ Resolved 🐞 Bug ≡ Correctness

Description

If CDN errors persist through the last attempt, the script exits with code 1 instead of Bazel’s
actual exit code, contradicting the script’s stated behavior and potentially breaking downstream
error handling.

Code

scripts/github-actions/run-bazel-with-retry.sh[R43-44]

Evidence
The header comment claims the script exits with Bazel’s actual exit code, but after breaking out of
the retry loop it unconditionally exits 1, discarding the previously captured BAZEL_EXIT_CODE.
scripts/github-actions/run-bazel-with-retry.sh[2-6]
scripts/github-actions/run-bazel-with-retry.sh[19-44]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When retries are exhausted, the script unconditionally `exit 1` rather than returning the last Bazel exit code it captured. This violates the script’s own contract (“exits with Bazel's actual exit code”).

## Issue Context
The script already captures Bazel’s exit code via `BAZEL_EXIT_CODE=${PIPESTATUS[0]}`. Preserve and return that value after the retry loop when it terminates due to max attempts.

## Fix Focus Areas
- scripts/github-actions/run-bazel-with-retry.sh[19-44]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

… doesn't collide

qodo-code-review · 2026-05-25T02:57:46Z

Persistent review updated to latest commit 2ba409c

qodo-code-review · 2026-05-25T02:59:49Z

+BAZEL_ERROR_PATTERN='GET returned 50[234] '
+
+for i in $(seq 1 "$BAZEL_MAX_ATTEMPTS"); do
+  bash -c "$BAZEL_RUN_CMD" 2>&1 | tee "$LOG_FILE"


1. Tee failures ignored 🐞 Bug ☼ Reliability

run-bazel-with-retry.sh only considers Bazel’s exit code, so if tee fails to write the console log (e.g., permission/disk-full), the step can still exit 0 and silently drop build/bazel-console.log. This can hide infrastructure problems and break downstream tooling that parses that log on failures.

Agent Prompt

## Issue description `run-bazel-with-retry.sh` pipes Bazel output to `tee`, but it only uses Bazel’s exit code (`PIPESTATUS[0]`). If `tee` fails (can’t write the log), the script may still report success and/or proceed with retry logic based on incomplete logs. ## Issue Context Downstream logic expects `build/bazel-console.log` to be present and readable when diagnosing failures. ## Fix Focus Areas - scripts/github-actions/run-bazel-with-retry.sh[20-21] ### Implementation guidance After the pipeline, capture both statuses (Bazel + tee) and fail fast on `tee` errors, e.g.: - `BAZEL_EXIT_CODE=${PIPESTATUS[0]}` - `TEE_EXIT_CODE=${PIPESTATUS[1]}` - if `TEE_EXIT_CODE != 0`, print an error and `exit $TEE_EXIT_CODE` (or a fixed non-zero code).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

[ci] retry Bazel on transient GitHub CDN 5xx errors during repo fetch

61b582b

selenium-ci added the B-build Includes scripting, bazel and CI integrations label May 25, 2026

qodo-code-review Bot reviewed May 25, 2026

View reviewed changes

Comment thread scripts/github-actions/run-bazel-with-retry.sh Outdated

titusfortner mentioned this pull request May 25, 2026

[build] monitor disk and cache usage throughout a bazel job #17569

Merged

3 tasks

use ENV for the inputs.run rather than positional arg so quote syntax…

2ba409c

… doesn't collide

qodo-code-review Bot reviewed May 25, 2026

View reviewed changes

titusfortner mentioned this pull request May 25, 2026

[build] prune old codeql caches #17570

Merged

3 tasks

titusfortner marked this pull request as draft May 25, 2026 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[build] retry Bazel on transient GitHub CDN 5xx errors during repo fetch#17568

[build] retry Bazel on transient GitHub CDN 5xx errors during repo fetch#17568
titusfortner wants to merge 2 commits into
trunkfrom
bazel_retry

titusfortner commented May 25, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot commented May 25, 2026

Uh oh!

qodo-code-review Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

qodo-code-review Bot commented May 25, 2026 •

edited

Loading

Uh oh!

qodo-code-review Bot May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

titusfortner commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💥 What does this PR do?

🔧 Implementation Notes

💡 Additional Considerations

🤖 AI assistance

🔄 Types of changes

Uh oh!

qodo-code-review Bot commented May 25, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Previous review results

Uh oh!

Uh oh!

qodo-code-review Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qodo-code-review Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

titusfortner commented May 25, 2026 •

edited

Loading

qodo-code-review Bot commented May 25, 2026 •

edited

Loading

qodo-code-review Bot commented May 25, 2026 •

edited

Loading