Skip to content

exec_ssh_cmd: only retry commands that failed due to SSH errors#433

Open
petrutlucian94 wants to merge 1 commit into
cloudbase:masterfrom
petrutlucian94:avoid_unnecessary_retries
Open

exec_ssh_cmd: only retry commands that failed due to SSH errors#433
petrutlucian94 wants to merge 1 commit into
cloudbase:masterfrom
petrutlucian94:avoid_unnecessary_retries

Conversation

@petrutlucian94
Copy link
Copy Markdown
Member

@petrutlucian94 petrutlucian94 commented May 18, 2026

We used to retry all SSH commands that failed, even those that don't actually exist.

We're still retrying commands that are expected to fail, wasting time unnecessarily during os morphing:

  coriolis.exception.CoriolisException: Command "readlink -en /dev/sda2"
  failed on host '172.17.0.3:22' with exit code: 1
  2026-05-18 12:14:11.783 1667765 WARNING coriolis.osmorphing.osmount.base
  [-] Target not found for symlink: /dev/sda2. Original link path will be returned

What makes things even worse is that the caller functions are often retried as well, having the "retry_on_error" decorator without any parameters. Commands that are expected to fail end up being retried 25 times.

We'll update "exec_ssh_cmd" so that by default it will perform retries only in case of SSH errors.

For convenience, we're exposing the retry helper arguments (number of retries, retry interval and terminal exceptions). In addition to that, we'll introduce a new argument for the list of retried exceptions.

If the caller needs the actual commands to be retried as well, "SSHCommandFailed" may be added to the list of retried exceptions.

Comment thread coriolis/utils.py
LOG.exception(ex)
raise
except Exception as ex:
except retried_exceptions as ex:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to keep both terminal_exceptions and retried_exceptions for this method. We should go with one of them.
I also think that this is unnecessary. Why not just add SSHCommandFailed to terminal_exceptions for _exec_ssh_cmd?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's meant to be flexible, allowing us to set a broader class of exceptions through retried_exceptions and a few potential exception subclasses that shouldn't be retried, specified through terminal_exceptions.

There's a comment saying that:

# By default, we'll perform retires only in case of SSH failures.
#
# Add "SSHCommandFailed" to the list of retried exceptions if failed
# commands should be retried as well.

The default list of terminal exceptions ensures that we won't retry missing commands for example.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that there's a typo, fixing it :).

We used to retry all SSH commands that failed, even those that
don't actually exist.

We're still retrying commands that are expected to fail, wasting
time unnecessarily during os morphing:

  coriolis.exception.CoriolisException: Command "readlink -en /dev/sda2"
  failed on host '172.17.0.3:22' with exit code: 1
  2026-05-18 12:14:11.783 1667765 WARNING coriolis.osmorphing.osmount.base
  [-] Target not found for symlink: /dev/sda2. Original link path will be returned

What makes things even worse is that the caller functions are
often retried as well, having the "retry_on_error" decorator
without any parameters. Commands that are expected to fail end up being
retried 25 times.

We'll update "exec_ssh_cmd" so that by default it will perform
retries only in case of SSH errors.

For convenience, we're exposing the retry helper arguments (number
of retries, retry interval and terminal exceptions). In addition
to that, we'll introduce a new argument for the list of retried
exceptions.

If the caller needs the actual commands to be retried as well,
"SSHCommandFailed" may be added to the list of retried exceptions.
@petrutlucian94 petrutlucian94 force-pushed the avoid_unnecessary_retries branch from 9393d78 to 9dfe7d3 Compare May 22, 2026 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants