exec_ssh_cmd: only retry commands that failed due to SSH errors#433
exec_ssh_cmd: only retry commands that failed due to SSH errors#433petrutlucian94 wants to merge 1 commit into
Conversation
| LOG.exception(ex) | ||
| raise | ||
| except Exception as ex: | ||
| except retried_exceptions as ex: |
There was a problem hiding this comment.
I don't think it makes sense to keep both terminal_exceptions and retried_exceptions for this method. We should go with one of them.
I also think that this is unnecessary. Why not just add SSHCommandFailed to terminal_exceptions for _exec_ssh_cmd?
There was a problem hiding this comment.
It's meant to be flexible, allowing us to set a broader class of exceptions through retried_exceptions and a few potential exception subclasses that shouldn't be retried, specified through terminal_exceptions.
There's a comment saying that:
# By default, we'll perform retires only in case of SSH failures.
#
# Add "SSHCommandFailed" to the list of retried exceptions if failed
# commands should be retried as well.
The default list of terminal exceptions ensures that we won't retry missing commands for example.
There was a problem hiding this comment.
Just realized that there's a typo, fixing it :).
We used to retry all SSH commands that failed, even those that don't actually exist. We're still retrying commands that are expected to fail, wasting time unnecessarily during os morphing: coriolis.exception.CoriolisException: Command "readlink -en /dev/sda2" failed on host '172.17.0.3:22' with exit code: 1 2026-05-18 12:14:11.783 1667765 WARNING coriolis.osmorphing.osmount.base [-] Target not found for symlink: /dev/sda2. Original link path will be returned What makes things even worse is that the caller functions are often retried as well, having the "retry_on_error" decorator without any parameters. Commands that are expected to fail end up being retried 25 times. We'll update "exec_ssh_cmd" so that by default it will perform retries only in case of SSH errors. For convenience, we're exposing the retry helper arguments (number of retries, retry interval and terminal exceptions). In addition to that, we'll introduce a new argument for the list of retried exceptions. If the caller needs the actual commands to be retried as well, "SSHCommandFailed" may be added to the list of retried exceptions.
9393d78 to
9dfe7d3
Compare
We used to retry all SSH commands that failed, even those that don't actually exist.
We're still retrying commands that are expected to fail, wasting time unnecessarily during os morphing:
What makes things even worse is that the caller functions are often retried as well, having the "retry_on_error" decorator without any parameters. Commands that are expected to fail end up being retried 25 times.
We'll update "exec_ssh_cmd" so that by default it will perform retries only in case of SSH errors.
For convenience, we're exposing the retry helper arguments (number of retries, retry interval and terminal exceptions). In addition to that, we'll introduce a new argument for the list of retried exceptions.
If the caller needs the actual commands to be retried as well, "SSHCommandFailed" may be added to the list of retried exceptions.