Suggest ssh-keygen -R on SSH connect host key mismatch#5645
Open
anton-107 wants to merge 1 commit into
Open
Conversation
When a Databricks compute is recreated it keeps the same deterministic SSH connection name but gets a new host key, so the stale known_hosts entry trips OpenSSH's strict checking and `ssh connect` exits 255 with "Host key verification failed." Until now that landed in the generic "container is likely missing an OpenSSH server" branch, which is misleading and offers no fix. Tee ssh's stderr through a bounded tail buffer so we can detect the host-key failure after exit, and when it occurs print an actionable hint telling the user to run `ssh-keygen -R <host>` (with `-f <file>` when --user-known-hosts-file is set) and reconnect. Co-authored-by: Isaac
Collaborator
Integration test reportCommit: 748fc00
22 interesting tests: 13 SKIP, 7 KNOWN, 2 flaky
Top 20 slowest tests (at least 2 minutes):
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
When a Databricks compute is recreated it keeps the same deterministic SSH connection name (e.g.
databricks-cpu-<hash>) but gets a new host key. The staleknown_hostsentry then trips OpenSSH's strict checking anddatabricks ssh connectexits 255 withHost key verification failed.Previously this landed in the generic "the cluster's container image is likely missing an OpenSSH server" branch, which is misleading for this case and gives the user no way forward:
Now the CLI recognizes the host-key failure and prints an actionable suggestion instead:
When
--user-known-hosts-fileis set, the suggested command appends-f <file>(sincessh-keygen -Rdefaults to~/.ssh/known_hosts).Implementation
spawnSSHClientpreviously piped ssh's stderr straight toos.Stderr, so the CLI never saw the failure message. It now tees stderr through a small bounded tail buffer (tailWriter, capped at 4 KB) so the user still sees the live output while the CLI retains the tail to inspect after exit. On exit code 255, a new first branch (hostKeyChangedHint) checks for OpenSSH's fixedHost key verification failedmessage and emits the hint; the existing server-logs and missing-sshd branches are unchanged for genuine connection drops.Tests
Added unit tests in
client_internal_test.go:TestHostKeyChangedHint— host-key failure (with and without a custom known_hosts file) and an unrelated-failure case.TestTailWriterRetainsTail— tail retention and short-write passthrough.Also manually verified end-to-end against a live serverless connection: corrupted the recorded host key in an isolated known_hosts file and confirmed the new suggestion appears, then that running the suggested
ssh-keygen -Rresolves it.This pull request and its description were written by Isaac.