fix: Handle Redis connection errors gracefully instead of panicking#111
Open
fix: Handle Redis connection errors gracefully instead of panicking#111
Conversation
When Redis goes down (e.g. master failover), a broken pipe error would cause `get_queued_query_count` to panic due to an `.unwrap()`, which then poisoned the RwLock used in metrics callbacks, cascading into further panics and leaving pods unresponsive until the liveness probe killed them. - Replace `.unwrap()` with proper error propagation in `get_queued_query_count` - Handle poisoned locks, closed channels, and panicked threads gracefully in both metrics callbacks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NickLarsenNZ
approved these changes
Apr 17, 2026
| .scard::<_, Option<u64>>(queued_query_set_name(cluster_group)) | ||
| .await | ||
| .unwrap() | ||
| .context(GetQueuedQueryCountSnafu { cluster_group })? |
| match receiver.blocking_recv() { | ||
| Some(v) => Some(v), | ||
| None => { | ||
| error!("cluster_counts_per_state metrics channel closed"); |
Member
There was a problem hiding this comment.
Some of the strings could be interpolated (so if the metric name changes, then so does the error). Not sure how feasible that is, so totally optional right now.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Maybe fixes #109
When Redis goes down (e.g. master failover), a broken pipe error would cause
get_queued_query_countto panic due to an.unwrap(), which then poisoned the RwLock used in metrics callbacks, cascading into further panics and leaving pods unresponsive until the liveness probe killed them..unwrap()with proper error propagation inget_queued_query_count