Conversation
aarongable
left a comment
There was a problem hiding this comment.
I have a few minor comments about the code, but nothing big.
My major concern is that this simply doesn't feel like a prober. It's doing so much work, and has so many opportunities for failure, all of which get wrapped up in big accumulated errors. In my mind, the ideal prober is short, simple, only emits metrics (not meaningful logs), and we have alerts for every single metric it exposes. I understand that it is useful for this to live within boulder-observer, so we don't have to deploy and monitor Yet Another Component, but I'm registering my moderate discomfort with the complexity here.
beautifulentropy
left a comment
There was a problem hiding this comment.
Overall, the structure of this change looks good to me. I do have a few comments, mostly concerning some places where we could panic. I also noticed there isn't a corresponding update to cmd/boulder-observer/README.md.
| } | ||
|
|
||
| if resp.StatusCode != http.StatusOK { | ||
| return nil, fmt.Errorf("http status %d for %q: %s", resp.StatusCode, url, string(body[:400])) |
There was a problem hiding this comment.
body[:400] could possibly panic slice bounds out of range for shorter error messages.
| return nil, err | ||
| } | ||
|
|
||
| pemIndex := slices.Index(header, "X.509 Certificate (PEM)") |
There was a problem hiding this comment.
The comment for slices.Index() says:
Index returns the index of the first occurrence of v in s, or -1 if not present.
We should guard against a missing column name here by checking for -1. It would also be useful for the error to mention something about contents being malformed.
| ownerIndex := slices.Index(header, "CA Owner") | ||
| crlIndex := slices.Index(header, "JSON Array of Partitioned CRLs") | ||
| skidIndex := slices.Index(header, "Subject Key Identifier") | ||
| certificateNameIndex := slices.Index(header, "Certificate Name") |
There was a problem hiding this comment.
Similar issue with index possibly being -1 here as well.
| crlAgeLimit: ageLimitDuration, | ||
| crlRegexp: re, | ||
|
|
||
| crlsSuccess: crlsSuccess, |
There was a problem hiding this comment.
I see this being initialized but I don't see it being used.
| body, err = getBody(ctx, url) | ||
| if err == nil { | ||
| return body, nil | ||
| } | ||
| time.Sleep(time.Duration(backoff) * time.Millisecond) | ||
| } | ||
| return nil, err |
There was a problem hiding this comment.
This appears to sleep even if the context is done/canceled. We could add a small check and an early return to avoid that:
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
return nil, err
}
Prior work: letsencrypt/crl-monitor#88
Fixes #8618