Added initial implementation of kosli snapshot cloud-run server/#4986#833
Added initial implementation of kosli snapshot cloud-run server/#4986#833ToreMerkely wants to merge 6 commits intomainfrom
Conversation
ToreMerkely
commented
Apr 28, 2026
- feat(snapshot): add hidden cloud-run skeleton command (#4986)
- feat(cloudrun): add internal/cloudrun package wrapping the Cloud Run Admin API (#4986)
- feat(snapshot): wire cloud-run command end-to-end via internal/cloudrun (#4986)
- feat(snapshot): add service filtering flags to cloud-run command (#4986)
- feat(cloudrun): friendly auth and not-found error messages (#4986)
- fix(cloudrun): align snapshot payload with server's snapshot-examples doc (#4986)
|
Claude finished @mbevc1's task in 5m 4s —— View job PR Review:
|
| func New(ctx context.Context) (*Client, error) { | ||
| services, err := run.NewServicesClient(ctx) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | ||
| } | ||
| revisions, err := run.NewRevisionsClient(ctx) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | ||
| } | ||
| return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil | ||
| } |
There was a problem hiding this comment.
Resource leak — GCP gRPC clients are never closed.
run.NewServicesClient and run.NewRevisionsClient open gRPC connections. The current code never calls .Close() on them. For a short-lived CLI command this isn't critical (the OS reclaims on exit), but it's best practice and avoids noisy warnings from the gRPC library.
Consider adding a Close method to Client (and to the apiClient interface or alongside it) so the caller can defer client.Close():
| func New(ctx context.Context) (*Client, error) { | |
| services, err := run.NewServicesClient(ctx) | |
| if err != nil { | |
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | |
| } | |
| revisions, err := run.NewRevisionsClient(ctx) | |
| if err != nil { | |
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | |
| } | |
| return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil | |
| } | |
| func New(ctx context.Context) (*Client, error) { | |
| services, err := run.NewServicesClient(ctx) | |
| if err != nil { | |
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | |
| } | |
| revisions, err := run.NewRevisionsClient(ctx) | |
| if err != nil { | |
| services.Close() | |
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | |
| } | |
| return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil | |
| } | |
| // Close releases the underlying gRPC connections. | |
| func (c *Client) Close() error { | |
| if g, ok := c.api.(*gcpAPI); ok { | |
| sErr := g.services.Close() | |
| rErr := g.revisions.Close() | |
| if sErr != nil { | |
| return sErr | |
| } | |
| return rErr | |
| } | |
| return nil | |
| } |
Note the second change: if NewRevisionsClient fails, the already-created services client should be closed to avoid leaking that connection too.
Then in the command: defer client.Close() after the newCloudRunClient call.
There was a problem hiding this comment.
Adding Close is a fair comment
| return cloudrun.Classify(err, o.project, o.region) | ||
| } | ||
|
|
||
| filtered := services[:0] |
There was a problem hiding this comment.
Nit: in-place slice filtering is correct but non-obvious.
filtered := services[:0] reuses the backing array of services. It's safe here because range copies each element before the append overwrites earlier positions, and services is never read again. But it's a subtle idiom — a plain var filtered []cloudrun.Service (or make([]cloudrun.Service, 0, len(services))) would be immediately clear with no mental overhead, and the allocation cost is negligible for the expected number of services.
Not a bug, just a readability suggestion.
| for _, t := range svc.GetTraffic() { | ||
| name := t.GetRevision() | ||
| if name == "" { | ||
| name = shortName(svc.GetLatestReadyRevision()) | ||
| } |
There was a problem hiding this comment.
Consider checking t.GetType() explicitly for LATEST resolution.
Currently, any traffic target with an empty Revision field falls through to LatestReadyRevision. This works because the Cloud Run API contract guarantees only LATEST targets have an empty revision — but an explicit type check would make the intent self-documenting and guard against future API changes:
name := t.GetRevision()
if t.GetType() == runpb.TrafficTargetAllocationType_TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST || name == "" {
name = shortName(svc.GetLatestReadyRevision())
}Minor defensive improvement — not a bug today.
| ListServices(ctx context.Context, project, region string) ([]cloudrun.Service, error) | ||
| } | ||
|
|
||
| var newCloudRunClient = func(ctx context.Context) (cloudRunLister, error) { |
There was a problem hiding this comment.
Package-level mutable var for test seam — works but has a known trade-off.
This pattern (save/restore in SetupTest/TearDownTest) is correct for sequential test execution, but if tests ever run in parallel (t.Parallel()), the shared mutable global would race. Since cmd/kosli tests use suite.Suite and don't parallelize, this is fine today. Just flagging for awareness — the testify suite pattern naturally serializes, so no action needed unless the test approach changes.
| func (c *Client) toService(ctx context.Context, raw *runpb.Service) (Service, error) { | ||
| svc := Service{ | ||
| Name: shortName(raw.GetName()), | ||
| URI: raw.GetUri(), | ||
| } | ||
| revNames := trafficRevisionNames(raw) | ||
| for _, revShort := range revNames { | ||
| fullName := raw.GetName() + "/revisions/" + revShort | ||
| rev, err := c.api.getRevision(ctx, fullName) | ||
| if err != nil { | ||
| return Service{}, fmt.Errorf("getting revision %s: %w", fullName, err) | ||
| } | ||
| svc.Revisions = append(svc.Revisions, toRevision(rev)) | ||
| } | ||
| return svc, nil |
There was a problem hiding this comment.
N+1 API calls per service (ListServices + GetRevision per traffic target).
For each service, the code makes one GetRevision call per traffic target. With many services or multi-revision traffic splits, this could add up. The handover doc already notes this:
"services excluded by name still cost their revision-fetch round-trips. If that becomes a bottleneck, push the filter into cloudrun.ListServices."
Good that it's tracked. For a future slice, consider batching or pushing the name filter into ListServices so excluded services skip the per-revision API calls entirely.
| type RevisionData struct { | ||
| RevisionName string `json:"revisionName"` | ||
| ServiceName string `json:"serviceName,omitempty"` | ||
| Digests map[string]string `json:"digests"` | ||
| CreatedAt int64 `json:"creationTimestamp"` | ||
| } |
There was a problem hiding this comment.
Digests will serialize as null when the revision has no containers.
If rev.Digests is nil (e.g. a revision with zero containers), the JSON field becomes "digests": null rather than "digests": {}. Depending on the server's validation, this might be rejected. The toRevision function in cloudrun.go:127 does make(map[string]string, len(rev.GetContainers())) which returns an empty (non-nil) map for zero containers, so this is fine in practice from the current code path. But if RevisionData is ever constructed directly with a nil Digests, it could surprise.
No action needed now — just noting the invariant to preserve.
Registers `kosli snapshot cloud-run ENVIRONMENT-NAME` as a hidden, in-development command for Google Cloud Run. The command takes required `--project` and `--region` flags, forces dry-run mode, and prints a placeholder line — no GCP API calls or HTTP requests yet. This is the first slice of the Cloud Run feature; subsequent slices will add the GCP client wrapper, the end-to-end happy path, filtering, auth UX, and finally unhide the command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Admin API (#4986) Slice 2 of the snapshot cloud-run feature. Adds a small, unit-tested wrapper around cloud.google.com/go/run/apiv2 that lists services in a GCP project + region and, for each service, returns one Revision per entry in the service's traffic configuration (any percent including 0%). TrafficTargetAllocationType_LATEST resolves via the service's LatestReadyRevision, and revisions referenced more than once are deduped. Digest extraction mirrors the ECS fallback in internal/aws/aws.go: use a @sha256: substring when present, else leave the digest empty rather than calling a registry. Production code reaches GCP through an unexported apiClient seam so tests substitute a fake without touching ADC. No command wiring yet; that's the next slice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…un (#4986) Slice 3 of the snapshot cloud-run feature. The command now calls cloudrun.New + ListServices, flattens the result into an EnvRequest payload, and submits a PUT to /report/cloud-run via kosliClient.Do. The PreRunE-forced dry-run keeps the request from actually leaving the client, so the (still-missing) server-side endpoint is not on the path yet. Each artifact carries the GCP project and region in addition to service_name, so revision rows are self-describing — a small extension of the EcsEnvRequest shape. The command depends on a local cloudRunLister interface and a package- level newCloudRunClient variable, letting tests swap in a stub without touching ADC; integration was sanity-checked against the real hello-world-cli-demo project and produced the expected digest-pinned artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 4 of the snapshot cloud-run feature. Adds --services, --services-regex, --exclude, and --exclude-regex, mirroring the ECS service filtering shape and reusing the existing filters.ResourceFilterOptions struct. PreRunE rejects the four include/exclude mutex pairs. Filtering is applied in the command after cloudrun.ListServices returns. Services excluded by name still incur their per-revision API round-trips; pushing the filter into the GCP wrapper to skip those calls is a tractable follow-up if the round-trip cost becomes a bottleneck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 6 of the snapshot cloud-run feature. cloudrun.Classify maps gRPC Unauthenticated / PermissionDenied / NotFound responses into actionable messages and preserves the underlying error via %w; other codes pass through. The Unauthenticated message names all three credential sources (env var, gcloud command, GCE/GKE metadata server or Workload Identity) because the production deployment is a cluster cron job, not a local-dev gcloud session. Classification lives in internal/cloudrun (the package owns GCP knowledge) but is applied at the command boundary, not inside Client.ListServices. Doing it inside the Client would double-wrap real-call errors when the command also classified them, and it would bypass the stubbed test path entirely. Calling Classify once at the command boundary covers both real and stub error sources. cloudrun.New now wraps construction errors with a generic "GCP client setup failed" prefix; the cluster case rarely fails here, and the SDK message is preserved for diagnosis. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… doc (#4986) Three changes brings the payload into line with the conventions documented in the server repo's out-snapshot-examples.txt: - Added the top-level "type": "cloud-run" literal that every other env-type report ships explicitly. Without it, the (still-to-come) CloudRunReport model on the server would only accept the URL default; with it, the request is unambiguous. - Renamed the per-artifact "service_name" to "serviceName" (camelCase) matching the doc's K8S/ECS examples. The existing CLI's ECS code uses snake_case, but a new type is better off following the doc. - Dropped the per-artifact "project" and "region" fields. The doc notes extra="forbid" on every Pydantic model, so unilaterally picking custom field names risks 422 once the server defines its CloudRunReport. Project + region are already in the URL and flags, mirroring how ECS reports don't carry account/region per artifact. Verified end-to-end against the hello-world-cli-demo project; payload now matches the doc shape exactly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
48c4c0e to
fa367b3
Compare
| func New(ctx context.Context) (*Client, error) { | ||
| services, err := run.NewServicesClient(ctx) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | ||
| } | ||
| revisions, err := run.NewRevisionsClient(ctx) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | ||
| } | ||
| return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil | ||
| } |
There was a problem hiding this comment.
Agreeing with the earlier review and @mbevc1's comment — the gRPC clients should be closed. Also, if NewRevisionsClient fails, the already-created services client leaks.
| func New(ctx context.Context) (*Client, error) { | |
| services, err := run.NewServicesClient(ctx) | |
| if err != nil { | |
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | |
| } | |
| revisions, err := run.NewRevisionsClient(ctx) | |
| if err != nil { | |
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | |
| } | |
| return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil | |
| } | |
| func New(ctx context.Context) (*Client, error) { | |
| services, err := run.NewServicesClient(ctx) | |
| if err != nil { | |
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | |
| } | |
| revisions, err := run.NewRevisionsClient(ctx) | |
| if err != nil { | |
| services.Close() | |
| return nil, fmt.Errorf("GCP client setup failed: %w", err) | |
| } | |
| return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil | |
| } | |
| // Close releases the underlying gRPC connections. | |
| func (c *Client) Close() error { | |
| if g, ok := c.api.(*gcpAPI); ok { | |
| sErr := g.services.Close() | |
| rErr := g.revisions.Close() | |
| if sErr != nil { | |
| return sErr | |
| } | |
| return rErr | |
| } | |
| return nil | |
| } |
Then in snapshotCloudRun.go after newCloudRunClient(ctx), add a defer client.Close() (which means cloudRunLister needs a Close() error method, or add the defer with a type-assert).
| ctx := context.Background() | ||
| client, err := newCloudRunClient(ctx) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| services, err := client.ListServices(ctx, o.project, o.region) | ||
| if err != nil { | ||
| return cloudrun.Classify(err, o.project, o.region) | ||
| } |
There was a problem hiding this comment.
Once Close() is added to the client, add a defer here:
client, err := newCloudRunClient(ctx)
if err != nil {
return err
}
if closer, ok := client.(io.Closer); ok {
defer closer.Close()
}Or add Close() error to cloudRunLister directly — it's a private interface, so that's fine. The stub can return nil.
| func (g *gcpAPI) listServices(ctx context.Context, project, region string) ([]*runpb.Service, error) { | ||
| parent := fmt.Sprintf("projects/%s/locations/%s", project, region) | ||
| it := g.services.ListServices(ctx, &runpb.ListServicesRequest{Parent: parent}) | ||
| var out []*runpb.Service | ||
| for { | ||
| svc, err := it.Next() | ||
| if err == iterator.Done { | ||
| return out, nil | ||
| } | ||
| if err != nil { | ||
| return nil, fmt.Errorf("listing Cloud Run services in %s: %w", parent, err) | ||
| } | ||
| out = append(out, svc) | ||
| } | ||
| } |
There was a problem hiding this comment.
Nit: unbounded growth pattern. If a project has many services, out will grow via repeated append. Since you already have len(rawServices) above, pre-allocating is fine there, but for listServices you could consider using the iterator's page size hint. Not a real issue for typical Cloud Run deployments (tens of services), just noting.
| func ToEnvRequest(services []Service) *EnvRequest { | ||
| artifacts := []*RevisionData{} | ||
| for _, svc := range services { |
There was a problem hiding this comment.
Consider what happens when services is nil vs empty:
ToEnvRequest(nil)→Artifactsis[](non-nil, due to the[]*RevisionData{}literal) — good, serializes as"artifacts": []ToEnvRequest([]Service{})→ same — good
This is correctly handled. Nice.
| func trafficRevisionNames(svc *runpb.Service) []string { | ||
| seen := map[string]struct{}{} | ||
| out := []string{} | ||
| for _, t := range svc.GetTraffic() { | ||
| name := t.GetRevision() | ||
| if name == "" { | ||
| name = shortName(svc.GetLatestReadyRevision()) | ||
| } | ||
| if name == "" { | ||
| continue | ||
| } | ||
| if _, dup := seen[name]; dup { | ||
| continue | ||
| } | ||
| seen[name] = struct{}{} | ||
| out = append(out, name) | ||
| } | ||
| return out | ||
| } |
There was a problem hiding this comment.
The dedup logic is clean, but as the earlier review noted, an explicit t.GetType() == LATEST check would make the fallback to LatestReadyRevision self-documenting rather than relying on the empty-revision contract. Low priority since the API contract is stable, but worth considering.
Also: if LatestReadyRevision is itself empty (e.g., a brand-new service that hasn't had a successful deployment), the name == "" check on line 114 correctly skips it — good edge case handling.