Skip to content

Added initial implementation of kosli snapshot cloud-run server/#4986#833

Open
ToreMerkely wants to merge 6 commits intomainfrom
4986-google-cloud-run-1
Open

Added initial implementation of kosli snapshot cloud-run server/#4986#833
ToreMerkely wants to merge 6 commits intomainfrom
4986-google-cloud-run-1

Conversation

@ToreMerkely
Copy link
Copy Markdown
Contributor

  • feat(snapshot): add hidden cloud-run skeleton command (#4986)
  • feat(cloudrun): add internal/cloudrun package wrapping the Cloud Run Admin API (#4986)
  • feat(snapshot): wire cloud-run command end-to-end via internal/cloudrun (#4986)
  • feat(snapshot): add service filtering flags to cloud-run command (#4986)
  • feat(cloudrun): friendly auth and not-found error messages (#4986)
  • fix(cloudrun): align snapshot payload with server's snapshot-examples doc (#4986)

@ToreMerkely ToreMerkely enabled auto-merge (squash) April 28, 2026 12:41
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 28, 2026

Claude finished @mbevc1's task in 5m 4s —— View job


PR Review: kosli snapshot cloud-run

  • Phase 1: Static analysis (go build, go vet, go mod tidy)
  • Phase 2: Read and review all changed files
  • Phase 3: Post inline comments and summary

Static Analysis

All three checks pass cleanly: go build ./..., go vet ./..., and go mod tidy with no drift in go.mod/go.sum.


Summary

This is a well-structured PR that adds kosli snapshot cloud-run as a hidden, dry-run-only command across 6 incremental slices. The code closely follows the established ECS snapshot pattern, which makes it easy to review and maintain.

What needs attention (1 actionable item):

  1. gRPC client resource leakrun.NewServicesClient and run.NewRevisionsClient open gRPC connections that are never closed. If NewRevisionsClient fails, the already-created services client also leaks. Add a Close() method to Client and defer client.Close() in the command. See inline comment for the suggested fix. Both @mbevc1 and the prior automated review flagged this — it's the one thing I'd address before merging.

Minor observations (no action required):

  • The trafficRevisionNames fallback to LatestReadyRevision when Revision is empty works correctly per the API contract, but an explicit GetType() == LATEST check would make intent clearer. See inline.
  • The filtered := services[:0] slice reuse (line 98 of snapshotCloudRun.go) is correct but a subtle idiom — the prior review already flagged this as a readability nit.
  • N+1 API calls per service (one GetRevision per traffic target) is a known trade-off documented in the handover doc — fine for now.

What's done well:

  • Clean vertical slicing — each of the 6 commits is independently coherent and reviewable. The handover doc tracks decisions and rationale clearly.
  • Thorough test coverage — 9 unit tests in internal/cloudrun covering traffic splits, dedup, LATEST resolution, multi-container, tag-pinned images, and zero-traffic edge cases. 13 command-level tests covering arg validation, flag mutual exclusion, filtering (include/exclude by name and regex), and auth error UX.
  • Good use of existing patternsfilters.ResourceFilterOptions, cmdTestCase table tests, cloudRunLister interface seam for testability, and Classify for friendly error messages all follow established codebase conventions.
  • Defensive design — hidden + forced dry-run until the feature is complete, EnvRequest.Artifacts initialized as non-nil empty slice, Classify preserves the original error via %w for unwrapping.
  • Payload alignment — the out-snapshot-examples.txt reference was followed carefully, and the Slice 3.5 correction (dropping project/region per artifact to avoid extra="forbid" rejection) shows good attention to the server contract.

Overall this is solid work. The gRPC close is the only must-fix; everything else is minor or already tracked for future slices.

Comment on lines +53 to +63
func New(ctx context.Context) (*Client, error) {
services, err := run.NewServicesClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
revisions, err := run.NewRevisionsClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak — GCP gRPC clients are never closed.

run.NewServicesClient and run.NewRevisionsClient open gRPC connections. The current code never calls .Close() on them. For a short-lived CLI command this isn't critical (the OS reclaims on exit), but it's best practice and avoids noisy warnings from the gRPC library.

Consider adding a Close method to Client (and to the apiClient interface or alongside it) so the caller can defer client.Close():

Suggested change
func New(ctx context.Context) (*Client, error) {
services, err := run.NewServicesClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
revisions, err := run.NewRevisionsClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil
}
func New(ctx context.Context) (*Client, error) {
services, err := run.NewServicesClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
revisions, err := run.NewRevisionsClient(ctx)
if err != nil {
services.Close()
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil
}
// Close releases the underlying gRPC connections.
func (c *Client) Close() error {
if g, ok := c.api.(*gcpAPI); ok {
sErr := g.services.Close()
rErr := g.revisions.Close()
if sErr != nil {
return sErr
}
return rErr
}
return nil
}

Note the second change: if NewRevisionsClient fails, the already-created services client should be closed to avoid leaking that connection too.

Then in the command: defer client.Close() after the newCloudRunClient call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding Close is a fair comment

return cloudrun.Classify(err, o.project, o.region)
}

filtered := services[:0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in-place slice filtering is correct but non-obvious.

filtered := services[:0] reuses the backing array of services. It's safe here because range copies each element before the append overwrites earlier positions, and services is never read again. But it's a subtle idiom — a plain var filtered []cloudrun.Service (or make([]cloudrun.Service, 0, len(services))) would be immediately clear with no mental overhead, and the allocation cost is negligible for the expected number of services.

Not a bug, just a readability suggestion.

Comment on lines +109 to +113
for _, t := range svc.GetTraffic() {
name := t.GetRevision()
if name == "" {
name = shortName(svc.GetLatestReadyRevision())
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider checking t.GetType() explicitly for LATEST resolution.

Currently, any traffic target with an empty Revision field falls through to LatestReadyRevision. This works because the Cloud Run API contract guarantees only LATEST targets have an empty revision — but an explicit type check would make the intent self-documenting and guard against future API changes:

name := t.GetRevision()
if t.GetType() == runpb.TrafficTargetAllocationType_TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST || name == "" {
    name = shortName(svc.GetLatestReadyRevision())
}

Minor defensive improvement — not a bug today.

ListServices(ctx context.Context, project, region string) ([]cloudrun.Service, error)
}

var newCloudRunClient = func(ctx context.Context) (cloudRunLister, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package-level mutable var for test seam — works but has a known trade-off.

This pattern (save/restore in SetupTest/TearDownTest) is correct for sequential test execution, but if tests ever run in parallel (t.Parallel()), the shared mutable global would race. Since cmd/kosli tests use suite.Suite and don't parallelize, this is fine today. Just flagging for awareness — the testify suite pattern naturally serializes, so no action needed unless the test approach changes.

Comment on lines +85 to +99
func (c *Client) toService(ctx context.Context, raw *runpb.Service) (Service, error) {
svc := Service{
Name: shortName(raw.GetName()),
URI: raw.GetUri(),
}
revNames := trafficRevisionNames(raw)
for _, revShort := range revNames {
fullName := raw.GetName() + "/revisions/" + revShort
rev, err := c.api.getRevision(ctx, fullName)
if err != nil {
return Service{}, fmt.Errorf("getting revision %s: %w", fullName, err)
}
svc.Revisions = append(svc.Revisions, toRevision(rev))
}
return svc, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N+1 API calls per service (ListServices + GetRevision per traffic target).

For each service, the code makes one GetRevision call per traffic target. With many services or multi-revision traffic splits, this could add up. The handover doc already notes this:

"services excluded by name still cost their revision-fetch round-trips. If that becomes a bottleneck, push the filter into cloudrun.ListServices."

Good that it's tracked. For a future slice, consider batching or pushing the name filter into ListServices so excluded services skip the per-revision API calls entirely.

Comment on lines +18 to +23
type RevisionData struct {
RevisionName string `json:"revisionName"`
ServiceName string `json:"serviceName,omitempty"`
Digests map[string]string `json:"digests"`
CreatedAt int64 `json:"creationTimestamp"`
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digests will serialize as null when the revision has no containers.

If rev.Digests is nil (e.g. a revision with zero containers), the JSON field becomes "digests": null rather than "digests": {}. Depending on the server's validation, this might be rejected. The toRevision function in cloudrun.go:127 does make(map[string]string, len(rev.GetContainers())) which returns an empty (non-nil) map for zero containers, so this is fine in practice from the current code path. But if RevisionData is ever constructed directly with a nil Digests, it could surprise.

No action needed now — just noting the invariant to preserve.

ToreMerkely and others added 6 commits April 28, 2026 18:25
Registers `kosli snapshot cloud-run ENVIRONMENT-NAME` as a hidden,
in-development command for Google Cloud Run. The command takes
required `--project` and `--region` flags, forces dry-run mode, and
prints a placeholder line — no GCP API calls or HTTP requests yet.
This is the first slice of the Cloud Run feature; subsequent slices
will add the GCP client wrapper, the end-to-end happy path, filtering,
auth UX, and finally unhide the command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Admin API (#4986)

Slice 2 of the snapshot cloud-run feature. Adds a small, unit-tested
wrapper around cloud.google.com/go/run/apiv2 that lists services in a
GCP project + region and, for each service, returns one Revision per
entry in the service's traffic configuration (any percent including
0%). TrafficTargetAllocationType_LATEST resolves via the service's
LatestReadyRevision, and revisions referenced more than once are
deduped. Digest extraction mirrors the ECS fallback in
internal/aws/aws.go: use a @sha256: substring when present, else leave
the digest empty rather than calling a registry. Production code
reaches GCP through an unexported apiClient seam so tests substitute a
fake without touching ADC.

No command wiring yet; that's the next slice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…un (#4986)

Slice 3 of the snapshot cloud-run feature. The command now calls
cloudrun.New + ListServices, flattens the result into an EnvRequest
payload, and submits a PUT to /report/cloud-run via kosliClient.Do.
The PreRunE-forced dry-run keeps the request from actually leaving the
client, so the (still-missing) server-side endpoint is not on the path
yet. Each artifact carries the GCP project and region in addition to
service_name, so revision rows are self-describing — a small extension
of the EcsEnvRequest shape.

The command depends on a local cloudRunLister interface and a package-
level newCloudRunClient variable, letting tests swap in a stub without
touching ADC; integration was sanity-checked against the real
hello-world-cli-demo project and produced the expected
digest-pinned artifact.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 4 of the snapshot cloud-run feature. Adds --services,
--services-regex, --exclude, and --exclude-regex, mirroring the ECS
service filtering shape and reusing the existing
filters.ResourceFilterOptions struct. PreRunE rejects the four
include/exclude mutex pairs.

Filtering is applied in the command after cloudrun.ListServices
returns. Services excluded by name still incur their per-revision
API round-trips; pushing the filter into the GCP wrapper to skip
those calls is a tractable follow-up if the round-trip cost becomes
a bottleneck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slice 6 of the snapshot cloud-run feature. cloudrun.Classify maps
gRPC Unauthenticated / PermissionDenied / NotFound responses into
actionable messages and preserves the underlying error via %w; other
codes pass through. The Unauthenticated message names all three
credential sources (env var, gcloud command, GCE/GKE metadata server
or Workload Identity) because the production deployment is a cluster
cron job, not a local-dev gcloud session.

Classification lives in internal/cloudrun (the package owns GCP
knowledge) but is applied at the command boundary, not inside
Client.ListServices. Doing it inside the Client would double-wrap
real-call errors when the command also classified them, and it would
bypass the stubbed test path entirely. Calling Classify once at the
command boundary covers both real and stub error sources.

cloudrun.New now wraps construction errors with a generic "GCP client
setup failed" prefix; the cluster case rarely fails here, and the SDK
message is preserved for diagnosis.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… doc (#4986)

Three changes brings the payload into line with the conventions
documented in the server repo's out-snapshot-examples.txt:

- Added the top-level "type": "cloud-run" literal that every other
  env-type report ships explicitly. Without it, the (still-to-come)
  CloudRunReport model on the server would only accept the URL
  default; with it, the request is unambiguous.
- Renamed the per-artifact "service_name" to "serviceName" (camelCase)
  matching the doc's K8S/ECS examples. The existing CLI's ECS code
  uses snake_case, but a new type is better off following the doc.
- Dropped the per-artifact "project" and "region" fields. The doc
  notes extra="forbid" on every Pydantic model, so unilaterally
  picking custom field names risks 422 once the server defines its
  CloudRunReport. Project + region are already in the URL and flags,
  mirroring how ECS reports don't carry account/region per artifact.

Verified end-to-end against the hello-world-cli-demo project; payload
now matches the doc shape exactly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mbevc1 mbevc1 force-pushed the 4986-google-cloud-run-1 branch from 48c4c0e to fa367b3 Compare April 28, 2026 17:25
Comment on lines +53 to +63
func New(ctx context.Context) (*Client, error) {
services, err := run.NewServicesClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
revisions, err := run.NewRevisionsClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreeing with the earlier review and @mbevc1's comment — the gRPC clients should be closed. Also, if NewRevisionsClient fails, the already-created services client leaks.

Suggested change
func New(ctx context.Context) (*Client, error) {
services, err := run.NewServicesClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
revisions, err := run.NewRevisionsClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil
}
func New(ctx context.Context) (*Client, error) {
services, err := run.NewServicesClient(ctx)
if err != nil {
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
revisions, err := run.NewRevisionsClient(ctx)
if err != nil {
services.Close()
return nil, fmt.Errorf("GCP client setup failed: %w", err)
}
return &Client{api: &gcpAPI{services: services, revisions: revisions}}, nil
}
// Close releases the underlying gRPC connections.
func (c *Client) Close() error {
if g, ok := c.api.(*gcpAPI); ok {
sErr := g.services.Close()
rErr := g.revisions.Close()
if sErr != nil {
return sErr
}
return rErr
}
return nil
}

Then in snapshotCloudRun.go after newCloudRunClient(ctx), add a defer client.Close() (which means cloudRunLister needs a Close() error method, or add the defer with a type-assert).

Comment on lines +88 to +96
ctx := context.Background()
client, err := newCloudRunClient(ctx)
if err != nil {
return err
}
services, err := client.ListServices(ctx, o.project, o.region)
if err != nil {
return cloudrun.Classify(err, o.project, o.region)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once Close() is added to the client, add a defer here:

client, err := newCloudRunClient(ctx)
if err != nil {
    return err
}
if closer, ok := client.(io.Closer); ok {
    defer closer.Close()
}

Or add Close() error to cloudRunLister directly — it's a private interface, so that's fine. The stub can return nil.

Comment on lines +170 to +184
func (g *gcpAPI) listServices(ctx context.Context, project, region string) ([]*runpb.Service, error) {
parent := fmt.Sprintf("projects/%s/locations/%s", project, region)
it := g.services.ListServices(ctx, &runpb.ListServicesRequest{Parent: parent})
var out []*runpb.Service
for {
svc, err := it.Next()
if err == iterator.Done {
return out, nil
}
if err != nil {
return nil, fmt.Errorf("listing Cloud Run services in %s: %w", parent, err)
}
out = append(out, svc)
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unbounded growth pattern. If a project has many services, out will grow via repeated append. Since you already have len(rawServices) above, pre-allocating is fine there, but for listServices you could consider using the iterator's page size hint. Not a real issue for typical Cloud Run deployments (tens of services), just noting.

Comment on lines +28 to +30
func ToEnvRequest(services []Service) *EnvRequest {
artifacts := []*RevisionData{}
for _, svc := range services {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider what happens when services is nil vs empty:

  • ToEnvRequest(nil)Artifacts is [] (non-nil, due to the []*RevisionData{} literal) — good, serializes as "artifacts": []
  • ToEnvRequest([]Service{}) → same — good

This is correctly handled. Nice.

Comment on lines +106 to +124
func trafficRevisionNames(svc *runpb.Service) []string {
seen := map[string]struct{}{}
out := []string{}
for _, t := range svc.GetTraffic() {
name := t.GetRevision()
if name == "" {
name = shortName(svc.GetLatestReadyRevision())
}
if name == "" {
continue
}
if _, dup := seen[name]; dup {
continue
}
seen[name] = struct{}{}
out = append(out, name)
}
return out
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dedup logic is clean, but as the earlier review noted, an explicit t.GetType() == LATEST check would make the fallback to LatestReadyRevision self-documenting rather than relying on the empty-revision contract. Low priority since the API contract is stable, but worth considering.

Also: if LatestReadyRevision is itself empty (e.g., a brand-new service that hasn't had a successful deployment), the name == "" check on line 114 correctly skips it — good edge case handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants