[FIX] Remove VERSION arg from Dockerfile base stages to fix CI cache#1864
[FIX] Remove VERSION arg from Dockerfile base stages to fix CI cache#1864chandrasekharan-zipstack wants to merge 4 commits intomainfrom
Conversation
The ARG VERSION + LABEL version="${VERSION}" in the base stage caused
every Docker layer's cache key to depend on the version tag. Since the
tag changes every build (rc.245, rc.246, ...), BuildKit could never
match cached layers — resulting in 0% cache hit rate for all Python
services in CI.
The frontend (55% cached) was unaffected because its Dockerfile has no
ARG VERSION in any stage.
Removed from: backend, prompt-service, platform-service, runner,
x2text-service, worker-unified, tool-sidecar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary by CodeRabbit
WalkthroughBuild version handling is refactored across seven Dockerfiles by removing the top-level Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR fixes a 0% Docker layer cache hit rate for all 7 Python services in CI by removing Key changes:
Effect on build graph: The Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
subgraph BEFORE["BEFORE (cache broken)"]
A1["base stage\nARG VERSION=rc.245\nLABEL version='rc.245'"] -->|"VERSION changes → cache miss"| B1["ext-dependencies stage\n(rebuilt every time)"]
B1 --> C1["production stage\n(rebuilt every time)"]
C1 --> D1["ENTRYPOINT / CMD"]
end
subgraph AFTER["AFTER (cache fixed)"]
A2["base stage\nLABEL maintainer + description\n(no VERSION arg)"] -->|"stable → cache hit ✅"| B2["ext-dependencies stage\n(cached ✅)"]
B2 -->|"deps unchanged → cache hit ✅"| C2["production stage\n(app code layer)"]
C2 --> E2["ARG VERSION=rc.246\nENV UNSTRACT_APPS_VERSION\n(thin layer, rebuilt)"]
E2 --> D2["ENTRYPOINT / CMD"]
end
style BEFORE fill:#ffe0e0,stroke:#cc0000
style AFTER fill:#e0ffe0,stroke:#00aa00
style A1 fill:#ffcccc
style B1 fill:#ffcccc
style C1 fill:#ffcccc
style A2 fill:#ccffcc
style B2 fill:#ccffcc
style C2 fill:#ccffcc
style E2 fill:#fffacc
Reviews (4): Last reviewed commit: "Revert "[MISC] Trivial log message clean..." | Re-trigger Greptile |
|
Note Unit test generation is a public access feature. Expect some limitations and changes as we gather feedback and continue to improve it. Generating unit tests... This may take up to 20 minutes. |
|
✅ Created PR with unit tests: #1865 |
Fix double period in backend health check log and remove ellipsis from prompt-service init log. Used to test Docker layer cache reuse after app code changes (ext-dependencies and nuitka-compile stages should remain cached). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ARG VERSION + ENV UNSTRACT_APPS_VERSION at the end of the production stage in all Python Dockerfiles. This makes the build version available at runtime (os.environ["UNSTRACT_APPS_VERSION"]) without affecting layer caching — ARG is scoped to the production stage and placed after all expensive build steps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
| ARG VERSION=dev | ||
| ENV UNSTRACT_APPS_VERSION=${VERSION} |
There was a problem hiding this comment.
UNSTRACT_APPS_VERSION env var set but never consumed
UNSTRACT_APPS_VERSION is now baked into every image as a runtime environment variable, but a search of the entire codebase shows zero consumers of this variable — no Python code, Helm chart, or CI script reads it.
Setting an ENV here does still create a Docker layer (unlike LABEL), which means it will be rebuilt every time VERSION changes. The cache-fix goal is fully achieved by simply omitting the ARG/ENV entirely at the end. If the intent is to make the version introspectable at runtime (e.g. for a /health or /version endpoint), that's a good goal — but the consuming code hasn't been added yet.
The same applies to all seven Dockerfiles in this PR (platform.Dockerfile, prompt.Dockerfile, runner.Dockerfile, tool-sidecar.Dockerfile, worker-unified.Dockerfile, x2text.Dockerfile).
If you want to keep it for future use, consider adding a comment explaining its purpose so reviewers don't treat it as dead configuration.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docker/dockerfiles/backend.Dockerfile
Line: 82-83
Comment:
**`UNSTRACT_APPS_VERSION` env var set but never consumed**
`UNSTRACT_APPS_VERSION` is now baked into every image as a runtime environment variable, but a search of the entire codebase shows zero consumers of this variable — no Python code, Helm chart, or CI script reads it.
Setting an `ENV` here does still create a Docker layer (unlike `LABEL`), which means it will be rebuilt every time `VERSION` changes. The cache-fix goal is fully achieved by simply omitting the `ARG`/`ENV` entirely at the end. If the intent is to make the version introspectable at runtime (e.g. for a `/health` or `/version` endpoint), that's a good goal — but the consuming code hasn't been added yet.
The same applies to all seven Dockerfiles in this PR (`platform.Dockerfile`, `prompt.Dockerfile`, `runner.Dockerfile`, `tool-sidecar.Dockerfile`, `worker-unified.Dockerfile`, `x2text.Dockerfile`).
If you want to keep it for future use, consider adding a comment explaining its purpose so reviewers don't treat it as dead configuration.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
@chandrasekharan-zipstack isn't this a valid comment? Why this needs to be env? Can't we add lables at the end?
There was a problem hiding this comment.
Yes, the comment is valid. Here's the key distinction:
ENV vs LABEL at the end:
ENV UNSTRACT_APPS_VERSION=${VERSION}→ Creates a runtime environment variable inside the container. Useful if application code needs to read it (e.g.,/healthendpoint showing version)LABEL version="${VERSION}"→ Pure image metadata, doesn't affect runtime. Can be inspected withdocker inspect
Both would rebuild that final layer when VERSION changes, which is fine since all the expensive layers (dependencies, code) are already cached.
The issue: No code currently reads UNSTRACT_APPS_VERSION from the environment, so it's unused runtime pollution. If the goal is just version metadata, LABEL at the end would be cleaner:
# At the very end
ARG VERSION=dev
LABEL version="${VERSION}"If there's a plan to consume this in Python code (e.g., expose in an API), then ENV makes sense but should be documented.
There was a problem hiding this comment.
@ritwik-g the idea of adding it as an env was so that, we can later support showing the version in the UI as well by reading this. If its a label, we can't read this env
cc: @hari-kuriakose
This reverts commit c71bd40.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docker/dockerfiles/worker-unified.Dockerfile (1)
20-29: Harden apt install against transient CI network timeouts.Given the observed timeout failures, add apt retries/timeouts here to reduce flaky builds.
🔧 Suggested resilience tweak for apt network flakiness
RUN apt-get update \ - && apt-get --no-install-recommends install -y \ + && apt-get -o Acquire::Retries=5 -o Acquire::http::Timeout="30" --no-install-recommends install -y \ build-essential \ curl \ gcc \ libmagic-dev \ libssl-dev \ pkg-config \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docker/dockerfiles/worker-unified.Dockerfile` around lines 20 - 29, The apt commands in the RUN layer (the "RUN apt-get update" and "apt-get --no-install-recommends install -y" lines) are failing intermittently due to transient network timeouts; make them more resilient by adding apt retry and timeout options (for example use -o Acquire::Retries=3 and -o Acquire::http::Timeout=30/-o Acquire::https::Timeout=30) to both the update and install invocations, and consider setting DEBIAN_FRONTEND=noninteractive for the layer; update the RUN line that contains "apt-get update" and "apt-get --no-install-recommends install -y" to include these apt options and retry semantics so CI builds are less flaky.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@docker/dockerfiles/worker-unified.Dockerfile`:
- Around line 20-29: The apt commands in the RUN layer (the "RUN apt-get update"
and "apt-get --no-install-recommends install -y" lines) are failing
intermittently due to transient network timeouts; make them more resilient by
adding apt retry and timeout options (for example use -o Acquire::Retries=3 and
-o Acquire::http::Timeout=30/-o Acquire::https::Timeout=30) to both the update
and install invocations, and consider setting DEBIAN_FRONTEND=noninteractive for
the layer; update the RUN line that contains "apt-get update" and "apt-get
--no-install-recommends install -y" to include these apt options and retry
semantics so CI builds are less flaky.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2b15eda5-b782-4825-a007-49b82d869dd0
📒 Files selected for processing (7)
docker/dockerfiles/backend.Dockerfiledocker/dockerfiles/platform.Dockerfiledocker/dockerfiles/prompt.Dockerfiledocker/dockerfiles/runner.Dockerfiledocker/dockerfiles/tool-sidecar.Dockerfiledocker/dockerfiles/worker-unified.Dockerfiledocker/dockerfiles/x2text.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (4)
- docker/dockerfiles/prompt.Dockerfile
- docker/dockerfiles/tool-sidecar.Dockerfile
- docker/dockerfiles/x2text.Dockerfile
- docker/dockerfiles/platform.Dockerfile



What
Remove
ARG VERSIONandLABEL version="${VERSION}"from thebasestage of all 7 Python Dockerfiles.Why
The
ARG VERSIONused inLABEL version="${VERSION}"made every Docker layer's cache key depend on the version tag. Since the tag changes every build (rc.245, rc.246, ...), BuildKit could never match cached layers — 0% cache hit rate for all Python services in CI.The frontend (55% cached) was unaffected because its Dockerfile has no
ARG VERSIONin any build stage.Evidence from build run #23239022262:
67550562986): cache manifest loads successfully but zero layers match67550562697): cache manifest loads and multiple layers hitCACHEDHow
Removed
ARG VERSION=devand theversion="${VERSION}"from theLABELinstruction in thebasestage of all Python Dockerfiles. The version label was unused — nothing in Helm charts, CI, or Kubernetes reads it. The image tag itself (e.g.,backend:rc.246) carries the version.Affected: backend, prompt-service, platform-service, runner, x2text-service, worker-unified, tool-sidecar.
Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
No. The
versionlabel was purely decorative metadata on the Docker image. No code, Helm chart, or deployment reads it. The image tag already carries the version. Local test confirmed: building withVERSION=cache-test-1thenVERSION=cache-test-2produces 100% cache hit (0.0s build).Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
Screenshots
N/A
Checklist
I have read and understood the Contribution Guidelines.