Skip to content

[SPARK-56864][INFRA][PYTHON] Consolidate python-ps-minimum image into python-minimum#55872

Open
zhengruifeng wants to merge 2 commits into
apache:masterfrom
zhengruifeng:remove-python-ps-minimum
Open

[SPARK-56864][INFRA][PYTHON] Consolidate python-ps-minimum image into python-minimum#55872
zhengruifeng wants to merge 2 commits into
apache:masterfrom
zhengruifeng:remove-python-ps-minimum

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 14, 2026

What changes were proposed in this pull request?

This PR consolidates the python-ps-minimum Docker image and its CI workflow into the existing python-minimum image, eliminating a near-duplicate.

Specifically:

  • Updates the label on dev/spark-test-image/python-minimum/Dockerfile to cover both PySpark and Pandas API on Spark.
  • Deletes dev/spark-test-image/python-ps-minimum/Dockerfile.
  • Deletes .github/workflows/build_python_ps_minimum.yml.
  • Adds "pyspark-pandas": "true" to .github/workflows/build_python_minimum.yml so Pandas API on Spark minimum-deps coverage is preserved.
  • Drops the python-ps-minimum entries from .github/workflows/build_infra_images_cache.yml (the paths trigger and the build/push step).
  • Removes the build_python_ps_minimum.yml badge from README.md.

Why are the changes needed?

To save CI resources. The two Dockerfiles were nearly identical. The only functional differences were in BASIC_PIP_PKGS:

Package python-minimum python-ps-minimum
numpy pinned ==1.22.4 unpinned
scikit-learn included omitted

Everything else (base image, apt packages, Python version, venv setup, CONNECT_PIP_PKGS) was the same. Maintaining both images doubles the image build/cache cost and runs a duplicate scheduled workflow without commensurate test value. Reusing python-minimum (which has the stricter pin and a superset of packages) for the Pandas API on Spark minimum-deps job keeps coverage while halving the image footprint and the associated CI runtime.

Does this PR introduce any user-facing change?

No. CI-only change.

How was this patch tested?

Existing CI. The merged build_python_minimum.yml now runs both pyspark and pyspark-pandas jobs against the python-minimum image.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

Generated-by: Claude Code (model: claude-opus-4-7)
@zhengruifeng zhengruifeng changed the title [INFRA] Consolidate python-ps-minimum image into python-minimum [SPARK-56864][INFRA][PYTHON] Consolidate python-ps-minimum image into python-minimum May 14, 2026
@zhengruifeng zhengruifeng marked this pull request as ready for review May 14, 2026 10:36
Generated-by: Claude Code (model: claude-opus-4-7)
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

cc @peter-toth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants