Skip to content

Update Protobuf Bazel Workspace#7094

Open
psamanoelton wants to merge 22 commits intotensorflow:masterfrom
psamanoelton:protobuf
Open

Update Protobuf Bazel Workspace#7094
psamanoelton wants to merge 22 commits intotensorflow:masterfrom
psamanoelton:protobuf

Conversation

@psamanoelton
Copy link
Copy Markdown

TensorBoard 2.21: Bazel 7.7.0 + protobuf 6.31.1 upgrade

Summary

This PR upgrades the local TensorBoard 2.21 branch to:

  • Bazel 7.7.0
  • protobuf 6.31.1
  • gRPC 1.74.0

The goal is to align TensorBoard's local/source build more closely with TensorFlow 2.21 while preserving:

  • local Bazel repo loading
  • query / analysis
  • wheel build
  • installed wheel runtime
  • full local bazel test //tensorboard/...

What changed

1. Bazel 7.7.0 migration

  • Updated .bazelversion to 7.7.0.
  • Updated WORKSPACE Bazel version checks from 6.x to 7.7.0+.
  • Added Bazel 7 compatibility settings in .bazelrc:
    • --noenable_bzlmod
    • platform-specific C++17 flags
    • test-only --incompatible_default_to_explicit_init_py
    • test env cleanup for Python/runfiles leakage
  • Added .bazelignore entries to avoid Bazel traversing local cache directories.

2. protobuf 6.31.1 alignment

3. WORKSPACE / external repo compatibility

4. rules_closure / Soy / Java compatibility

5. TensorBoard compat / Bazel Python behavior

6. Test and runtime environment hardening

7. Plugin summary import fixes

8. Miscellaneous compatibility fixes

Validation

Validated locally in Docker against a fresh clone of the protobuf branch.

Commands used:

pip install "tensorflow==2.21.0"
pip install -r ./tensorboard/pip_package/requirements.txt \
            -r ./tensorboard/pip_package/requirements_dev.txt
pip install --force-reinstall "urllib3==1.26.20" "six>=1.12,<2"

bazel build //tensorboard/pip_package:build_pip_package && \
    mkdir -p /tmp/tb_wheel && \
    ./bazel-bin/tensorboard/pip_package/build_pip_package /tmp/tb_wheel && \
    pip install /tmp/tb_wheel/tensorboard-*.whl
bazel run //tensorboard/pip_package:test_pip_package -- \
  --tf-version "tensorflow==2.21"
bazel run //tensorboard/data/server:update_protos
bazel test //tensorboard/... --test_output=errors --keep_going --cache_test_results=no

Additional manual runtime check:

from tensorboard.summary.writer.event_file_writer import EventFileWriter
from tensorboard.compat.proto.event_pb2 import Event
from tensorboard.compat.proto.summary_pb2 import Summary
import time

writer = EventFileWriter("logs/test_run")
for step in range(5):
    summary = Summary(value=[Summary.Value(tag="loss", simple_value=step * 0.5)])
    event = Event(wall_time=time.time(), step=step, summary=summary)
    writer.add_event(event)
writer.close()

Important notes for reviewers

Why test --incompatible_default_to_explicit_init_py is test-only

Using this flag globally fixed Bazel test runfiles but broke wheel packaging by changing the package tree copied into the pip wheel build. Scoping it to test keeps:

  • Bazel Python tests green
  • pip package build and test_pip_package green

Why third_party/safe_html_types is vendored

This is the largest part of the diff. It is included because the older Closure/Soy stack used by TensorBoard expects safe-html-types classes that remain compatible with protobuf-java 6.x. Without this vendored copy, the Bazel 7 / protobuf 6 migration fails later in Java/Soy tooling rather than in TensorBoard code directly.

Follow-up ideas

These are not required for this PR, but may be worth considering later:

  • Revisit whether pip runtime protobuf should be pinned exactly to 6.31.1 instead of allowing newer 6.x/7.x.
  • pip install --force-reinstall "urllib3==1.26.20" "six>=1.12,<2" (71 tests fails if not running this pip install).
  • Consider refactoring some of the expanded WORKSPACE compatibility wiring into more focused third_party helpers.
  • Revisit whether some subprocess environment cleanup in tensorboard/manager.py can be reduced now that test-only Bazel init-py behavior is explicit.

Final status

  • Repo loading: passing
  • bazel query //...: passing
  • Focused build: passing
  • Wheel build: passing
  • Installed wheel smoke test: passing
  • update_protos_test: passing
  • Full bazel test //tensorboard/...: passing

Copy link
Copy Markdown
Member

@arcra arcra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I'd like some optional and somewhat independent changes to be done in separate PRs, ideally.

Also, not sure what issues you were finding with the urllib and six packages, but there were a couple related PRs: #7016 and #7013 (we vendor the urllib3 package, IIRC, and according to that second PR, the six library shouldn't be necessary anymore).

Comment thread .github/workflows/ci.yml
sudo apt-get update
sudo apt-get install -y libgbm-dev libxss1 libasound2
sudo apt-get install -y "${py_abi}-dev" libgbm-dev libxss1 libasound2
# Bazel's system_python repository resolves headers relative to the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a problem that started from upgrading to bazel version 7.7? Or from running in our self-hosted container? Or with the new protobuf version?

Perhaps this comment explains it... although it's not super clear to me either. Or maybe an incompatibility of the setup-python action in a self-hosted runner, as described here.

Generally, this seems hacky for something that I expect should be easily supported (setting up python).

Having said this, I think the environment is fairly complex, so if this works, I guess I'm ok with it, but it will just be complex to maintain. If we could make it work in a simpler way, that would be my preference.

Perhaps we can try either:

  • Using the env var mentioned in the "advance usage" documentation of the setup-python action (linked above)
  • Using a newer setup-python that might handle this better
  • Remove the setup-python action, and just installing the specific python version via shell commands directly in the container.

We can leave this as a follow-up, but if possible, I'd like this change to be submitted as a standalone PR.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is primarily a CI environment issue:

  • Bazel's system_python repo resolves headers relative to the interpreter exposed by setup-python
  • In the self-hosted container, that include path does not already contain Python.h
  • The fix is needed for Bazel repo loading/builds in that environment

This is not caused by protobuf logic directly.

It is hacky and reasonable to split later with:

  • newer setup-python
  • shell-installed Python in container
  • alternate Bazel python toolchain wiring

Comment thread .github/workflows/ci.yml
# in our WORKSPACE file.
BAZEL_VERSION: '6.5.0'
BAZEL_SHA256SUM: 'a40ac69263440761199fcb8da47ad4e3f328cbe79ffbf4ecc14e5ba252857307'
BAZEL_VERSION: '7.7.0'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, my suggestion to upgrade Bazel version to 7.7 was to use bazel modules, which should ideally help with dependency management.

Are we getting any benefit from doing this upgrade? Does it help updating some dependencies or something?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • TensorFlow 2.21 uses Bazel 7.7.0
  • The upgrade unblocked protobuf 6.31.1 and related dependency alignment
  • This branch already relies on several custom repository overrides, patches, and local repos
  • Moving to bzlmod would be a larger dependency-management refactor than the minimum needed to align with TF 2.21, but this could be considered in a future split.

def protobuf_java_export(**kwargs):
java_export(
javacopts = JAVA_RELEASE_OPTS,
- # https://github.com/bazelbuild/rules_jvm_external/issues/1245
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments for why we are patching these changes? (all of these files)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding comments to this files

Comment thread patches/README.md
@@ -2,6 +2,11 @@

We use [patch-package](https://www.npmjs.com/package/patch-package) to apply
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still accurate? (i.e. are we still using patch-package somewhere?) Or should we remove this comment?

It's unclear with the current phrasing. If this is still used, can you rephrase to make it clear that both methods are used for different libraries? (and on what this depends or why we're using two methods?)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch-package is still the authoring/generation mechanism for npm patches
this branch applies the generated patch artifacts through Bazel yarn_install(post_install_patches=...)
that avoids the less reliable install-time invocation inside the repository rule

Comment thread tensorboard/compat/BUILD
srcs = ["__init__.py"],
srcs_version = "PY3",
# Keep the package's real __init__.py on this target instead of depending
# on :compat so Bazel 7 test runfiles expose the lazy tf/tf2 exports from
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this bazel-7-specific? We might forget to update this if we update the project to use a new bazel version. Could we just say "bazel", without the specific version?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real fix, not cosmetic:

  • Bazel test runfiles were exposing a synthesized/incorrect package layout
  • that broke from tensorboard.compat import tf / tf2
  • the failure reproduced in both Bazel tests and wheel/runtime smoke tests

Comment thread WORKSPACE
path = "third_party/safe_html_types",
)

# rules_closure's Soy toolchain still expects safe-html-types classes that are
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this comment be above the previous lines where the local repository is defined?

Comment thread WORKSPACE
@@ -53,19 +84,108 @@ py_repositories()

http_archive(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you'd use the tb_http_archive rule here. Why is it only used in some places?

Comment thread WORKSPACE
)

java_import_external(
name = "com_google_template_soy",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments for why we added all of these dependencies? What needs this template soy package? And why are we adding the extra_build_file_content?

Comment thread WORKSPACE
package_json_remove = ["scripts.postinstall"],
patch_args = ["-p1"],
# Under Bazel 7.7.0 on this stack, invoking patch-package from the
# repository rule is fragile. Apply the existing git-format patches
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "fragile" mean in this context? It would sometimes not work correctly?

Comment thread WORKSPACE
omit_com_google_common_html_types = True,
omit_com_google_protobuf = True,
omit_com_google_protobuf_js = True,
omit_com_google_template_soy = True,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the rules_closure_dependencies are the ones that require the new dependencies above?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will answer all the WORKSPACE comments in this one...

  • safe_html_types is the main true vendored code in this PR
  • it is a build-time Java dependency for the existing Closure/Soy toolchain, not TensorBoard runtime code
  • the local copy is used because this branch needs a protobuf-6-compatible adjusted version of those classes, not just any upstream release
  • the omit_* entries are exactly what stop rules_closure_dependencies from re-introducing older/conflicting transitive versions of the same Java deps
  • tb_http_archive is used where we needed common mirror/patch/link behavior; plain http_archive remains where that helper was not yet applied or where we stayed closer to upstream declarations

Suggested actions:

  • Keep vendoring for now unless we can prove an exact upstream archive works unchanged with protobuf 6.31.1.
  • Improve comments in WORKSPACE:
    • move the vendor rationale above local_repository
    • add one comment before com_google_template_soy
    • add one comment before rules_closure_dependencies(...) explaining the omit_* linkage
  • Expand third_party/safe_html_types/README.md:
    • what is vendored
    • why it is needed
    • why it is safe
    • why it is local_repository instead of a plain archive in this branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants