Skip to content

[runners-spark] Use robust constructor resolution in EncoderFactory#38271

Open
tkaymak wants to merge 2 commits intoapache:masterfrom
tkaymak:spark-encoderfactory-robustness
Open

[runners-spark] Use robust constructor resolution in EncoderFactory#38271
tkaymak wants to merge 2 commits intoapache:masterfrom
tkaymak:spark-encoderfactory-robustness

Conversation

@tkaymak
Copy link
Copy Markdown
Contributor

@tkaymak tkaymak commented Apr 23, 2026

Summary

Replace (Constructor<X>) X.class.getConstructors()[0] for StaticInvoke, Invoke,
and NewInstance in the shared structured-streaming EncoderFactory with a
primaryConstructor() helper that picks the constructor with the most parameters.

Class.getConstructors() returns constructors in JVM-defined order that is not
guaranteed stable, so resolving the widest constructor explicitly makes the lookup
robust to future Spark releases that add overloaded constructors. The downstream
switch (...getParameterCount()) in invoke(...) and newInstance(...) already
dispatches on parameter count, so the widest constructor matches the dispatch table.

Behavior

No-op against currently supported Spark 3 versions: StaticInvoke, Invoke, and
NewInstance only have one public constructor each in Spark 3.1.x through 3.5.x,
so getConstructors()[0] and the widest constructor resolve to the same one.
The change is purely defensive.

Precedent

The same fix has already landed in the Spark 4 override in #38255 (commit
9c071c5e). This PR ports it to the shared base so both code paths stay consistent.

Note on overlap with #38255

#38255 also touches this file at a different line (a43868508b,
correcting the second invoke(Expression obj, ...) switch to key on
INVOKE_CONSTRUCTOR.getParameterCount() instead of STATIC_INVOKE_CONSTRUCTOR).
The two changes are independent — this PR can land before or after #38255

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request improves the robustness of the EncoderFactory within the Spark structured-streaming runner. By replacing direct array indexing for constructor retrieval with a logic-based selection of the widest constructor, the code becomes more resilient to potential changes in Spark's internal class structures across different versions. This change is purely defensive and maintains current functionality while aligning the shared base with existing patterns in the Spark 4 overrides.

Highlights

  • Robust Constructor Resolution: Replaced unstable index-based constructor access with a new primaryConstructor() helper method that explicitly selects the constructor with the highest parameter count.
  • Defensive Coding: Ensured compatibility across different Spark versions by making constructor lookup resilient to changes in JVM-defined constructor ordering.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@tkaymak
Copy link
Copy Markdown
Contributor Author

tkaymak commented Apr 24, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the robustness of the EncoderFactory by replacing unstable constructor indexing with a helper method that selects the constructor with the most parameters. A review comment suggests adding a check for empty constructor arrays to prevent a potential ArrayIndexOutOfBoundsException during static initialization.

Comment on lines +48 to +49
Constructor<?>[] ctors = cls.getConstructors();
Constructor<?> widest = ctors[0];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation assumes that getConstructors() will always return at least one element. If it returns an empty array (for example, if all public constructors were removed or made non-public in a future Spark version), ctors[0] will throw an ArrayIndexOutOfBoundsException. Since the goal of this PR is to improve robustness, adding a check with a descriptive error message is recommended to avoid a cryptic crash during static initialization.

    Constructor<?>[] ctors = cls.getConstructors();
    if (ctors.length == 0) {
      throw new IllegalStateException("No public constructors found for " + cls.getName());
    }
    Constructor<?> widest = ctors[0];

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0621ce5.

@tkaymak tkaymak changed the title [WIP][runners-spark] Use robust constructor resolution in EncoderFactory [runners-spark] Use robust constructor resolution in EncoderFactory Apr 24, 2026
@tkaymak tkaymak marked this pull request as ready for review April 24, 2026 16:50
tkaymak added 2 commits April 24, 2026 16:53
Replace `(Constructor<X>) X.class.getConstructors()[0]` for
StaticInvoke, Invoke, and NewInstance with a `primaryConstructor()`
helper that picks the constructor with the most parameters.
Class.getConstructors() returns constructors in JVM-defined order
that is not guaranteed stable, so resolving the widest constructor
explicitly makes the lookup robust to future Spark releases that
add overloaded constructors.

Today this is a no-op: StaticInvoke / Invoke / NewInstance only
have one public constructor each in Spark 3.1.x through 3.5.x, so
getConstructors()[0] and the widest constructor resolve to the
same one. The change is purely defensive.

Same fix has already landed in the Spark 4 override in PR apache#38255
(commit 9c071c5, flagged by Gemini Code Assist round 1). Porting
it to the shared base keeps both code paths consistent.
If Class.getConstructors() returns empty (e.g. all public constructors
removed in a future Spark version), throw an IllegalStateException with
the class name instead of letting ctors[0] surface as a cryptic
ArrayIndexOutOfBoundsException during static initialization.
@tkaymak tkaymak force-pushed the spark-encoderfactory-robustness branch from b4f8ffd to 0621ce5 Compare April 24, 2026 16:53
@github-actions
Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @kennknowles added as fallback since no labels match configuration

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant