[runners-spark] Use robust constructor resolution in EncoderFactory#38271
[runners-spark] Use robust constructor resolution in EncoderFactory#38271tkaymak wants to merge 2 commits intoapache:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request improves the robustness of the EncoderFactory within the Spark structured-streaming runner. By replacing direct array indexing for constructor retrieval with a logic-based selection of the widest constructor, the code becomes more resilient to potential changes in Spark's internal class structures across different versions. This change is purely defensive and maintains current functionality while aligning the shared base with existing patterns in the Spark 4 overrides. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request improves the robustness of the EncoderFactory by replacing unstable constructor indexing with a helper method that selects the constructor with the most parameters. A review comment suggests adding a check for empty constructor arrays to prevent a potential ArrayIndexOutOfBoundsException during static initialization.
| Constructor<?>[] ctors = cls.getConstructors(); | ||
| Constructor<?> widest = ctors[0]; |
There was a problem hiding this comment.
The current implementation assumes that getConstructors() will always return at least one element. If it returns an empty array (for example, if all public constructors were removed or made non-public in a future Spark version), ctors[0] will throw an ArrayIndexOutOfBoundsException. Since the goal of this PR is to improve robustness, adding a check with a descriptive error message is recommended to avoid a cryptic crash during static initialization.
Constructor<?>[] ctors = cls.getConstructors();
if (ctors.length == 0) {
throw new IllegalStateException("No public constructors found for " + cls.getName());
}
Constructor<?> widest = ctors[0];Replace `(Constructor<X>) X.class.getConstructors()[0]` for StaticInvoke, Invoke, and NewInstance with a `primaryConstructor()` helper that picks the constructor with the most parameters. Class.getConstructors() returns constructors in JVM-defined order that is not guaranteed stable, so resolving the widest constructor explicitly makes the lookup robust to future Spark releases that add overloaded constructors. Today this is a no-op: StaticInvoke / Invoke / NewInstance only have one public constructor each in Spark 3.1.x through 3.5.x, so getConstructors()[0] and the widest constructor resolve to the same one. The change is purely defensive. Same fix has already landed in the Spark 4 override in PR apache#38255 (commit 9c071c5, flagged by Gemini Code Assist round 1). Porting it to the shared base keeps both code paths consistent.
If Class.getConstructors() returns empty (e.g. all public constructors removed in a future Spark version), throw an IllegalStateException with the class name instead of letting ctors[0] surface as a cryptic ArrayIndexOutOfBoundsException during static initialization.
b4f8ffd to
0621ce5
Compare
|
Assigning reviewers: R: @kennknowles added as fallback since no labels match configuration Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Summary
Replace
(Constructor<X>) X.class.getConstructors()[0]forStaticInvoke,Invoke,and
NewInstancein the shared structured-streamingEncoderFactorywith aprimaryConstructor()helper that picks the constructor with the most parameters.Class.getConstructors()returns constructors in JVM-defined order that is notguaranteed stable, so resolving the widest constructor explicitly makes the lookup
robust to future Spark releases that add overloaded constructors. The downstream
switch (...getParameterCount())ininvoke(...)andnewInstance(...)alreadydispatches on parameter count, so the widest constructor matches the dispatch table.
Behavior
No-op against currently supported Spark 3 versions:
StaticInvoke,Invoke, andNewInstanceonly have one public constructor each in Spark 3.1.x through 3.5.x,so
getConstructors()[0]and the widest constructor resolve to the same one.The change is purely defensive.
Precedent
The same fix has already landed in the Spark 4 override in #38255 (commit
9c071c5e). This PR ports it to the shared base so both code paths stay consistent.Note on overlap with #38255
#38255 also touches this file at a different line (
a43868508b,correcting the second
invoke(Expression obj, ...)switch to key onINVOKE_CONSTRUCTOR.getParameterCount()instead ofSTATIC_INVOKE_CONSTRUCTOR).The two changes are independent — this PR can land before or after #38255