HIVE-29514: Optimize UDF Unhex and improve its test coverage#6471
HIVE-29514: Optimize UDF Unhex and improve its test coverage#6471tanishq-chugh wants to merge 7 commits into
Conversation
| if (val == -1) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Please add a test for invalid character in an odd-length input.
Example: G.
There was a problem hiding this comment.
Hi @soumyakanti3578 , Thanks for checking this!
Added the case in commit: a530e08
| while (i < len) { | ||
| int high = decodeHexChar(textBytes[i++]); | ||
| int low = decodeHexChar(textBytes[i++]); | ||
|
|
||
| if (high == -1 || low == -1) { | ||
| return null; | ||
| } | ||
|
|
||
| result[resIdx++] = (byte) ((high << 4) | low); | ||
| } |
There was a problem hiding this comment.
How about:
while (i < len) {
int high, low;
if ((high = decodeHexChar(textBytes[i++])) == -1 ||
(low = decodeHexChar(textBytes[i++])) == -1) {
return null;
}
result[resIdx++] = (byte) ((high << 4) | low);
}
so that we don't compute low if high == -1. Consider adding tests for when high == -1 and low == -1.
| Text hexEmpty = new Text(""); | ||
| byte[] expectedEmpty = new byte[0]; | ||
| assertArrayEquals(expectedEmpty, udf.evaluate(hexEmpty)); | ||
| } |
There was a problem hiding this comment.
It would be nice to have tests for
- mixed case:
aABb9 - boundary values
- lower case, as all tests are for upper case right now
Also, maybe not in this file, but are there any tests for hex(unhex(...)) and unhex(hex(..))?
There was a problem hiding this comment.
Have added these test cases in commit: 254027f
Also, maybe not in this file, but are there any tests for hex(unhex(...)) and unhex(hex(..))?
I see the UNHEX(HEX(...)) case being used in ba_table_udfs.q QTest, but i don't think HEX(UNHEX(...)) is a valid scenario, what do you think? please feel free to correct me if i am wrong here.
thomasrebele
left a comment
There was a problem hiding this comment.
I agree with @soumyakanti3578's comments. Otherwise LGTM.
|



What changes were proposed in this pull request?
Improve the UDF Unhex and its test coverage
Why are the changes needed?
Better performance
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manual testing + java test