Skip to content

Comments

Fix: map_from_arrays() with NULL inputs causes native crash#3356

Open
kazantsev-maksim wants to merge 33 commits intoapache:mainfrom
kazantsev-maksim:map_from_array_fix
Open

Fix: map_from_arrays() with NULL inputs causes native crash#3356
kazantsev-maksim wants to merge 33 commits intoapache:mainfrom
kazantsev-maksim:map_from_array_fix

Conversation

@kazantsev-maksim
Copy link
Contributor

@kazantsev-maksim kazantsev-maksim commented Feb 1, 2026

Which issue does this PR close?

Closes #3327

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Tested with existing unit tests

-- Comet bug: map_from_arrays(NULL, NULL) causes native crash "map key cannot be null"
-- https://github.com/apache/datafusion-comet/issues/3327
query ignore(https://github.com/apache/datafusion-comet/issues/3327)
query spark_answer_only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fully native?

Suggested change
query spark_answer_only
query

-- https://github.com/apache/datafusion-comet/issues/3327
query ignore(https://github.com/apache/datafusion-comet/issues/3327)
query spark_answer_only
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add tests where v is NULL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be the same behavior: in the original data, value == null when key == null.

statement
INSERT INTO test_map_from_arrays VALUES (array('a', 'b', 'c'), array(1, 2, 3)), (array(), array()), (NULL, NULL)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran some additional tests locally and found that the null check needs to cover both inputs, not just the keys. The Spark docs say map_from_arrays "returns null if either the keys array or values array is null", but the current keyIsNotNullExpr only checks expr.left.

I added the row (array('x'), NULL) to the test data and a query SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL AND v IS NULL, which crashes with CometNativeException: Compute error: concat requires input of at least one array.

Here's a suggested updated test file that covers the missing cases. Note that I did use AI to generate these tests, so it is possible that there are mistakes.

-- ConfigMatrix: parquet.enable.dictionary=false,true

statement
CREATE TABLE test_map_from_arrays(k array<string>, v array<int>) USING parquet

statement
INSERT INTO test_map_from_arrays VALUES
  (array('a', 'b', 'c'), array(1, 2, 3)),
  (array(), array()),
  (NULL, NULL),
  (array('x'), NULL),
  (NULL, array(99))

-- basic functionality
query spark_answer_only
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL AND v IS NOT NULL

-- both inputs NULL should return NULL
query
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL AND v IS NULL

-- keys not null but values null should return NULL (Spark behavior)
query
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NOT NULL AND v IS NULL

-- keys null but values not null should return NULL (Spark behavior)
query
SELECT map_from_arrays(k, v) FROM test_map_from_arrays WHERE k IS NULL AND v IS NOT NULL

-- all rows including nulls
query spark_answer_only
SELECT map_from_arrays(k, v) FROM test_map_from_arrays

-- literal arguments
query spark_answer_only
SELECT map_from_arrays(array('a', 'b'), array(1, 2))

-- literal null arguments
query
SELECT map_from_arrays(NULL, array(1, 2))

query
SELECT map_from_arrays(array('a'), NULL)

query
SELECT map_from_arrays(NULL, NULL)

To fix the serde, the condition should check both inputs for null. Something like IsNotNull(expr.left) AND IsNotNull(expr.right) instead of just IsNotNull(expr.left).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tanks @andygrove, fixed this.

@codecov-commenter
Copy link

codecov-commenter commented Feb 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.03%. Comparing base (f09f8af) to head (d82900a).
⚠️ Report is 927 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3356      +/-   ##
============================================
+ Coverage     56.12%   60.03%   +3.91%     
- Complexity      976     1477     +501     
============================================
  Files           119      175      +56     
  Lines         11743    16178    +4435     
  Branches       2251     2681     +430     
============================================
+ Hits           6591     9713    +3122     
- Misses         4012     5114    +1102     
- Partials       1140     1351     +211     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

map_from_arrays() with NULL inputs causes native crash

3 participants