Skip to content

[MINOR] Fix CholeskyTest crash when residual is exactly zero#2487

Merged
Baunsgaard merged 1 commit into
apache:mainfrom
Baunsgaard:fix-cholesky-empty-output
Jun 9, 2026
Merged

[MINOR] Fix CholeskyTest crash when residual is exactly zero#2487
Baunsgaard merged 1 commit into
apache:mainfrom
Baunsgaard:fix-cholesky-empty-output

Conversation

@Baunsgaard

Copy link
Copy Markdown
Contributor

CholeskyTest reconstructs A from its Cholesky factor and asserts that the 1x1 residual D = sum(A-B) is approximately zero. The output was read back with dmlOut.keySet().iterator().next(), which assumes at least one cell is present. When the residual is exactly 0.0, the sparse text writer omits the cell entirely, so the result map comes back empty and the iterator throws NoSuchElementException. A perfect reconstruction therefore caused the test to error out instead of pass.

This is not data-dependent flakiness: the input matrix is already seeded, so A is identical on every run. The variability comes from the reduction order of sum(A-B), which differs across Spark partitions and CP threads. Because floating-point addition is not associative, the residual lands on either an exact 0.0 (empty output) or a tiny non-zero value depending on execution, which is why only some runs (notably testLargeCholeskyDenseSP) failed. The fix treats an empty output as 0.0, making the assertion robust to both outcomes, and drops the now-unused MatrixValue import.

Error: https://github.com/apache/systemds/actions/runs/27172521960/job/80214522399

CholeskyTest reconstructs A from its Cholesky factor and asserts that the
1x1 residual D = sum(A-B) is approximately zero. The output was read back
with dmlOut.keySet().iterator().next(), which assumes at least one cell is
present. When the residual is exactly 0.0, the sparse text writer omits the
cell entirely, so the result map comes back empty and the iterator throws
NoSuchElementException. A perfect reconstruction therefore caused the test
to error out instead of pass.

This is not data-dependent flakiness: the input matrix is already seeded, so
A is identical on every run. The variability comes from the reduction order
of sum(A-B), which differs across Spark partitions and CP threads. Because
floating-point addition is not associative, the residual lands on either an
exact 0.0 (empty output) or a tiny non-zero value depending on execution,
which is why only some runs (notably testLargeCholeskyDenseSP) failed. The
fix treats an empty output as 0.0, making the assertion robust to both
outcomes, and drops the now-unused MatrixValue import.
@github-project-automation github-project-automation Bot moved this to In Progress in SystemDS PR Queue Jun 9, 2026
@Baunsgaard Baunsgaard merged commit d90ecf6 into apache:main Jun 9, 2026
43 of 44 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SystemDS PR Queue Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant