Add sparse-data unit test for xvector-based linear regression#643
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughLinearRegression::setArguments now checks Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@engine/minskyTensorOps.cc`:
- Around line 1197-1205: The sumy accumulation currently includes y values where
x is non-finite, breaking the paired-data constraint; create a maskedY BinOp
(analogous to fx/maskedX) that returns y only when both x and y are finite (use
the existing mask lambda or a variant), then replace the existing
sumy.setArgument(y,args) call with
sumy.setArgument(make_shared<BinOp>(maskedY,y,spreadX),args) so sumy, sumx,
sumxx, sumxy and count all use the same pairwise masking logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
There was a problem hiding this comment.
Pull request overview
This PR fixes linear regression calculations when the independent variable is derived from a tensor’s x-vector and the dependent data contains sparse (non-finite) entries.
Changes:
- Corrects the axis validity guard to use
y->rank()instead of the current op’srank(). - Masks
sumxandsumxxcomputations so they only include finite (x,y) pairs, aligning them withcount/sumxy.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto mask=[](double x, double y){return isfinite(x) && isfinite(y);}; | ||
| auto fx=[](double x, double y){return isfinite(x) && isfinite(y)? x:0;}; | ||
| auto fxy=[](double x, double y){return isfinite(x) && isfinite(y)? x*y: 0;}; | ||
| auto maskedX=make_shared<BinOp>(fx,spreadX,y); | ||
| sumx.setArgument(maskedX,args); | ||
| sumyy.setArgument(make_shared<BinOp>(fxy,y,y),args); | ||
| sumxx.setArgument(make_shared<BinOp>(fxy,spreadX,spreadX),args); | ||
| sumxx.setArgument(make_shared<BinOp>(fxy,maskedX,spreadX),args); | ||
| sumxy.setArgument(make_shared<BinOp>(fxy,y,spreadX),args); | ||
| count.setArgument | ||
| (make_shared<BinOp>([](double x,double y) {return isfinite(x)*isfinite(y);},y,spreadX),args); | ||
| count.setArgument(make_shared<BinOp>(mask,y,spreadX),args); |
There was a problem hiding this comment.
The new masking logic (maskedX / fxy / count) changes linearRegression behavior for sparse inputs, but there doesn’t appear to be a unit test covering missing/non-finite y values when x is derived from y’s x-vector (the scenario this fix targets). Please add a gtest case (likely in test/testTensorOps.cc near the existing *LinearRegression tests) that includes NaN/Inf entries in y and verifies the regression output matches the expected line using only finite (x,y) pairs.
There was a problem hiding this comment.
Added xvectorValueLinearRegressionSparse test in test/testTensorOps.cc (commit bff0c30). The test creates a y-vector with xvector values {0,1,2,3,4,5} and data {1, NaN, 3, 4, Inf, 6}, then verifies the regression correctly identifies y = x + 1 using only the four finite pairs (0,1),(2,3),(3,4),(5,6).
There was a problem hiding this comment.
🧹 Nitpick comments (1)
engine/minskyTensorOps.cc (1)
1203-1203: Consider:sumyydoesn't apply the same pairwise masking.Unlike the other aggregates,
sumyyonly checks ifyis finite, not whetherspreadXis also finite for each element. This is currently harmless sincesumyyisn't used incomputeScaleAndOffset(), but could cause subtle bugs if it's used in future extensions (e.g., R² calculation).Optional fix for consistency
- sumyy.setArgument(make_shared<BinOp>(fxy,y,y),args); + sumyy.setArgument(make_shared<BinOp>(fxy,maskedY,y),args);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@engine/minskyTensorOps.cc` at line 1203, sumyy currently only masks on y's finiteness; update the argument passed to sumyy.setArgument so it applies the same pairwise masking used elsewhere (i.e., ensure the BinOp combines y and spreadX finiteness like the other aggregates). Locate the sumyy.setArgument call and replace the single-input BinOp(make_shared<BinOp>(fxy,y,y),...) with a BinOp that uses both y and spreadX (matching the pattern used by the other aggregates) so computeScaleAndOffset and any future users get consistent pairwise masking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@engine/minskyTensorOps.cc`:
- Line 1203: sumyy currently only masks on y's finiteness; update the argument
passed to sumyy.setArgument so it applies the same pairwise masking used
elsewhere (i.e., ensure the BinOp combines y and spreadX finiteness like the
other aggregates). Locate the sumyy.setArgument call and replace the
single-input BinOp(make_shared<BinOp>(fxy,y,y),...) with a BinOp that uses both
y and spreadX (matching the pattern used by the other aggregates) so
computeScaleAndOffset and any future users get consistent pairwise masking.
…hen x is derived from x-vector Agent-Logs-Url: https://github.com/highperformancecoder/minsky/sessions/af68bc6c-5736-4b28-b042-2bcf698ddc71 Co-authored-by: highperformancecoder <3075825+highperformancecoder@users.noreply.github.com>
…m:highperformancecoder/minsky into bugfix/self-linear-regression-sparse-data
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/testTensorOps.cc`:
- Around line 1865-1866: The test currently uses EXPECT_EQ(result.size(),
toVal.size()) followed by indexing result[_i], which can OOB if sizes differ;
change the non-fatal check to a fatal one by replacing EXPECT_EQ with ASSERT_EQ
for the size comparison (or alternatively iterate up to result.size() instead of
toVal.size()) so that the loop in the for (size_t _i=0; _i<toVal.size(); ++_i)
EXPECT_NEAR(result[_i], toVal[_i], 1e-4) cannot run when sizes mismatch; update
the expectation that compares result.size() and toVal.size() accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 47ced0ca-9577-42cb-b231-b7947f077d22
📒 Files selected for processing (2)
engine/minskyTensorOps.cctest/testTensorOps.cc
✅ Files skipped from review due to trivial changes (1)
- engine/minskyTensorOps.cc
Linear regression using a y tensor's x-vector for x-values was silently producing incorrect results when y contained non-finite entries (
NaN/Inf), becausesumxandsumxxwere not masked to finite pairs.Changes
test/testTensorOps.cc): AddsxvectorValueLinearRegressionSparse— a gtest case withy = {1, NaN, 3, 4, Inf, 6}over x-vector{0,1,2,3,4,5}. Verifies that only the four finite pairs are used and the regression recoversy = x + 1across all output slots.This change is
Summary by CodeRabbit