Skip to content

Update GradientBoost to xgboost >= 2.0 API (xgb.DMatrix)#48

Draft
instantepiphany wants to merge 1 commit into
masterfrom
jc-xgboost-3-compat-fix-xgb.DMatrix
Draft

Update GradientBoost to xgboost >= 2.0 API (xgb.DMatrix)#48
instantepiphany wants to merge 1 commit into
masterfrom
jc-xgboost-3-compat-fix-xgb.DMatrix

Conversation

@instantepiphany

Copy link
Copy Markdown
Contributor

Summary

GradientBoost() was written against the pre-2.0 xgboost API (raw matrix + label = arg). Since xgboost 2.0, xgb.cv() and xgb.train() enforce inherits(data, \"xgb.DMatrix\"), and xgb.cv() no longer accepts label. The legacy xgboost() wrapper also renamed datax and labely. As of the version currently shipping in NixR (xgboost 3.2.0.1), every Gradient Boosting test in the regression suite fails before the first round of training with:

```
Error: inherits(data, "xgb.DMatrix") is not TRUE
```

(The error is emitted by `stopifnot()` at xgboost `R/xgb.cv.R:118`, which quotes the unevaluated expression — i.e. the variable is named `data`, not `dada`.)

Surfaced by the Standard R Tests / Machine Learning - Gradient Boosting.Q regression run #7389: log.htm. The Q file's `tree` RItem (and all sibling `xgb*`, `importance tree`, `gradient.boost`, `imputation`, `tree with grid` items) all funnel through `GradientBoost()`, so they all fail the same way.

Changes

  • Build `xgb.DMatrix(data = numeric.data$X, label = numeric.data$y)` once and reuse it.
  • `xgb.cv()` calls now pass `data = dtrain` with no `label =`. Inside the grid-search worker, `set.seed(seed)` is called explicitly before each `xgb.cv` (the old `seed =` parameter was dropped from `xgb.cv`).
  • Replace `xgboost(data = …, label = …, params = …, nrounds = …)` with `xgb.train(data = dtrain, params = …, nrounds = …)` so the returned object is still an `xgb.Booster` and the downstream contract on `result$original` (`$params`, `$call`, `$feature_names`, `$raw`, `$handle`) is unchanged.
  • Read `xgbcv$early_stop$best_iteration` instead of `xgbcv$best_iteration` (relocated in xgb.cv's return shape).
  • `@importFrom` / NAMESPACE: drop `xgboost`, add `xgb.DMatrix` and `xgb.train`.
  • DESCRIPTION: bump version to 1.2.5 and require `xgboost (>= 2.0.0)`.

Reproduction

Reproduce the original failure (raw matrix into modern xgb.cv):

```bash
cd /path/to/r-server
nix run .#R -- --vanilla -e 'set.seed(1)
x <- matrix(rnorm(40), 20, 2)
y <- sample(0:1, 20, replace = TRUE)
xgboost::xgb.cv(data = x, label = y, nrounds = 2, nfold = 3,
params = list(objective = "binary:logistic"), verbose = 0)'

-> Error: inherits(data, "xgb.DMatrix") is not TRUE

-> Warning: Parameter(s) have been removed from this function: label.

```

Confirm the new code path works against the same xgboost:

```bash
nix run .#R -- --vanilla -e 'library(xgboost)
set.seed(1)
x <- matrix(rnorm(60), 30, 2)
y <- sample(0:1, 30, replace = TRUE)
dtrain <- xgb.DMatrix(data = x, label = y)
xgb.cv(data = dtrain, nrounds = 5, nfold = 3,
params = list(objective = "binary:logistic", num_class = 1, max_depth = 3),
verbose = 0)
xgb.train(data = dtrain,
params = list(objective = "binary:logistic", num_class = 1, max_depth = 3),
nrounds = 3, verbose = 0)'

-> xgb.cv ok; xgb.train returns class "xgb.Booster"

```

Build this branch as an installable R package via the NixR adhoc-deployment branch (override the flipMultivariates input to point at your local checkout of this branch):

```bash
nix build 'git+ssh://git@github.com/displayr/NixR?ref=untracked-update-nixpkgs#pkgs.x86_64-linux.rPackages.flipMultivariates' \
--override-input flipMultivariates .
```

Test plan

  • `Standard R Tests / Machine Learning - Gradient Boosting.Q` passes against an R server using this branch (re-run of the failing job above).
  • Spot-check both grid-search and non-grid-search code paths (`tree` vs `tree with grid` RItems in the same Q file).
  • `Importance` output still renders (`importance tree` RItem) — exercises `xgb.importance(model = x$original)`.
  • `predict.GradientBoost` smoke test — exercises `x$original$params$objective`, `x$original$feature_names`, and the `xgb.load.raw(x$original$raw)` salvage path in `R/variables.R`.

🤖 Generated with Claude Code

xgboost >= 2.0 enforces inherits(data, "xgb.DMatrix") inside xgb.cv()
and xgb.train(), so passing a raw matrix now fails with
`inherits(data, "xgb.DMatrix") is not TRUE`. The `label` arg has also
been removed from xgb.cv(), and xgboost()'s legacy signature was
renamed (data -> x, label -> y).

Build an xgb.DMatrix once from numeric.data$X/y, pass it to xgb.cv,
switch the fit calls to xgb.train so the existing $original (xgb.Booster)
contract is preserved, and read best_iteration from $early_stop where
the new xgb.cv now exposes it. Bump xgboost requirement to (>= 2.0.0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@instantepiphany instantepiphany marked this pull request as draft June 25, 2026 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant