Skip to content

feat: consolidate to llguidance from xgrammar#1077

Open
jakelorocco wants to merge 2 commits into
generative-computing:mainfrom
jakelorocco:fix/consolidate-llguidance
Open

feat: consolidate to llguidance from xgrammar#1077
jakelorocco wants to merge 2 commits into
generative-computing:mainfrom
jakelorocco:fix/consolidate-llguidance

Conversation

@jakelorocco
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco commented May 14, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Moves granite formatters onto llguidance. There wasn't a strong reason to keep xgrammar and both work. All intrinsic tests continue to pass.

Moved _GuidanceLogitsProcessor to util so that it could be imported by the HuggingFace backend and not live in two spots.

Also fixed one spot in the util function where model was being used even though it could've been None.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

… instead of xgrammar

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
@github-actions github-actions Bot added the enhancement New feature or request label May 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@jakelorocco jakelorocco marked this pull request as ready for review May 14, 2026 20:45
@jakelorocco jakelorocco requested review from a team as code owners May 14, 2026 20:45
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a quick read-through this looks good, Claude found a handful of small items, but nothing big:

ll_matcher.consume_token(input_ids.tolist()[-1]) # type: ignore[attr-defined]
err = ll_matcher.get_error() # type: ignore[attr-defined]
if err:
logging.warning("Error in LLMatcher: %s", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous version in huggingface.py used MelleaLogger.get_logger().warning(...). Suggest keeping that here — the rest of the HF backend standardizes on it (~15 callsites), and users who configure the mellea logger (level, handlers) won't see warnings routed through the root logger otherwise.

self.new_sampling: bool = False
self.batch_size: int = -1

def __call__(self, batch_input_ids, batch_scores): # type: ignore[no-untyped-def]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AGENTS.md §5 requires types on core functions. The previous version had (batch_input_ids: torch.Tensor, batch_scores: torch.Tensor) -> torch.Tensor, and bitmasks: list[torch.Tensor] on line 127. You can keep the runtime imports lazy and still restore the annotations by adding import torch to the existing if TYPE_CHECKING: block at the top of the file.

"""Apply the grammar's allowed-token bitmask to ``batch_scores`` in place."""
# Third Party
import llguidance
import llguidance.torch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: these run on every generation step. sys.modules caches it so cost is small, but hoisting into __init__ (or module-level lazy) would be cleaner — you can't reach __call__ without going through the constructor.

request["extra_body"]["structured_outputs"]["json"]
# NOTE: Mellea's structured output for hf sets `whitespace_flexible` to False.
# Not doing so here to match the previous granite formatter behavior.
# defaults={"whitespace_flexible": False},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either commit to the choice (delete the dead code) or pass defaults=... explicitly so the intent lives in code rather than a comment. As written, this is the kind of note that rots — a future reader will wonder whether it was forgotten.

"Request specifies constrained decoding, but no "
"tokenizer object was passed to this function."
"model object was passed to this function."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xgrammar version needed model for model.vocab_size in the max(...) reconciliation. The llguidance path doesn't appear to use model anywhere in this branch, so this check (and the model argument requirement for constrained decoding generally) can probably be dropped.

# Gather together all the possibilities and pick the biggest one.
vocab_size = max(tokenizer.vocab_size, len(tokenizer), model.vocab_size)
if ll_tokenizer is None:
ll_tokenizer = llguidance.hf.from_tokenizer(tokenizer) # type: ignore[arg-type]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code did vocab_size = max(tokenizer.vocab_size, len(tokenizer), model.vocab_size) to defend against HF's vocab-size disagreement across components (resized embeddings, added special tokens, etc.). Worth confirming llguidance.hf.from_tokenizer handles the model.vocab_size > tokenizer.vocab_size case — the old comment ("of course they do") implied this had bitten someone before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

consolidate between llguidance and xgrammar fix: unpin xgrammar when bug is fixed

2 participants