feat: consolidate to llguidance from xgrammar#1077
Conversation
… instead of xgrammar Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS
Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
ajbozarth
left a comment
There was a problem hiding this comment.
On a quick read-through this looks good, Claude found a handful of small items, but nothing big:
| ll_matcher.consume_token(input_ids.tolist()[-1]) # type: ignore[attr-defined] | ||
| err = ll_matcher.get_error() # type: ignore[attr-defined] | ||
| if err: | ||
| logging.warning("Error in LLMatcher: %s", err) |
There was a problem hiding this comment.
The previous version in huggingface.py used MelleaLogger.get_logger().warning(...). Suggest keeping that here — the rest of the HF backend standardizes on it (~15 callsites), and users who configure the mellea logger (level, handlers) won't see warnings routed through the root logger otherwise.
| self.new_sampling: bool = False | ||
| self.batch_size: int = -1 | ||
|
|
||
| def __call__(self, batch_input_ids, batch_scores): # type: ignore[no-untyped-def] |
There was a problem hiding this comment.
AGENTS.md §5 requires types on core functions. The previous version had (batch_input_ids: torch.Tensor, batch_scores: torch.Tensor) -> torch.Tensor, and bitmasks: list[torch.Tensor] on line 127. You can keep the runtime imports lazy and still restore the annotations by adding import torch to the existing if TYPE_CHECKING: block at the top of the file.
| """Apply the grammar's allowed-token bitmask to ``batch_scores`` in place.""" | ||
| # Third Party | ||
| import llguidance | ||
| import llguidance.torch |
There was a problem hiding this comment.
Nit: these run on every generation step. sys.modules caches it so cost is small, but hoisting into __init__ (or module-level lazy) would be cleaner — you can't reach __call__ without going through the constructor.
| request["extra_body"]["structured_outputs"]["json"] | ||
| # NOTE: Mellea's structured output for hf sets `whitespace_flexible` to False. | ||
| # Not doing so here to match the previous granite formatter behavior. | ||
| # defaults={"whitespace_flexible": False}, |
There was a problem hiding this comment.
Either commit to the choice (delete the dead code) or pass defaults=... explicitly so the intent lives in code rather than a comment. As written, this is the kind of note that rots — a future reader will wonder whether it was forgotten.
| "Request specifies constrained decoding, but no " | ||
| "tokenizer object was passed to this function." | ||
| "model object was passed to this function." | ||
| ) |
There was a problem hiding this comment.
The xgrammar version needed model for model.vocab_size in the max(...) reconciliation. The llguidance path doesn't appear to use model anywhere in this branch, so this check (and the model argument requirement for constrained decoding generally) can probably be dropped.
| # Gather together all the possibilities and pick the biggest one. | ||
| vocab_size = max(tokenizer.vocab_size, len(tokenizer), model.vocab_size) | ||
| if ll_tokenizer is None: | ||
| ll_tokenizer = llguidance.hf.from_tokenizer(tokenizer) # type: ignore[arg-type] |
There was a problem hiding this comment.
The previous code did vocab_size = max(tokenizer.vocab_size, len(tokenizer), model.vocab_size) to defend against HF's vocab-size disagreement across components (resized embeddings, added special tokens, etc.). Worth confirming llguidance.hf.from_tokenizer handles the model.vocab_size > tokenizer.vocab_size case — the old comment ("of course they do") implied this had bitten someone before.
Misc PR
Type of PR
Description
Moves granite formatters onto llguidance. There wasn't a strong reason to keep xgrammar and both work. All intrinsic tests continue to pass.
Moved
_GuidanceLogitsProcessorto util so that it could be imported by the HuggingFace backend and not live in two spots.Also fixed one spot in the util function where
modelwas being used even though it could've beenNone.Testing
Attribution