feat: consolidate to llguidance from xgrammar by jakelorocco · Pull Request #1077 · generative-computing/mellea

jakelorocco · 2026-05-14T18:45:43Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: Fixes consolidate between llguidance and xgrammar #1004

Moves granite formatters onto llguidance. There wasn't a strong reason to keep xgrammar and both work. All intrinsic tests continue to pass.

Moved _GuidanceLogitsProcessor to util so that it could be imported by the HuggingFace backend and not live in two spots.

Also fixed one spot in the util function where model was being used even though it could've been None.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used

… instead of xgrammar Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

github-actions · 2026-05-14T18:46:01Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

ajbozarth

On a quick read-through this looks good, Claude found a handful of small items, but nothing big:

ajbozarth · 2026-05-14T21:40:36Z

+                ll_matcher.consume_token(input_ids.tolist()[-1])  # type: ignore[attr-defined]
+                err = ll_matcher.get_error()  # type: ignore[attr-defined]
+                if err:
+                    logging.warning("Error in LLMatcher: %s", err)


The previous version in huggingface.py used MelleaLogger.get_logger().warning(...). Suggest keeping that here — the rest of the HF backend standardizes on it (~15 callsites), and users who configure the mellea logger (level, handlers) won't see warnings routed through the root logger otherwise.

ajbozarth · 2026-05-14T21:40:36Z

+        self.new_sampling: bool = False
+        self.batch_size: int = -1
+
+    def __call__(self, batch_input_ids, batch_scores):  # type: ignore[no-untyped-def]


AGENTS.md §5 requires types on core functions. The previous version had (batch_input_ids: torch.Tensor, batch_scores: torch.Tensor) -> torch.Tensor, and bitmasks: list[torch.Tensor] on line 127. You can keep the runtime imports lazy and still restore the annotations by adding import torch to the existing if TYPE_CHECKING: block at the top of the file.

ajbozarth · 2026-05-14T21:40:36Z

+        """Apply the grammar's allowed-token bitmask to ``batch_scores`` in place."""
+        # Third Party
+        import llguidance
+        import llguidance.torch


Nit: these run on every generation step. sys.modules caches it so cost is small, but hoisting into __init__ (or module-level lazy) would be cleaner — you can't reach __call__ without going through the constructor.

ajbozarth · 2026-05-14T21:40:36Z

            request["extra_body"]["structured_outputs"]["json"]
+            # NOTE: Mellea's structured output for hf sets `whitespace_flexible` to False.
+            #       Not doing so here to match the previous granite formatter behavior.
+            # defaults={"whitespace_flexible": False},


Either commit to the choice (delete the dead code) or pass defaults=... explicitly so the intent lives in code rather than a comment. As written, this is the kind of note that rots — a future reader will wonder whether it was forgotten.

ajbozarth · 2026-05-14T21:40:36Z

                "Request specifies constrained decoding, but no "
-                "tokenizer object was passed to this function."
+                "model object was passed to this function."
            )


The xgrammar version needed model for model.vocab_size in the max(...) reconciliation. The llguidance path doesn't appear to use model anywhere in this branch, so this check (and the model argument requirement for constrained decoding generally) can probably be dropped.

ajbozarth · 2026-05-14T21:40:36Z

-        # Gather together all the possibilities and pick the biggest one.
-        vocab_size = max(tokenizer.vocab_size, len(tokenizer), model.vocab_size)
+        if ll_tokenizer is None:
+            ll_tokenizer = llguidance.hf.from_tokenizer(tokenizer)  # type: ignore[arg-type]


The previous code did vocab_size = max(tokenizer.vocab_size, len(tokenizer), model.vocab_size) to defend against HF's vocab-size disagreement across components (resized embeddings, added special tokens, etc.). Worth confirming llguidance.hf.from_tokenizer handles the model.vocab_size > tokenizer.vocab_size case — the old comment ("of course they do") implied this had bitten someone before.

jakelorocco added 2 commits May 14, 2026 14:44

feat: move granite formatter hf generation function to use llguidance…

372d940

… instead of xgrammar Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com> Assisted-by: CLAUDE:OPUS

fix: add not None assertion to model in granite formatter util

c16cc6b

Signed-off-by: Jake LoRocco <jake.lorocco@ibm.com>

github-actions Bot added the enhancement New feature or request label May 14, 2026

This was linked to issues May 14, 2026

consolidate between llguidance and xgrammar #1004

Open

fix: unpin xgrammar when bug is fixed #990

Open

jakelorocco marked this pull request as ready for review May 14, 2026 20:45

jakelorocco requested review from a team as code owners May 14, 2026 20:45

jakelorocco requested review from ajbozarth, akihikokuroda and frreiss May 14, 2026 20:45

ajbozarth reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: consolidate to llguidance from xgrammar#1077

feat: consolidate to llguidance from xgrammar#1077
jakelorocco wants to merge 2 commits into
generative-computing:mainfrom
jakelorocco:fix/consolidate-llguidance

jakelorocco commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

ajbozarth left a comment

Uh oh!

ajbozarth May 14, 2026

Uh oh!

ajbozarth May 14, 2026

Uh oh!

ajbozarth May 14, 2026

Uh oh!

ajbozarth May 14, 2026

Uh oh!

ajbozarth May 14, 2026

Uh oh!

ajbozarth May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jakelorocco commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Attribution

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

ajbozarth left a comment

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 14, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 14, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 14, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 14, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 14, 2026

Choose a reason for hiding this comment

Uh oh!

ajbozarth May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakelorocco commented May 14, 2026 •

edited

Loading