feat(langchain): add generation metadata hook by Edison-A-N · Pull Request #1686 · langfuse/langfuse-python

Edison-A-N · 2026-06-03T09:52:07Z

What does this PR do?

Adds a public LangChain CallbackHandler extension point for response-derived generation metadata, addressing https://github.com/orgs/langfuse/discussions/14015.

Users can now either pass generation_metadata_extractor to CallbackHandler or override get_generation_metadata(...) in a subclass. The hook receives the LangChain LLMResult and can return metadata that is merged into the ended Langfuse generation observation, while preserving existing run-start metadata.

Fixes https://github.com/orgs/langfuse/discussions/14015

Type of change

Verification

List the main commands you ran:

uv run --frozen pytest tests/unit/test_langchain.py::test_chat_model_callback_adds_response_derived_generation_metadata tests/unit/test_langchain.py::test_chat_model_callback_supports_generation_metadata_subclass_hook tests/unit/test_langchain.py::test_chat_model_callback_ends_generation_when_metadata_hook_fails
uv run --frozen pytest tests/unit/test_langchain.py
uv run --frozen ruff check langfuse/langchain/CallbackHandler.py tests/unit/test_langchain.py
uv run --frozen mypy langfuse/langchain/CallbackHandler.py --no-error-summary
git diff --check

Checklist

I self-reviewed the diff using code_review.md.
I added or updated tests for behavior changes.
I updated docs, examples, or .env.template if needed.
I did not hand-edit generated files; if generated files changed, I used the upstream regeneration path.
I did not commit secrets or credentials.

Greptile Summary

This PR adds a response-derived generation metadata hook to the LangChain CallbackHandler, letting users inject custom metadata (e.g., provider request IDs from response headers) into Langfuse generation observations after the LLM responds. Users can supply a generation_metadata_extractor callable at construction time or override the new get_generation_metadata method in a subclass.

Adds LangchainGenerationMetadataExtractor Protocol and get_generation_metadata public override point; the hook result is merged with start-time metadata in on_llm_end.
Introduces _generation_metadata_by_run_id to carry the start-time observation metadata forward to the end event so it can be re-applied alongside response-derived fields; cleanup is wired into on_llm_error and _reset.
Three new unit tests cover the extractor path, the subclass-override path, and the resilience path (hook raises → generation still ends correctly).

Confidence Score: 4/5

Safe to merge. The new hook is well-isolated behind a try/except so a failing extractor never blocks generation completion, and the internal state dict is cleaned up on all exit paths.

The logic is straightforward and the three new tests cover the critical new paths. Two minor gaps hold the score below 5: the LangchainGenerationMetadataExtractor Protocol is not re-exported from the public package entry point, and on_llm_end now unconditionally passes a metadata argument to update() even when both the extractor and initial metadata are absent — a behavioural change from before the PR that no existing test exercises directly.

langfuse/langchain/init.py — the new Protocol should be exported here alongside CallbackHandler.

Sequence Diagram

sequenceDiagram
    participant LC as LangChain
    participant CB as LangchainCallbackHandler
    participant Meta as _generation_metadata_by_run_id
    participant Obs as LangfuseObservation

    LC->>CB: on_chat_model_start / on_llm_start
    CB->>CB: __on_llm_action → compute observation_metadata
    CB->>Obs: "start_observation(metadata=observation_metadata)"
    CB->>Meta: "store[run_id] = observation_metadata (if not None)"

    LC->>CB: on_llm_end(response)
    CB->>Meta: "pop initial_metadata (default {})"
    CB->>CB: get_generation_metadata(response, run_id, ...)
    note over CB: try/except – hook failure is logged, generation still ends
    CB->>Obs: "update(metadata={**initial_metadata, **generation_metadata})"
    CB->>Obs: end()

    alt LLM error path
        LC->>CB: on_llm_error(error)
        CB->>Meta: pop(run_id, None) – discard stored metadata
        CB->>Obs: update(level, status_message).end()
    end

    alt Chain reset path
        CB->>Meta: pop all run_ids in root_run_state
    end

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
langfuse/langchain/__init__.py:1-5
**`LangchainGenerationMetadataExtractor` not re-exported from the public package**

`LangchainGenerationMetadataExtractor` is the typed interface users need to annotate their extractor functions, but it is only reachable through the internal path `langfuse.langchain.CallbackHandler`. Any user who wants to type-annotate an extractor (e.g. `extractor: LangchainGenerationMetadataExtractor`) must reach into the private module instead of using `from langfuse.langchain import ...`. Adding it to the `__init__.py` import and `__all__` list makes the public API complete.

### Issue 2 of 2
langfuse/langchain/CallbackHandler.py:1378-1382
**`metadata` always passed to `update()` even without an extractor**

Before this PR, `on_llm_end` never passed a `metadata` argument to `generation.update()`. Now it always does, even for the no-extractor / no-initial-metadata path where both `initial_metadata` and `generation_metadata` are empty, resulting in `update(metadata={})`. While the current OTel attribute implementation treats an empty dict as a no-op and existing span attributes are unaffected, this is a silent behavioural change. Consider guarding the `metadata` keyword with a truthiness check so the argument is omitted when there is nothing to set — this makes the intent explicit and avoids surprising interactions if the serialisation layer changes.

_{Reviews (1): Last reviewed commit: "feat(langchain): add generation metadata..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

CLAassistant · 2026-06-03T09:52:26Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

greptile-apps · 2026-06-03T09:57:09Z

+                    metadata={
+                        **initial_metadata,
+                        **(generation_metadata or {}),
+                    },
                ).end()


metadata always passed to update() even without an extractor

Before this PR, on_llm_end never passed a metadata argument to generation.update(). Now it always does, even for the no-extractor / no-initial-metadata path where both initial_metadata and generation_metadata are empty, resulting in update(metadata={}). While the current OTel attribute implementation treats an empty dict as a no-op and existing span attributes are unaffected, this is a silent behavioural change. Consider guarding the metadata keyword with a truthiness check so the argument is omitted when there is nothing to set — this makes the intent explicit and avoids surprising interactions if the serialisation layer changes.

Prompt To Fix With AI

This is a comment left during a code review. Path: langfuse/langchain/CallbackHandler.py Line: 1378-1382 Comment: **`metadata` always passed to `update()` even without an extractor** Before this PR, `on_llm_end` never passed a `metadata` argument to `generation.update()`. Now it always does, even for the no-extractor / no-initial-metadata path where both `initial_metadata` and `generation_metadata` are empty, resulting in `update(metadata={})`. While the current OTel attribute implementation treats an empty dict as a no-op and existing span attributes are unaffected, this is a silent behavioural change. Consider guarding the `metadata` keyword with a truthiness check so the argument is omitted when there is nothing to set — this makes the intent explicit and avoids surprising interactions if the serialisation layer changes. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

feat(langchain): add generation metadata hook

580dd98

claude Bot reviewed Jun 3, 2026

View reviewed changes

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(langchain): add generation metadata hook#1686

feat(langchain): add generation metadata hook#1686
Edison-A-N wants to merge 1 commit into
langfuse:mainfrom
Edison-A-N:feat/langchain-generation-metadata-hook

Edison-A-N commented Jun 3, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

CLAassistant commented Jun 3, 2026

Uh oh!

greptile-apps Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Edison-A-N commented Jun 3, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Verification

Checklist

Greptile Summary

Confidence Score: 4/5

Sequence Diagram

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

CLAassistant commented Jun 3, 2026

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Edison-A-N commented Jun 3, 2026 •

edited by greptile-apps Bot

Loading