Skip to content

feat(langchain): add generation metadata hook#1686

Open
Edison-A-N wants to merge 1 commit into
langfuse:mainfrom
Edison-A-N:feat/langchain-generation-metadata-hook
Open

feat(langchain): add generation metadata hook#1686
Edison-A-N wants to merge 1 commit into
langfuse:mainfrom
Edison-A-N:feat/langchain-generation-metadata-hook

Conversation

@Edison-A-N
Copy link
Copy Markdown

@Edison-A-N Edison-A-N commented Jun 3, 2026

What does this PR do?

Adds a public LangChain CallbackHandler extension point for response-derived generation metadata, addressing https://github.com/orgs/langfuse/discussions/14015.

Users can now either pass generation_metadata_extractor to CallbackHandler or override get_generation_metadata(...) in a subclass. The hook receives the LangChain LLMResult and can return metadata that is merged into the ended Langfuse generation observation, while preserving existing run-start metadata.

Fixes https://github.com/orgs/langfuse/discussions/14015

Type of change

  • Bug fix
  • New feature
  • Breaking change
  • Refactor
  • Documentation update
  • Tooling, CI, or repo maintenance

Verification

List the main commands you ran:

uv run --frozen pytest tests/unit/test_langchain.py::test_chat_model_callback_adds_response_derived_generation_metadata tests/unit/test_langchain.py::test_chat_model_callback_supports_generation_metadata_subclass_hook tests/unit/test_langchain.py::test_chat_model_callback_ends_generation_when_metadata_hook_fails
uv run --frozen pytest tests/unit/test_langchain.py
uv run --frozen ruff check langfuse/langchain/CallbackHandler.py tests/unit/test_langchain.py
uv run --frozen mypy langfuse/langchain/CallbackHandler.py --no-error-summary
git diff --check

Checklist

  • I self-reviewed the diff using code_review.md.
  • I added or updated tests for behavior changes.
  • I updated docs, examples, or .env.template if needed.
  • I did not hand-edit generated files; if generated files changed, I used the upstream regeneration path.
  • I did not commit secrets or credentials.

Greptile Summary

This PR adds a response-derived generation metadata hook to the LangChain CallbackHandler, letting users inject custom metadata (e.g., provider request IDs from response headers) into Langfuse generation observations after the LLM responds. Users can supply a generation_metadata_extractor callable at construction time or override the new get_generation_metadata method in a subclass.

  • Adds LangchainGenerationMetadataExtractor Protocol and get_generation_metadata public override point; the hook result is merged with start-time metadata in on_llm_end.
  • Introduces _generation_metadata_by_run_id to carry the start-time observation metadata forward to the end event so it can be re-applied alongside response-derived fields; cleanup is wired into on_llm_error and _reset.
  • Three new unit tests cover the extractor path, the subclass-override path, and the resilience path (hook raises → generation still ends correctly).

Confidence Score: 4/5

Safe to merge. The new hook is well-isolated behind a try/except so a failing extractor never blocks generation completion, and the internal state dict is cleaned up on all exit paths.

The logic is straightforward and the three new tests cover the critical new paths. Two minor gaps hold the score below 5: the LangchainGenerationMetadataExtractor Protocol is not re-exported from the public package entry point, and on_llm_end now unconditionally passes a metadata argument to update() even when both the extractor and initial metadata are absent — a behavioural change from before the PR that no existing test exercises directly.

langfuse/langchain/init.py — the new Protocol should be exported here alongside CallbackHandler.

Sequence Diagram

sequenceDiagram
    participant LC as LangChain
    participant CB as LangchainCallbackHandler
    participant Meta as _generation_metadata_by_run_id
    participant Obs as LangfuseObservation

    LC->>CB: on_chat_model_start / on_llm_start
    CB->>CB: __on_llm_action → compute observation_metadata
    CB->>Obs: "start_observation(metadata=observation_metadata)"
    CB->>Meta: "store[run_id] = observation_metadata (if not None)"

    LC->>CB: on_llm_end(response)
    CB->>Meta: "pop initial_metadata (default {})"
    CB->>CB: get_generation_metadata(response, run_id, ...)
    note over CB: try/except – hook failure is logged, generation still ends
    CB->>Obs: "update(metadata={**initial_metadata, **generation_metadata})"
    CB->>Obs: end()

    alt LLM error path
        LC->>CB: on_llm_error(error)
        CB->>Meta: pop(run_id, None) – discard stored metadata
        CB->>Obs: update(level, status_message).end()
    end

    alt Chain reset path
        CB->>Meta: pop all run_ids in root_run_state
    end
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
langfuse/langchain/__init__.py:1-5
**`LangchainGenerationMetadataExtractor` not re-exported from the public package**

`LangchainGenerationMetadataExtractor` is the typed interface users need to annotate their extractor functions, but it is only reachable through the internal path `langfuse.langchain.CallbackHandler`. Any user who wants to type-annotate an extractor (e.g. `extractor: LangchainGenerationMetadataExtractor`) must reach into the private module instead of using `from langfuse.langchain import ...`. Adding it to the `__init__.py` import and `__all__` list makes the public API complete.

### Issue 2 of 2
langfuse/langchain/CallbackHandler.py:1378-1382
**`metadata` always passed to `update()` even without an extractor**

Before this PR, `on_llm_end` never passed a `metadata` argument to `generation.update()`. Now it always does, even for the no-extractor / no-initial-metadata path where both `initial_metadata` and `generation_metadata` are empty, resulting in `update(metadata={})`. While the current OTel attribute implementation treats an empty dict as a no-op and existing span attributes are unaffected, this is a silent behavioural change. Consider guarding the `metadata` keyword with a truthiness check so the argument is omitted when there is nothing to set — this makes the intent explicit and avoids surprising interactions if the serialisation layer changes.

Reviews (1): Last reviewed commit: "feat(langchain): add generation metadata..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Comment on lines +1378 to 1382
metadata={
**initial_metadata,
**(generation_metadata or {}),
},
).end()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 metadata always passed to update() even without an extractor

Before this PR, on_llm_end never passed a metadata argument to generation.update(). Now it always does, even for the no-extractor / no-initial-metadata path where both initial_metadata and generation_metadata are empty, resulting in update(metadata={}). While the current OTel attribute implementation treats an empty dict as a no-op and existing span attributes are unaffected, this is a silent behavioural change. Consider guarding the metadata keyword with a truthiness check so the argument is omitted when there is nothing to set — this makes the intent explicit and avoids surprising interactions if the serialisation layer changes.

Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/langchain/CallbackHandler.py
Line: 1378-1382

Comment:
**`metadata` always passed to `update()` even without an extractor**

Before this PR, `on_llm_end` never passed a `metadata` argument to `generation.update()`. Now it always does, even for the no-extractor / no-initial-metadata path where both `initial_metadata` and `generation_metadata` are empty, resulting in `update(metadata={})`. While the current OTel attribute implementation treats an empty dict as a no-op and existing span attributes are unaffected, this is a silent behavioural change. Consider guarding the `metadata` keyword with a truthiness check so the argument is omitted when there is nothing to set — this makes the intent explicit and avoids surprising interactions if the serialisation layer changes.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants