feat(langchain): add generation metadata hook#1686
Conversation
|
|
| metadata={ | ||
| **initial_metadata, | ||
| **(generation_metadata or {}), | ||
| }, | ||
| ).end() |
There was a problem hiding this comment.
metadata always passed to update() even without an extractor
Before this PR, on_llm_end never passed a metadata argument to generation.update(). Now it always does, even for the no-extractor / no-initial-metadata path where both initial_metadata and generation_metadata are empty, resulting in update(metadata={}). While the current OTel attribute implementation treats an empty dict as a no-op and existing span attributes are unaffected, this is a silent behavioural change. Consider guarding the metadata keyword with a truthiness check so the argument is omitted when there is nothing to set — this makes the intent explicit and avoids surprising interactions if the serialisation layer changes.
Prompt To Fix With AI
This is a comment left during a code review.
Path: langfuse/langchain/CallbackHandler.py
Line: 1378-1382
Comment:
**`metadata` always passed to `update()` even without an extractor**
Before this PR, `on_llm_end` never passed a `metadata` argument to `generation.update()`. Now it always does, even for the no-extractor / no-initial-metadata path where both `initial_metadata` and `generation_metadata` are empty, resulting in `update(metadata={})`. While the current OTel attribute implementation treats an empty dict as a no-op and existing span attributes are unaffected, this is a silent behavioural change. Consider guarding the `metadata` keyword with a truthiness check so the argument is omitted when there is nothing to set — this makes the intent explicit and avoids surprising interactions if the serialisation layer changes.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
What does this PR do?
Adds a public LangChain
CallbackHandlerextension point for response-derived generation metadata, addressing https://github.com/orgs/langfuse/discussions/14015.Users can now either pass
generation_metadata_extractortoCallbackHandleror overrideget_generation_metadata(...)in a subclass. The hook receives the LangChainLLMResultand can return metadata that is merged into the ended Langfuse generation observation, while preserving existing run-start metadata.Fixes https://github.com/orgs/langfuse/discussions/14015
Type of change
Verification
List the main commands you ran:
Checklist
code_review.md..env.templateif needed.Greptile Summary
This PR adds a response-derived generation metadata hook to the LangChain
CallbackHandler, letting users inject custom metadata (e.g., provider request IDs from response headers) into Langfuse generation observations after the LLM responds. Users can supply ageneration_metadata_extractorcallable at construction time or override the newget_generation_metadatamethod in a subclass.LangchainGenerationMetadataExtractorProtocol andget_generation_metadatapublic override point; the hook result is merged with start-time metadata inon_llm_end._generation_metadata_by_run_idto carry the start-time observation metadata forward to the end event so it can be re-applied alongside response-derived fields; cleanup is wired intoon_llm_errorand_reset.Confidence Score: 4/5
Safe to merge. The new hook is well-isolated behind a try/except so a failing extractor never blocks generation completion, and the internal state dict is cleaned up on all exit paths.
The logic is straightforward and the three new tests cover the critical new paths. Two minor gaps hold the score below 5: the LangchainGenerationMetadataExtractor Protocol is not re-exported from the public package entry point, and on_llm_end now unconditionally passes a metadata argument to update() even when both the extractor and initial metadata are absent — a behavioural change from before the PR that no existing test exercises directly.
langfuse/langchain/init.py — the new Protocol should be exported here alongside CallbackHandler.
Sequence Diagram
sequenceDiagram participant LC as LangChain participant CB as LangchainCallbackHandler participant Meta as _generation_metadata_by_run_id participant Obs as LangfuseObservation LC->>CB: on_chat_model_start / on_llm_start CB->>CB: __on_llm_action → compute observation_metadata CB->>Obs: "start_observation(metadata=observation_metadata)" CB->>Meta: "store[run_id] = observation_metadata (if not None)" LC->>CB: on_llm_end(response) CB->>Meta: "pop initial_metadata (default {})" CB->>CB: get_generation_metadata(response, run_id, ...) note over CB: try/except – hook failure is logged, generation still ends CB->>Obs: "update(metadata={**initial_metadata, **generation_metadata})" CB->>Obs: end() alt LLM error path LC->>CB: on_llm_error(error) CB->>Meta: pop(run_id, None) – discard stored metadata CB->>Obs: update(level, status_message).end() end alt Chain reset path CB->>Meta: pop all run_ids in root_run_state endPrompt To Fix All With AI
Reviews (1): Last reviewed commit: "feat(langchain): add generation metadata..." | Re-trigger Greptile