Skip to content

fix(langchain): extract model name for ChatHuggingFace#1728

Merged
hassiebp merged 2 commits into
langfuse:mainfrom
vismaytiwari:fix-langchain-chathuggingface-model
Jun 29, 2026
Merged

fix(langchain): extract model name for ChatHuggingFace#1728
hassiebp merged 2 commits into
langfuse:mainfrom
vismaytiwari:fix-langchain-chathuggingface-model

Conversation

@vismaytiwari

@vismaytiwari vismaytiwari commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes langfuse/langfuse#14103

When a generation runs through ChatHuggingFace, Langfuse recorded it without a model name and logged "Langfuse was not able to parse the LLM model". ChatHuggingFace is not LangChain-serializable, so it reaches the callback as a not_implemented stub with no kwargs; the model id is only present in the repr string as model_id='...'. _extract_model_name had no entry for it, so extraction fell through and returned None.

This adds a repr-pattern entry that reads model_id, matching how the other repr-only models (HuggingFaceHub, Ollama, etc.) are already handled. ChatHuggingFace exposes model_id regardless of the underlying backend (HuggingFaceEndpoint, HuggingFaceHub, or HuggingFacePipeline), so this covers those cases.

Type of change

  • Bug fix

Verification

I confirmed the model id is only available in the repr by dumping a real ChatHuggingFace via langchain_core.load.dumpd, then wrote a deterministic unit test using that serialized shape so it needs no live HuggingFace call. The test returns None before the change and the correct model id after.

uv run --frozen pytest tests/unit/test_langchain_utils.py
uv run --frozen ruff check langfuse/langchain/utils.py tests/unit/test_langchain_utils.py
uv run --frozen ruff format --check langfuse/langchain/utils.py tests/unit/test_langchain_utils.py
uv run --frozen mypy langfuse/langchain/utils.py --no-error-summary

Checklist

  • I self-reviewed the diff using code_review.md.
  • I added or updated tests for behavior changes.
  • I updated docs, examples, or .env.template if needed.
  • I did not hand-edit generated files; if generated files changed, I used the upstream regeneration path.
  • I did not commit secrets or credentials.

Greptile Summary

Adds ChatHuggingFace to the repr-based model-name extraction table so that Langfuse records a model name when LangChain callbacks arrive from ChatHuggingFace, which serialises as a not_implemented stub with the model id only available in the repr string.

  • langfuse/langchain/utils.py: Inserts (\"ChatHuggingFace\", \"model_id\", None) into models_by_pattern, following the identical approach already used for HuggingFaceHub, Ollama, DeepInfra, and others that expose their model identifier only via repr.
  • tests/unit/test_langchain_utils.py: Adds a new unit test file with a deterministic serialized stub matching the real dumpd shape of ChatHuggingFace, verifying the regex extracts the correct model_id.

Confidence Score: 5/5

Safe to merge — a one-line addition to a lookup table with a corresponding unit test.

The change is minimal and self-contained: one new tuple in models_by_pattern using the same repr-regex mechanism already proven by multiple other entries. The unit test confirms the extraction works correctly for the primary backend shape. The only gap is that the HuggingFacePipeline-backend repr variant is not explicitly tested, but since both occurrences carry the same value in practice this is a test-coverage gap rather than a functional defect.

No files require special attention.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_extract_model_name called] --> B{Match by ID path\nmodels_by_id list}
    B -- match --> Z[Return model name]
    B -- no match --> C{AzureOpenAI\nspecial case?}
    C -- yes --> D[Extract from\ninvocation_params / kwargs]
    D --> Z
    C -- no --> E{Match by repr\nmodels_by_pattern list}
    E -- includes new ChatHuggingFace entry\nre.search model_id='.+' in repr --> F{repr contains\nmodel_id='...'?}
    F -- yes --> Z
    F -- no --> G[Return None default]
    E -- no match --> H{Catch-all path\nkwargs / serialized}
    H -- found --> Z
    H -- not found --> I[Return None]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[_extract_model_name called] --> B{Match by ID path\nmodels_by_id list}
    B -- match --> Z[Return model name]
    B -- no match --> C{AzureOpenAI\nspecial case?}
    C -- yes --> D[Extract from\ninvocation_params / kwargs]
    D --> Z
    C -- no --> E{Match by repr\nmodels_by_pattern list}
    E -- includes new ChatHuggingFace entry\nre.search model_id='.+' in repr --> F{repr contains\nmodel_id='...'?}
    F -- yes --> Z
    F -- no --> G[Return None default]
    E -- no match --> H{Catch-all path\nkwargs / serialized}
    H -- found --> Z
    H -- not found --> I[Return None]
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
tests/unit/test_langchain_utils.py:29-32
**Test only covers HuggingFaceEndpoint backend shape**

The PR description correctly notes that `ChatHuggingFace` works with three backends — `HuggingFaceEndpoint`, `HuggingFaceHub`, and `HuggingFacePipeline`. `HuggingFacePipeline` itself has a `model_id` attribute in its own repr, so a wrapped repr could look like `ChatHuggingFace(llm=HuggingFacePipeline(model_id='...', ...), model_id='...')`. The regex uses `re.search` (first match), meaning it would pick up the inner `HuggingFacePipeline`'s `model_id` rather than `ChatHuggingFace`'s. In practice both should be the same value, but adding a parameterised test for each backend shape would guard against any future repr change where the two values might differ.

Reviews (1): Last reviewed commit: "fix(langchain): extract model name for C..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

ChatHuggingFace is not LangChain-serializable, so it is passed to the
callback as a not_implemented stub with no kwargs; the model id is only
present in the repr string as model_id='...'. _extract_model_name had no
entry for it, so generations from a ChatHuggingFace model were recorded
without a model name. Add a repr-pattern entry that reads model_id.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Comment thread tests/unit/test_langchain_utils.py
@hassiebp hassiebp merged commit b45f987 into langfuse:main Jun 29, 2026
1 check passed
@hassiebp

Copy link
Copy Markdown
Collaborator

Thanks for your contribution, @vismaytiwari !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: <short description> Langfuse was not able to parse the LLM model.

2 participants