Skip to content

fix(har): correct attribute name mismatches and dict init bug#1

Open
andreabedini wants to merge 3 commits intomkb79:mainfrom
andreabedini:fix/har-attribute-names
Open

fix(har): correct attribute name mismatches and dict init bug#1
andreabedini wants to merge 3 commits intomkb79:mainfrom
andreabedini:fix/har-attribute-names

Conversation

@andreabedini
Copy link
Copy Markdown

@andreabedini andreabedini commented Apr 20, 2026

Hi @mkb79, thank you for this project! I had to make a couple of changes to make it work with my session capture.

Summary

hc_har.py references attribute names that don't exist on the parser's dataclasses, causing AttributeError at runtime whenever hc-har is invoked.

Bug 1 — wrong attribute names in har_from_session

The parser dataclasses (ConnectionFrame, RequestMainInfo, RequestHeader, RequestBody, ResponseHeader, ResponseBody) all expose connection_id and request_id, but the HAR builder accessed item.conn_id / item.req_id:

AttributeError: 'ConnectionFrame' object has no attribute 'conn_id'

Fixed by renaming all seven access sites to match the parser's field names.

Bug 2 — list literal instead of dict in _headers_to_dict_multi

# before (crashes on first .setdefault() call)
out: dict[str, list[str]] = []

# after
out: dict[str, list[str]] = {}

Test plan

  • Run hc-har <session_file> — previously crashed immediately with AttributeError, now completes successfully
  • Verify the output .har file is valid JSON and contains the expected number of entries

…ar.py

The HAR exporter referenced `item.conn_id` and `item.req_id` on parser
dataclasses that expose `connection_id` and `request_id` respectively,
causing AttributeError at runtime. Also fixes `_headers_to_dict_multi`
which initialised `out` as a list (`[]`) instead of a dict (`{}`).

After these fixes, `har_from_session()` successfully exports 90 entries
from a real-world session file.
Copilot AI review requested due to automatic review settings April 20, 2026 06:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes runtime crashes in the HAR export path (hc-har) by aligning hc_har.py with the parser dataclass field names and correcting an invalid dict initialization.

Changes:

  • Replace incorrect parser attribute access (conn_id/req_id) with connection_id/request_id throughout har_from_session.
  • Fix _headers_to_dict_multi to initialize out as a dict rather than a list.
  • Minor docstring/formatting adjustments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Owner

@mkb79 mkb79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review notes after checking the local checkout and PR diff:

  1. The core fix looks correct. ConnectionFrame and RequestMainInfo expose connection_id and request_id, so replacing the previous item.conn_id / item.req_id access in hc_har.py fixes a real runtime crash in the HAR export path. The _headers_to_dict_multi change from [] to {} is also correct.

  2. Before merging, I would clean up two small PR-introduced formatting issues: the docstring now contains the literal text UTF\u20118 instead of UTF-8 / UTF‑8, and src/httpcatcher_parser/hc_har.py no longer ends with a trailing newline. These are not runtime blockers, but they are unnecessary noise in the patch.

  3. There is still a naming inconsistency in the surrounding HAR code: hc_parser.py consistently exposes request_id / connection_id, while hc_har.py still uses internal names like req_id / conn_id in ReqAgg, _ensure, comments, and maps. This does not currently crash, but it is the kind of mixed naming that made the original bug easy to introduce. I would consider renaming the HAR aggregator fields to request_id and connection_id in a follow-up or in this PR if you want to keep the naming aligned.

  4. A separate existing HAR issue: when decompress_response=True, response.bodySize is currently based on len(body_out), i.e. the decompressed body. For HAR semantics, it should likely represent the on-wire response body size (len(resp_raw)), while content.size can remain the decoded content size.

  5. Header lookups are only partially case-insensitive. _headers_to_dict_multi preserves the original header casing, but later lookups only check variants such as Content-Type and content-type. Since HTTP header names are case-insensitive, a normalized lookup helper would make this more robust for Content-Type, Content-Encoding, Location, etc.

  6. Minor existing style inconsistency: in hc_parser.py, from __future__ import annotations appears before the module docstring, so httpcatcher_parser.hc_parser.__doc__ is None. The file also says comments/docstrings are English, but MetricsHarvester still has a German docstring.

Verification I ran locally:

  • python3 -m compileall -q src passes.
  • PYTHONPATH=src python3 -m httpcatcher_parser.hc_har --help works.
  • A smoke test with hc_sessions/2025_08_28__13_25_58 wrote a HAR file with 17 entries.
  • pytest -q reports no tests ran, because the repository currently has no test files.

Overall: I would merge the functional fix after cleaning up the docstring escape and missing trailing newline. The other points are follow-up candidates unless you want this PR to also do a small consistency pass around the HAR exporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants