Skip to content

Recover PG-vendored C types collapsed to int during header parsing#15

Open
estebanzimanyi wants to merge 1 commit into
MobilityDB:masterfrom
estebanzimanyi:fix/recover-collapsed-c-types
Open

Recover PG-vendored C types collapsed to int during header parsing#15
estebanzimanyi wants to merge 1 commit into
MobilityDB:masterfrom
estebanzimanyi:fix/recover-collapsed-c-types

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Problem

Some MEOS header sets reach libclang with bool / int64 / Timestamp /
TimestampTz / H3Index already collapsed to int (or int *) at the
preprocessor level — the real type name is gone before parsing, so it cannot be
recovered from the AST. The extracted IDL then carries int where the source
says one of those types, and downstream binding generators mis-map them:

  • boolint (should be a boolean)
  • int64 / H3Indexint (should be 64-bit / long)
  • TimestampTz * out-param → int * — generators size the result buffer at
    4 bytes for an 8-byte native write (a buffer under-allocation; observed as
    IndexOutOfBounds / native-heap corruption in a JMEOS consumer).

Fix

A post-parse pass (parser/typerecover.py) that recovers these from the raw
header declaration text (which still spells the real type) and rewrites the
IDL entry. Wired into run.py right after parse_all_headers.

It is idempotent and a no-op on correctly-parsed headers: it only rewrites a
type that is currently "int" / "int *" and whose header declaration spells
a recoverable type. Genuinely-int functions (e.g. intspan_width) are left
untouched.

Validation

  • Unit test over scalar bool/int64/TimestampTz*/H3Index returns and params,
    plus a genuine-int control: all recovered correctly, control unchanged,
    re-run is a 0/0 no-op (idempotent).
  • Parses the real MEOS headers cleanly (3548 extern decls).

This makes the IDL — and any binding regenerated from it — reproducible without
external post-processing scripts.

The host-symbol-collision build prefix-renames PG types and the parse
lacks pg_config.h, so opaque PG-vendored types reach libclang already
macro-collapsed and are spelled int / int * / int ** in the parsed IDL.
This post-parse pass recovers each from the header declaration text,
preserving const / pointer levels, and only when the function's parsed
type actually collapsed to int.

Recovered base types: bool, int64, Timestamp(Tz), H3Index, text,
GSERIALIZED, Interval, DateADT, Datum, size_t, GBOX, BOX3D, AFFINE.
Audited against a correct-typed reference IDL: zero int*-where-a-named-
pointer-belongs mismatches remain, so every binding that codegens from
the catalog gets the real types (e.g. tcbuffer_convex_hull -> GSERIALIZED *,
temporal_tprecision(..., const Interval *, ...)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant