Skip to content

feat: autodetect DATE/DATETIME fields in CSV type inference#154

Merged
vmvarela merged 4 commits into
masterfrom
issue-142/autodetect-date-datetime-fields
May 22, 2026
Merged

feat: autodetect DATE/DATETIME fields in CSV type inference#154
vmvarela merged 4 commits into
masterfrom
issue-142/autodetect-date-datetime-fields

Conversation

@vmvarela
Copy link
Copy Markdown
Owner

Summary

Closes #142

Extends type inference to detect DATE and DATETIME columns in CSV input and normalizes values to ISO 8601 on insert, enabling SQLite date functions (date(), strftime(), etc.) to work correctly on imported data.

What's new

  • 6 new ColumnType sub-variants (DATE, DATE_EU, DATE_US, DATETIME, DATETIME_EU, DATETIME_US) carrying format info to bind time without extra parallel arrays
  • isDate() and isDateTime() detectors for YYYY-MM-DD, DD-MM-YYYY, DD/MM/YYYY, MM/DD/YYYY, YYYY-MM-DD HH:MM:SS, YYYY-MM-DDTHH:MM:SS, DD/MM/YYYY HH:MM, MM/DD/YYYY HH:MM
  • Slash-format disambiguation per column: D1>12 → EU, D2>12 → US, both≤12 → TEXT, contradictory → TEXT
  • Normalization at bind time: all accepted formats written as YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
  • DDL emits TEXT affinity for all date variants (declaring DATE/DATETIME would give NUMERIC affinity and coerce ISO strings)
  • --no-type-inference backward compatibility maintained — dates stay as raw TEXT
  • 15 integration tests (140–154) covering ISO, EU-dash, EU-slash, US-slash, ambiguous, ORDER BY, --columns, --validate, --no-type-inference
  • 30 unit tests in src/loader.zig for all detection, disambiguation, normalization, and inference paths

Definition of Done

  • Code implemented and functional
  • All acceptance criteria from the issue are met
  • Tests written and passing (zig build test -Dbundle-sqlite=true + zig build unit-test -Dbundle-sqlite=true)
  • No lint or compilation errors (ziglint src build.zig clean)
  • Self-reviewed (read your own diff)
  • Documentation updated (if user-facing behavior changed)

vmvarela added 2 commits May 22, 2026 16:38
- Extend ColumnType enum with DATE, DATE_EU, DATE_US, DATETIME,
  DATETIME_EU, DATETIME_US sub-variants; all map to TEXT affinity in DDL
- Add displayName() method: DATE*/DATETIME* display as DATE/DATETIME in
  --columns, --validate, and --sample output instead of internal tag name
- Add isDate() / isDateTime() detectors (length-gated; no overlap)
- Add SlashOrder enum + accumSlashOrder() for DD/MM vs MM/DD disambiguation
  per column: d1>12→EU, d2>12→US, both≤12→abstain, contradictory→TEXT
- Rewrite inferTypes() with 11 tracking arrays; DATETIME>DATE>INTEGER>REAL>TEXT
  priority; mixed ISO+slash format or mixed date+datetime → TEXT fallback
- Add normalizeDateToIso() / normalizeDateTimeToIso() helpers that reformat
  EU/US/dash/T-separator values into YYYY-MM-DD / YYYY-MM-DD HH:MM:SS
- Update insertRowTyped() with 6 new ColumnType cases; stack-buffer bind
  uses sqliteTransient() (SQLITE_TRANSIENT sentinel via @setRuntimeSafety(false))
- Add sqliteTransient() fn in sqlite.zig (replaces unrepresentable const)
- Add loader unit test binary to build.zig (unit-test step)
- Add 15 date/datetime integration tests (140-154) covering ISO, EU dash,
  EU slash, US slash, ambiguous, --columns, --validate, ORDER BY, --no-type-inference
- All 154 integration tests + CSV/XML/loader unit tests pass; ziglint clean
…e variants

- EU-dash DD-MM-YYYY date → DATE
- Mixed ISO + EU-dash dates → DATE (bind-time distinction)
- Slash datetime with d1>12 → DATETIME_EU
- Slash datetime with d2>12 → DATETIME_US
@github-actions github-actions Bot added the type:feature New functionality label May 22, 2026
vmvarela added 2 commits May 22, 2026 18:42
- README: update type inference description to list DATE/DATETIME
- README: add La Liga season-lengths real-world example (julianday arithmetic
  on auto-detected DATE column, COVID and World Cup anomalies)
- README: update 'Date range filter' recipe and 'How it works' section
- man page: add DATE/DATETIME to DESCRIPTION, --columns, and --sample entries
…nit test

- README: fix julianday(2020-07-19)-julianday(2019-08-16) = 338, not 337
- loader.zig: add unit test for d_has_nonslash && d_has_slash -> TEXT path
  (ISO date + slash date in same column falls back to TEXT)
@vmvarela vmvarela merged commit 099271f into master May 22, 2026
4 checks passed
@vmvarela vmvarela deleted the issue-142/autodetect-date-datetime-fields branch May 22, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Autodetect date and datetime fields in CSV for SQLite operations

1 participant