Skip to content

feat(declarative): Add skip_rows_before_header and skip_rows_after_header to CsvDecoder (AI-Triage PR)#929

Draft
devin-ai-integration[bot] wants to merge 3 commits intomainfrom
devin/1772633947-csv-decoder-skip-rows
Draft

feat(declarative): Add skip_rows_before_header and skip_rows_after_header to CsvDecoder (AI-Triage PR)#929
devin-ai-integration[bot] wants to merge 3 commits intomainfrom
devin/1772633947-csv-decoder-skip-rows

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 4, 2026

feat(declarative): Add skip_rows_before/after_header to CsvDecoder (AI-Triage PR)

Summary

Adds two new optional integer properties to the declarative CsvDecoder component: skip_rows_before_header (default: 0) and skip_rows_after_header (default: 0). This enables Connector Builder users to parse CSV/TSV API responses that contain metadata lines before or after the header row — a capability that already exists in file-based connectors but was missing from the declarative layer.

Files changed (intentionally):

  • declarative_component_schema.yaml — new schema properties on CsvDecoder
  • composite_raw_decoder.pyCsvParser implementation: _skip_rows() helper + updated parse() method
  • model_to_component_factory.py — passes new model attributes to CsvParser constructor
  • declarative_component_schema.pyauto-regenerated via poe assemble (most of this diff is reformatting noise from the code generator, not manual edits)
  • test_composite_decoder.py — new parametrized unit tests covering skip_rows functionality

Resolves: https://github.com/airbytehq/oncall/issues/11524

Context: airbytehq/airbyte#74285 (original community request — Apple API returning TSV with 3 metadata lines before header)

Updates since last revision

  • Added 6 unit tests (parametrized) in test_composite_decoder.py:
    • Default behavior (0/0) unchanged
    • skip_rows_before_header with 1 and 3 metadata lines
    • skip_rows_after_header with 2 post-header lines
    • Both combined (2 before + 1 after)
    • TSV-specific test matching the original reporter's Apple API use case
  • Fixed ruff format issues in generated model and factory files
  • Fixed MyPy type error: changed _skip_rows parameter type from TextIOWrapper to io.TextIOBase (the parent class) to resolve generic type mismatch

Review & Testing Checklist for Human

  • Verify skip_rows_after_header behavior with malformed post-header rows. The implementation skips rows from the DictReader iterator (not the raw stream), so skipped rows are parsed against the header schema. If post-header metadata rows have a different column count, verify DictReader handles this gracefully (it should — extra values go to restkey, missing values to restval).
  • Copyright header was removed from generated model file. The regeneration via poe assemble stripped # Copyright (c) 2025 Airbyte, Inc., all rights reserved. from declarative_component_schema.py. Confirm this is acceptable or if the generator config needs updating.
  • Confirm Connector Builder UI picks up new fields. Since the UI is schema-driven from declarative_component_schema.yaml, the new properties should auto-appear — but verify they render correctly with appropriate labels/descriptions in the Builder.

Test Plan

Recommended end-to-end verification:

  1. In Connector Builder, create a new low-code source
  2. Configure a stream with CsvDecoder and set skip_rows_before_header: 2
  3. Test against a mock API endpoint returning CSV with 2 metadata lines before the header
  4. Verify records are parsed correctly with the header detected on line 3
  5. Repeat with skip_rows_after_header: 1 to skip a totals row after the header

Notes

  • This follows the same pattern used when set_values_to_none was added to CsvDecoder in airbytehq/airbyte-python-cdk#581
  • File-based connectors already implement this exact feature at airbyte_cdk/sources/file_based/config/csv_format.py (lines 127-135)
  • Non-breaking: new optional fields with default 0; fully backward compatible

Requested by: bot_apk (apk@cognition.ai)
Devin session

…coder

Add skip_rows_before_header and skip_rows_after_header optional integer
properties (default: 0) to the declarative CsvDecoder schema and parser.
This enables Connector Builder users to parse CSV/TSV responses that
contain metadata lines before or after the header row.

The implementation follows the same pattern already used by file-based
connectors (CsvFormat in csv_format.py) and mirrors the approach used
when set_values_to_none was added to CsvDecoder.

Co-Authored-By: bot_apk <apk@cognition.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1772633947-csv-decoder-skip-rows#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1772633947-csv-decoder-skip-rows

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

devin-ai-integration bot and others added 2 commits March 4, 2026 14:24
Co-Authored-By: bot_apk <apk@cognition.ai>
@github-actions
Copy link

github-actions bot commented Mar 4, 2026

PyTest Results (Fast)

3 875 tests  +6   3 863 ✅ +6   6m 52s ⏱️ -1s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit d2cd6da. ± Comparison against base commit 7f41401.

@github-actions
Copy link

github-actions bot commented Mar 4, 2026

PyTest Results (Full)

3 878 tests   3 866 ✅  11m 15s ⏱️
    1 suites     12 💤
    1 files        0 ❌

Results for commit d2cd6da.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants