feat(declarative): Add skip_rows_before_header and skip_rows_after_header to CsvDecoder (AI-Triage PR)#929
Draft
devin-ai-integration[bot] wants to merge 3 commits intomainfrom
Draft
Conversation
…coder Add skip_rows_before_header and skip_rows_after_header optional integer properties (default: 0) to the declarative CsvDecoder schema and parser. This enables Connector Builder users to parse CSV/TSV responses that contain metadata lines before or after the header row. The implementation follows the same pattern already used by file-based connectors (CsvFormat in csv_format.py) and mirrors the approach used when set_values_to_none was added to CsvDecoder. Co-Authored-By: bot_apk <apk@cognition.ai>
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1772633947-csv-decoder-skip-rows#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1772633947-csv-decoder-skip-rowsPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
Co-Authored-By: bot_apk <apk@cognition.ai>
Co-Authored-By: bot_apk <apk@cognition.ai>
PyTest Results (Full)3 878 tests 3 866 ✅ 11m 15s ⏱️ Results for commit d2cd6da. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(declarative): Add skip_rows_before/after_header to CsvDecoder (AI-Triage PR)
Summary
Adds two new optional integer properties to the declarative
CsvDecodercomponent:skip_rows_before_header(default: 0) andskip_rows_after_header(default: 0). This enables Connector Builder users to parse CSV/TSV API responses that contain metadata lines before or after the header row — a capability that already exists in file-based connectors but was missing from the declarative layer.Files changed (intentionally):
declarative_component_schema.yaml— new schema properties onCsvDecodercomposite_raw_decoder.py—CsvParserimplementation:_skip_rows()helper + updatedparse()methodmodel_to_component_factory.py— passes new model attributes toCsvParserconstructordeclarative_component_schema.py— auto-regenerated viapoe assemble(most of this diff is reformatting noise from the code generator, not manual edits)test_composite_decoder.py— new parametrized unit tests covering skip_rows functionalityResolves: https://github.com/airbytehq/oncall/issues/11524
Context: airbytehq/airbyte#74285 (original community request — Apple API returning TSV with 3 metadata lines before header)
Updates since last revision
test_composite_decoder.py:skip_rows_before_headerwith 1 and 3 metadata linesskip_rows_after_headerwith 2 post-header linesruff formatissues in generated model and factory files_skip_rowsparameter type fromTextIOWrappertoio.TextIOBase(the parent class) to resolve generic type mismatchReview & Testing Checklist for Human
skip_rows_after_headerbehavior with malformed post-header rows. The implementation skips rows from the DictReader iterator (not the raw stream), so skipped rows are parsed against the header schema. If post-header metadata rows have a different column count, verifyDictReaderhandles this gracefully (it should — extra values go torestkey, missing values torestval).poe assemblestripped# Copyright (c) 2025 Airbyte, Inc., all rights reserved.fromdeclarative_component_schema.py. Confirm this is acceptable or if the generator config needs updating.declarative_component_schema.yaml, the new properties should auto-appear — but verify they render correctly with appropriate labels/descriptions in the Builder.Test Plan
Recommended end-to-end verification:
CsvDecoderand setskip_rows_before_header: 2skip_rows_after_header: 1to skip a totals row after the headerNotes
set_values_to_nonewas added toCsvDecoderin airbytehq/airbyte-python-cdk#581airbyte_cdk/sources/file_based/config/csv_format.py(lines 127-135)Requested by: bot_apk (apk@cognition.ai)
Devin session