Skip to content

fix: make ResponseToFileExtractor respect configured delimiter instead of hardcoding comma#928

Open
Ilja Herdt (Airbyte) (iherdt-airbyte) wants to merge 2 commits intomainfrom
devin/1772576576-fix-response-to-file-extractor-delimiter
Open

fix: make ResponseToFileExtractor respect configured delimiter instead of hardcoding comma#928
Ilja Herdt (Airbyte) (iherdt-airbyte) wants to merge 2 commits intomainfrom
devin/1772576576-fix-response-to-file-extractor-delimiter

Conversation

@iherdt-airbyte
Copy link

@iherdt-airbyte Ilja Herdt (Airbyte) (iherdt-airbyte) commented Mar 3, 2026

Summary

ResponseToFileExtractor hardcoded dialect="unix" in its pd.read_csv() call, which forces a comma delimiter regardless of the user's configuration. When used as the download_extractor in async streams, the configured download_decoder's CsvDecoder delimiter (e.g. \t for TSV files) was completely ignored, causing all columns to be merged into a single field.

This PR adds a delimiter parameter to ResponseToFileExtractor and replaces dialect="unix" with explicit CSV options that respect the configured delimiter. The factory is updated to pass the delimiter from the declarative model to the component.

Changes:

  • response_to_file_extractor.py — Add delimiter field (default ","), decode escaped delimiters (e.g. \\t\t), replace dialect="unix" with explicit equivalent options + configurable delimiter
  • declarative_component_schema.yaml — Add delimiter property to ResponseToFileExtractor
  • model_to_component_factory.py — Pass delimiter from model to component
  • declarative_component_schema.py — Regenerated via poetry run poe assemble
  • test_response_to_file_extractor.py — Add tests for tab, escaped-tab, default comma, and comma-in-values scenarios

Related: #909 (gzip decompression fix for ResponseToFileExtractor)

Review & Testing Checklist for Human

  • Verify dialect="unix" replacement is behaviorally equivalent for existing comma-delimited connectors. The unix dialect sets quotechar='"', doublequote=True, lineterminator='\n', quoting=QUOTE_ALL. The new code sets all of these except quotechar (which defaults to '"' in pandas). Consider whether quotechar='"' should be set explicitly for safety.
  • Verify the escaped delimiter logic in __post_init__: self.delimiter.startswith("\\") triggers unicode_escape decoding. Confirm this handles all YAML-serialized delimiter values correctly and doesn't misfire on unexpected inputs.
  • Test end-to-end with a Connector Builder async stream that uses ResponseToFileExtractor with a tab-delimited download (e.g. the App Store Connect analytics connector from the original bug report) to confirm the delimiter flows through from the manifest YAML all the way to parsed records.
  • Review the large diff in declarative_component_schema.py — most changes are formatting-only from the code generator re-running. Confirm no unexpected semantic changes slipped in.

Notes

Summary by CodeRabbit

Release Notes

  • New Features

    • CSV delimiter configuration: Users can now specify a custom delimiter for parsing CSV data during file extraction. Supports various delimiters including tab-separated values. Defaults to comma (,) when not configured.
  • Tests

    • Enhanced test coverage for delimiter handling with multiple delimiter types.

…d of hardcoding comma

ResponseToFileExtractor was using pd.read_csv(dialect='unix') which hardcodes
a comma delimiter, ignoring the configured download_decoder's CsvDecoder
delimiter setting. This caused tab-separated (TSV) files to be parsed as a
single column.

Changes:
- Add 'delimiter' parameter to ResponseToFileExtractor (default: ',')
- Replace dialect='unix' with explicit CSV options + configurable delimiter
- Add delimiter property to declarative component schema
- Pass delimiter from model to component in factory
- Add tests for tab-delimited CSV parsing

Related: #909 (gzip decompression fix for ResponseToFileExtractor)
Co-Authored-By: Ilja Herdt <ilja.herdt@airbyte.io>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1772576576-fix-response-to-file-extractor-delimiter#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1772576576-fix-response-to-file-extractor-delimiter

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

Co-Authored-By: Ilja Herdt <ilja.herdt@airbyte.io>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

This change introduces configurable CSV delimiter support to ResponseToFileExtractor, allowing users to specify custom delimiters (defaulting to comma) for parsing CSV data. The delimiter parameter is added across the schema definition, model, implementation, factory, and test suite with consistent application throughout.

Changes

Cohort / File(s) Summary
Schema and Model Definitions
airbyte_cdk/sources/declarative/declarative_component_schema.yaml, airbyte_cdk/sources/declarative/models/declarative_component_schema.py
Added delimiter field (string, default ",") to ResponseToFileExtractor model definition and schema, with description and title for CSV value separation.
Core Implementation
airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
Added delimiter parameter to class, imported csv module, normalized escape sequences in __post_init__, and updated _read_with_chunks to pass delimiter to pandas.read_csv with explicit CSV options (quoting, doublequote, lineterminator).
Factory Integration
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Updated create_response_to_file_extractor method to construct ResponseToFileExtractor with delimiter parameter from model (or default ",").
Tests
unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py
Converted constructor calls to keyword-style parameters; added four new tests validating tab-delimited parsing, escaped tab handling, default comma behavior, and preservation of commas within field values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately summarizes the main change: adding configurable delimiter support to ResponseToFileExtractor instead of hardcoding comma, matching the core objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1772576576-fix-response-to-file-extractor-delimiter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2386-2389: Could we switch to an explicit None check for delimiter fallback, wdyt?

On Line 2388, model.delimiter or "," will also coerce an explicitly provided empty string to comma. Would you prefer preserving non-None values so invalid inputs fail fast instead of silently changing behavior?

Suggested tweak
         return ResponseToFileExtractor(
             parameters=model.parameters or {},
-            delimiter=model.delimiter or ",",
+            delimiter=model.delimiter if model.delimiter is not None else ",",
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` around
lines 2386 - 2389, The return that constructs ResponseToFileExtractor currently
uses the truthy fallback "model.delimiter or ','", which will replace an
explicit empty-string delimiter; change it to an explicit None check so only
None falls back to comma (e.g., use model.delimiter if model.delimiter is not
None else ","), leaving other values (including "") unchanged; update the call
site where ResponseToFileExtractor is instantiated (the code that references
model.parameters and model.delimiter) to use this None-check pattern and keep
the parameters fallback as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py`:
- Around line 2386-2389: The return that constructs ResponseToFileExtractor
currently uses the truthy fallback "model.delimiter or ','", which will replace
an explicit empty-string delimiter; change it to an explicit None check so only
None falls back to comma (e.g., use model.delimiter if model.delimiter is not
None else ","), leaving other values (including "") unchanged; update the call
site where ResponseToFileExtractor is instantiated (the code that references
model.parameters and model.delimiter) to use this None-check pattern and keep
the parameters fallback as-is.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f41401 and f846554.

📒 Files selected for processing (5)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml
  • airbyte_cdk/sources/declarative/extractors/response_to_file_extractor.py
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
  • unit_tests/sources/declarative/extractors/test_response_to_file_extractor.py

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

PyTest Results (Fast)

3 873 tests  +4   3 861 ✅ +4   6m 39s ⏱️ -12s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit f846554. ± Comparison against base commit 7f41401.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

PyTest Results (Full)

3 876 tests   3 864 ✅  11m 18s ⏱️
    1 suites     12 💤
    1 files        0 ❌

Results for commit f846554.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant