Sync dotnet CsvFileReader with Python multiline-field fix by Copilot · Pull Request #2 · sharpninja/graphrag

Copilot · 2026-02-28T17:14:08Z

Description

Syncs the .NET codebase with Python changes merged since the last dotnet commit (Feb 23). The primary functional gap was CsvFileReader silently corrupting multiline quoted CSV fields by parsing line-by-line — matching the Python fix in graphrag-input microsoft#2248.

Related Issues

Syncs with upstream Python commits:

64c5552 — fix csv file reader (fix csv file reader microsoft/graphrag#2248)
6f26d0e — vector load_documents in batches (vector load_documents in batches microsoft/graphrag#2251) — dotnet already aligned

Proposed Changes

dotnet/src/GraphRag.Input/CsvFileReader.cs

Replaced ReadLineAsync loop + single-line ParseCsvLine with a full-content ParseCsvContent character-by-character parser
Now correctly handles: embedded \n/\r\n in quoted fields, "" escape sequences, commas inside quoted values
Added guards: all-empty header row → return empty; all-empty trailing row → skip

Before, this CSV would silently produce wrong output:

title,text
"Post 1","Line one.
Line two.
Line three."
"Post 2","Single line."

After, docs[0].Text == "Line one.\nLine two.\nLine three." — matching Python csv.DictReader behaviour.

dotnet/tests/GraphRag.Tests.Unit/Input/CsvFileReaderTests.cs (new)

6 tests: basic CSV, multiline quoted field (mirrors Python test_csv_loader_preserves_multiline_fields), comma-inside-quote, escaped double-quote, empty content, no-match

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

Additional Notes

The vector store batching changes from microsoft#2251 (LanceDB, Azure AI Search, CosmosDB) required no dotnet changes — the existing implementations already use equivalent batch semantics. LanceDB remains a NotImplementedException stub pending an official .NET SDK.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…2248) Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>

Copilot

Pull request overview

Syncs the .NET CsvFileReader behavior with the Python multiline-CSV fix by switching from line-by-line parsing to a full-content parser so quoted fields can safely contain embedded newlines.

Changes:

Replaced the line-based CSV parsing loop with a character-by-character ParseCsvContent implementation to support quoted multiline fields and escaped quotes.
Added guards for empty headers / trailing content and skipped simple blank lines.
Added a new unit test suite for CSV parsing scenarios (multiline fields, commas-in-quotes, escaped quotes, empty inputs, no matches).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
dotnet/src/GraphRag.Input/CsvFileReader.cs	Implements full-content CSV parsing to preserve multiline quoted fields and improve handling of quotes/commas.
dotnet/tests/GraphRag.Tests.Unit/Input/CsvFileReaderTests.cs	Adds unit coverage for CSV parsing, including multiline quoted-field preservation.

Comments suppressed due to low confidence (1)

dotnet/src/GraphRag.Input/CsvFileReader.cs:137

ParseCsvContent adds a row on every \r/\n boundary without checking whether the row is entirely empty. This means a trailing row like ",," (or any all-empty row terminated by a newline) will be kept and later turned into an empty TextDocument, even though the method’s final flush explicitly skips all-empty trailing rows. Consider skipping row creation when all collected fields are empty/whitespace (e.g., check fields.TrueForAll(string.IsNullOrWhiteSpace) before rows.Add), and add a unit test covering a trailing all-empty row with multiple columns.

            else if (c == '\r' || c == '\n')
            {
                // End of row — consume \r\n as a single line ending.
                fields.Add(field.ToString());
                field.Clear();
                rows.Add([.. fields]);
                fields.Clear();

Initial plan

5fae410

Copilot AI assigned Copilot and sharpninja Feb 28, 2026

Copilot started work on behalf of sharpninja February 28, 2026 17:14 View session

Sync dotnet CsvFileReader with Python multiline-field fix (microsoft#…

5d354d5

…2248) Co-authored-by: sharpninja <16146732+sharpninja@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Sync dotnet code with python changes since last commit~~ Sync dotnet CsvFileReader with Python multiline-field fix Feb 28, 2026

Copilot finished work on behalf of sharpninja February 28, 2026 17:24

sharpninja marked this pull request as ready for review February 28, 2026 20:46

Copilot AI review requested due to automatic review settings February 28, 2026 20:46

sharpninja merged commit f4e1001 into main Feb 28, 2026
17 of 24 checks passed

Copilot started reviewing on behalf of sharpninja February 28, 2026 20:47 View session

Copilot AI reviewed Feb 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync dotnet CsvFileReader with Python multiline-field fix#2

Sync dotnet CsvFileReader with Python multiline-field fix#2
sharpninja merged 2 commits intomainfrom
copilot/sync-dotnet-with-python-changes

Copilot AI commented Feb 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Copilot AI commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Proposed Changes

Checklist

Additional Notes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 28, 2026 •

edited

Loading