Allow writing StringDType variables to netCDF#11218
Allow writing StringDType variables to netCDF#11218kkollsga wants to merge 2 commits intopydata:mainfrom
Conversation
| data = np.array(["ab", "cdef"], dtype=np.dtypes.StringDType()) | ||
| expected = Dataset({"x": ("t", data)}) | ||
| with self.roundtrip(expected) as actual: | ||
| assert_identical(expected, actual) |
There was a problem hiding this comment.
This test is great! Can you also add a test where there is a null value in the dataset and also one where you explicitly set StringDType(na_object="") or something to prove that the nulll handling works as expected?
Recognize numpy.dtypes.StringDType (kind "T") as a unicode string type in is_unicode_dtype, and convert StringDType arrays to object arrays before passing to netCDF4/h5netcdf backends which don't support StringDType natively. Null values from StringDType(na_object=None) are replaced with empty strings on write. Co-authored-by: Claude <noreply@anthropic.com>
- Handle StringDType null values in encode_string_array (scipy/nc3 path) - Add roundtrip tests for StringDType with na_object=None and na_object="" - Add unit test for encode_string_array with StringDType nulls Co-Authored-By: Claude <noreply@anthropic.com>
4f7280c to
79b4ae1
Compare
|
Thanks for the review @jsignell ! I looked into scipy and the situation is a bit different there. The scipy backend goes through I added these changes:
All roundtrip tests are in |
Summary
numpy.dtypes.StringDType(kind"T") as a unicode string type inis_unicode_dtype, so the encoding pipeline and backend dtype selection handle it correctly.netCDF4andh5netcdfbackendprepare_variablemethods, since neither C library supports StringDType natively.StringDType(na_object=None)are replaced with empty strings on write, matching existing behavior for object-dtype string arrays with missing values.EncodedStringCoder(allows_unicode=False)encodes strings to bytes viaencode_string_array, which handles StringDType.Test plan
test_is_unicode_dtype_stringdtype— unit test foris_unicode_dtypewith StringDTypetest_roundtrip_stringdtype_data— roundtrip test inDatasetIOBase, runs across all backends (netCDF4, h5netcdf, scipy, zarr)StringDType(na_object=None)🤖 Generated with Claude Code