While working with data from the International Brain Laboratory (IBL), I have encountered .meta files that do not follow the SpikeGLX format. The IBL preprocessing pipeline rewrites .meta files and in doing so strips the ~ prefix from list-valued fields (snsChanMap, imroTbl, snsShankMap) and sometimes generates LF metas by copying AP metas without updating the fileName field, so an LF meta can point at the AP binary. Both patterns produce original_meta=False in the rewritten file, which indicates the tool explicitly flags its output as non-canonical. I want to raise this with the team because the failures these files produce today are particularly hard to triage: stripped ~ surfaces as ValueError: signal_channels do not have unique ids, and the LF-pointing-to-AP case surfaces as the terse KeyError collision that has come up repeatedly across unrelated issues.
My position is that python-neo should not silently accept these files, because doing so invites more third-party tools to produce non-conforming files and makes the reader responsible for every pipeline's idiosyncrasies. What I would like to propose instead is that the reader validate a small number of format assumptions at parse time and raise named errors when they are violated, pointing users at the specific field and the specific mismatch. The validations I have in mind are scoped narrowly so they only replace existing bad errors and do not introduce new failure modes on files that currently load, for example flagging list-shaped values that are missing the ~ prefix, and reporting which required scalar field is absent when the reader cannot proceed. Before writing anything, I would like to hear the team's stance on this direction. Specifically: is the appetite to validate and reject non-conforming files, to continue being permissive where possible, or something in between? And are there .meta variants in the wild that I should be aware of before drawing a line on what counts as conforming?
While working with data from the International Brain Laboratory (IBL), I have encountered
.metafiles that do not follow the SpikeGLX format. The IBL preprocessing pipeline rewrites.metafiles and in doing so strips the~prefix from list-valued fields (snsChanMap,imroTbl,snsShankMap) and sometimes generates LF metas by copying AP metas without updating thefileNamefield, so an LF meta can point at the AP binary. Both patterns produceoriginal_meta=Falsein the rewritten file, which indicates the tool explicitly flags its output as non-canonical. I want to raise this with the team because the failures these files produce today are particularly hard to triage: stripped~surfaces asValueError: signal_channels do not have unique ids, and the LF-pointing-to-AP case surfaces as the terseKeyErrorcollision that has come up repeatedly across unrelated issues.My position is that python-neo should not silently accept these files, because doing so invites more third-party tools to produce non-conforming files and makes the reader responsible for every pipeline's idiosyncrasies. What I would like to propose instead is that the reader validate a small number of format assumptions at parse time and raise named errors when they are violated, pointing users at the specific field and the specific mismatch. The validations I have in mind are scoped narrowly so they only replace existing bad errors and do not introduce new failure modes on files that currently load, for example flagging list-shaped values that are missing the
~prefix, and reporting which required scalar field is absent when the reader cannot proceed. Before writing anything, I would like to hear the team's stance on this direction. Specifically: is the appetite to validate and reject non-conforming files, to continue being permissive where possible, or something in between? And are there.metavariants in the wild that I should be aware of before drawing a line on what counts as conforming?