Skip to content

General API for handling sample gaps on rawio #1773

@h-mayorquin

Description

@h-mayorquin

@zm711, @samuelgarcia, and I had a discussion today about two current Blackrock issues (#1770 and #1755).

The main problem is that the gaps automatically detected by the code are much smaller than what would reasonably qualify as a segment in Neo (for example, someone deliberately pausing the recording). These gaps are more likely artifacts of the system (in #1770, it looks like a case of dropped samples).

Creating format-specific heuristics for each format would be a large maintenance burden. A simpler, more maintainable, and general solution is to provide users (who know their data best) with both information and control. We came up with the following proposal:

  1. Default behavior: If gaps are detected, loading should error out. RawIO readers should raise an error when timestamp gaps are found and display a report showing the number, size, and characteristics of the gaps. This way, users can make an informed decision. Examples of this approach can be seen in Blackrock add summary of automatic data segmentation  #1769 and Improve intan reader error message for discontinuities #1484.
  2. Opt-in behavior: RawIO readers should provide an argument (e.g. gap_tolerance_ms, naming TBD per reader) that lets users explicitly load data with gaps if they wish. Gaps smaller than the value will be ignored, gaps larger than the value will be segmented.
  3. Timestamps from acquisition system: To make data with gaps more useful, we should provide the original timestamps from the acquisition system when available (see Add utility method to get timestamps on Intan base #1652 for Intan, BlackrockRawIO: Add _get_blackrock_timestamps for per-sample timestamp retrieval #1816 for Blackrock).
  4. ignore_integrity_checks as the global escape hatch. A boolean kwarg at instantiation that bypasses every integrity check the reader performs, including gap detection and any format-specific file-validity checks. Default is False (strict). This serves two audiences: users who know their data is fine and want faster loads (integrity checks can be expensive, as @samuelgarcia pointed out), and users who need to open corrupt files to recover what they can (e.g. when the person who ran the experiment is no longer available). Examples: Add the possilibity of opening intan files even if corrupted. #1470 (Intan corrupt-file opening), Blackrock add block validation #1740 (other integrity checks).

This design would allow us to implement a consistent API across RawIO, so gaps are handled with a common interface. The plan is as follows:

  1. Implement the solution first for Blackrock (since there are open issues and users currently cannot read the data). This will also let us to discuss the types and naming of the interface.
  2. Extend the interface to other popular readers (e.g. Intan, OpenEphys, SpikeGLX, Plexon) to uncover potential difficulties.
  3. Once the API is stable for the main readers, integrate it into the abstract/parent classes as a general RawIO API.
  4. If possible, deprecate the current gap-handling mechanisms already in place (e.g. in OpenEphys) to simplify the codebase.

[EDIT, added mechanism to override the checks]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions