ENH: OOC architecture rewrite — new bulk I/O API and infrastructure#1568
Open
joeykleingers wants to merge 2 commits into
Open
ENH: OOC architecture rewrite — new bulk I/O API and infrastructure#1568joeykleingers wants to merge 2 commits into
joeykleingers wants to merge 2 commits into
Conversation
b4ef97f to
99b49ed
Compare
b4ef97f to
bb09048
Compare
This was referenced Mar 24, 2026
102c436 to
b4c1358
Compare
2bd614a to
110c054
Compare
35aecd0 to
3a88bbf
Compare
bdfed87 to
6fbfc8d
Compare
2095f32 to
65c4ba8
Compare
Adds the core out-of-core (OOC) architecture to simplnx: a plugin-
driven hook system that lets the SimplnxOoc plugin own OOC storage
policy while simplnx core supplies the mechanism. Core gains bulk I/O
primitives that work identically for in-core and OOC stores, a
DataIOCollection callback API for the plugin to drive format
selection / post-import finalization / recovery-file write
interception, a strided N-dimensional Extent type, an EmptyStringStore
placeholder, a MemoryBudgetManager singleton that cache subsystems
register against, a unified DispatchAlgorithm template, and a
redesigned Dream3dIO public API (LoadDataStructure family plus a
WriteRecoveryFile with optional UserDataFilePath redirect). The
chunk-method skeleton previously exposed on AbstractDataStore is
removed in favor of the bulk I/O primitives. 92 files changed,
+5954 / -2707 lines.
================================================================================
1. AbstractDataStore bulk I/O API
================================================================================
src/simplnx/DataStructure/AbstractDataStore.hpp,
src/simplnx/DataStructure/DataStore.hpp,
src/simplnx/DataStructure/EmptyDataStore.hpp,
src/simplnx/DataStructure/IDataStore.hpp
Two bulk I/O primitives on AbstractDataStore<T>:
Result<> copyIntoBuffer(usize startIndex, std::span<T>) const
Result<> copyFromBuffer(usize startIndex, std::span<const T>)
Plus a strided N-dimensional access pair driven by Extent:
virtual Result<> readExtent(const Extent&, std::span<T>) const = 0
virtual Result<> writeExtent(const Extent&, std::span<const T>) = 0
DataStore<T> implements the 3D fast path via std::memcpy on contiguous
X rows (ZYX tuple order). The memcpy branch is guarded behind
!std::is_same_v<T, bool> because std::vector<bool> has no .data()
accessor; bool falls back to per-tuple copy. 1D extents use a flat
strided walk. EmptyDataStore returns empty / no-ops.
StoreType is reduced to three values (InMemory, OutOfCore, Empty).
IsOutOfCore() checks StoreType directly. getRecoveryMetadata() is
pure virtual on IDataStore — DataStore returns an empty map (in-
memory stores have no backing file); EmptyDataStore throws, matching
the fail-fast pattern of every other data-access method on this
metadata-only placeholder.
The previously exposed chunk-method skeleton on AbstractDataStore
(loadChunk, getNumberOfChunks, getChunkLowerBounds,
getChunkUpperBounds, getChunkShape, getChunkSize, getChunkTupleShape,
getChunkExtents, convertChunkToDataStore) is removed; the bulk I/O
primitives subsume it. Callers across the codebase are migrated.
================================================================================
2. Plugin hook system (DataIOCollection / IDataIOManager)
================================================================================
src/simplnx/DataStructure/IO/Generic/DataIOCollection.{hpp,cpp},
src/simplnx/DataStructure/IO/Generic/IDataIOManager.{hpp,cpp},
src/simplnx/Core/Application.cpp,
src/simplnx/Utilities/DataStoreUtilities.{hpp,cpp}
DataIOCollection carries three plugin-registered callbacks that let
the SimplnxOoc plugin drive OOC policy without core knowing the
details:
FormatResolverFnc(DataStructure, DataPath, DataType, dataSizeBytes)
Decides storage format per array. The expanded signature lets the
resolver walk parents to determine geometry type, so it can force
in-core for unstructured / poly geometry arrays without caller-
side checks.
DataStoreImportHandlerFnc(DataStructure, hdf5File, importStructure,
EagerLoadFnc)
Post-import callback that finalizes placeholder stores after all
HDF5 objects are read. Handles in-core eager loading, OOC
reference stores, and recovery reattachment in one place. The
EagerLoadFnc callback lets the handler eager-load individual
arrays without knowing Dream3dIO internals.
WriteArrayOverrideFnc
Intercepts HDF5 writes during recovery-file creation so the
plugin can emit lightweight placeholder datasets instead of full
data. Activated via the RAII WriteArrayOverrideGuard, wired into
DataStructureWriter.
IDataIOManager gains factory registration for ListStoreRefCreateFnc,
StringStoreCreateFnc, and FinalizeStoresFnc, with delegating creation
methods on DataIOCollection. "Simplnx-Default-In-Memory" is a
reserved format name: addIOManager() rejects plugin registrations
under it (the core CoreDataIOManager is installed directly into the
map bypassing the guard). addIOManager() returns Result<> rather than
throwing.
DataIOCollection::generateManagerListString() produces a multi-line
capability matrix of every registered IO manager and the store types
it supports; the helper is surfaced in CreateArray's "unknown format"
error message so users see which formats are available when a request
fails. A format display-name registry (registerFormatDisplayName /
getFormatDisplayNames) supplies human-readable names for the
DataStoreFormatParameter dropdown — "Automatic", "In Memory", plus
plugin-registered formats with display names.
DataStoreUtilities::GetIOCollection() and
Application::getIOCollection() return DataIOCollection& rather than
std::shared_ptr — the collection is owned by the Application
singleton that outlives every caller, so a reference expresses non-
ownership clearly. WriteArrayOverrideGuard holds a DataIOCollection&
member; the "may be null no-op" path is dropped (no caller used it).
================================================================================
3. Extent type and strided N-D access
================================================================================
src/simplnx/Common/Extent.hpp,
test/ExtentTest.cpp
nx::core::Extent is a strided N-dimensional range struct (lower /
upper bounds, dimension count, per-dimension stride). It is the
parameter type for the new readExtent / writeExtent virtuals on
AbstractDataStore, giving the visualization layer one virtual call
per slab instead of hand-rolled per-row copyIntoBuffer loops.
The header #undef's min and max at the top so MSVC's transitively
included Windows.h macros don't expand against the struct's member
names in the constructor initializer list — without the guard, the
parser failure cascades into fmt/color.h namespace errors on v143
Windows builds.
ExtentTest covers construction, contains / overlaps / intersect,
equality, and strided semantics (82 assertions).
================================================================================
4. EmptyStringStore placeholder
================================================================================
src/simplnx/DataStructure/EmptyStringStore.hpp,
test/EmptyStringStoreTest.cpp,
src/simplnx/DataStructure/IO/HDF5/StringArrayIO.cpp
EmptyStringStore is the string-array analog of EmptyDataStore: a
placeholder that stores only tuple-shape metadata. All data-access
methods throw std::runtime_error; isPlaceholder() returns true (vs
false for StringStore). StringArrayIO creates an EmptyStringStore
in OOC mode instead of allocating numValues empty strings up-front,
deferring full-string materialization to the plugin's handler.
Tests cover metadata, zero tuples, throwing access, deep-copy
placeholder preservation, resize, and isPlaceholder.
================================================================================
5. Dream3dIO public API
================================================================================
src/simplnx/Utilities/Parsing/DREAM3D/Dream3dIO.{hpp,cpp},
test/Dream3dLoadingApiTest.cpp,
test/UnitTestCommon/include/simplnx/UnitTest/UnitTestCommon.{hpp,cpp}
The Dream3dIO public API is rebuilt around four purpose-specific
loaders:
LoadDataStructure(path) — full load with OOC handler support
LoadDataStructureArrays(path, dataPaths) — selective load with pruning
LoadDataStructureMetadata(path) — metadata-only skeleton (preflight)
LoadDataStructureArraysMetadata(path, dataPaths) — pruned metadata skeleton
This decouples pipeline loading from DataStructure loading, replaces
a bool-flagged preflight parameter with distinct functions, and
centralizes OOC handler integration in a single internal
LoadDataStructureWithHandler. The internal helpers
(LoadDataObjectFromHDF5, EagerLoadDataFromHDF5, PruneDataStructure,
LoadDataStructureWithHandler) move into an anonymous namespace. The
legacy ReadFile / ImportDataStructureFromFile / FinishImportingObject
entry points are removed.
ReadDREAM3DFilter calls LoadDataStructureMetadata at preflight and no
longer manages HDF5 file handles. ImportH5ObjectPathsAction calls
LoadDataStructureMetadata at preflight and LoadDataStructure at
execute; the action no longer manages file handles or deferred
loading itself and delegates to the registered
DataStoreImportHandlerFnc when present, or falls back to
FinishImportingObject for non-OOC builds.
WriteRecoveryFile() wraps WriteFile with WriteArrayOverrideGuard so
the plugin's placeholder-write hook fires. It also accepts an
optional userDataFilePath parameter: when set, the writer emits a
minimal HDF5 file containing only the file-version tag plus a root-
level "UserDataFilePath" string attribute. In that mode the
DataStructure and Pipeline parameters are ignored — the user's
authoritative .dream3d output (written by a trailing
WriteDREAM3DFilter in the pipeline) is the real data carrier. The
recovery scanner reads the attribute back at relaunch time to
redirect the load. Companion reader:
DREAM3D::ReadUserDataFilePathAttribute. Existing call sites are
source-compatible; the parameter defaults to std::nullopt.
UnitTest::LoadDataStructure and LoadDataStructureMetadata helpers
delegate directly to the new functions. Dream3dLoadingApiTest covers
all four loaders. Test callers (ComputeIPFColorsTest,
RotateSampleRefFrameTest, DREAM3DFileTest, H5Test) are migrated.
================================================================================
6. Format resolution at the array creation layer
================================================================================
src/simplnx/Utilities/DataStoreUtilities.{hpp,cpp},
src/simplnx/Utilities/ArrayCreationUtilities.hpp,
src/simplnx/Filter/Actions/CreateArrayAction.{hpp,cpp},
src/simplnx/Filter/Actions/ImportH5ObjectPathsAction.{hpp,cpp},
src/simplnx/Filter/Actions/Create{Vertex,1D,2D,3D}GeometryAction.{hpp,cpp},
src/simplnx/Filter/Actions/CreateRectGridGeometryAction.{hpp,cpp},
+ 12 filter call-sites across SimplnxCore and OrientationAnalysis
resolveFormat() is called from ArrayCreationUtilities::CreateArray
and ImportH5ObjectPathsAction; DataStoreUtilities::CreateDataStore
and CreateListStore are simple factories that take an already-
resolved format string.
CreateArrayAction carries a dataFormat member for per-filter format
override; when non-empty it bypasses the resolver. The dropdown shows
"Automatic", "In Memory", and any plugin-registered formats with
display names. 12 filter callers are fixed where fillValue was being
passed as dataFormat after a parameter reordering.
Geometry creation actions drop their createdDataFormat parameter and
materialize OOC topology arrays into in-core stores when source
arrays carry StoreType::OutOfCore — unstructured / poly geometry
topology must be in-core for the visualization layer.
ArrayCreationUtilities::CheckMemoryRequirement is a pure RAM check
and recognizes both the empty string and k_InMemoryFormat as in-core
sentinels (avoiding the prior overload where "" meant both "unset"
and "in-memory").
================================================================================
7. HDF5 I/O integration
================================================================================
src/simplnx/DataStructure/IO/HDF5/DataStoreIO.hpp,
src/simplnx/DataStructure/IO/HDF5/DataArrayIO.hpp,
src/simplnx/DataStructure/IO/HDF5/NeighborListIO.hpp,
src/simplnx/DataStructure/IO/HDF5/DataStructureWriter.cpp,
src/simplnx/Utilities/Parsing/HDF5/IO/DatasetIO.{hpp,cpp}
DataStoreIO::ReadDataStoreIntoMemory (the in-memory load path) gains
a placeholder-detection guard that compares physical HDF5 element
count against shape attributes and returns Result<> with a warning
when they mismatch — catching attempts to load placeholder datasets
without the OOC plugin. Return type becomes
Result<shared_ptr<AbstractDataStore<T>>> so callers can accumulate
warnings across arrays.
DataArrayIO::writeData always calls WriteDataStore. OOC stores
materialize their data through the plugin's writeHdf5(); recovery
writes flow through WriteArrayOverrideFnc.
NeighborListIO has an OOC interception path: it computes total
neighbor count, calls resolveFormat(), and creates a read-only
reference list-store when an OOC format is available. Legacy
NeighborList reading threads a preflight flag through the chain
(readLegacyNeighborList → createLegacyNeighborList → ReadHdf5Data) so
legacy .dream3d imports create EmptyListStore placeholders rather
than eagerly loading per-element via setList().
DataStructureWriter consults WriteArrayOverrideFnc first, giving the
registered plugin callback first chance to handle each data object.
DatasetIO adds explicit template instantiations of createEmptyDataset
and writeSpanHyperslab for all numeric types plus bool. The plugin's
AbstractOocStore::writeHdf5() cannot use writeSpan() (the full array
isn't in memory); it creates an empty dataset, then fills it region-
by-region via hyperslab writes as it streams from the backing file.
================================================================================
8. MemoryBudgetManager and Preferences
================================================================================
src/simplnx/Utilities/MemoryBudgetManager.{hpp,cpp},
test/MemoryBudgetManagerTest.cpp,
src/simplnx/Core/Preferences.{hpp,cpp}
MemoryBudgetManager is a unified, thread-safe singleton in simplnx
core. Cache subsystems (the plugin's ChunkCache, visualization stride
cache, etc.) register allocations against it, and it evicts globally
oldest entries via callbacks when memory pressure exceeds the
configured budget. Living in core lets non-OOC builds and
visualization code use it without a plugin dependency, and the 50%-
of-system-RAM default is computed in-place via
MemoryBudgetManager::defaultBudgetBytes() — no plugin-startup
override race.
Preferences gains the persisted "memory_budget_bytes" key and
memoryBudgetBytes() / setMemoryBudgetBytes(value) accessors. The
k_InMemoryFormat sentinel constant is added for explicit in-core
format choice; migration logic erases legacy empty-string and
"In-Memory" preference values. checkUseOoc() tests against
k_InMemoryFormat. setLargeDataFormat("") removes the key so plugin
defaults take effect.
forceOocData is no longer gated on m_UseOoc in either the getter or
the setter. The previous gate (returning true only when the selected
format was something other than the in-memory sentinel) silently
dropped force_ooc_data writes whenever the user's large-data format
was the default in-memory sentinel — defeating the toggle's intent.
The OOC format resolver already handles the (forceOoc=true,
userChoseInMemory=true) case correctly; removing the gate makes the
toggle behave as designed.
================================================================================
9. Algorithm infrastructure (AlgorithmDispatch / IParallelAlgorithm)
================================================================================
src/simplnx/Utilities/AlgorithmDispatch.hpp,
src/simplnx/Utilities/IParallelAlgorithm.{hpp,cpp},
test/IParallelAlgorithmTest.cpp,
CMakeLists.txt
AlgorithmDispatch adds ForceInCoreAlgorithm and ForceOocAlgorithm
global flags with RAII guards, plus a DispatchAlgorithm template that
selects between Direct (in-core) and Scanline (OOC) algorithm
variants based on involved stores' types and the force flags.
SIMPLNX_TEST_ALGORITHM_PATH is a new CMake option (0 = both,
1 = OOC-only, 2 = InCore-only) for dual-dispatch test control.
IParallelAlgorithm drops the blanket TBB-disable for OOC data — OOC
stores are thread-safe via the plugin's ChunkCache plus the HDF5
global mutex. CheckStoresInMemory and CheckArraysInMemory use
StoreType directly instead of getDataFormat().
IParallelAlgorithmTest covers TBB enablement across in-memory, OOC,
and mixed array / store combinations using a MockOocDataStore that
overrides readExtent / writeExtent and provides a no-op
getRecoveryMetadata.
================================================================================
10. Filter / utility callers updated for the bulk I/O API
================================================================================
src/Plugins/SimplnxCore/src/SimplnxCore/Filters/Algorithms/FillBadData.cpp,
src/Plugins/SimplnxCore/src/SimplnxCore/Filters/Algorithms/QuickSurfaceMesh.cpp,
src/Plugins/SimplnxCore/src/SimplnxCore/utils/VtkUtilities.cpp,
src/Plugins/SimplnxCore/src/SimplnxCore/Filters/WriteVtkRectilinearGridFilter.cpp,
+ ITK and OrientationAnalysis filters / tests
FillBadData::phaseOneCCL and phaseThreeRelabeling use Z-slab buffered
I/O via copyIntoBuffer / copyFromBuffer rather than the removed chunk
methods. operator()() scans feature counts in 64K-element chunks via
copyIntoBuffer.
QuickSurfaceMesh::generateTripleLines() no longer queries
getChunkShape() to set ParallelData3DAlgorithm chunk size.
VtkUtilities binary write reads into 4096-element buffers via
copyIntoBuffer, byte-swaps inside the buffer, and fwrites — replacing
direct DataStore data()-pointer access. WriteVtkRectilinearGrid
propagates Result<> errors from copyIntoBuffer / copyFromBuffer.
ITKArrayHelper / ITKTestBase OOC checks use getStoreType() instead of
getDataFormat().empty(). IsArrayInMemory collapses from a 40-line
DataType switch to a single StoreType check. ArraySelectionParameter
drops EmptyOutOfCore handling; only StoreType::Empty remains.
The bulk I/O APIs (DataIOCollection::addIOManager and the
AbstractDataStore copyIntoBuffer / copyFromBuffer family) return
Result<> rather than throwing std::runtime_error / std::out_of_range;
all call sites — Application::loadPlugin, the
Create{Vertex,1D,2D,3D}Geometry actions, WriteVtkRectilinearGrid,
FillBadData phases — early-return on error.
================================================================================
11. Tests
================================================================================
test/Dream3dLoadingApiTest.cpp (new)
test/ExtentTest.cpp (new)
test/IParallelAlgorithmTest.cpp (new)
test/MemoryBudgetManagerTest.cpp (new)
test/DataIOCollectionHooksTest.cpp (new)
test/EmptyStringStoreTest.cpp (new)
test/IOFormat.cpp (extended)
DataIOCollectionHooksTest exercises the format-resolver and data-
store-import-handler hooks. IOFormat tests cover the InMemory
sentinel, empty-format handling, and resolveFormat with / without a
registered plugin; the "Target DataStructure Size" tautology test is
removed.
RodriguesConvertorTest exemplar gains the missing expected values for
the 4th tuple (indices 12-15) — the old CompareDataArrays broke on
the first floating-point mismatch and masked the gap; the new
chunked comparison correctly continues past epsilon-close differences
and exposed it.
================================================================================
12. Style and incidental cleanup
================================================================================
docs/superpowers/{plans,specs}/2026-04-07-arraycalculator-memory-optimization*.md
(deletions),
Dream3dIO.cpp, ImportH5ObjectPathsAction.cpp, DataIOCollection.cpp,
H5Test.cpp, UnitTestCommon.cpp, DREAM3DFileTest.cpp,
ComputeIPFColorsTest.cpp
Two markdown files under docs/superpowers/ that were merged into
develop by mistake are removed. Seven .cpp files that spelled out
std::filesystem repeatedly gain the namespace fs = std::filesystem
alias for consistency with the existing codebase convention.
================================================================================
Verification
================================================================================
The squashed commit was not built or test-run in this session. The
squash introduces no code changes — `git diff <squashed-commit>
<original-branch-tip>` is empty (confirmed in step 8 of the squash
workflow).
Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
65c4ba8 to
653cf62
Compare
… API Carries the same migration the rest of this file received when the ENH commit landed: the compression test cases (added in that same squash) still referenced the removed ArrayCreationUtilities:: k_DefaultDataFormat constant and the removed DREAM3D::ReadFile() entry point, causing the Linux, macOS, and Windows CI builds to fail with C2039 / undeclared-identifier errors. * Replace ArrayCreationUtilities::k_DefaultDataFormat (7 sites) with the empty string "" — its prior value — preserving the default-format / fillValue semantics of each CreateArray call. * Replace HDF5::FileIO::ReadFile + DREAM3D::ReadFile structured- binding pattern (3 sites) with UnitTest::LoadDataStructure(path), matching the canonical helper already used elsewhere in the file and across the test suite. All 14 DREAM3DFileTest / WriteDREAM3DFilter ctest entries pass in the simplnx-Rel build. Signed-off-by: Joey Kleingers <joey.kleingers@bluequartz.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrites the out-of-core (OOC) architecture in simplnx, replacing the old chunk-based API with a new bulk I/O design built around
copyIntoBuffer/copyFromBufferonAbstractDataStore. Introduces the core infrastructure that the OOC-optimized filter algorithms (separate PR #1575) build upon.Core Architecture Changes
AbstractDataStore/IDataStore(loadChunk,getNumberOfChunks,getChunkLowerBounds,getChunkUpperBounds,getChunkShape)copyIntoBuffer/copyFromBufferpure virtual bulk I/O methods toAbstractDataStorewith implementations inDataStore,EmptyDataStore, andHDF5ChunkedStore(in SimplnxOoc plugin)StoreTypeenum (InMemory,OutOfCore,Empty) toIDataStore;IsOutOfCore()now checksStoreTypeinstead ofgetChunkShape()HDF5ChunkedStoreperforms I/O via HDF5 hyperslab selections with Z-slice-aligned default chunk shape{1,Y,X}for 3D datacopyFromBufferfast path: skips read-modify-write for tuple-aligned writescopyIntoBufferfast path: direct span-basedreadTuplesfor tuple-aligned readsDatasetIOgainsreadTuples/writeTuplesfor direct hyperslab-based bulk tuple I/ONew Core Utilities
DispatchAlgorithm— Runtime dispatch between in-core (Direct) and OOC (Scanline/CCL) algorithm variants based on data store typeSliceBufferedTransfer— Type-dispatched Z-slice buffered tuple copy utility that eliminates per-element OOC overhead during morphological transfer phasesUnionFind— Vector-based disjoint set data structure with union-by-rank and path-halving compression for chunk-sequential CCL algorithmsSegmentFeaturesOOC path — Z-slice CCL-based connected component labeling withUnionFindequivalence tracking, replacing BFS/DFS flood fill for OOC dataAlignSectionsOOC path — Bulk slice read/write withAlignSectionsTransferDataOocImplDataArrayUtilitiesbulk I/O —ImportFromBinaryFile,AppendData,CopyData, and mirrorswap_rangesupdated with chunked bulk I/O (runtime OOC check preserves original in-core performance)OOC Store Management
DataIOCollection/IDataIOManager— Updated for OOC store lifecycle managementImportH5ObjectPathsAction— OOC-aware file import with recovery metadataDataStoreIO— Detect OOC recovery attributes inReadDataStorefor safe data restorationTest Infrastructure
CompareDataArraysrewritten to usecopyIntoBufferin 40K-element chunks instead of per-elementoperator[]ForceOocAlgorithmGuardfor dual-path test coverageSIMPLNX_TEST_ALGORITHM_PATHCMake option (0=Both, 1=OOC-only, 2=InCore-only) for build-specific test path controlRelated PRs
Test Plan