Skip to content

[WIP][pypaimon] Add FUSE support for REST Catalog#7483

Open
shyjsarah wants to merge 23 commits intoapache:masterfrom
shyjsarah:dev-fuse-path
Open

[WIP][pypaimon] Add FUSE support for REST Catalog#7483
shyjsarah wants to merge 23 commits intoapache:masterfrom
shyjsarah:dev-fuse-path

Conversation

@shyjsarah
Copy link
Contributor

Purpose

Linked issue: close #xxx

Add FUSE (Filesystem in Userspace) support for PyPaimon REST Catalog. When remote object storage paths (OSS, S3, HDFS, etc.) are mounted locally via FUSE, users can now access data directly through local filesystem paths for better performance, bypassing remote storage SDKs.

Key features:

  • Three configuration options: fuse.local-path.enabled, fuse.local-path.root, fuse.local-path.validation-mode
  • Three validation modes: strict (throw exception), warn (fallback), none (skip validation)
  • Validation checks default database location to verify FUSE mount
  • Automatic path conversion: skip catalog/bucket level from remote path
  • Tri-state validation state to avoid repeated validation

Tests

  • pypaimon/tests/rest/test_fuse_local_path.py - 13 unit test cases covering:
    - Path conversion logic (_resolve_fuse_local_path)
    - Validation modes (strict/warn/none)
    - file_io_for_data behavior with FUSE enabled/disabled
    - Edge cases (missing root config, no location, etc.)

API and Format

No API or storage format changes. This is a new feature that adds optional configuration.

Documentation

  • Added docs/content/pypaimon/fuse-support.md - User documentation covering configuration, validation modes, usage examples, and limitations.

Generative AI tooling

Generated-by: Qwen Code

shyjsarah and others added 23 commits March 13, 2026 14:25
Add design documents for supporting FUSE-mounted OSS paths in RESTCatalog.
This allows users to access data through local file system paths without
needing OSS tokens via getTableToken API.

Configuration includes:
- fuse.local-path.enabled: enable/disable FUSE path mapping
- fuse.local-path.root: root local path for FUSE mount
- fuse.local-path.database: database-level path mapping
- fuse.local-path.table: table-level path mapping

Security validation:
- fuse.local-path.validation-mode: strict/warn/none
- Option 1: Java NIO FileStore API (cross-platform FUSE detection)
- Option 2: OSS data validation (recommended)
  - Uses existing FileIO (RESTTokenFileIO or ResolvingFileIO) to read OSS
  - Compares file size and content hash with local file
  - No REST API extension required
  - Graceful fallback to default FileIO on validation failure

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…napshotManager/SchemaManager

- Remove Java NIO FileStore API option (Option 1)
- Use SnapshotManager.latestSnapshot() to get latest snapshot directly
- Use SchemaManager.latest() as fallback for new tables
- Remove custom file traversal logic, use existing Paimon APIs
- Simplify validation code and improve maintainability

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…ation

- Replace java.nio.file.Paths.get() and Files.* with LocalFileIO
- Use unified computeFileHash(FileIO, Path) method for both OSS and local files
- More consistent with Paimon coding style
- Removes dependency on java.nio.file package

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Use storage-agnostic terminology throughout the design documents:
- Replace 'OSS' with 'remote storage' or 'remote' in variable names and comments
- Use 'remoteFileIO', 'remotePath', 'remoteHash' instead of 'ossFileIO', 'ossPath', 'ossHash'
- Update method names: validateByOSSData -> validateByRemoteData
- Update flowchart labels: 'OSS Data Validation' -> 'Remote Data Validation'
- Keep OSS/S3/HDFS as examples of remote storage types

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Update background: FUSE allows bypassing remote storage SDKs, not authentication
- Update goal apache#3: Use local FileIO for data read/write, but still need getTableToken for validation
- Update behavior matrix: clarify getTableToken is still used for validation
- Remove 'Cost Reduction' from benefits since getTableToken is still called

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…nces

- Background: focus on SDK vs local filesystem access, not authentication
- Goals: remove getTableToken validation mention
- Behavior matrix: simplify description

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Remove 'Path Consistency Validation' (isFUSEMountPoint, etc.)
- Remove 'Table Identifier File Mechanism'
- Update validation flow diagrams to reflect remote data validation
- Keep only the remote data validation approach

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Format tables and object tables have no schema, this is expected behavior

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Format tables and object tables have no schema, this is expected behavior

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add validateByIdentifierFile() method for UUID comparison
- Add readIdentifierFile() helper for parsing identifier file
- Update validation flow diagrams with two-step validation
- Add .paimon-identifier file format documentation

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Remove database/table fields (they can change via rename)
- Only keep table-uuid for identification
- Update readIdentifierFile() to return UUID only

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…sing

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…th Paimon internal naming

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add comprehensive error handling section for FUSE local path operations:
- Error categories: permission/auth, network, service, FUSE-specific
- Read/write operation error handling flows with retry logic
- New retry configuration options with exponential backoff
- Implementation example with isRetryableError classification
- Logging guidelines and optional metrics

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Remove network, REST API, and permission error handling since they are
already handled by existing Paimon mechanisms. Focus only on:
- Transport endpoint is not connected (mount disconnection)
- Stale file handle
- Device or resource busy
- Input/output error from FUSE backend

Add best practices section for FUSE users.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add retry configuration parameters: max-attempts, initial-delay-ms, max-delay-ms
- Implement FuseErrorHandler with exponential backoff algorithm
- Add FuseAwareFileIO wrapper to delegate LocalFileIO with error handling
- Support retry for stale file handle and device busy errors
- Fail immediately for FUSE mount disconnection (no retry)

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add simplified FUSE local path mapping to allow accessing remote data
through locally mounted FUSE filesystem. This enables direct local file
access when data is synchronized via FUSE mount.

Features:
- Three config options: enabled, root, validation-mode
- Path conversion: skip catalog/bucket level from remote path
- Validation via default database location check
- Three validation modes: strict (error), warn (fallback), none (skip)
- Tri-state variable to avoid repeated validation

Tests:
- 13 test cases covering path conversion and validation logic

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add user-facing documentation for the FUSE local path mapping
feature in PyPaimon REST Catalog. Covers configuration options,
validation modes, path conversion logic, and usage examples.

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant