[WIP][pypaimon] Add FUSE support for REST Catalog#7483
Open
shyjsarah wants to merge 23 commits intoapache:masterfrom
Open
[WIP][pypaimon] Add FUSE support for REST Catalog#7483shyjsarah wants to merge 23 commits intoapache:masterfrom
shyjsarah wants to merge 23 commits intoapache:masterfrom
Conversation
Add design documents for supporting FUSE-mounted OSS paths in RESTCatalog. This allows users to access data through local file system paths without needing OSS tokens via getTableToken API. Configuration includes: - fuse.local-path.enabled: enable/disable FUSE path mapping - fuse.local-path.root: root local path for FUSE mount - fuse.local-path.database: database-level path mapping - fuse.local-path.table: table-level path mapping Security validation: - fuse.local-path.validation-mode: strict/warn/none - Option 1: Java NIO FileStore API (cross-platform FUSE detection) - Option 2: OSS data validation (recommended) - Uses existing FileIO (RESTTokenFileIO or ResolvingFileIO) to read OSS - Compares file size and content hash with local file - No REST API extension required - Graceful fallback to default FileIO on validation failure Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…napshotManager/SchemaManager - Remove Java NIO FileStore API option (Option 1) - Use SnapshotManager.latestSnapshot() to get latest snapshot directly - Use SchemaManager.latest() as fallback for new tables - Remove custom file traversal logic, use existing Paimon APIs - Simplify validation code and improve maintainability Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…ation - Replace java.nio.file.Paths.get() and Files.* with LocalFileIO - Use unified computeFileHash(FileIO, Path) method for both OSS and local files - More consistent with Paimon coding style - Removes dependency on java.nio.file package Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Use storage-agnostic terminology throughout the design documents: - Replace 'OSS' with 'remote storage' or 'remote' in variable names and comments - Use 'remoteFileIO', 'remotePath', 'remoteHash' instead of 'ossFileIO', 'ossPath', 'ossHash' - Update method names: validateByOSSData -> validateByRemoteData - Update flowchart labels: 'OSS Data Validation' -> 'Remote Data Validation' - Keep OSS/S3/HDFS as examples of remote storage types Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Update background: FUSE allows bypassing remote storage SDKs, not authentication - Update goal apache#3: Use local FileIO for data read/write, but still need getTableToken for validation - Update behavior matrix: clarify getTableToken is still used for validation - Remove 'Cost Reduction' from benefits since getTableToken is still called Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…nces - Background: focus on SDK vs local filesystem access, not authentication - Goals: remove getTableToken validation mention - Behavior matrix: simplify description Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Remove 'Path Consistency Validation' (isFUSEMountPoint, etc.) - Remove 'Table Identifier File Mechanism' - Update validation flow diagrams to reflect remote data validation - Keep only the remote data validation approach Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Format tables and object tables have no schema, this is expected behavior Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Format tables and object tables have no schema, this is expected behavior Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add validateByIdentifierFile() method for UUID comparison - Add readIdentifierFile() helper for parsing identifier file - Update validation flow diagrams with two-step validation - Add .paimon-identifier file format documentation Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Remove database/table fields (they can change via rename) - Only keep table-uuid for identification - Update readIdentifierFile() to return UUID only Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…sing Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…th Paimon internal naming Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add comprehensive error handling section for FUSE local path operations: - Error categories: permission/auth, network, service, FUSE-specific - Read/write operation error handling flows with retry logic - New retry configuration options with exponential backoff - Implementation example with isRetryableError classification - Logging guidelines and optional metrics Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Remove network, REST API, and permission error handling since they are already handled by existing Paimon mechanisms. Focus only on: - Transport endpoint is not connected (mount disconnection) - Stale file handle - Device or resource busy - Input/output error from FUSE backend Add best practices section for FUSE users. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Add retry configuration parameters: max-attempts, initial-delay-ms, max-delay-ms - Implement FuseErrorHandler with exponential backoff algorithm - Add FuseAwareFileIO wrapper to delegate LocalFileIO with error handling - Support retry for stale file handle and device busy errors - Fail immediately for FUSE mount disconnection (no retry) Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add simplified FUSE local path mapping to allow accessing remote data through locally mounted FUSE filesystem. This enables direct local file access when data is synchronized via FUSE mount. Features: - Three config options: enabled, root, validation-mode - Path conversion: skip catalog/bucket level from remote path - Validation via default database location check - Three validation modes: strict (error), warn (fallback), none (skip) - Tri-state variable to avoid repeated validation Tests: - 13 test cases covering path conversion and validation logic Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add user-facing documentation for the FUSE local path mapping feature in PyPaimon REST Catalog. Covers configuration options, validation modes, path conversion logic, and usage examples. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #xxx
Add FUSE (Filesystem in Userspace) support for PyPaimon REST Catalog. When remote object storage paths (OSS, S3, HDFS, etc.) are mounted locally via FUSE, users can now access data directly through local filesystem paths for better performance, bypassing remote storage SDKs.
Key features:
Tests
- Path conversion logic (_resolve_fuse_local_path)
- Validation modes (strict/warn/none)
- file_io_for_data behavior with FUSE enabled/disabled
- Edge cases (missing root config, no location, etc.)
API and Format
No API or storage format changes. This is a new feature that adds optional configuration.
Documentation
Generative AI tooling
Generated-by: Qwen Code