GH-47435: [Python][Parquet] Add direct key encryption/decryption API#49667
GH-47435: [Python][Parquet] Add direct key encryption/decryption API#49667smaheshwar-pltr wants to merge 6 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
cc @ggershinsky, this is a requirement for PyIceberg encryption support, see apache/iceberg-python#3221 |
|
sgtm, #47435 (comment) |
|
|
1 similar comment
|
|
| This bypasses the KMS-based :class:`CryptoFactory` API and directly | ||
| constructs decryption properties from a plaintext key. This is useful | ||
| when the caller manages key wrapping externally (e.g. via an | ||
| application-level envelope encryption scheme). | ||
|
|
||
| For most use cases, prefer the higher-level :class:`CryptoFactory` | ||
| with :class:`DecryptionConfiguration`, which handles envelope | ||
| encryption and key rotation automatically. |
There was a problem hiding this comment.
I don't think it's true that the CryptoFactory handles key rotation automatically, that part of the comment should probably be removed. It does allow key rotation if you use external key material though.
adamreeve
left a comment
There was a problem hiding this comment.
Thanks for implementing this @smaheshwar-pltr! I've left a few suggestions
| This bypasses the KMS-based :class:`CryptoFactory` API and directly | ||
| constructs decryption properties from a plaintext key. This is useful | ||
| when the caller manages key wrapping externally (e.g. via an | ||
| application-level envelope encryption scheme). | ||
|
|
||
| For most use cases, prefer the higher-level :class:`CryptoFactory` | ||
| with :class:`DecryptionConfiguration`, which handles envelope | ||
| encryption and key rotation automatically. |
There was a problem hiding this comment.
I don't think it's true that the CryptoFactory handles key rotation automatically, that part of the comment should probably be removed. It does allow key rotation if you use external key material though.
|
@adamreeve, thank you for the great review here 🙌 . I've reworked the docs now - they are consistent with my (inexperienced) understanding. Would appreciate your eyes on this again, please let me know what you think! |
|
@github-actions crossbow submit preview-docs |
|
Revision: 3e529d5 Submitted crossbow builds: ursacomputing/crossbow @ actions-4b28d0a71b
|
adamreeve
left a comment
There was a problem hiding this comment.
Thanks for the changes @smaheshwar-pltr, I just noticed one more minor issue that I missed first time around sorry.
|
@smaheshwar-pltr, can you please comment on #47435 so I can assign it to you? Eg. just a "take" comment. Then I'll merge this, the CI failures are unrelated and also failing on main. |
|
I've created #49821 as a follow-up issue to add support for per-column keys. |
|
|
||
| cdef cppclass CFileDecryptionPropertiesBuilder\ | ||
| " parquet::FileDecryptionProperties::Builder": | ||
| CFileDecryptionPropertiesBuilder() except + |
There was a problem hiding this comment.
If all these APIs can raise C++ exceptions, which kind of exceptions will be raised on the Python side?
| b"0123456789abcdef", | ||
| b"0123456789abcdef01234567", | ||
| b"0123456789abcdef0123456789abcdef", |
There was a problem hiding this comment.
Nit: reuse the KEY_ constants above?
| dec_props_no_aad = pe.create_decryption_properties( | ||
| footer_key=self.KEY_128, | ||
| ) | ||
| with pytest.raises(IOError): |
There was a problem hiding this comment.
Can we also test something about the exception message?
| with pytest.raises(IOError): | |
| with pytest.raises(IOError, match='XXX...'): |
| footer_key=self.KEY_128, | ||
| aad_prefix=b"wrong_prefix", | ||
| ) | ||
| with pytest.raises(IOError): |
| CFileDecryptionPropertiesBuilder* builder | ||
| shared_ptr[CFileDecryptionProperties] props | ||
|
|
||
| footer_key_bytes = tobytes(footer_key) |
There was a problem hiding this comment.
We shouldn't call tobytes as it will utf8-encode a str object.
| builder.footer_key(c_footer_key) | ||
|
|
||
| if aad_prefix is not None: | ||
| c_aad_prefix = tobytes(aad_prefix) |
| ) | ||
|
|
||
| c_footer_key = CSecureString(<c_string>footer_key_bytes) | ||
| builder = new CFileDecryptionPropertiesBuilder() |
There was a problem hiding this comment.
Why are we using new? We can create a plain value, I think.
| shared_ptr[CFileEncryptionProperties] props | ||
| ParquetCipher cipher | ||
|
|
||
| footer_key_bytes = tobytes(footer_key) |
There was a problem hiding this comment.
Same here and other instances below.
Rationale for this change
See #47435.
What changes are included in this PR?
Adds direct encryption / decryption Python API
Are these changes tested?
Yes, see PR.
Are there any user-facing changes?
Yes, new Python bindings.