Unhandled exceptions (zlib.error, EOFError, RuntimeError) during malicious/corrupt DOCX parsing


### Summary

When parsing a `.docx` file using `docx.Document(stream)`, the library relies heavily on Python's standard `zipfile` module to extract the internal XML files.

If a user provides a malformed, corrupt, or unexpectedly modified ZIP structure, the underlying `zipfile` and `zlib` modules will throw native Python exceptions (`zlib.error`, `EOFError`, and `RuntimeError`). Because `python-docx` does not catch these specific exceptions during `PackageReader.from_file()`, they completely escape the library and cause an unhandled application crash.

These issues were discovered via fuzzing.

---

### Details & Tracebacks

#### 1. `zlib.error` (Corrupted Compression Stream)

If the ZIP compression data is modified, `zlib` encounters an illegal compression distance.

```text
Traceback (most recent call last):
  File "poc.py", line 4, in <module>
    doc = docx.Document("crash_zlib.docx")
  ...
  File "/usr/lib/python3.11/zipfile.py", line 1027, in _read1
    data = self._decompressor.decompress(data, n)
zlib.error: Error -3 while decompressing data: invalid distance too far back
````

---

#### 2. `EOFError` (Truncated ZIP Data)

If the ZIP headers declare a larger size than the actual file stream contains, `zipfile` hits EOF unexpectedly.

```text
Traceback (most recent call last):
  File "poc.py", line 4, in <module>
    doc = docx.Document("crash_eof.docx")
  ...
  File "/usr/lib/python3.11/zipfile.py", line 1054, in _read2
    raise EOFError
EOFError
```

---

#### 3. `RuntimeError` (Unexpected Encryption Flag)

If a ZIP file flag is modified to indicate an internal XML file (e.g. `word/_rels/document.xml.rels`) is encrypted, `zipfile` throws a `RuntimeError` because no password was provided.

```text
Traceback (most recent call last):
  File "poc.py", in <module>
    doc = docx.Document("crash_encrypted.docx")
  File "/usr/lib/python3.11/zipfile.py", line 1598, in open
    raise RuntimeError("File 'word/_rels/document.xml.rels' is encrypted, password required for extraction")
RuntimeError: File 'word/_rels/document.xml.rels' is encrypted, password required for extraction
```

---

### Suggested Remediation

Any application parsing user-uploaded `.docx` files should catch a standard `PackageNotFoundError` or `BadZipFile` if the document is corrupt. The application developer should not have to manually catch `zlib.error` or `EOFError`.

I suggest updating `docx.opc.pkgreader.PackageReader.from_file()` (or the underlying `phys_pkg.blob_for` method) to catch these native extraction exceptions and wrap them in a standard `python-docx` exception (such as `PackageNotFoundError` or a new `InvalidPackageError`).

```python
import zipfile
import zlib

try:
    return self._zipf.read(pack_uri.membername)
except (KeyError, zipfile.BadZipFile, zlib.error, EOFError, RuntimeError):
    raise PackageNotFoundError("Package is corrupt, truncated, or encrypted.")
```

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unhandled exceptions (zlib.error, EOFError, RuntimeError) during malicious/corrupt DOCX parsing #1561

Summary

Details & Tracebacks

1. `zlib.error` (Corrupted Compression Stream)

2. `EOFError` (Truncated ZIP Data)

3. `RuntimeError` (Unexpected Encryption Flag)

Suggested Remediation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Unhandled exceptions (zlib.error, EOFError, RuntimeError) during malicious/corrupt DOCX parsing #1561

Description

Summary

Details & Tracebacks

1. zlib.error (Corrupted Compression Stream)

2. EOFError (Truncated ZIP Data)

3. RuntimeError (Unexpected Encryption Flag)

Suggested Remediation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `zlib.error` (Corrupted Compression Stream)

2. `EOFError` (Truncated ZIP Data)

3. `RuntimeError` (Unexpected Encryption Flag)