Skip to content

Harden XML configuration parsing with commons-xml#4162

Draft
ppkarwasz wants to merge 5 commits into
feat/2.x/xinclude-resolverfrom
feat/2.x/xml-factory-hardening
Draft

Harden XML configuration parsing with commons-xml#4162
ppkarwasz wants to merge 5 commits into
feat/2.x/xinclude-resolverfrom
feat/2.x/xml-factory-hardening

Conversation

@ppkarwasz

Copy link
Copy Markdown
Member

Note

Stacked on #4161.

Since Commons XML has not yet been released, this PR is based on copernik-xml-factory. The target dependency will be swapped after the first release.

Replaces Log4j's hand-rolled DocumentBuilderFactory hardening with eu.copernik:copernik-xml-factory, and hardens the schema path, which until #4161 used a default SchemaFactory.

The change to XmlConfiguration is two method calls (XmlFactories.newDocumentBuilderFactory() and XmlFactories.newSchemaFactory()), plus the build, OSGi and JPMS plumbing.

The point of this PR is not the lines it removes; it is the hardening Log4j is missing today, and the cost of writing and owning that hardening ourselves.

This is a probe PR

This PR exists to evaluate the library in a real consumer. It should not be merged before the first release of Apache Commons XML, into which the library's code has migrated and is being reviewed Apache-style. At that point the dependency is no longer "third party": it is an ASF-governed artifact with the same provenance as the rest of the Log4j supply chain. Until then, treat this as a spike that measures whether the integration is clean, and whether the hardening is worth a managed dependency.

What Log4j fails to harden today

The current XmlConfiguration hardening is disableDtdProcessing() + setFeature(), 19 lines, and it is incomplete:

  1. Document parsing has no entity-expansion protection.
    FEATURE_SECURE_PROCESSING is never set, so on an external Xerces, Billion-Laughs-class entity expansion is not bounded. External entity resolution is covered decently (external general and parameter entities and external DTD loading are disabled); entity expansion is not.
  2. XInclude protection is inconsistent.
    In 2.26.0, XInclude is on by design, yet ALLOWED_PROTOCOLS restricts only the main configuration file and not the files it includes. The main document is policed; its includes are not.
  3. SchemaFactory is not hardened at all,
    neither against entity expansion nor against external resource access (external DTD/schema). A strict configuration validating against a schema parses that schema with stock, unhardened JAXP.

The honest line count

Before counting what the library removes, count what it takes to fill these gaps.

  • Closing gap (2), and the resolver part of (3), is exactly the code in Resolve XML configuration resources through ConfigurationSource #4161:
    the ConfigurationSourceResolver and the schema-validation routing, ~190 lines in XmlConfiguration alone.
    The library does not remove this and does not claim to: per its current threat model, installing a resolver makes resolution the caller's responsibility, so this stays Log4j's regardless (the next revision of that model is expected to lift the responsibility).
  • Closing the factory parts of (1) and (3) by hand starts with FEATURE_SECURE_PROCESSING on both factories.
    That is the one feature the JAXP specification requires every implementation to support, and yet not every implementation honors it (Android, for one), so even this single line is not portable on its own, before one even reaches the per-provider entity, DTD and external-access settings.

So the real comparison is not "library vs. 10 lines". It is two hardened-factory calls against a hand-rolled, single-provider, partial hardening that Log4j would have to complete, test, and keep current.

The tests come with it

The hardening Log4j would otherwise write and maintain already exists in the library, tested. Restricting the library's suite to the two JAXP factories Log4j uses:

mvn test -Dgroups="dom,schema"   ->   Tests run: 90, Failures: 0

Those 90 parameterized security cases exercise DocumentBuilderFactory and SchemaFactory directly:Billion Laughs, external general and parameter entities, external DTD, doctype handling, the entity-resolver floor, XInclude, and schema import / include / redefine / schemaLocation, across providers. They are backed by ~2,800 lines of test code and their leaked-resource fixtures, against a library of ~3,000 lines of hardening logic.

Put next to the handful of lines of hardening Log4j would otherwise inline, that is two to three orders of magnitude of security code and tests that adopting the library imports, and that reimplementing the hardening would mean writing and owning. Reimplementing the hardening means reimplementing the tests too.

Lightning rod

GHSA-xm28-xvqc-gxxg
(High, CVSS 8.2, fixed in 0.1.2) is precisely Log4j's path:
newDocumentBuilderFactory() + setXIncludeAware(true) on the stock JDK provider.
It shows that even someone well versed in JAXP can get a detail slightly wrong, here believing that XInclude fallback resolution is governed by ACCESS_EXTERNAL_DTD or ACCESS_EXTERNAL_SCHEMA.

The relevant point for Log4j is not the bug; it is where the report landed. A dedicated, threat-modeled library acts as a lightning rod: it attracts the scanners and the researchers and triages them against a published model. More often than not it can reject a report on every consumer's behalf; when a report is valid, it ships one fix for everybody.

A consuming project can then meet the steady stream of XXE scanner reports with a single response that points at the library's threat model, instead of re-triaging each one in its own parser code.

Scope of this PR

  • XmlFactories.newDocumentBuilderFactory() in newDocumentBuilder (removes disableDtdProcessing / setFeature).
  • XmlFactories.newSchemaFactory() in validateDocument.
  • OSGi (log4j-osgi-test) wiring.
  • No behavior change: a forbidden external fetch fails parsing, which yields the default configuration, as before.
  • Our part of the contract (ALLOWED_PROTOCOLS on includes and schema imports) is already covered by the tests added in Resolve XML configuration resources through ConfigurationSource #4161.

The changelog <issue> will be set to this PR number once assigned.

Adoption modalities

Commons XML does not have to be an external dependency. In apache/commons-xml#5 the DocumentBuilderFactory-relevant code was moved into a single class, so Log4j can shade and relocate just that part. An early attempt (with minimizeJar enabled) pulled about 30% of the library into Log4j, a figure that can certainly be improved.

Shading and relocating part of a library, rather than all of it, is the intended use of the Maven Shade Plugin, one that even I, a fervent opponent of shading, approve of.

Replace the hand-rolled `DocumentBuilderFactory` hardening in `XmlConfiguration`
with the hardened JAXP factories from `eu.copernik:copernik-xml-factory`, and
harden the schema-validation path the same way:

* `newDocumentBuilder` uses `XmlFactories.newDocumentBuilderFactory()`,
* `validateDocument` uses `XmlFactories.newSchemaFactory()`.

By the library's contract these factories, and everything they produce, never
fetch external entities, DTDs or schemas, across the JAXP implementations the
library recognizes. The OSGi integration tests link the `eu.copernik.xml.factory`
bundle.

Assisted-By: Claude Opus 4.8 <noreply@anthropic.com>
Assisted-By: Claude Opus 4.8 <noreply@anthropic.com>
`XmlConfigurationSchemaTest` exercises schema validation whose
`xsd:include`/`import` resources are resolved through an
`LSResourceResolver`. On JDK 8 Xerces enforces the `accessExternalSchema`
restriction on the resources the resolver returns; the
`isCreatedByResolver` exemption that lets resolver-supplied resources
bypass that check was only added in JDK 9. copernik-xml-factory sets
`accessExternalSchema` to "" at API precedence (deliberately not
overridable by a system property), so this resolution path cannot run on
JDK 8.

Skip the whole class on JDK 8 via a `@BeforeEach` assumption. The
previous guard only covered `schemaIncludeRespectsAllowedProtocols`
(which already passed on JDK 8), leaving the actually-failing
`validConfigurationValidates` unguarded.

Assisted-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant