Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 22 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,28 @@ The application runs as a multi-container setup:
1. Requests come through nginx proxy
2. Varnish provides caching layer
3. LinkedDataHub application handles business logic
4. Data persisted to appropriate Fuseki triplestore
5. XSLT transforms data for client presentation
4. RDF data is read/written via the **Graph Store Protocol** — each document in the hierarchy corresponds to a named graph in the triplestore; the document URI is the graph name
5. Data persisted to appropriate Fuseki triplestore
6. XSLT transforms data for client presentation

### Linked Data Proxy and Client-Side Rendering

LDH includes a Linked Data proxy that dereferences external URIs on behalf of the browser. The original design rendered proxied resources identically to local ones — server-side RDF fetch + XSLT. This created a DDoS/resource-exhaustion vector: scraper bots routing arbitrary external URIs through the proxy would trigger a full server-side pipeline (HTTP fetch → XSLT rendering) per request, exhausting HTTP connection pools and CPU.

The current design splits rendering by request origin:

- **Browser requests** (`Accept: text/html`): `ProxyRequestFilter` bypasses the proxy entirely. The server returns the local application shell. Saxon-JS then issues a second, RDF-typed request (`Accept: application/rdf+xml`) from the browser.
- **RDF requests** (API clients, Saxon-JS second pass): `ProxyRequestFilter` fetches the external RDF, parses it, and returns it to the caller. No XSLT happens server-side.
- **Client-side rendering**: Saxon-JS receives the raw RDF and applies the same XSLT 3 templates used server-side (shared stylesheet), so proxied resources look almost identical to local ones.

Key implementation files:
- `ProxyRequestFilter.java` — intercepts `?uri=` and `lapp:Dataset` proxy requests; HTML bypass; forwards external `Link` headers
- `ApplicationFilter.java` — registers external proxy target URI in request context (`AC.uri` property) as authoritative proxy marker
- `ResponseHeadersFilter.java` — skips local-only hypermedia links (`sd:endpoint`, `ldt:ontology`, `ac:stylesheet`) for proxy requests; external ones are forwarded by `ProxyRequestFilter`
- `client.xsl` (`ldh:rdf-document-response`) — receives the RDF proxy response client-side; extracts `sd:endpoint` from `Link` header; stores it in `LinkedDataHub.endpoint`
- `functions.xsl` (`sd:endpoint()`) — returns `LinkedDataHub.endpoint` when set (external proxy), otherwise falls back to the local SPARQL endpoint

The SPARQL endpoint forwarding chain ensures ContentMode blocks (charts, maps) query the **remote** app's SPARQL endpoint, not the local one. `LinkedDataHub.endpoint` is reset to the local endpoint by `ldh:HTMLDocumentLoaded` on every HTML page navigation, so there is no stale state when navigating back to local documents.

### Key Extension Points
- **Vocabulary definitions** in `com.atomgraph.linkeddatahub.vocabulary`
Expand Down
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,8 @@ ENV MAX_TOTAL_CONN=40

ENV MAX_REQUEST_RETRIES=3

ENV CONNECTION_REQUEST_TIMEOUT=30000

ENV IMPORT_KEEPALIVE=

ENV MAX_IMPORT_THREADS=10
Expand Down
1 change: 1 addition & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ services:
- SIGN_UP_CERT_VALIDITY=180
- MAX_CONTENT_LENGTH=${MAX_CONTENT_LENGTH:-2097152}
- ALLOW_INTERNAL_URLS=${ALLOW_INTERNAL_URLS:-}
- CONNECTION_REQUEST_TIMEOUT=${CONNECTION_REQUEST_TIMEOUT:-}
- NOTIFICATION_ADDRESS=LinkedDataHub <notifications@localhost>
- MAIL_SMTP_HOST=email-server
- MAIL_SMTP_PORT=25
Expand Down
4 changes: 4 additions & 0 deletions platform/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1037,6 +1037,10 @@ if [ -n "$ALLOW_INTERNAL_URLS" ]; then
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.atomgraph.linkeddatahub.allowInternalUrls=$ALLOW_INTERNAL_URLS"
fi

if [ -n "$CONNECTION_REQUEST_TIMEOUT" ]; then
export CATALINA_OPTS="$CATALINA_OPTS -Dcom.atomgraph.linkeddatahub.connectionRequestTimeout=$CONNECTION_REQUEST_TIMEOUT"
fi

if [ -n "$MAX_CONTENT_LENGTH" ]; then
MAX_CONTENT_LENGTH_PARAM="--stringparam ldhc:maxContentLength '$MAX_CONTENT_LENGTH' "
fi
Expand Down
31 changes: 21 additions & 10 deletions src/main/java/com/atomgraph/linkeddatahub/Application.java
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,7 @@
import org.apache.http.HttpClientConnection;
import org.apache.http.HttpHost;
import org.apache.http.client.HttpRequestRetryHandler;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.config.Registry;
import org.apache.http.config.RegistryBuilder;
import org.apache.http.conn.socket.ConnectionSocketFactory;
Expand Down Expand Up @@ -358,6 +359,8 @@ public Application(@Context ServletConfig servletConfig) throws URISyntaxExcepti
servletConfig.getServletContext().getInitParameter(LDHC.maxConnPerRoute.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.maxConnPerRoute.getURI())) : null,
servletConfig.getServletContext().getInitParameter(LDHC.maxTotalConn.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.maxTotalConn.getURI())) : null,
servletConfig.getServletContext().getInitParameter(LDHC.maxRequestRetries.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.maxRequestRetries.getURI())) : null,
System.getProperty("com.atomgraph.linkeddatahub.connectionRequestTimeout") != null ? Integer.valueOf(System.getProperty("com.atomgraph.linkeddatahub.connectionRequestTimeout")) :
servletConfig.getServletContext().getInitParameter(LDHC.connectionRequestTimeout.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.connectionRequestTimeout.getURI())) : null,
servletConfig.getServletContext().getInitParameter(LDHC.maxImportThreads.getURI()) != null ? Integer.valueOf(servletConfig.getServletContext().getInitParameter(LDHC.maxImportThreads.getURI())) : null,
servletConfig.getServletContext().getInitParameter(LDHC.notificationAddress.getURI()) != null ? servletConfig.getServletContext().getInitParameter(LDHC.notificationAddress.getURI()) : null,
servletConfig.getServletContext().getInitParameter(LDHC.supportedLanguages.getURI()) != null ? servletConfig.getServletContext().getInitParameter(LDHC.supportedLanguages.getURI()) : null,
Expand Down Expand Up @@ -445,7 +448,7 @@ public Application(final ServletConfig servletConfig, final MediaTypes mediaType
final String baseURIString, final String proxyScheme, final String proxyHostname, final Integer proxyPort,
final String uploadRootString, final boolean invalidateCache,
final Integer cookieMaxAge, final boolean enableLinkedDataProxy, final boolean allowInternalUrls, final Integer maxContentLength,
final Integer maxConnPerRoute, final Integer maxTotalConn, final Integer maxRequestRetries, final Integer maxImportThreads,
final Integer maxConnPerRoute, final Integer maxTotalConn, final Integer maxRequestRetries, final Integer connectionRequestTimeout, final Integer maxImportThreads,
final String notificationAddressString, final String supportedLanguageCodes, final boolean enableWebIDSignUp, final String oidcRefreshTokensPropertiesPath,
final String frontendProxyString, final String backendProxyAdminString, final String backendProxyEndUserString,
final String mailUser, final String mailPassword, final String smtpHost, final String smtpPort,
Expand Down Expand Up @@ -709,10 +712,10 @@ public Application(final ServletConfig servletConfig, final MediaTypes mediaType
trustStore.load(trustStoreInputStream, clientTrustStorePassword.toCharArray());
}

client = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, null, false);
externalClient = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, null, false);
importClient = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, maxRequestRetries, true);
noCertClient = getNoCertClient(trustStore, maxConnPerRoute, maxTotalConn, maxRequestRetries);
client = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, null, false, connectionRequestTimeout);
externalClient = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, null, false, connectionRequestTimeout);
importClient = getClient(keyStore, clientKeyStorePassword, trustStore, maxConnPerRoute, maxTotalConn, maxRequestRetries, true, connectionRequestTimeout);
noCertClient = getNoCertClient(trustStore, maxConnPerRoute, maxTotalConn, maxRequestRetries, connectionRequestTimeout);

if (maxContentLength != null)
{
Expand Down Expand Up @@ -1527,7 +1530,7 @@ public void submitImport(RDFImport rdfImport, com.atomgraph.linkeddatahub.apps.m
* @throws UnrecoverableKeyException key loading error
* @throws KeyManagementException key loading error
*/
public static Client getClient(KeyStore keyStore, String keyStorePassword, KeyStore trustStore, Integer maxConnPerRoute, Integer maxTotalConn, Integer maxRequestRetries, boolean buffered) throws NoSuchAlgorithmException, KeyStoreException, UnrecoverableKeyException, KeyManagementException
public static Client getClient(KeyStore keyStore, String keyStorePassword, KeyStore trustStore, Integer maxConnPerRoute, Integer maxTotalConn, Integer maxRequestRetries, boolean buffered, Integer connectionRequestTimeout) throws NoSuchAlgorithmException, KeyStoreException, UnrecoverableKeyException, KeyManagementException
{
if (keyStore == null) throw new IllegalArgumentException("KeyStore cannot be null");
if (keyStorePassword == null) throw new IllegalArgumentException("KeyStore password string cannot be null");
Expand Down Expand Up @@ -1592,7 +1595,11 @@ public void releaseConnection(final HttpClientConnection managedConn, final Obje
config.property(ClientProperties.FOLLOW_REDIRECTS, true);
config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, RequestEntityProcessing.BUFFERED); // https://stackoverflow.com/questions/42139436/jersey-client-throws-cannot-retry-request-with-a-non-repeatable-request-entity
config.property(ApacheClientProperties.CONNECTION_MANAGER, conman);

if (connectionRequestTimeout != null)
config.property(ApacheClientProperties.REQUEST_CONFIG, RequestConfig.custom().
setConnectionRequestTimeout(connectionRequestTimeout).
build());

if (maxRequestRetries != null)
config.property(ApacheClientProperties.RETRY_HANDLER, (HttpRequestRetryHandler) (IOException ex, int executionCount, HttpContext context) ->
{
Expand Down Expand Up @@ -1629,7 +1636,7 @@ public void releaseConnection(final HttpClientConnection managedConn, final Obje
* @param maxRequestRetries maximum number of times that the HTTP client will retry a request
* @return client instance
*/
public static Client getNoCertClient(KeyStore trustStore, Integer maxConnPerRoute, Integer maxTotalConn, Integer maxRequestRetries)
public static Client getNoCertClient(KeyStore trustStore, Integer maxConnPerRoute, Integer maxTotalConn, Integer maxRequestRetries, Integer connectionRequestTimeout)
{
try
{
Expand Down Expand Up @@ -1688,7 +1695,11 @@ public void releaseConnection(final HttpClientConnection managedConn, final Obje
config.property(ClientProperties.FOLLOW_REDIRECTS, true);
config.property(ClientProperties.REQUEST_ENTITY_PROCESSING, RequestEntityProcessing.BUFFERED); // https://stackoverflow.com/questions/42139436/jersey-client-throws-cannot-retry-request-with-a-non-repeatable-request-entity
config.property(ApacheClientProperties.CONNECTION_MANAGER, conman);

if (connectionRequestTimeout != null)
config.property(ApacheClientProperties.REQUEST_CONFIG, RequestConfig.custom().
setConnectionRequestTimeout(connectionRequestTimeout).
build());

if (maxRequestRetries != null)
config.property(ApacheClientProperties.RETRY_HANDLER, (HttpRequestRetryHandler) (IOException ex, int executionCount, HttpContext context) ->
{
Expand All @@ -1708,7 +1719,7 @@ public void releaseConnection(final HttpClientConnection managedConn, final Obje
}
return false;
});

return ClientBuilder.newBuilder().
withConfig(config).
sslContext(ctx).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,23 @@ public void filter(ContainerRequestContext request) throws IOException

requestURI = builder.build();
}
else requestURI = request.getUriInfo().getRequestUri();
else
{
request.setProperty(AC.uri.getURI(), graphURI); // authoritative external proxy marker

// strip ?uri= from the effective request URI — server-side sees only the path;
// the ContainerRequestContext property is the sole indicator of proxy mode
MultivaluedMap<String, String> externalQueryParams = new MultivaluedHashMap();
externalQueryParams.putAll(request.getUriInfo().getQueryParameters());
externalQueryParams.remove(AC.uri.getLocalName());

UriBuilder externalBuilder = UriBuilder.fromUri(request.getUriInfo().getAbsolutePath());
for (Entry<String, List<String>> params : externalQueryParams.entrySet())
for (String value : params.getValue())
externalBuilder.queryParam(params.getKey(), value);

requestURI = externalBuilder.build();
}
}
catch (URISyntaxException ex)
{
Expand Down
Loading
Loading