This repository currently captures the product and technical direction for a focused Java performance profiling system on Kubernetes.
The target problem is narrower than a general observability platform:
- profile Java services running in Kubernetes
- use node-local collection
- control enablement through Kubernetes metadata
- store results in ClickHouse
- present a small, service-centric service-diagnosis UI for profile, target status, and ingestion investigation
The repository is transitioning from documentation-only into implementation. The source of truth remains the documentation under docs/, and the first code scaffolding now lives under:
cmd/backendcmd/collectorbackend/internalcollector/internalcontracts/profilingjava-helper/thread-diagnosticsexamples/jdk17-http-demowebdeploy
docs/brainstorms/java-profiler-requirements.md- primary requirements draft
- problem frame, actors, flows, acceptance examples, scope boundaries
docs/architecture/java-profiler-architecture.md- software architecture
- collector, backend, ClickHouse, query, and UI boundaries
docs/architecture/performance-ingestion-architecture-review.md- performance architecture review for OOM, batch upload, ingestion limits, and ClickHouse query pressure
docs/research/coroot-node-agent-java-agent.md- research notes on Coroot's Java agent and async-profiler-related behavior
docs/operations/java-profiling-runbook.md- install-time and incident-time operator workflow
docs/operations/deployment-operations-admin-manual.md- deployment, operations, security, storage, upgrade, and platform troubleshooting manual
docs/operations/performance-analysis-user-manual.md- Java service owner workflow for the service-diagnosis page, including CPU, memory allocation, lock, deadlock, target status, and ingestion analysis
docs/operations/real-profiling-acceptance-standard.md- mandatory real Kubernetes acceptance standard for collector, ingestion, profile storage, query API, and UI changes
The current design assumes:
- Kubernetes DaemonSet collection
- opt-in profiling through annotations or labels
- HotSpot-compatible JVMs in the first version
- async-profiler for CPU, allocation, and lock profiling
- bounded retention with no collected data older than 7 days
- ClickHouse as the primary query and storage layer
- metrics exposed through collector/backend exporters only, with Prometheus-series services owning metric storage and dashboards
- a lightweight, self-owned UI rather than a broad observability workspace
- collector and backend Go container images built from
ghcr.io/koolay/library/golang:1.26.0 - Kubernetes deployment artifacts under
deploy/helm
The first version does not include:
- Pyroscope, Parca, Grafana, or other incompatible profile backends
- non-Java profiling
- OpenJ9 support
- distributed ClickHouse
- heap dump analysis or retained-heap dominator analysis
- general-purpose tracing, log analysis, or service map features
- Prometheus metrics storage or dashboard replacement
cmd/
backend/
collector/
backend/
internal/
collector/
internal/
contracts/
profiling/
java-helper/
thread-diagnostics/
examples/
jdk17-http-demo/
web/
src/
deploy/
helm/
docs/
architecture/
brainstorms/
operations/
research/
plans/
go test ./...
javac --release 11 java-helper/thread-diagnostics/src/main/java/com/ebpfjava/threads/*.java
cd examples/jdk17-http-demo && mvn test
cd web && npm install && npm test && npm run buildOptional local ClickHouse-compatible smoke check using chDB:
scripts/verify-chdb-local.shThe script skips cleanly when libchdb is not installed. Use CHDB_REQUIRED=1 to make missing chDB fail automation.
Real Kubernetes acceptance, including screenshots/video and target restart-count evidence, is handled by:
scripts/real-acceptance.sh --helpFor profiling or UI changes, passing real acceptance means proving non-empty CPU, allocation, and lock-delay profile data from the current Kubernetes run window, plus browser UI acceptance against that real backend data. See docs/operations/real-profiling-acceptance-standard.md.
When adding implementation or additional docs, keep them aligned with the requirements document. If a new assumption changes the product shape, update the docs first or in the same change.