A PostgreSQL query performance analysis tool that detects slow queries, collects execution plans, identifies common bottlenecks, and generates optimization reports.
This project is designed to demonstrate practical backend engineering, SQL performance tuning, PostgreSQL internals, query plan analysis, and database observability — implemented as a Java/Spring Boot CLI application.
sql-query-performance-analyzer connects to a PostgreSQL database, collects query statistics, runs query plan analysis, and produces actionable performance reports.
The tool can help identify:
- slow queries
- expensive sequential scans
- missing or unused indexes
- inefficient joins
- costly sorts
- high planning or execution time
- large row-estimation errors
- queries with high total database impact
- candidate indexes for optimization
The goal is not to replace professional database monitoring platforms. Instead, this project implements the core ideas behind SQL query observability and performance analysis in a compact, readable backend tool.
SQL performance is a common bottleneck in backend systems. A single inefficient query can increase latency, consume CPU and memory, lock database resources, and degrade the entire service.
This project explores important database and backend concepts:
- SQL query execution plans
- PostgreSQL
EXPLAINandEXPLAIN ANALYZE - index selection
- table statistics
- query latency analysis
- row-estimation accuracy
- backend observability
- report generation
- safe database inspection
- Connect to PostgreSQL
- Collect slow queries from
pg_stat_statements - Sort queries by total execution time
- Filter queries by minimum mean execution time and call count
- Export collected query statistics as JSON
- Run
EXPLAIN - Run
EXPLAIN ANALYZE, optional and configurable - Parse JSON execution plans
- Detect sequential scans
- Detect nested loop joins
- Detect expensive sorts
- Detect hash joins and merge joins
- Detect row-estimation mismatch
- Detect high-cost plan nodes
- Suggest candidate indexes for large sequential scans
- Detect filters that may benefit from indexing
- Detect join columns that may benefit from indexing
- Detect order-by columns that may benefit from indexing
- Avoid duplicate recommendations when an equivalent index already exists
- Generate Markdown reports
- Generate JSON reports
- Include query text, statistics, plan summary, and recommendations
- Include before/after plan comparison (
comparecommand) - Include risk notes for each recommendation
- CLI interface (Spring Boot,
analyze/explain/compare) - Docker Compose demo environment
- Sample PostgreSQL database
- Seed data and intentionally slow queries
- Unit tests (JUnit 5)
- Integration tests (Testcontainers)
This tool can answer questions such as:
- Which queries consume the most total database time?
- Which queries perform sequential scans on large tables?
- Which indexes may improve query performance?
- Which execution plans have poor row-estimation accuracy?
- Did an index improve the query plan after being added?
- Are slow queries caused by scans, joins, sorts, or missing filters?
The codebase is organized as a small pipeline of single-responsibility layers, each its own Java package, wired together with Spring's dependency injection rather than manual new-ing or a service locator:
+----------------------+
| CLI (Spring Boot) |
|----------------------|
| analyze |
| explain |
| compare |
+----------+-----------+
|
v
+----------------------+
| Query Collector |
|----------------------|
| pg_stat_statements |
| query filters |
+----------+-----------+
|
v
+----------------------+
| Plan Analyzer |
|----------------------|
| EXPLAIN JSON parser |
| bottleneck detectors |
| row estimation |
+----------+-----------+
|
v
+----------------------+
| Recommendation Engine|
|----------------------|
| index candidates |
| existing-index check |
| risk notes |
+----------+-----------+
|
v
+----------------------+
| Report Generator |
|----------------------|
| Markdown |
| JSON |
| before/after compare |
+----------------------+
- Pluggable detectors (strategy pattern). Each bottleneck check (
SeqScanDetector,JoinDetector,SortDetector,RowEstimationDetector,HighCostNodeDetector) implements a singlePlanDetectorinterface and is a Spring@Component.DetectorChaincollects every implementation through a constructor-injectedList<PlanDetector>— adding a new detector is a new class with no registry or switch statement to edit. - Immutable domain model.
PlanNode,Finding,IndexRecommendation,QueryStat, andAnalysisResultare all Javarecords. The plan tree (PlanNode) is a recursive, structurally-shared, side-effect-free value object, which makes the detectors and recommender pure functions over data — easy to unit test with hand-built trees and no mocking. - Heuristic recommender, not a SQL parser.
IndexCandidateExtractordeliberately uses targeted regular expressions overEXPLAINfilter/join/sort text instead of building a SQL AST. This keeps the recommendation logic small and inspectable, at the explicit cost of missing multi-predicate composite indexes — a deliberate tradeoff kept explicit rather than hidden. - Config-as-CLI-arguments.
AnalyzerPropertiesis the single source of truth for every tunable (thresholds, output format, database URL), bound fromapplication.yml. Spring adds command-line arguments to theEnvironmentbefore beans are constructed, so--analysis.limit=50overrides the YAML default with zero custom argument-parsing code. - Fail-soft per query, not per run.
pg_stat_statementsnormalizes literals to$1,$2, ... placeholders, whichEXPLAINcannot bind.AnalyzeCommandcatches that specific failure per query and records it as aFindinginstead of aborting the whole batch — one unexplainable query doesn't take down a 50-query report. - Translation boundary for the legacy connection string.
DatabaseUrlParseris the one place that understands thepostgres://user:pass@host:port/dbURL shape, isolating that parsing from both the JDBC/Hikari layer and the rest of the app.
1. Connect to PostgreSQL
2. Read query statistics from pg_stat_statements
3. Select candidate queries based on mean/total execution time and call count
4. Run EXPLAIN or EXPLAIN ANALYZE
5. Parse the execution plan
6. Detect bottlenecks
7. Generate index recommendations
8. Write a report
See examples/sample_report.md for a real report generated by this tool against the demo dataset. Excerpt:
## Query ID: 5129619640771364291
Mean Time: 18.4 ms
Calls: 10
Total Time: 184.0 ms
Query:
SELECT * FROM orders WHERE customer_id = ? ORDER BY created_at DESC
Plan Summary:
- Sort (estimated 125 rows, actual 25130 rows)
- Seq Scan on orders (estimated 125 rows, actual 25130 rows)
Findings:
- Large sequential scan detected on orders (25130 rows)
- Expensive sort detected on created_at DESC (25130 rows)
- Row estimation mismatch on Seq Scan: estimated 125 rows, actual 25130 rows (201.0x)
Recommendations:
1. Filter column "customer_id" used in a sequential scan on orders:
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
2. ORDER BY columns [created_at] caused an expensive sort:
CREATE INDEX idx_orders_created_at ON orders (created_at);
Risk Notes:
- Indexes improve reads but add write overhead.
- Validate with EXPLAIN ANALYZE before applying in production.
This is a Spring Boot CLI application, so options use --key=value syntax (not space-separated, unlike a typical Go flag-style CLI). Config-mirroring options map directly onto application.yml's structure (e.g. --database.url=..., --report.output=...); command-specific options (--query=, --before=, --after=) are parsed per-command. See docs/CLI_USAGE.md for the full reference.
java -jar target/sql-analyzer.jar analyze \
--database.url=postgres://user:password@localhost:5432/appdb?sslmode=disable \
--analysis.min-mean-time=50ms \
--analysis.limit=20 \
--report.output=report.mdjava -jar target/sql-analyzer.jar analyze \
--database.url=postgres://user:password@localhost:5432/appdb?sslmode=disable \
--report.format=json \
--report.output=report.jsonjava -jar target/sql-analyzer.jar explain \
--database.url=postgres://user:password@localhost:5432/appdb?sslmode=disable \
--query="SELECT * FROM orders WHERE customer_id = 42"java -jar target/sql-analyzer.jar compare \
--before=examples/sample_plan.json \
--after=examples/sample_plan_after.json \
--report.output=comparison.mdStart the demo Postgres instance:
make docker-upRun migrations, seed data, and the intentionally slow sample queries:
make migrate
make seed
make run-slow-queriesAnalyze the demo database:
make analyze-demoRun unit tests:
mvn testRun unit + Testcontainers integration tests:
mvn verifyapplication.yml mirrors this shape (any field is overridable on the command line, e.g. --analysis.limit=20):
database:
url: "postgres://postgres:postgres@localhost:5432/demo?sslmode=disable"
statement-timeout: "5s"
analysis:
query-source: "pg_stat_statements"
min-mean-time: "50ms"
min-calls: 5
limit: 20
run-explain-analyze: false
detect-row-estimation-error: true
row-estimation-error-threshold: 10.0
recommendations:
suggest-indexes: true
check-existing-indexes: true
include-risk-notes: true
report:
format: "markdown"
output: "report.md"sql-query-performance-analyzer/
├── README.md
├── pom.xml
├── Dockerfile
├── docker-compose.yml
├── Makefile
├── src/
│ ├── main/
│ │ ├── java/com/sqlanalyzer/
│ │ │ ├── SqlAnalyzerApplication.java
│ │ │ ├── cli/ # analyze / explain / compare subcommands
│ │ │ ├── config/ # AnalyzerProperties, DataSourceConfig, JsonConfig
│ │ │ ├── db/ # DatabaseUrlParser (postgres:// -> jdbc:postgresql://)
│ │ │ ├── collector/ # pg_stat_statements collection
│ │ │ ├── explain/ # EXPLAIN runner + JSON plan parser
│ │ │ ├── detector/ # pluggable PlanDetector implementations
│ │ │ ├── index/ # index candidate extraction + recommendation
│ │ │ ├── report/ # Markdown/JSON report generation
│ │ │ └── common/ # SqlRedactor
│ │ └── resources/
│ │ ├── application.yml
│ │ └── db/migration/ # Flyway: demo schema + pg_stat_statements
│ └── test/
│ ├── java/com/sqlanalyzer/ # unit tests + *IT.java integration tests
│ └── resources/
│ ├── fixtures/ # captured EXPLAIN JSON samples
│ └── examples/
├── seed/
│ └── seed.sql
├── examples/
│ ├── slow_queries.sql
│ ├── sample_report.md
│ ├── sample_report.json
│ ├── sample_plan.json
│ └── sample_plan_after.json
└── docs/
└── CLI_USAGE.md
This project uses PostgreSQL with pg_stat_statements. The extension must be loaded at server start (not just CREATE EXTENSION'd), since it requires shared_preload_libraries:
# postgresql.conf, or as a command-line flag:
postgres -c shared_preload_libraries=pg_stat_statementsCREATE EXTENSION IF NOT EXISTS pg_stat_statements;The Docker Compose demo environment and the Testcontainers-based integration tests both configure this automatically.
The analyzer parses PostgreSQL JSON plans (com.sqlanalyzer.explain.PlanJsonParser) and inspects plan nodes via pluggable detectors (com.sqlanalyzer.detector).
Plan node types detected:
Seq Scan
Index Scan
Index Only Scan
Bitmap Heap Scan
Nested Loop
Hash Join
Merge Join
Sort
Aggregate
Limit
Gather
| Finding | Meaning |
|---|---|
| Large sequential scan | Query scans a large table without using an index |
| Expensive sort | Query sorts many rows, possibly needing an index |
| Nested loop over large input | Join may be inefficient for large tables |
| Row-estimation mismatch | Planner estimated far fewer or more rows than actual |
| High-cost plan node | Any plan node whose total cost exceeds a threshold |
The index recommender (com.sqlanalyzer.index.IndexRecommender) uses simple text heuristics over plan filter/join/sort expressions — not a full SQL parser. It may recommend indexes when:
- a large table uses a sequential scan with a filter
- a join column (from a hash/merge join condition) lacks an index
- an
ORDER BYcolumn causes an expensive sort
Recommendations are checked against pg_indexes (com.sqlanalyzer.index.ExistingIndexInspector) and skipped if an equivalent index already exists.
Important: recommendations are candidates, not automatic changes. Indexes should be reviewed and tested before being applied.
- Uses
EXPLAINby default;EXPLAIN ANALYZErequires--analysis.run-explain-analyze=true(or per-command--analyze=true). - Applies a configurable
statement_timeouton every connection. - Does not run destructive SQL or automatically create indexes.
- Best-effort redaction of literal values in reports (
com.sqlanalyzer.common.SqlRedactor). - Queries collected from
pg_stat_statementsthat contain unresolved$Nplaceholders (most do, since Postgres normalizes literals) cannot be re-run throughEXPLAINdirectly — theanalyzecommand reports this per-query as a finding rather than failing the whole run. Useexplain --query="..."with a concrete literal query for full plan analysis of a specific case.
Unit tests (no database, fast): JSON plan parsing, each PlanDetector, index candidate extraction and recommendation deduplication, Markdown/JSON report formatting, SQL redaction, config binding.
Integration tests (Testcontainers, real PostgreSQL, run via mvn verify): query collection, EXPLAIN/EXPLAIN ANALYZE against real plans, existing-index inspection, and the full analyze pipeline end-to-end. All integration test classes share a single Testcontainers Postgres instance (com.sqlanalyzer.testsupport.AbstractIntegrationTest) configured with shared_preload_libraries=pg_stat_statements.
$ mvn verify
Unit tests (mvn test)
Tests run: 28, Failures: 0, Errors: 0, Skipped: 0
Integration tests (Testcontainers PostgreSQL, mvn verify)
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0
Total: 35 tests, 100% passing
BUILD SUCCESS
| Layer | Coverage |
|---|---|
explain |
JSON plan parsing — flat nodes, nested children, sort keys, real EXPLAIN ANALYZE output |
detector |
One test class per detector: seq scan, joins, sort, row-estimation mismatch, high-cost nodes |
index |
Candidate extraction (filter/join/order-by columns), recommendation dedup against existing indexes |
report |
Markdown/JSON rendering, before/after plan comparison deltas |
collector / cli |
Real pg_stat_statements collection and the full analyze pipeline against a live, Testcontainers-managed PostgreSQL 16 instance |
The demo schema (seed/seed.sql, examples/slow_queries.sql) includes intentionally slow query patterns:
- missing index on
orders.customer_idandevents.customer_id - sequential scan on a 100k-row table
- expensive
ORDER BYwithout a supporting index - inefficient hash join across two unindexed foreign keys
- a high-frequency, low-latency query (demonstrates total-time vs mean-time ranking)
- SQL performance tuning
- PostgreSQL query plan analysis
- backend observability
- database metadata inspection
- index design tradeoffs
- Java/Spring Boot CLI development
- structured report generation
- integration testing with Testcontainers and Docker
- performance-focused engineering
- PostgreSQL EXPLAIN documentation
- PostgreSQL
pg_stat_statementsmodule - PostgreSQL query planning overview
- PostgreSQL
pg_indexessystem catalog - Using EXPLAIN — PostgreSQL wiki
- Slow Query Questions — PostgreSQL wiki
- Testcontainers for Java
- Spring Boot CLI documentation
- Flyway database migrations
This project is licensed under the MIT License. See LICENSE for details.