From 12b3aceaf271a9c2d7f0ad0f42da699c74590bf8 Mon Sep 17 00:00:00 2001 From: Dan Lynch Date: Thu, 26 Feb 2026 11:14:14 +0000 Subject: [PATCH] docs(plan): add GraphQL-based pgpm export feature plan --- docs/plan/graphql-export.md | 351 ++++++++++++++++++++++++++++++++++++ 1 file changed, 351 insertions(+) create mode 100644 docs/plan/graphql-export.md diff --git a/docs/plan/graphql-export.md b/docs/plan/graphql-export.md new file mode 100644 index 000000000..5a2cfda14 --- /dev/null +++ b/docs/plan/graphql-export.md @@ -0,0 +1,351 @@ +# Plan: GraphQL-Based `pgpm export` + +## Problem + +The current `pgpm export` command fetches all data via direct SQL queries against the PostgreSQL database (using `pg-cache` / `pg` pool). Customers who use Constructive's hosted platform do **not** have direct SQL access to their databases — they only have access to the **GraphQL API** served by the Constructive server (`graphql/server`). + +This means customers cannot run `pgpm export` today because the command requires a direct database connection. + +## Goal + +Create an alternative data-fetching backend for `pgpm export` that uses **GraphQL queries** instead of raw SQL, so the export can run against a customer's GraphQL endpoint (e.g. `https://api.example.com/graphql`). + +The developer should be able to run something like: + +```bash +pgpm export --graphql-endpoint https://api.example.com/graphql --token +``` + +--- + +## Current Architecture + +### Export Flow Overview + +The export command has three layers: + +1. **CLI layer** — `pgpm/cli/src/commands/export.ts` +2. **Core orchestrator** — `pgpm/core/src/export/export-migrations.ts` +3. **Meta exporter** — `pgpm/core/src/export/export-meta.ts` + +### SQL Queries Used Today + +#### CLI layer (`pgpm/cli/src/commands/export.ts`) + +| # | Query | Purpose | +|---|-------|---------| +| 1 | `SELECT datname FROM pg_catalog.pg_database ...` | List available Postgres databases | +| 2 | `SELECT id, name FROM metaschema_public.database` | List database entries in metaschema | +| 3 | `SELECT * FROM metaschema_public.schema WHERE database_id = $1` | List schemas for a database | + +#### Core orchestrator (`pgpm/core/src/export/export-migrations.ts`) + +| # | Query | Purpose | +|---|-------|---------| +| 4 | `SELECT * FROM metaschema_public.database WHERE id = $1` | Get database record | +| 5 | `SELECT * FROM metaschema_public.schema WHERE database_id = $1` | Get schemas | +| 6 | `SELECT * FROM db_migrate.sql_actions WHERE database_id = $1 ORDER BY id` | Get migration SQL actions | + +#### Meta exporter (`pgpm/core/src/export/export-meta.ts`) + +~50+ queries of the form: + +```sql +SELECT * FROM . WHERE database_id = $1 +``` + +Across three schemas: +- **`metaschema_public`** — `database`, `schema`, `table`, `field`, `policy`, `index`, `trigger`, `trigger_function`, `rls_function`, `limit_function`, `procedure`, `foreign_key_constraint`, `primary_key_constraint`, `unique_constraint`, `check_constraint`, `full_text_search`, `schema_grant`, `table_grant` +- **`services_public`** — `domains`, `sites`, `apis`, `apps`, `site_modules`, `site_themes`, `site_metadata`, `api_modules`, `api_extensions`, `api_schemas` +- **`metaschema_modules_public`** — `rls_module`, `user_auth_module`, `memberships_module`, `permissions_module`, `limits_module`, `levels_module`, `users_module`, `hierarchy_module`, `membership_types_module`, `invites_module`, `emails_module`, `sessions_module`, `secrets_module`, `profiles_module`, `encrypted_secrets_module`, `connected_accounts_module`, `phone_numbers_module`, `crypto_addresses_module`, `crypto_auth_module`, `field_module`, `table_template_module`, `secure_table_provision`, `uuid_module`, `default_ids_module`, `denormalized_table_field` + +Additionally, `export-meta.ts` queries `information_schema.columns` to dynamically discover table columns for each table (via `getTableColumns()`). + +--- + +## GraphQL Availability Analysis + +The Constructive GraphQL server (PostGraphile v5) auto-generates a GraphQL schema from the exposed Postgres schemas. The schemas exposed are configured per-API via `services_public.api_schemas`. + +### Tables accessible via GraphQL + +All tables in these schemas are exposed through PostGraphile and will have auto-generated GraphQL queries: + +- `metaschema_public.*` — All 18+ tables +- `services_public.*` — All 10+ tables +- `metaschema_modules_public.*` — All 25+ modules tables + +For each table, PostGraphile generates: +- `alls` (connection query with filtering, pagination) +- `ById` (single-row lookup by primary key) + +For example, `metaschema_public.schema` becomes: +```graphql +query { + allSchemas(condition: { databaseId: "..." }) { + nodes { + id + databaseId + name + schemaName + description + isPublic + } + } +} +``` + +### Tables/queries NOT accessible via GraphQL + +| Query | Why | Mitigation | +|-------|-----|------------| +| `pg_catalog.pg_database` | System catalog, never exposed | Not needed — in GraphQL mode the user already knows their database | +| `db_migrate.sql_actions` | Internal migration schema, not in exposed schemas | **Blocker** — see below | +| `information_schema.columns` | System schema, never exposed | Not needed — field metadata already lives in `metaschema_public.field` | + +### Blocker: `db_migrate.sql_actions` + +The `db_migrate.sql_actions` table contains the actual migration SQL that was generated. This is the core data for the database-schema portion of the export. This table lives in the `db_migrate` schema which is **not** exposed via GraphQL. + +**Options:** + +1. **Expose `db_migrate` schema via a dedicated admin API** — Add it as an optional schema in the API configuration (requires server-side change) +2. **Create a custom GraphQL query/mutation** — A Graphile plugin that adds a `exportSqlActions(databaseId)` query returning the migration data +3. **Split the export** — Only export metadata via GraphQL (the meta portion), skip the migration SQL portion. The metadata export is the larger and more complex part anyway. +4. **Provide a REST endpoint** — Add a `/api/export/sql-actions` REST route that returns this data + +**Recommended:** Option 3 initially (metadata-only export via GraphQL), with Option 2 as a follow-up to support full export. + +--- + +## Proposed Architecture + +### Data Source Abstraction + +Create an `ExportDataSource` interface that abstracts how data is fetched. The existing SQL path becomes one implementation; the new GraphQL path becomes another. + +#### Interface + +```typescript +// pgpm/core/src/export/data-source.ts + +export interface ExportDataSource { + /** Fetch all rows from a metaschema/services/modules table filtered by database_id */ + fetchTable(schema: string, table: string, databaseId: string): Promise[]>; + + /** Fetch the database record by id */ + fetchDatabase(databaseId: string): Promise | null>; + + /** Fetch schemas for a database */ + fetchSchemas(databaseId: string): Promise[]>; + + /** Fetch migration SQL actions (may not be available in GraphQL mode) */ + fetchSqlActions?(databaseId: string): Promise[]>; + + /** List available databases (for interactive selection) */ + listDatabases?(): Promise<{ id: string; name: string }[]>; + + /** Clean up connections */ + close(): Promise; +} +``` + +#### SQL Implementation (existing behavior) + +```typescript +// pgpm/core/src/export/data-source-sql.ts + +export class SqlDataSource implements ExportDataSource { + constructor(private pool: Pool) {} + + async fetchTable(schema: string, table: string, databaseId: string) { + const result = await this.pool.query( + `SELECT * FROM ${schema}.${table} WHERE database_id = $1`, + [databaseId] + ); + return result.rows; + } + // ... etc +} +``` + +#### GraphQL Implementation (new) + +```typescript +// pgpm/core/src/export/data-source-graphql.ts + +export class GraphQLDataSource implements ExportDataSource { + constructor( + private endpoint: string, + private token?: string + ) {} + + async fetchTable(schema: string, table: string, databaseId: string) { + // Build GraphQL query using PostGraphile naming conventions: + // schema: metaschema_public, table: field + // => allFields(condition: { databaseId: "..." }) { nodes { ... } } + const queryName = toGraphQLCollectionName(schema, table); + const query = buildAllNodesQuery(queryName, databaseId); + const result = await this.executeQuery(query); + return extractNodes(result, queryName); + } + // ... etc +} +``` + +### Naming Convention Mapping + +PostGraphile transforms Postgres names to GraphQL names using inflection. The mapping follows: + +| Postgres | GraphQL Query | GraphQL Type | +|----------|--------------|--------------| +| `metaschema_public.database` | `allDatabases` | `Database` | +| `metaschema_public.schema` | `allSchemas` | `Schema` | +| `metaschema_public.table` | `allTables` | `Table` | +| `metaschema_public.field` | `allFields` | `Field` | +| `metaschema_public.foreign_key_constraint` | `allForeignKeyConstraints` | `ForeignKeyConstraint` | +| `services_public.domains` | `allDomains` | `Domain` | +| `services_public.apis` | `allApis` | `Api` | +| `metaschema_modules_public.rls_module` | `allRlsModules` | `RlsModule` | + +Column names: `database_id` => `databaseId`, `schema_name` => `schemaName`, etc. + +**Important:** The actual inflection is controlled by `graphile-settings` (specifically the `InflektPreset`). The developer should use **GraphQL introspection** at runtime to discover the exact names rather than hardcoding them. The existing `QueryBuilder` in `graphql/query` already handles this. + +--- + +## Files to Create / Modify + +### New Files + +| File | Purpose | +|------|---------| +| `pgpm/core/src/export/data-source.ts` | `ExportDataSource` interface definition | +| `pgpm/core/src/export/data-source-sql.ts` | SQL implementation (refactor existing code) | +| `pgpm/core/src/export/data-source-graphql.ts` | GraphQL implementation (new) | +| `pgpm/core/src/export/graphql-naming.ts` | PostGraphile naming convention helpers (postgres name => GraphQL query/field names) | +| `pgpm/cli/src/commands/export.ts` | Modify to accept `--graphql-endpoint` and `--token` flags | + +### Files to Modify + +| File | Change | +|------|--------| +| `pgpm/core/src/export/export-migrations.ts` | Refactor `exportMigrationsToDisk` to accept `ExportDataSource` instead of using `pg-cache` directly | +| `pgpm/core/src/export/export-meta.ts` | Refactor `exportMeta` to accept `ExportDataSource` instead of using `pg-cache` directly. The `queryAndParse` helper should use `dataSource.fetchTable()` instead of `pool.query()` | +| `pgpm/cli/src/commands/export.ts` | Add `--graphql-endpoint` / `--token` CLI flags; construct `GraphQLDataSource` when endpoint is provided, `SqlDataSource` otherwise | + +### Existing Packages to Leverage + +| Package | What to Use | +|---------|-------------| +| `graphql/query` (`@constructive-io/graphql-query`) | `QueryBuilder` for building GraphQL AST queries; `MetaObject` types | +| `graphql/query/src/executor.ts` | `QueryExecutor` for local execution (if connecting via connection string rather than HTTP endpoint) | +| `graphql` (v16) | `print()` for serializing DocumentNode to string; `parse()` for parsing responses | + +--- + +## Implementation Phases + +### Phase 1: Data Source Abstraction + +1. Define the `ExportDataSource` interface in `pgpm/core/src/export/data-source.ts` +2. Create `SqlDataSource` that wraps the existing `pg` pool logic +3. Refactor `exportMeta()` to accept a data source instead of pool options +4. Refactor `exportMigrationsToDisk()` similarly +5. Verify existing SQL export still works identically (no behavior change) + +### Phase 2: GraphQL Data Source + +1. Create `GraphQLDataSource` class +2. Implement `fetchTable()` — builds a GraphQL `allXxx(condition: { databaseId: "..." }) { nodes { ... } }` query and executes it via HTTP POST to the endpoint +3. Implement `fetchDatabase()` and `fetchSchemas()` as specific instances of `fetchTable()` +4. Handle authentication (Bearer token in `Authorization` header) +5. Handle pagination — PostGraphile uses cursor-based pagination; the implementation must page through all results if there are more than the default page size + +### Phase 3: CLI Integration + +1. Add `--graphql-endpoint ` and `--token ` flags to the export command +2. When `--graphql-endpoint` is provided: + - Skip the `pg_catalog.pg_database` query (not available) + - Fetch databases from `metaschema_public.database` via GraphQL instead + - Construct `GraphQLDataSource` and pass it through to the core +3. When no endpoint is provided, use `SqlDataSource` (existing behavior) +4. Handle the `db_migrate.sql_actions` gap — in GraphQL mode, only export metadata (skip the database migration portion), or error with a helpful message + +### Phase 4: Field Discovery + +The current `export-meta.ts` uses `information_schema.columns` to dynamically discover which columns exist in each table. In GraphQL mode, this information is not directly available. Options: + +1. **Use GraphQL introspection** — Query the GraphQL schema's `__type` for each type to discover available fields. This is the cleanest approach. +2. **Use the `config` map** — The hardcoded `config` in `export-meta.ts` already lists expected fields per table. In GraphQL mode, these can be cross-referenced with introspection results. + +**Recommended:** Use GraphQL introspection (`__schema` / `__type` queries) to discover fields, then map them back to Postgres column names using reverse inflection. + +--- + +## Key Design Decisions + +### 1. HTTP vs Direct Execution + +Two modes of GraphQL execution should be supported: + +- **HTTP mode** (primary): POST queries to a remote GraphQL endpoint. This is what customers will use. +- **Local mode** (optional): Use `QueryExecutor` from `graphql/query` to execute queries directly against a local database via PostGraphile. This could be useful for development/testing. + +### 2. Pagination Strategy + +PostGraphile uses Relay-style cursor pagination. The `GraphQLDataSource.fetchTable()` must handle tables with many rows by paginating: + +```graphql +query { + allFields(condition: { databaseId: "..." }, first: 100, after: "cursor...") { + nodes { ... } + pageInfo { + hasNextPage + endCursor + } + } +} +``` + +Loop until `hasNextPage` is false. + +### 3. Column Name Translation + +GraphQL field names are camelCase (`databaseId`), but the `csv-to-pg` Parser and the rest of the export pipeline expect snake_case Postgres column names (`database_id`). The GraphQL data source must translate response keys back to snake_case before returning rows. + +### 4. Authentication + +The GraphQL endpoint requires authentication. The export command should accept: +- `--token ` — A JWT token passed as `Authorization: Bearer ` +- The token must have sufficient privileges to read from `metaschema_public`, `services_public`, and `metaschema_modules_public` schemas + +--- + +## Testing Strategy + +1. **Unit tests**: Mock the `ExportDataSource` interface and verify `exportMeta`/`exportMigrations` work with both implementations +2. **Integration tests**: Stand up a local GraphQL server, run the GraphQL export against it, compare output to SQL export output — they should be identical +3. **E2E test**: Run `pgpm export --graphql-endpoint http://localhost:5678/graphql` against a running dev server + +--- + +## Open Questions + +1. **Should `db_migrate.sql_actions` be exposed via GraphQL?** If so, this requires a server-side change (adding `db_migrate` to the exposed schemas or creating a custom Graphile plugin). Without this, GraphQL export can only produce the metadata/service portion, not the database-schema migration SQL. +2. **Should the GraphQL export produce the same output format?** The SQL export produces pgpm plan files + deploy/revert/verify SQL files. The metadata-only export is a subset of this. +3. **Should we support introspection-based field discovery or rely on the hardcoded config map?** Introspection is cleaner but adds complexity; the config map is already maintained. + +--- + +## Summary + +| What | Where | +|------|-------| +| New interface | `pgpm/core/src/export/data-source.ts` | +| SQL adapter (refactor) | `pgpm/core/src/export/data-source-sql.ts` | +| GraphQL adapter (new) | `pgpm/core/src/export/data-source-graphql.ts` | +| Naming helpers | `pgpm/core/src/export/graphql-naming.ts` | +| CLI changes | `pgpm/cli/src/commands/export.ts` | +| Core refactors | `pgpm/core/src/export/export-meta.ts`, `pgpm/core/src/export/export-migrations.ts` | +| Leverage | `graphql/query` (QueryBuilder), `graphql` v16 (print/parse) |