Evaluate, compare, and recommend the optimal AWS document processing approach for your use case.
Upload a sample document, answer a few targeted questions, and watch 16 processing methods run in parallel across 33 capabilities — with real accuracy, cost, and speed comparisons, then generate a production-ready architecture (Terraform or CDK).
| Tier | What it does | AWS services |
|---|---|---|
| 01 Edge | TLS termination, DNS, CDN, SPA delivery | Route 53 · ACM · CloudFront · WAF · S3 (SPA bucket) |
| 02 Web — ECS Fargate | Stateless Express API, SSE streaming, Cognito auth, HPA on CPU & RPS | ALB · ECS Fargate (2–10 tasks, 1 vCPU / 2 GB, awsvpc) · ECR · Secrets Manager · CloudWatch Logs · X-Ray |
| 03 Agent — Strands on AgentCore | Socratic advisor; closure-bound tools analyze_document(), recommend_capabilities(), generate_architecture() — invoked via SigV4 only |
Bedrock AgentCore Runtime (arm64, SSE) |
| 04 AI Services | 6 families · 16 methods · language-aware routing · sequential Guardrails composition for PII | Amazon Bedrock (Claude Sonnet / Haiku / Opus 4.6, Nova 2 Lite / Pro, Nova Embeddings, Guardrails) · Bedrock Data Automation (up to 3 000 pages per job) · Amazon Textract (sync + async, tables / forms) |
| 05 Data | Uploads, activity tracking, Terraform state, KMS encryption, async fan-out | S3 Uploads · DynamoDB · SQS · S3 TF State · KMS |
Three architectural principles worth calling out:
- Least-privilege IAM everywhere. Bucket-scoped S3 ARNs, agent-scoped
InvokeAgentRuntime, per-model foundation-model ARNs.AUTH_PROVIDER=nonerefuses to boot in production unlessALLOW_UNAUTHENTICATED=trueis set explicitly. - IaC parity. Terraform (
infrastructure/) and AWS CDK v2 (infrastructure-cdk/) both produce the same five-tier topology. Do not run both against the same account / region. - Data-driven method routing. The Socratic agent recommends capabilities on the first turn; the comparison dashboard runs every compatible method in parallel and feeds actual preview metrics back into the pipeline generator. PII capabilities fall through to Bedrock Guardrails and chain sequentially behind the extraction stage.
- 33 capabilities across 8 categories: Core Extraction, Visual Analysis, Document Intelligence, Compliance & Security, Industry-Specific, Media Processing, Advanced AI, Document Conversion
- 16 processing methods across 6 families: BDA (Standard / + LLM hybrids), Claude (Sonnet 4.6 / Haiku 4.5 / Opus 4.6), Nova (2 Lite GA / 2 Pro Preview), Textract+LLM, Nova Embeddings, Bedrock Guardrails (PII specialist)
- Pipeline builder — ReactFlow node graph for custom processing pipelines; chat interface to modify pipelines conversationally
- Real-time SSE streaming — token-level progress for every method, 15s keepalive
- Architecture recommendations — cost projections at scale, generated IaC in Terraform or CDK
- Recent Runs — save and reload past evaluation sessions with full journey detail (upload, analysis, preview, pipeline, architecture). User-separated; admin can view all users' runs.
- Admin dashboard — usage stats, evaluation runs with click-to-detail, activity log with type-specific detail panels
- Pluggable auth —
none(demo), Amazon Cognito (real JWT verifier against a user pool)
# 1. Install
npm install
# 2. Configure — copy template and fill in your AWS values
cp .env.example .env
# Minimum .env for local demo:
# AWS_REGION=us-west-2
# USE_LOCAL_STORAGE=true
# AUTH_PROVIDER=none
# BDA_PROFILE_ARN=arn:aws:bedrock:us-west-2:<account>:data-automation-profile/us.data-automation-v1 (optional)
# 3. Build shared types (required once, and after any skills/capability changes)
npm run build -w packages/shared
# 4. Start dev servers (backend :3001 + frontend :5173)
npm run devOpen http://localhost:5173.
With AWS credentials and Bedrock enabled in your region:
# Upload a sample and run Claude Haiku 4.5 text extraction
curl -sX POST -F "file=@test-samples/04-tax-receipt-pii.pdf" \
http://localhost:3001/api/upload
# Response: { "documentId": "...", "s3Uri": "local:///...", "previewUrl": "/api/files/..." }
curl -sX POST -N http://localhost:3001/api/preview \
-H "Content-Type: application/json" \
-d '{"documentId":"<id>","s3Uri":"local:///...","capabilities":["text_extraction"],"methods":["claude-haiku"]}'
# → SSE: preview_start → method_result → preview_doneone-idp/
├── packages/
│ ├── shared/ # Shared types, capability/skill defs, generated from skills/*.md
│ ├── backend/ # Express API + Strands agent server + adapters
│ │ └── src/middleware/
│ │ ├── auth.ts # Pluggable auth dispatcher (none|cognito)
│ │ ├── auth-cognito.ts # Real JWT verifier (jose + JWKS)
│ │ └── upload.ts # multer: 50MB limit + mimetype allowlist
│ └── frontend/ # React 18 + Vite + Cloudscape + ReactFlow
├── infrastructure/ # Terraform stack (ECS Fargate + AgentCore + CloudFront + S3 + DynamoDB)
├── infrastructure-cdk/ # AWS CDK TypeScript stack (parity with Terraform)
├── test-samples/ # Test documents (gitignored — add your own samples)
└── docs/
└── architecture.md # 3-tier topology, auth boundary, deploy lifecycle
| Layer | Technology |
|---|---|
| Frontend | React 18, Vite 5, Cloudscape Design, ReactFlow, Lucide Icons |
| Backend | Node.js 20, Express 4, TypeScript 5 |
| AI/ML | Amazon Bedrock (Claude, Nova), BDA, Amazon Textract, Amazon Comprehend |
| Agent runtime | Strands Agents TypeScript SDK on Bedrock AgentCore |
| Auth | Pluggable: none (local dev) / Amazon Cognito (production, real JWT verifier via jose) |
| Storage | Amazon S3 (KMS, versioned, CORS) or local .local-uploads/ |
| Activity | DynamoDB pay-per-request |
| Deploy | ECS Fargate + Bedrock AgentCore Runtime + CloudFront + Route53/ACM |
| IaC | Terraform >= 1.6 and AWS CDK v2 (pick one) |
| Family | Models | Pricing |
|---|---|---|
| BDA | Standard, Custom Blueprint | $0.01 / $0.04 per page |
| BDA + LLM | +Sonnet, +Haiku, +Nova Lite | BDA page + LLM tokens |
| Claude | Sonnet 4.6, Haiku 4.5, Opus 4.6 | $1 – $5 input / $5 – $25 output per 1M tokens |
| Nova | 2 Lite (GA), 2 Pro (Preview) | $0.30 – $1.25 / 1M input tokens |
| Textract + LLM | +Sonnet, +Haiku, +Nova Lite, +Nova Pro | $0.0015/page + LLM tokens |
| Comprehend / Guardrails | PII detection only | pay-per-request |
Full list in .env.example. Highlights:
| Var | Default | Notes |
|---|---|---|
AWS_REGION |
us-west-2 |
|
S3_BUCKET |
(empty) | Required unless USE_LOCAL_STORAGE=true |
USE_LOCAL_STORAGE |
(unset) | true → uses .local-uploads/ instead of S3 |
AUTH_PROVIDER |
none |
none | cognito |
ALLOW_UNAUTHENTICATED |
(unset) | Only with AUTH_PROVIDER=none + NODE_ENV=production. Otherwise boot is refused. |
ADMIN_USERS |
'' |
Comma-separated aliases. Ignored when AUTH_PROVIDER=none (unless ALLOW_UNAUTHENTICATED=true). |
DEV_USER_ALIAS |
local-user |
Override local user alias when AUTH_PROVIDER=none. Set to match an ADMIN_USERS entry to test admin locally. |
CLOUDFRONT_SECRET |
(unset) | Shared secret for CloudFront → ALB origin validation. When set in production, requests without the matching X-CloudFront-Secret header are rejected. |
ACTIVITY_TABLE |
(unset) | DynamoDB table name for activity tracking + recent runs. |
COGNITO_USER_POOL_ID |
(empty) | Required when AUTH_PROVIDER=cognito |
COGNITO_CLIENT_ID |
(empty) | Optional allowlist, comma-separated |
BDA_PROFILE_ARN / BDA_PROJECT_ARN |
(empty) | Optional — BDA methods unavailable if unset |
CLAUDE_MODEL_ID / NOVA_MODEL_ID |
GA defaults | Override for regional variants |
VITE_APP_TITLE |
ONE IDP Framework |
Frontend top-nav title |
VITE_REPO_URL / VITE_CHAT_URL |
(unset) | Source / chat links. Shown only in dev builds by default (import.meta.env.DEV). Set VITE_SHOW_LINKS=true at build time to force-show in prod. |
Two equivalent IaC stacks. Pick one — do not run both against the same account/region.
| Stack | Path | Tooling |
|---|---|---|
| Terraform | infrastructure/ |
>= 1.6 |
| CDK (TypeScript) | infrastructure-cdk/ |
AWS CDK v2 |
Both produce the same 3-tier topology (see docs/architecture.md):
- Edge tier — CloudFront + optional Route53 + ACM
- Web tier — ECS Fargate behind an ALB (Express API, pluggable auth, HPA on CPU & RPS)
- Agent tier — Bedrock AgentCore Runtime (Strands agent, IAM SigV4 only)
# Terraform
cd infrastructure
cp terraform.tfvars.example terraform.tfvars
# For existing deployments preserving state:
terraform init -reconfigure \
-backend-config="bucket=<your-state-bucket>" \
-backend-config="key=one-idp/terraform.tfstate" \
-backend-config="region=us-west-2"
make plan && make apply # or: terraform plan -out tfplan && terraform apply tfplan
# CDK
cd infrastructure-cdk
npm install
npx cdk deploy \
-c projectName=one-idp -c environment=dev \
-c authProvider=cognito \
-c bdaProfileArn="arn:aws:bedrock:us-west-2:<account>:data-automation-profile/us.data-automation-v1"See infrastructure/README.md and infrastructure-cdk/README.md for variable references and migration notes.
The backend ships with a pluggable AUTH_PROVIDER:
none— demo mode; synthetic anonymous user.- Refuses to boot in
NODE_ENV=productionunlessALLOW_UNAUTHENTICATED=trueis set explicitly. - Admin endpoints (
/api/admin/*) are always denied whenAUTH_PROVIDER=none, regardless ofADMIN_USERS.
- Refuses to boot in
cognito— real JWT verifier usingjose. Fetches the user pool JWKS, verifies signature + issuer + expiry +token_use, and optionally checksclient_idagainstCOGNITO_CLIENT_IDallowlist. Accepts both ID and access tokens.
Switch providers without code changes via env vars alone. The dispatcher lives in packages/backend/src/middleware/auth.ts.
- Path-traversal defense —
/api/files/*rejects keys with.., leading/, or null bytes before touching the backend.getLocalFilePathadditionally resolves absolute paths and verifies containment within.local-uploads/. - Filename sanitization — uploaded filenames are NFC-normalized and stripped of path separators / control characters before being used as S3 keys.
- Upload limits —
multercaps body size at 50MB and enforces a mimetype allowlist from@idp/shared. - Admin defense-in-depth — admin middleware refuses access when auth is disabled (
AUTH_PROVIDER=none), even if aliases match. EmptyADMIN_USERSalso blocks all admins. - Fail-closed prod boot —
NODE_ENV=production+ unauth provider → backend throws on startup unlessALLOW_UNAUTHENTICATED=trueis explicitly set. - IAM least-privilege — bucket-scoped S3 ARNs, agent-scoped AgentCore invoke ARNs. The few remaining
Resource: "*"policies are standard Bedrock/Textract usage. - JWT verification — Cognito path uses
jose.jwtVerifyagainst the live JWKS, not a homegrown parser.
- Rate limiter is per-IP in-memory. With ECS auto-scaling, an attacker hitting N instances gets N× the rate limit. Use Redis or an edge WAF (CloudFront + AWS WAF rate-based rules) for production traffic.
- CloudFront origin validation —
cloudfront-secret.tsmiddleware validatesX-CloudFront-Secretheader in production. SetCLOUDFRONT_SECRETenv var on the ECS task to match the Terraform-managedrandom_password.cloudfront_secret. Health-check paths are exempt. AUTH_PROVIDER=noneusesDEV_USER_ALIASenv var (default:local-user) instead of the OS username. This prevents accidental admin privilege escalation when the OS user matches anADMIN_USERSentry.
MIT-0. See LICENSE.
