use case based cookbooks by KarthikAvinashFI · Pull Request #461 · future-agi/docs

KarthikAvinashFI · 2026-03-11T18:02:36Z

Pull Request

Description

Describe the changes in this pull request:

What feature/bug does this PR address?
Provide any relevant links or screenshots.

Checklist

Code compiles correctly.
Created/updated tests.
Linting and formatting applied.
Documentation updated.

Related Issues

Closes #<issue_number>

linear · 2026-03-11T18:02:40Z

TH-3418 Use Case Based Cookbooks

entelligence-ai-pr-reviews · 2026-03-11T18:02:40Z

⚠️ Trial Period Expired ⚠️

Your trial period has expired. To continue using this feature, please upgrade to a paid plan here or book a time to chat here.

…ion UI steps

…t flow, auto-eval template

…s-links

…rdcoded results, fix KB eval params

…cy on output only

…soften results

…ating step

…nshots

…oring and compliance-hipaa-gdpr

…ence

…ons, closings

- text_to_sql passes all 5 cases (doesn't catch subtle logic error) - Updated similarity scores to match real values (0.95, 0.87, 0.58, 0.54) - Updated narrative: multiple layers needed since intent validation alone misses bugs - Removed redundant paragraph in execution testing section - Updated decision matrix to gate on ground_truth_match + execution Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Replace broken completeness eval (SDK class lookup bug) with working scanners - Replace duplicate answer_relevancy with threshold tuning - Use SecretsScanner + InvisibleCharScanner (local) instead of PIIScanner + ToxicityScanner (broken EvalDelegate 400 errors) - All sample outputs match real notebook results - Explain faithfulness catches pricing bug, answer_relevancy local model limitation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- security -> prompt_injection, content_moderation -> toxicity - Remove stale sample outputs, use generic truncated examples - Update quality score prose to not reference exact scores - Fix factual_accuracy -> context_adherence - All code and prose consistent between MDX and notebook

- Remove hardcoded score ranges from prose - Fix wrong character count (16 -> 15) and ratio (133% -> generic) - Remove duplicate legal disclaimer paragraph - Generic interpretive prose instead of specific run observations

- text-to-sql: "All five cases pass" -> "In our test run, all five cases pass" - coding-agent-eval: "All six scenarios pass" -> conditional language

coding-agent-eval: - Fix fact_result/fact_score -> adh_result/adh_score in MDX Step 3 - Remove duplicate paragraph in Step 4 translation-eval: - Remove exact 125%/250%/30chars from prose - Remove duplicate paragraphs in Steps 4 and 5 - Fix "130% threshold" -> "per-string-type threshold"

- Fix deprecated metric names: security -> prompt_injection, content_moderation -> toxicity - Red-teaming: naive v1 prompt (role only), is_pass fix, synthetic data narrative, EduBright framing - Compliance: toxicity instead of content_moderation in code and prose - All sample outputs and prose references updated

…and screenshot

… 1.000)

…optimization-loop - compliance-hipaa-gdpr: fix INPUT/OUTPUT_RULES alignment - domain-hallucination-detection: real classification results, turing_small fix - end-to-end-agent-testing: critical analysis, FMA, optimization trials - red-teaming-llm: real Protect results (7/10 blocked), RT-007 fix narrative - Remove simulation-optimization-loop (merged into end-to-end) - Update navigation to remove deleted page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace sk-proj-* pattern with your-openai-api-key placeholder in the hardcoded_secret test snippet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Keep both copy-button script and FastNav component.

…e prod-quality-monitoring - All 10 use-case intros now explicitly name the FutureAGI features used (Simulate, Evals, Optimize, Protect, Observe, Agent Compass, Knowledge Base, Prompt Management, Experimentation, Datasets, AutoEvalPipeline) - production-quality-monitoring: split Step 1 into Step 1 (Define agent) and Step 2 (Trace every call), renumber subsequent steps. Replace old trace-spans video with three screenshots that show the trace detail, the eval columns in the trace table, and the populated Evals tab - end-to-end-agent-testing: bump default to 100 conversations in step title and dashboard config, add scale-flex paragraph, polish duplicate Chat Details sentence with bolded eval names - coding-agent-eval: replace sk-proj fake key in test data with placeholder

… cookbooks secure-ai-evals-guardrails: - Update intro to name FutureAGI Protect and Evals as bold proper nouns - Add "before guardrails" demo in Step 1 showing the chatbot failing at prompt injection and PII leakage before any enforcement is added - Add cross-links to end-to-end-agent-testing and production-quality-monitoring cookbooks in Steps 5 and 6 - Replace Explore further cards with valid sibling + quickstart links domain-hallucination-detection: - Add new Step 1 "Meet the chatbot you are evaluating" introducing the MediSafe pharma chatbot agent and the three hallucination patterns - Reframe KB step intro to explain how Knowledge Base enables grounded evaluation (cross-references responses against source documents) - Reframe test cases as real production interactions (not hypothetical), with cross-link to Simulate for generating them at scale - Add cross-link to end-to-end-agent-testing in intro for readers who haven't built their agent yet

…nt fixes Apply reviewer feedback across all 10 remaining use-case cookbooks: - Intros now name exact FutureAGI features with bold proper nouns (Protect metrics by canonical name, Evals with evaluate() vs Evaluator distinction, Knowledge Base with indexing details, Simulate with scenario types, Prompt Management with version/label system) - Cross-links added at natural points in narrative (not as lists): links to End-to-End Agent Testing, Production Quality Monitoring, Secure AI Evals, Protect Guardrails, Custom Eval Metrics, etc. - Explore further CardGroups replaced with valid links using confirmed sidebar icons (flask, gauge, shield, zap, rocket, etc.) - No fabricated analysis or data changes - No em-dashes

…hout)

initial commit

bb06dfe

KarthikAvinashFI changed the title ~~initial commit~~ use case based cookbooks Mar 11, 2026

KarthikAvinashFI added 26 commits March 12, 2026 00:17

add initial use case based cookbooks

b083238

added media instructions as warnings

090882b

add github and colab badges

bb77ef3

fix import error

e2306a8

update instructions

6b2e9cf

update titles and improvements

ecdc697

Merge branch 'astro' into feature/th-3418-use-case-based-cookbooks

c2afbaf

fix use-case cookbooks: QA pass, remove streaming-safety, fix annotat…

999af8e

…ion UI steps

fix: commit v2 before label assign, fix optimizer eval metric

b6e2a92

fix: completeness context param, factual_accuracy input key, v2 commi…

4e729aa

…t flow, auto-eval template

fix: sync nav title for secure-ai-evals-guardrails

f5b0a70

add scores-may-vary notes, replace explore-further with use-case cros…

c17251d

…s-links

add dataset upload + batch eval with KB step

12fac9f

add custom eval creation MEDIA TODO

18c2d46

replace deprecated factual_accuracy with context_adherence, soften ha…

58a5642

…rdcoded results, fix KB eval params

fix: use security+content_moderation for input rules, keep data_priva…

acd2fa9

…cy on output only

replace MEDIA TODOs with S3 video/image tags for 3 cookbooks

a86eaab

add simulation videos, fix translation media, fix chat sim SDK flow, …

19ff507

…soften results

add annotation MEDIA TODOs, fix label config details, add Start Annot…

0c332de

…ating step

add optimization + v2 video, complete all simulation media

0ba0cfa

add annotation media: labels, queue, annotate+export videos and scree…

0b76585

…nshots

add tracing, alerts, agent compass media for production-quality-monit…

a06e937

…oring and compliance-hipaa-gdpr

add media for end-to-end-agent-testing, all 13 cookbooks complete

98ff7b1

Merge branch 'astro' into feature/th-3418-use-case-based-cookbooks

74e6318

spread simulation videos to their relevant sections

3f88883

fix: replace remaining factual accuracy references with context adher…

fd25bd4

…ence

KarthikAvinashFI and others added 25 commits March 23, 2026 20:29

improve use-case cookbook narratives: elaborate intros, step transiti…

5cac353

…ons, closings

fix: add sample outputs to feedback-loop-eval MDX

d64cadf

fix: update coding-agent-eval MDX narrative with observed results

a2ec94e

fix: hedge AI-judge results in narrative prose

ef51cf3

- text-to-sql: "All five cases pass" -> "In our test run, all five cases pass" - coding-agent-eval: "All six scenarios pass" -> conditional language

fix: update red-teaming MDX narrative to match notebook

d487922

update production-quality-monitoring with real Agent Compass results …

e6d4631

…and screenshot

add close-the-loop narrative for Agent Compass diagnosis

d1ccdab

update full-prompt-lifecycle with real optimization results (0.700 to…

8fe4a10

… 1.000)

scrub fake API key from coding-agent-eval test data

23718e3

Replace sk-proj-* pattern with your-openai-api-key placeholder in the hardcoded_secret test snippet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

merge astro into feature branch, resolve DocsLayout conflict

5c1cc65

Keep both copy-button script and FastNav component.

add before/after guardrail visual to secure-ai-evals cookbook

053571d

replace generated guardrail image with real Protect demo video

baccf2b

fix misleading cross-link: clarify it references a different agent

373fc0e

remove unnecessary cross-link from domain-hallucination intro

9d39fec

add inline cross-links to full-prompt-lifecycle (was the only one wit…

519d8c2

…hout)

Base automatically changed from astro to dev April 17, 2026 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use case based cookbooks#461

use case based cookbooks#461
KarthikAvinashFI wants to merge 52 commits into
devfrom
feature/th-3418-use-case-based-cookbooks

KarthikAvinashFI commented Mar 11, 2026

Uh oh!

linear Bot commented Mar 11, 2026

Uh oh!

entelligence-ai-pr-reviews Bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KarthikAvinashFI commented Mar 11, 2026

Pull Request

Description

Checklist

Related Issues

Uh oh!

linear Bot commented Mar 11, 2026

Uh oh!

entelligence-ai-pr-reviews Bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant