chore: sync Arize skills from arize-skills#1690
Merged
Merged
Conversation
…cabff161d8aae6 and phoenix@30ccbe6b38cc83719038bf30041335f29bae45e9
Contributor
🔍 Skill Validator Results
Summary
Full validator output```text Found 11 skill(s) [arize-ai-provider-integration] 📊 arize-ai-provider-integration: 2,684 BPE tokens [chars/4: 2,601] (standard ~), 29 sections, 16 code blocks [arize-ai-provider-integration] ⚠ Skill is 2,684 BPE tokens (chars/4 estimate: 2,601) — approaching "comprehensive" range where gains diminish. [arize-ai-provider-integration] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [arize-annotation] 📊 arize-annotation: 2,528 BPE tokens [chars/4: 2,696] (standard ~), 27 sections, 15 code blocks [arize-annotation] ⚠ Skill is 2,528 BPE tokens (chars/4 estimate: 2,696) — approaching "comprehensive" range where gains diminish. [arize-annotation] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [arize-dataset] 📊 arize-dataset: 3,861 BPE tokens [chars/4: 3,854] (standard ~), 51 sections, 16 code blocks [arize-dataset] ⚠ Skill is 3,861 BPE tokens (chars/4 estimate: 3,854) — approaching "comprehensive" range where gains diminish. [arize-evaluator] 📊 arize-evaluator: 7,825 BPE tokens [chars/4: 8,053] (comprehensive ✗), 59 sections, 28 code blocks [arize-evaluator] ⚠ Skill is 7,825 BPE tokens (chars/4 estimate: 8,053) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [arize-experiment] 📊 arize-experiment: 4,616 BPE tokens [chars/4: 4,646] (standard ~), 34 sections, 20 code blocks [arize-experiment] ⚠ Skill is 4,616 BPE tokens (chars/4 estimate: 4,646) — approaching "comprehensive" range where gains diminish. [arize-instrumentation] 📊 arize-instrumentation: 6,117 BPE tokens [chars/4: 6,210] (comprehensive ✗), 19 sections, 4 code blocks [arize-instrumentation] ⚠ Skill is 6,117 BPE tokens (chars/4 estimate: 6,210) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [arize-link] 📊 arize-link: 1,239 BPE tokens [chars/4: 1,121] (detailed ✓), 9 sections, 6 code blocks [arize-prompt-optimization] 📊 arize-prompt-optimization: 4,489 BPE tokens [chars/4: 4,799] (standard ~), 58 sections, 19 code blocks [arize-prompt-optimization] ⚠ Skill is 4,489 BPE tokens (chars/4 estimate: 4,799) — approaching "comprehensive" range where gains diminish. [arize-trace] 📊 arize-trace: 5,896 BPE tokens [chars/4: 5,853] (comprehensive ✗), 43 sections, 10 code blocks [arize-trace] ⚠ Skill is 5,896 BPE tokens (chars/4 estimate: 5,853) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [phoenix-cli] 📊 phoenix-cli: 3,920 BPE tokens [chars/4: 4,050] (standard ~), 20 sections, 17 code blocks [phoenix-cli] ⚠ Skill is 3,920 BPE tokens (chars/4 estimate: 4,050) — approaching "comprehensive" range where gains diminish. [phoenix-cli] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [phoenix-evals] 📊 phoenix-evals: 1,089 BPE tokens [chars/4: 1,126] (detailed ✓), 5 sections, 0 code blocks [phoenix-evals] ⚠ No code blocks — agents perform better with concrete snippets and commands. [phoenix-evals] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. ✅ All checks passed (11 skill(s)) ``` |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR syncs Arize- and Phoenix-related skills/reference documentation to the latest upstream versions, expanding guidance for dataset upserts, Phoenix CLI workflows (open/axial coding), and refreshing Arize skill metadata/descriptions.
Changes:
- Update Phoenix eval dataset references (Python/TypeScript) to document upsert semantics, stable example IDs, and split handling.
- Expand the Phoenix CLI skill and workflow references (open coding / axial coding) with identifiers, sidecar handoff, profiles, and deletion/cleanup guidance.
- Refresh Arize skill SKILL.md frontmatter descriptions/metadata and update the generated skills index entries accordingly.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/phoenix-evals/references/experiments-datasets-typescript.md | Adds upsert + stable example ID guidance and updates the documented example type shape. |
| skills/phoenix-evals/references/experiments-datasets-python.md | Documents upsert behavior, stable IDs, and split key guidance for Python dataset creation. |
| skills/phoenix-cli/SKILL.md | Expands Phoenix CLI reference commands and introduces profiles + coding identifier workflow framing. |
| skills/phoenix-cli/references/open-coding.md | Substantially expands open-coding workflow (unit of analysis, identifiers, sidecar, UI filter, cleanup). |
| skills/phoenix-cli/references/axial-coding.md | Updates axial-coding workflow to use the open-coding identifier + sidecar-based gather/quantify. |
| skills/arize-trace/SKILL.md | Refreshes skill description and adds metadata/compatibility fields. |
| skills/arize-prompt-optimization/SKILL.md | Refreshes skill description and adds metadata/compatibility fields. |
| skills/arize-link/SKILL.md | Refreshes skill description and adds metadata fields. |
| skills/arize-instrumentation/SKILL.md | Refreshes skill description and expands guidance (including Go) plus metadata/compatibility fields. |
| skills/arize-experiment/SKILL.md | Refreshes skill description and adds metadata/compatibility fields. |
| skills/arize-evaluator/SKILL.md | Refreshes skill description and adds metadata/compatibility fields. |
| skills/arize-dataset/SKILL.md | Refreshes skill description and adds metadata/compatibility fields. |
| skills/arize-annotation/SKILL.md | Refreshes skill description and adds metadata/compatibility fields. |
| skills/arize-ai-provider-integration/SKILL.md | Refreshes skill description and adds metadata/compatibility fields. |
| docs/README.skills.md | Updates the skills index table descriptions for the Arize skills to match the refreshed SKILL.md content. |
| }); | ||
|
|
||
| // With stable example IDs for targeted updates across uploads | ||
| const { datasetId } = await createDataset({ |
Comment on lines
+45
to
+51
| interface Example { | ||
| input: Record<string, unknown>; // Task input | ||
| output?: Record<string, unknown>; // Expected output | ||
| metadata?: Record<string, unknown>; // Additional context | ||
| output?: Record<string, unknown> | null; // Expected output | ||
| metadata?: Record<string, unknown> | null; // Additional context | ||
| splits?: string | string[] | null; // Split assignment ("train", ["train", "easy"], etc.) | ||
| spanId?: string | null; // OTEL span ID to link back to source trace | ||
| id?: string | null; // Stable user-provided ID; server updates matching row |
| --- | ||
| name: arize-ai-provider-integration | ||
| description: "INVOKE THIS SKILL when creating, reading, updating, or deleting Arize AI integrations. Covers listing integrations, creating integrations for any supported LLM provider (OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM, custom), updating credentials or metadata, and deleting integrations using the ax CLI." | ||
| description: Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize. |
| --- | ||
| name: arize-annotation | ||
| description: "INVOKE THIS SKILL when creating, managing, or using annotation configs or annotation queues on Arize (categorical, continuous, freeform), or applying human annotations to project spans via the Python SDK. Configs are the label schema for human feedback; queues are review workflows that route records to annotators. Triggers: annotation config, annotation queue, label schema, human feedback schema, bulk annotate spans, update_annotations, labeling queue, annotate record." | ||
| description: Creates and manages annotation configs (categorical, continuous, freeform label schemas) and annotation queues (human review workflows) on Arize. Applies human annotations to project spans via the Python SDK. Use when the user mentions annotation config, annotation queue, label schema, human feedback, bulk annotate spans, update_annotations, labeling queue, annotate record, or human review. |
| --- | ||
| name: arize-dataset | ||
| description: "INVOKE THIS SKILL when creating, managing, or querying Arize datasets and examples. Also use when the user needs test data or evaluation examples for their model. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI." | ||
| description: Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set. |
| --- | ||
| name: arize-experiment | ||
| description: "INVOKE THIS SKILL when creating, running, or analyzing Arize experiments. Also use when the user wants to evaluate or measure model performance, compare models (including GPT-4, Claude, or others), or assess how well their AI is doing. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI." | ||
| description: Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy. |
| --- | ||
| name: arize-instrumentation | ||
| description: "INVOKE THIS SKILL when adding Arize AX tracing or observability to an app for the first time, or when the user wants to instrument their LLM app or get started with LLM observability. Follow the Agent-Assisted Tracing two-phase flow: analyze the codebase (read-only), then implement after user confirmation. When the app uses LLM tool/function calling, add manual CHAIN + TOOL spans. Leverages https://arize.com/docs/ax/alyx/tracing-assistant and https://arize.com/docs/PROMPT.md." | ||
| description: Adds Arize AX tracing to an LLM application for the first time. Follows a two-phase agent-assisted flow to analyze the codebase then implement instrumentation after user confirmation. Use when the user wants to instrument their app, add tracing from scratch, set up LLM observability, integrate OpenTelemetry or openinference, or get started with Arize tracing. |
| --- | ||
| name: arize-link | ||
| description: Generate deep links to the Arize UI. Use when the user wants a clickable URL to open or share a specific trace, span, session, dataset, labeling queue, evaluator, or annotation config, or when sharing Arize resources with team members. | ||
| description: Generates deep links to the Arize UI for traces, spans, sessions, datasets, labeling queues, evaluators, and annotation configs. Produces clickable URLs for sharing Arize resources with team members. Use when the user wants to link to or open a trace, span, session, dataset, evaluator, or annotation config in the Arize UI. |
| --- | ||
| name: arize-prompt-optimization | ||
| description: "INVOKE THIS SKILL when optimizing, improving, or debugging LLM prompts using production trace data, evaluations, and annotations. Also use when the user wants to make their AI respond better or improve AI output quality. Covers extracting prompts from spans, gathering performance signal, and running a data-driven optimization loop using the ax CLI." | ||
| description: Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement. |
| --- | ||
| name: arize-trace | ||
| description: "INVOKE THIS SKILL when downloading, exporting, or inspecting Arize traces and spans, or when a user wants to look at what their LLM app is doing using existing trace data, or when an already-instrumented app has a bug or error to investigate. Use for debugging unknown runtime issues, failures, and behavior regressions. Covers exporting traces by ID, spans by ID, sessions by ID, and root-cause investigation with the ax CLI." | ||
| description: Downloads, exports, and inspects existing Arize traces and spans to understand what an LLM app is doing or debug runtime issues. Covers exporting traces by ID, spans by ID, sessions by ID, and root-cause investigation using the ax CLI. Use when the user wants to look at existing trace data, see what their LLM app is doing, export traces, download spans, investigate errors, or analyze behavior regressions. |
aaronpowell
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Checklist
npm startand verified thatREADME.mdis up to date.stagedbranch for this pull request.Description
Updating the Arize AX and Phoenix skills to the latest version.
Type of Contribution
Additional Notes
By submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.