chore: sync Arize skills from arize-skills by jimbobbennett · Pull Request #1690 · github/awesome-copilot

jimbobbennett · 2026-05-13T00:56:13Z

Pull Request Checklist

I have read and followed the CONTRIBUTING.md guidelines.
I have read and followed the Guidance for submissions involving paid services.
My contribution adds a new instruction, prompt, agent, skill, or workflow file in the correct directory.
The file follows the required naming convention.
The content is clearly structured and follows the example format.
I have tested my instructions, prompt, agent, skill, or workflow with GitHub Copilot.
I have run npm start and verified that README.md is up to date.
I am targeting the staged branch for this pull request.

Description

Updating the Arize AX and Phoenix skills to the latest version.

Type of Contribution

Additional Notes

By submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.

…cabff161d8aae6 and phoenix@30ccbe6b38cc83719038bf30041335f29bae45e9

github-actions · 2026-05-13T00:56:58Z

🔍 Skill Validator Results

⚠️ Warnings or advisories found

Scope	Checked
Skills	11
Agents	1
Total	12
Severity	Count
---	---:
❌ Errors	0
⚠️ Warnings	14
ℹ️ Advisories	0

Summary

Level	Finding
ℹ️	Found 11 skill(s)
ℹ️	[arize-ai-provider-integration] 📊 arize-ai-provider-integration: 2,684 BPE tokens [chars/4: 2,601] (standard ~), 29 sections, 16 code blocks
ℹ️	[arize-ai-provider-integration] ⚠ Skill is 2,684 BPE tokens (chars/4 estimate: 2,601) — approaching "comprehensive" range where gains diminish.
ℹ️	[arize-ai-provider-integration] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably.
ℹ️	[arize-annotation] 📊 arize-annotation: 2,528 BPE tokens [chars/4: 2,696] (standard ~), 27 sections, 15 code blocks
ℹ️	[arize-annotation] ⚠ Skill is 2,528 BPE tokens (chars/4 estimate: 2,696) — approaching "comprehensive" range where gains diminish.
ℹ️	[arize-annotation] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably.
ℹ️	[arize-dataset] 📊 arize-dataset: 3,861 BPE tokens [chars/4: 3,854] (standard ~), 51 sections, 16 code blocks
ℹ️	[arize-dataset] ⚠ Skill is 3,861 BPE tokens (chars/4 estimate: 3,854) — approaching "comprehensive" range where gains diminish.
ℹ️	[arize-evaluator] 📊 arize-evaluator: 7,825 BPE tokens [chars/4: 8,053] (comprehensive ✗), 59 sections, 28 code blocks

Full validator output

```text Found 11 skill(s) [arize-ai-provider-integration] 📊 arize-ai-provider-integration: 2,684 BPE tokens [chars/4: 2,601] (standard ~), 29 sections, 16 code blocks [arize-ai-provider-integration] ⚠ Skill is 2,684 BPE tokens (chars/4 estimate: 2,601) — approaching "comprehensive" range where gains diminish. [arize-ai-provider-integration] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [arize-annotation] 📊 arize-annotation: 2,528 BPE tokens [chars/4: 2,696] (standard ~), 27 sections, 15 code blocks [arize-annotation] ⚠ Skill is 2,528 BPE tokens (chars/4 estimate: 2,696) — approaching "comprehensive" range where gains diminish. [arize-annotation] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [arize-dataset] 📊 arize-dataset: 3,861 BPE tokens [chars/4: 3,854] (standard ~), 51 sections, 16 code blocks [arize-dataset] ⚠ Skill is 3,861 BPE tokens (chars/4 estimate: 3,854) — approaching "comprehensive" range where gains diminish. [arize-evaluator] 📊 arize-evaluator: 7,825 BPE tokens [chars/4: 8,053] (comprehensive ✗), 59 sections, 28 code blocks [arize-evaluator] ⚠ Skill is 7,825 BPE tokens (chars/4 estimate: 8,053) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [arize-experiment] 📊 arize-experiment: 4,616 BPE tokens [chars/4: 4,646] (standard ~), 34 sections, 20 code blocks [arize-experiment] ⚠ Skill is 4,616 BPE tokens (chars/4 estimate: 4,646) — approaching "comprehensive" range where gains diminish. [arize-instrumentation] 📊 arize-instrumentation: 6,117 BPE tokens [chars/4: 6,210] (comprehensive ✗), 19 sections, 4 code blocks [arize-instrumentation] ⚠ Skill is 6,117 BPE tokens (chars/4 estimate: 6,210) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [arize-link] 📊 arize-link: 1,239 BPE tokens [chars/4: 1,121] (detailed ✓), 9 sections, 6 code blocks [arize-prompt-optimization] 📊 arize-prompt-optimization: 4,489 BPE tokens [chars/4: 4,799] (standard ~), 58 sections, 19 code blocks [arize-prompt-optimization] ⚠ Skill is 4,489 BPE tokens (chars/4 estimate: 4,799) — approaching "comprehensive" range where gains diminish. [arize-trace] 📊 arize-trace: 5,896 BPE tokens [chars/4: 5,853] (comprehensive ✗), 43 sections, 10 code blocks [arize-trace] ⚠ Skill is 5,896 BPE tokens (chars/4 estimate: 5,853) — "comprehensive" skills hurt performance by 2.9pp on average. Consider splitting into 2–3 focused skills. [phoenix-cli] 📊 phoenix-cli: 3,920 BPE tokens [chars/4: 4,050] (standard ~), 20 sections, 17 code blocks [phoenix-cli] ⚠ Skill is 3,920 BPE tokens (chars/4 estimate: 4,050) — approaching "comprehensive" range where gains diminish. [phoenix-cli] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. [phoenix-evals] 📊 phoenix-evals: 1,089 BPE tokens [chars/4: 1,126] (detailed ✓), 5 sections, 0 code blocks [phoenix-evals] ⚠ No code blocks — agents perform better with concrete snippets and commands. [phoenix-evals] ⚠ No numbered workflow steps — agents follow sequenced procedures more reliably. ✅ All checks passed (11 skill(s)) ```

Copilot

Pull request overview

This PR syncs Arize- and Phoenix-related skills/reference documentation to the latest upstream versions, expanding guidance for dataset upserts, Phoenix CLI workflows (open/axial coding), and refreshing Arize skill metadata/descriptions.

Changes:

Update Phoenix eval dataset references (Python/TypeScript) to document upsert semantics, stable example IDs, and split handling.
Expand the Phoenix CLI skill and workflow references (open coding / axial coding) with identifiers, sidecar handoff, profiles, and deletion/cleanup guidance.
Refresh Arize skill SKILL.md frontmatter descriptions/metadata and update the generated skills index entries accordingly.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
skills/phoenix-evals/references/experiments-datasets-typescript.md	Adds upsert + stable example ID guidance and updates the documented example type shape.
skills/phoenix-evals/references/experiments-datasets-python.md	Documents upsert behavior, stable IDs, and split key guidance for Python dataset creation.
skills/phoenix-cli/SKILL.md	Expands Phoenix CLI reference commands and introduces profiles + coding identifier workflow framing.
skills/phoenix-cli/references/open-coding.md	Substantially expands open-coding workflow (unit of analysis, identifiers, sidecar, UI filter, cleanup).
skills/phoenix-cli/references/axial-coding.md	Updates axial-coding workflow to use the open-coding identifier + sidecar-based gather/quantify.
skills/arize-trace/SKILL.md	Refreshes skill description and adds metadata/compatibility fields.
skills/arize-prompt-optimization/SKILL.md	Refreshes skill description and adds metadata/compatibility fields.
skills/arize-link/SKILL.md	Refreshes skill description and adds metadata fields.
skills/arize-instrumentation/SKILL.md	Refreshes skill description and expands guidance (including Go) plus metadata/compatibility fields.
skills/arize-experiment/SKILL.md	Refreshes skill description and adds metadata/compatibility fields.
skills/arize-evaluator/SKILL.md	Refreshes skill description and adds metadata/compatibility fields.
skills/arize-dataset/SKILL.md	Refreshes skill description and adds metadata/compatibility fields.
skills/arize-annotation/SKILL.md	Refreshes skill description and adds metadata/compatibility fields.
skills/arize-ai-provider-integration/SKILL.md	Refreshes skill description and adds metadata/compatibility fields.
docs/README.skills.md	Updates the skills index table descriptions for the Arize skills to match the refreshed SKILL.md content.

 });
+
+// With stable example IDs for targeted updates across uploads
+const { datasetId } = await createDataset({


+interface Example {
  input: Record<string, unknown>;    // Task input
-  output?: Record<string, unknown>;  // Expected output
-  metadata?: Record<string, unknown>; // Additional context
+  output?: Record<string, unknown> | null;  // Expected output
+  metadata?: Record<string, unknown> | null; // Additional context
+  splits?: string | string[] | null; // Split assignment ("train", ["train", "easy"], etc.)
+  spanId?: string | null;            // OTEL span ID to link back to source trace
+  id?: string | null;                // Stable user-provided ID; server updates matching row


 ---
 name: arize-ai-provider-integration
-description: "INVOKE THIS SKILL when creating, reading, updating, or deleting Arize AI integrations. Covers listing integrations, creating integrations for any supported LLM provider (OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM, custom), updating credentials or metadata, and deleting integrations using the ax CLI."
+description: Creates, reads, updates, and deletes Arize AI integrations that store LLM provider credentials used by evaluators and other Arize features. Supports any LLM provider (e.g. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, Vertex AI, Gemini, NVIDIA NIM). Use when the user mentions AI integration, LLM provider credentials, create integration, list integrations, update credentials, delete integration, or connecting an LLM provider to Arize.


 ---
 name: arize-annotation
-description: "INVOKE THIS SKILL when creating, managing, or using annotation configs or annotation queues on Arize (categorical, continuous, freeform), or applying human annotations to project spans via the Python SDK. Configs are the label schema for human feedback; queues are review workflows that route records to annotators. Triggers: annotation config, annotation queue, label schema, human feedback schema, bulk annotate spans, update_annotations, labeling queue, annotate record."
+description: Creates and manages annotation configs (categorical, continuous, freeform label schemas) and annotation queues (human review workflows) on Arize. Applies human annotations to project spans via the Python SDK. Use when the user mentions annotation config, annotation queue, label schema, human feedback, bulk annotate spans, update_annotations, labeling queue, annotate record, or human review.


 ---
 name: arize-dataset
-description: "INVOKE THIS SKILL when creating, managing, or querying Arize datasets and examples. Also use when the user needs test data or evaluation examples for their model. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI."
+description: Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the ax CLI. Use when the user needs test data, evaluation examples, or mentions create dataset, list datasets, export dataset, append examples, dataset version, golden dataset, or test set.


 ---
 name: arize-experiment
-description: "INVOKE THIS SKILL when creating, running, or analyzing Arize experiments. Also use when the user wants to evaluate or measure model performance, compare models (including GPT-4, Claude, or others), or assess how well their AI is doing. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI."
+description: Creates, runs, and analyzes Arize experiments for evaluating and comparing model performance. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI. Use when the user mentions create experiment, run experiment, compare models, model performance, evaluate AI, experiment results, benchmark, A/B test models, or measure accuracy.


 ---
 name: arize-instrumentation
-description: "INVOKE THIS SKILL when adding Arize AX tracing or observability to an app for the first time, or when the user wants to instrument their LLM app or get started with LLM observability. Follow the Agent-Assisted Tracing two-phase flow: analyze the codebase (read-only), then implement after user confirmation. When the app uses LLM tool/function calling, add manual CHAIN + TOOL spans. Leverages https://arize.com/docs/ax/alyx/tracing-assistant and https://arize.com/docs/PROMPT.md."
+description: Adds Arize AX tracing to an LLM application for the first time. Follows a two-phase agent-assisted flow to analyze the codebase then implement instrumentation after user confirmation. Use when the user wants to instrument their app, add tracing from scratch, set up LLM observability, integrate OpenTelemetry or openinference, or get started with Arize tracing.


 ---
 name: arize-link
-description: Generate deep links to the Arize UI. Use when the user wants a clickable URL to open or share a specific trace, span, session, dataset, labeling queue, evaluator, or annotation config, or when sharing Arize resources with team members.
+description: Generates deep links to the Arize UI for traces, spans, sessions, datasets, labeling queues, evaluators, and annotation configs. Produces clickable URLs for sharing Arize resources with team members. Use when the user wants to link to or open a trace, span, session, dataset, evaluator, or annotation config in the Arize UI.


 ---
 name: arize-prompt-optimization
-description: "INVOKE THIS SKILL when optimizing, improving, or debugging LLM prompts using production trace data, evaluations, and annotations. Also use when the user wants to make their AI respond better or improve AI output quality. Covers extracting prompts from spans, gathering performance signal, and running a data-driven optimization loop using the ax CLI."
+description: Optimizes, improves, and debugs LLM prompts using production trace data, evaluations, and annotations. Extracts prompts from spans, gathers performance signal, and runs a data-driven optimization loop using the ax CLI. Use when the user mentions optimize prompt, improve prompt, make AI respond better, improve output quality, prompt engineering, prompt tuning, or system prompt improvement.


 ---
 name: arize-trace
-description: "INVOKE THIS SKILL when downloading, exporting, or inspecting Arize traces and spans, or when a user wants to look at what their LLM app is doing using existing trace data, or when an already-instrumented app has a bug or error to investigate. Use for debugging unknown runtime issues, failures, and behavior regressions. Covers exporting traces by ID, spans by ID, sessions by ID, and root-cause investigation with the ax CLI."
+description: Downloads, exports, and inspects existing Arize traces and spans to understand what an LLM app is doing or debug runtime issues. Covers exporting traces by ID, spans by ID, sessions by ID, and root-cause investigation using the ax CLI. Use when the user wants to look at existing trace data, see what their LLM app is doing, export traces, download spans, investigate errors, or analyze behavior regressions.


chore: sync Arize skills from arize-skills@6a622b6c962907f54ca3578cb2…

130836d

…cabff161d8aae6 and phoenix@30ccbe6b38cc83719038bf30041335f29bae45e9

Copilot AI review requested due to automatic review settings May 13, 2026 00:56

jimbobbennett requested a review from aaronpowell as a code owner May 13, 2026 00:56

github-actions Bot added the skills PR touches skills label May 13, 2026

Copilot started reviewing on behalf of jimbobbennett May 13, 2026 00:56 View session

github-actions Bot added the skill-check-warning Skill validator reported warnings label May 13, 2026

Copilot AI reviewed May 13, 2026

View reviewed changes

aaronpowell approved these changes May 13, 2026

View reviewed changes

aaronpowell merged commit a4d0afc into github:staged May 13, 2026
15 of 16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: sync Arize skills from arize-skills#1690

chore: sync Arize skills from arize-skills#1690
aaronpowell merged 1 commit into
github:stagedfrom
Arize-ai:sync/arize-skills

jimbobbennett commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jimbobbennett commented May 13, 2026

Pull Request Checklist

Description

Type of Contribution

Additional Notes

Uh oh!

github-actions Bot commented May 13, 2026

🔍 Skill Validator Results

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants