Skip to content

Latest commit

 

History

History
186 lines (126 loc) · 8.86 KB

File metadata and controls

186 lines (126 loc) · 8.86 KB

Environment configuration reference

Environments are configured by creating a config.mjs or config.js file that exposes an object that satisfies the EnvironmentConfig interface. This document covers all options in EnvironmentConfig and what they do.

Required properties

These properties all have to be specified in order for the environment to function

displayName

Human-readable name that is shown in eval reports about this environment.

id

Unique ID for the environment. If omitted, one is generated from the displayName.

clientSideFramework

ID of the client-side framework that the environment runs, for example angular.

ratings

An array defining the ratings that are executed as a part of the evaluation. The ratings determine the score assigned for the test run. Currently, the tool supports the following built-in ratings:

Rating Name Description
PerBuildRating Assigns a score based on the build result of the generated code, e.g. "Does it build on the first run?" or "Does it build after X repair attempts?"
PerFileRating Assigns a score based on the content of individual files generated by the LLM. Can be run either against all file types by setting the filter to
PerFileRatingContentType.UNKNOWN or against specific files.
LLMBasedRating Rates the generated code by asking an LLM to assign a score to it, e.g. "Does this app match the specified prompts?"

packageManager

Name of the package manager to use to install dependencies for the evaluated code. Supports npm, pnpm and yarn. Defaults to npm.

generationSystemPrompt

Relative path to the system instructions that should be passed to the LLM when generating code.

repairSystemPrompt

Relative path to the system instructions that should be passed to the LLM when repairing failures.

executablePrompts

Configures the prompts that should be evaluated against the environment. Can contain either strings which represent glob patterns pointing to text files with the prompt's text (e.g. ./prompts/**/*.md) or MultiStepPrompt objects (see below). The prompts can be shared between environments (e.g. executablePrompts: ['../some-other-env/prompts/**/*.md']).

classifyPrompts

When enabled, the system prompts for this environment won't be included in the final report. This is useful when evaluating confidential code.

skipInstall

Whether to skip installing dependencies during the eval run. This is useful if you've already installed dependencies through something like pnpm workspaces.

Prompt templating

Prompts are typically stored in .md files. The tool supports the following template syntax inside of these files in order to augment the prompt and reduce boilerplate:

Helper / Variable Description
{{> embed file='../path/to/file.md' }} Embeds the content of the specified file in the current one.
{{> contextFiles '**/*.foo' }} Specifies files that should be passed to the LLM as context when the prompt is executed. Should be a comma-separated string of glob pattern within the environments project code. E.g. {{> contextFiles '**/*.ts, **/*.html' }} passes all .ts and .html files as context.
{{CLIENT_SIDE_FRAMEWORK_NAME}} Insert the name of the client-side framework of the current environment.
{{FULL_STACK_FRAMEWORK_NAME}} Insert the name of the full-stack framework of the current environment.

Prompt-specific ratings

If you want to run a set of ratings against a specific prompt, set an object literal in the executablePrompts array, instead of a string:

executablePrompts: [
  // Runs only with the environment-level ratings.
  './prompts/foo/*.md',

  // Runs the ratings specific to the `contact-form.md`, as well as the environment-level ones.
  {
    path: './prompts/bar/contact-form.md',
    ratings: contactFormSpecificRatings,
  },
];

Multi-step prompts

Multistep prompts evaluate workflows composed of one or more stages. Steps execute one after another inside the same directory, but are rated individually. The tool takes snapshots after each step and includes them in the final report. You can create a multistep prompt by passing an instance of the MultiStepPrompt class into the executablePrompts array, for example:

executablePrompts: [
  new MultiStepPrompt('./prompts/about-page', {
    'step-1': ratingsForFirstStep,
    'step-2': [...ratingsForFirstStep, ratingsForSecondStep],
  }),
];

The first parameter is the directory from which to resolve the individual step prompts. All files in the directory have to be named step-{number}.md, for example:

my-env/prompts/about-page/step-1.md:

Create an "About us" page.

my-env/prompts/about-page/step-2.md:

Add a contact form to the "About us" page

my-env/prompts/about-page/step-3.md:

Make it so submitting the contact form redirects the user back to the homepage.

The second parameter of MultiStepPrompt defines ratings that should be run only against specific steps. The key is the name of the step (e.g. step-2) while the value are the ratings that should run against it.

Optional properties

These properties aren't required for the environment to run, but can be used to configure it further.

sourceDirectory

Directory into which the LLM-generated files are written, built, executed, and evaluated. Can be an entire project or a handful of files to be merged with the projectTemplate (see below)

projectTemplate

Used for reducing the boilerplate when setting up an environment, projectTemplate specifies the path of a project template to be merged together with the files from sourceDirectory, creating the final project structure against which the evaluation runs.

For example, if the config has projectTemplate: './templates/angular', sourceDirectory: './project', the eval runner copies the files from ./templates/angular into the output directory and then applies the files from ./project on top of them, merging directories and replacing overlapping files.

fullStackFramework

Name of the full-stack framework that used in the evaluation, in addition to the clientSideFramework. If omitted, the fullStackFramework is set to the same value as the clientSideFramework.

mcpServers

IDs of Model Context Protocol (MCP) servers that are started and exposed to the LLM as a part of the evaluation.

buildCommand

Command used to build the generated code as a part of the evaluation. Defaults to <package manager> run build.

serveCommand

Command used to start a local dev server as a part of the evaluation. Defaults to <package manager> run start --port 0.

testCommand

Command used to run tests against the generated code. If this property is not provided, tests will not be run. The command should exit with code 0 on success and a non-zero exit code on failure. The output from the command (both stdout and stderr) is captured and used for repair attempts if the tests fail. The test command will time out after 4 minutes.