Environment configuration reference

Environments are configured by creating a config.mjs or config.js file that exposes an object that satisfies the EnvironmentConfig interface. This document covers all options in EnvironmentConfig and what they do.

Required properties

These properties all have to be specified in order for the environment to function

`displayName`

Human-readable name that is shown in eval reports about this environment.

`id`

Unique ID for the environment. If omitted, one is generated from the displayName.

`clientSideFramework`

ID of the client-side framework that the environment runs, for example angular.

`ratings`

An array defining the ratings that are executed as a part of the evaluation. The ratings determine the score assigned for the test run. Currently, the tool supports the following built-in ratings:

Rating Name	Description
`PerBuildRating`	Assigns a score based on the build result of the generated code, e.g. "Does it build on the first run?" or "Does it build after X repair attempts?"
`PerFileRating`	Assigns a score based on the content of individual files generated by the LLM. Can be run either against all file types by setting the `filter` to `PerFileRatingContentType.UNKNOWN` or against specific files.
`LLMBasedRating`	Rates the generated code by asking an LLM to assign a score to it, e.g. "Does this app match the specified prompts?"

`packageManager`

Name of the package manager to use to install dependencies for the evaluated code. Supports npm, pnpm and yarn. Defaults to npm.

`generationSystemPrompt`

Relative path to the system instructions that should be passed to the LLM when generating code.

`repairSystemPrompt`

Relative path to the system instructions that should be passed to the LLM when repairing failures.

`executablePrompts`

Configures the prompts that should be evaluated against the environment. Can contain either strings which represent glob patterns pointing to text files with the prompt's text (e.g. ./prompts/**/*.md) or MultiStepPrompt objects (see below). The prompts can be shared between environments (e.g. executablePrompts: ['../some-other-env/prompts/**/*.md']).

`classifyPrompts`

When enabled, the system prompts for this environment won't be included in the final report. This is useful when evaluating confidential code.

`skipInstall`

Whether to skip installing dependencies during the eval run. This is useful if you've already installed dependencies through something like pnpm workspaces.

Prompt templating

Prompts are typically stored in .md files. The tool supports the following template syntax inside of these files in order to augment the prompt and reduce boilerplate:

Helper / Variable	Description
`{{> embed file='../path/to/file.md' }}`	Embeds the content of the specified file in the current one.
`{{> contextFiles '*/.foo' }}`	Specifies files that should be passed to the LLM as context when the prompt is executed. Should be a comma-separated string of glob pattern within the environments project code. E.g. `{{> contextFiles '*/.ts, */.html' }}` passes all `.ts` and `.html` files as context.
`{{CLIENT_SIDE_FRAMEWORK_NAME}}`	Insert the name of the client-side framework of the current environment.
`{{FULL_STACK_FRAMEWORK_NAME}}`	Insert the name of the full-stack framework of the current environment.

Prompt-specific ratings

If you want to run a set of ratings against a specific prompt, set an object literal in the executablePrompts array, instead of a string:

executablePrompts: [
  // Runs only with the environment-level ratings.
  './prompts/foo/*.md',

  // Runs the ratings specific to the `contact-form.md`, as well as the environment-level ones.
  {
    path: './prompts/bar/contact-form.md',
    ratings: contactFormSpecificRatings,
  },
];

Multi-step prompts

Multistep prompts evaluate workflows composed of one or more stages. Steps execute one after another inside the same directory, but are rated individually. The tool takes snapshots after each step and includes them in the final report. You can create a multistep prompt by passing an instance of the MultiStepPrompt class into the executablePrompts array, for example:

executablePrompts: [
  new MultiStepPrompt('./prompts/about-page', {
    'step-1': ratingsForFirstStep,
    'step-2': [...ratingsForFirstStep, ratingsForSecondStep],
  }),
];

The first parameter is the directory from which to resolve the individual step prompts. All files in the directory have to be named step-{number}.md, for example:

my-env/prompts/about-page/step-1.md:

Create an "About us" page.

my-env/prompts/about-page/step-2.md:

Add a contact form to the "About us" page

my-env/prompts/about-page/step-3.md:

Make it so submitting the contact form redirects the user back to the homepage.

The second parameter of MultiStepPrompt defines ratings that should be run only against specific steps. The key is the name of the step (e.g. step-2) while the value are the ratings that should run against it.

Optional properties

These properties aren't required for the environment to run, but can be used to configure it further.

`sourceDirectory`

Directory into which the LLM-generated files are written, built, executed, and evaluated. Can be an entire project or a handful of files to be merged with the projectTemplate (see below)

`projectTemplate`

Used for reducing the boilerplate when setting up an environment, projectTemplate specifies the path of a project template to be merged together with the files from sourceDirectory, creating the final project structure against which the evaluation runs.

For example, if the config has projectTemplate: './templates/angular', sourceDirectory: './project', the eval runner copies the files from ./templates/angular into the output directory and then applies the files from ./project on top of them, merging directories and replacing overlapping files.

`fullStackFramework`

Name of the full-stack framework that used in the evaluation, in addition to the clientSideFramework. If omitted, the fullStackFramework is set to the same value as the clientSideFramework.

`mcpServers`

IDs of Model Context Protocol (MCP) servers that are started and exposed to the LLM as a part of the evaluation.

`buildCommand`

Command used to build the generated code as a part of the evaluation. Defaults to <package manager> run build.

`serveCommand`

Command used to start a local dev server as a part of the evaluation. Defaults to <package manager> run start --port 0.

`testCommand`

Command used to run tests against the generated code. If this property is not provided, tests will not be run. The command should exit with code 0 on success and a non-zero exit code on failure. The output from the command (both stdout and stderr) is captured and used for repair attempts if the tests fail. The test command will time out after 4 minutes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Environment configuration reference

Required properties

`displayName`

`id`

`clientSideFramework`

`ratings`

`packageManager`

`generationSystemPrompt`

`repairSystemPrompt`

`executablePrompts`

`classifyPrompts`

`skipInstall`

Prompt templating

Prompt-specific ratings

Multi-step prompts

Optional properties

`sourceDirectory`

`projectTemplate`

`fullStackFramework`

`mcpServers`

`buildCommand`

`serveCommand`

`testCommand`

FilesExpand file tree

environment-reference.md

Latest commit

History

environment-reference.md

File metadata and controls

Environment configuration reference

Required properties

displayName

id

clientSideFramework

ratings

packageManager

generationSystemPrompt

repairSystemPrompt

executablePrompts

classifyPrompts

skipInstall

Prompt templating

Prompt-specific ratings

Multi-step prompts

Optional properties

sourceDirectory

projectTemplate

fullStackFramework

mcpServers

buildCommand

serveCommand

testCommand

`displayName`

`id`

`clientSideFramework`

`ratings`

`packageManager`

`generationSystemPrompt`

`repairSystemPrompt`

`executablePrompts`

`classifyPrompts`

`skipInstall`

`sourceDirectory`

`projectTemplate`

`fullStackFramework`

`mcpServers`

`buildCommand`

`serveCommand`

`testCommand`