To run the script, go to the root of this repo and use the following command from the repository root:
python evaluation/scripts/gencode_json.py [options]Your first need to set up your API keys. For this, create a keys.cfg file at the root of the repository
and add keys as follows:
OPENAI_KEY = 'your_api_key'
ANTHROPIC_KEY = 'your_api_key'
GOOGLE_KEY = 'your_api_key'
For example, to create model results with gpt-4o and the default settings, run
python evaluation/scripts/gencode_json.py --model gpt-4o--model- Specifies the model name used for generating responses.--output-dir- Directory to store the generated code outputs (Default:eval_results/generated_code).--input-path- Directory containing the JSON files describing the problems (Default:eval/data/problems_all.jsonl).--prompt-dir- Directory where prompt files are saved (Default:eval_results/prompt).--temperature- Controls the randomness of the generation (Default: 0).
Download the numeric test results and save them as ./eval/data/test_data.h5
To run the script, go to the root of this repo and use the following command:
python evaluation/scripts/test_generated_code.pyPlease edit the test_generated_code.py source file to specify your model name, results directory and problem set (if not problems_all.jsonl).