Conversation
0b99fa4 to
0c564be
Compare
0c564be to
db8afa9
Compare
|
|
||
| request = ApiCreateBenchmarkTaskRequest() | ||
| request.slug = task | ||
| # Assume create_benchmark_task accepts ipynb content (JSON string) |
There was a problem hiding this comment.
yes can confirm, same format as kaggle kernels push
| request.task_slugs = [task_slug_obj] | ||
| request.model_slugs = models | ||
|
|
||
| response = kaggle.benchmarks.benchmark_tasks_api_client.batch_schedule_benchmark_task_runs(request) |
There was a problem hiding this comment.
client check of task status, only run when the task status is successful
andrewmwang
left a comment
There was a problem hiding this comment.
Looks great! Didn't look to hard at the download flow, yet, but the other ones LGTM
| } | ||
|
|
||
| @staticmethod | ||
| def _make_task_slug(task: str) -> ApiBenchmarkTaskSlug: |
There was a problem hiding this comment.
[nit]: _make_api_task_slug or _make_api_task_slug_object
| for decorator in node.decorator_list: | ||
| func = decorator.func if isinstance(decorator, ast.Call) else decorator | ||
|
|
||
| if not ((isinstance(func, ast.Name) and func.id == 'task') or |
There was a problem hiding this comment.
Nice! Would it be also possible to get the description field here also? Then I could set up a big portion of the TDP before the first session completes running
|
|
||
| request = ApiCreateBenchmarkTaskRequest() | ||
| request.slug = task | ||
| # Assume create_benchmark_task accepts ipynb content (JSON string) |
There was a problem hiding this comment.
yes can confirm, same format as kaggle kernels push
|
|
||
| response = kaggle.benchmarks.benchmark_tasks_api_client.create_benchmark_task(request) | ||
| print(f"Task '{task}' pushed.") | ||
| print(f"Task URL: {response.url}") |
There was a problem hiding this comment.
[super nit]: the response url could be better formatted to include https://kaggle.com/...
I could also just do that server side, so feel free to ignore
| # Assume create_benchmark_task accepts ipynb content (JSON string) | ||
| request.text = notebook_content | ||
|
|
||
| response = kaggle.benchmarks.benchmark_tasks_api_client.create_benchmark_task(request) |
There was a problem hiding this comment.
need to add error handling here. You could read the ErrorMessage field in the response
| def benchmarks_tasks_list_cli(self, regex=None, status=None): | ||
| request = ApiListBenchmarkTasksRequest() | ||
| if regex: | ||
| request.regex_filter = regex |
There was a problem hiding this comment.
@nl917 were we still going to allow regex?
Benchmarks CLI Reference
The benchmarks CLI manages benchmark tasks — registering evaluation code, scheduling runs against models, monitoring progress, and downloading results.
Aliases:
kaggle benchmarksorkaggle bAll task subcommands are under
kaggle benchmarks tasks(alias:kaggle b t).Commands
push— Register a taskUpload a Python source file as a benchmark task definition. The file is expected to be a
.pyfile with percent delimiters (e.g.,# %%). The CLI converts it to an.ipynbfile before uploading. If the task already exists, it creates a new version.taskmath-eval)file-f,--fileBehavior:
.pyextension.astmodule to extract task names from@taskdecorators (supports both@taskand@kbench.taskstyles, as well as@task(name="...")with explicit names).@taskdecorator. If none are found, raisesValueErrorand stops.creation_stateisQUEUEDorRUNNING(i.e. a previous version is still being built), the push is rejected withValueError.COMPLETEDorERROREDstate, the push proceeds (creates a new version)..pyfile content to.ipynbformat (Jupyter Notebook) usingjupytext(assuming percent format).create_benchmark_task.Errors:
ValueError: File <path> does not exist— file path is invalid.ValueError: File <path> must be a .py file— file is not a Python file.ValueError: No @task decorators found in file <path>. The file must define at least one task.— the file does not contain any@task-decorated functions.ValueError: Task '<name>' not found in file <path>. Found tasks: ...— the task name doesn't match any@task-decorated function in the file.ValueError: Task '<name>' is currently being created (pending). Cannot push now.— a previous version of this task is still being processed by the server.HTTPError— server-side error (e.g. authentication failure, permission denied).Example:
list— List tasksDisplay all benchmark tasks, optionally filtered by name pattern or creation status.
regex--regexstatus--statusqueued,running,completed,erroredBehavior:
ApiListBenchmarkTasksRequest. If--regexis provided, setsregex_filter. If--statusis provided, setsstatus_filter. Both filters can be combined.list_benchmark_tasksand displays a table with columns:COMPLETED,ERRORED)Notes:
--statusvalue is passed directly to the server as a string; the server performs the filtering.Examples:
status— Show task details and run statusDisplay task metadata and per-model run information including timing and errors.
taskmath-eval)model-m,--modelBehavior:
get_benchmark_taskand prints a header:list_benchmark_task_runs, optionally filtered to specific model slugs.No runs yet. Use 'kaggle b t run <task>' to start one.RUNNING,COMPLETED,ERRORED)https://www.kaggle.com/benchmarks/runs/<id>| Error: <message>to the row iferror_messageis present.Errors:
HTTPError(404) — task does not exist on the server.HTTPError— authentication or permission errors.Examples:
run— Schedule task runsSchedule benchmark task execution against one or more models.
taskmath-eval)model-m,--modelwait--waitpoll_interval--poll-interval--wait(default: 10)Behavior:
Model selection: If no
-mis provided, fetches the list of available benchmark models vialist_benchmark_modelsand prompts the user interactively:1,3) to select specific models.allto run against every available model.ValueError.ValueError: No benchmark models available. Cannot schedule runs.Scheduling: Calls
batch_schedule_benchmark_task_runswith the task slug and selected model slugs. Output:Waiting (
--wait): After scheduling, if--waitis specified, pollslist_benchmark_task_runsat a fixed interval (default 10 seconds, configurable via--poll-interval) until all runs reach a terminal state (COMPLETEDorERRORED) or the timeout is reached. Output while waiting:Timed out waiting for runs after <timeout> seconds.0or no value is specified for--wait, it waits indefinitely.Errors:
ValueError: No benchmark models available. Cannot schedule runs.— no models exist on the server and none were specified via-m.ValueError: Invalid selection: <input>— the user entered non-numeric or out-of-range input during interactive model selection.HTTPError— server-side error (task not found, authentication failure, etc.).Examples:
download— Download run outputsDownload output files for completed and errored runs of a task.
taskmath-eval)model-m,--modeloutput-o,--output./<task>/outputBehavior:
list_benchmark_task_runs. If-mis specified, filters by those model slugs; otherwise, fetches runs for all models.COMPLETED— downloads the result output files.ERRORED— downloads log files for debugging.QUEUEDorRUNNINGstate are silently skipped (no message printed).download_benchmark_task_run_outputwhich returns a streamed HTTP response.<output>/<model_slug>_<run_id>(the output directory is created automatically if it doesn't exist).Downloading output for run <id> (<model_slug>)...andDownloaded output for <model_slug> to <path>.Notes:
-mflag is useful when a task has many models but you only need output from specific ones.Errors:
HTTPError— server-side error (authentication, task not found, download failure).Examples:
delete— Remove a taskDelete a benchmark task.
taskmath-eval)no_confirm-y,--yesBehavior:
Delete is not supported by the server yet.-yflag is accepted but has no effect since the delete operation is not implemented.Quick Reference