OpenAdapt-ML

The ML engine for OpenAdapt -- open-source desktop automation with demo-conditioned AI agents.

OpenAdapt-ML provides the GUI-specific ML layer for training and running vision-language model (VLM) agents that automate desktop tasks. It handles everything between raw screen recordings and a production policy API: canonical schemas for GUI trajectories, VLM adapters, supervised fine-tuning with TRL + Unsloth, grounding, and demo-conditioned inference.

Demos

Synthetic Login -- Qwen3-VL-2B fine-tuned on synthetic UI scenarios:

Key Features

GUI trajectory schemas -- Pydantic models for Episodes, Steps, Actions, and Observations with JSON Schema export and format converters (WAA, WebArena)
VLM adapters -- Unified interface for Qwen3-VL, Qwen2.5-VL, Claude, GPT, and Gemini with automatic device selection (CUDA / MPS / CPU)
Supervised fine-tuning -- TRL SFTTrainer with Unsloth optimizations for 2x faster training and 50% less VRAM via LoRA adapters
Runtime policy API -- AgentPolicy that predicts the next GUI action (CLICK, TYPE, DONE) from a screenshot and goal
Demo-conditioned inference -- Retrieval-augmented prompting using recorded demonstrations for trajectory-conditioned disambiguation
Grounding module -- Locate UI elements via Gemini vision API, oracle bounding boxes, or Set-of-Marks (SoM) overlays
Cloud GPU training -- One-command training pipelines for Lambda Labs and Azure
Synthetic data generation -- Configurable UI scenarios (login, registration) with layout jitter for rapid iteration

Installation

# Core package
pip install openadapt-ml

# With training dependencies (TRL + datasets)
pip install openadapt-ml[training]

# With API-backed VLMs (Claude, GPT)
pip install openadapt-ml[api]

# Development (from source)
git clone https://github.com/OpenAdaptAI/openadapt-ml.git
cd openadapt-ml
uv sync

Quick Start

Run a smoke test

# Model-free policy demo (no GPU required)
uv run python -m openadapt_ml.scripts.demo_policy --backend dummy

Train on synthetic data

# Fine-tune Qwen3-VL on synthetic login scenario
uv run python -m openadapt_ml.scripts.train \
  --config configs/qwen3vl_synthetic.yaml

Train on real recordings

# Record a workflow with openadapt-capture, then train
uv run python -m openadapt_ml.scripts.train \
  --config configs/qwen3vl_capture.yaml \
  --capture ~/captures/my-workflow \
  --open  # Opens training dashboard in browser

End-to-end benchmark (train + eval + plot)

uv run python -m openadapt_ml.scripts.run_qwen_login_benchmark \
  --config configs/qwen3vl_synthetic_dev.yaml \
  --out-dir experiments/qwen_login/2b_dev

Use the policy API

from openadapt_ml.runtime.policy import AgentPolicy
from openadapt_ml.models.qwen_vl import QwenVLAdapter

adapter = QwenVLAdapter(model_name="Qwen/Qwen3-VL-2B-Instruct")
policy = AgentPolicy(adapter)

# Given an SFT-style sample (screenshot + goal + chat history):
output = policy.predict(sample)
print(output.action)   # Action(type=CLICK, coordinates={"x": 0.45, "y": 0.71})
print(output.thought)  # "Click the Login button"

Use the schema

from openadapt_ml.schema import Episode, Step, Action, Observation, ActionType

episode = Episode(
    episode_id="demo_001",
    instruction="Open Notepad and type Hello World",
    steps=[
        Step(
            step_index=0,
            observation=Observation(screenshot_path="step_0.png"),
            action=Action(type=ActionType.CLICK, coordinates={"x": 100, "y": 200}),
        ),
        Step(
            step_index=1,
            observation=Observation(screenshot_path="step_1.png"),
            action=Action(type=ActionType.TYPE, text="Hello World"),
        ),
    ],
    success=True,
)

Architecture

openadapt_ml/
├── schema/              # Episode, Step, Action, Observation (Pydantic models)
│   ├── episode.py       #   Core dataclasses + JSON Schema export
│   └── converters.py    #   WAA/WebArena format converters
├── models/              # VLM adapters
│   ├── base_adapter.py  #   BaseVLMAdapter ABC
│   ├── qwen_vl.py       #   Qwen3-VL, Qwen2.5-VL
│   ├── api_adapter.py   #   Claude, GPT (inference-only)
│   └── dummy_adapter.py #   Fake adapter for testing
├── training/            # Fine-tuning pipeline
│   ├── trl_trainer.py   #   TRL SFTTrainer + Unsloth
│   ├── trainer.py       #   Training orchestration
│   └── viewer.py        #   Training dashboard (HTML)
├── runtime/             # Inference
│   ├── policy.py        #   AgentPolicy (screenshot -> action)
│   └── safety_gate.py   #   Action safety checks
├── datasets/            # Data loading
│   └── next_action.py   #   Episodes -> SFT chat samples
├── ingest/              # Data ingestion
│   ├── synthetic.py     #   Synthetic UI generation
│   ├── capture.py       #   openadapt-capture loader
│   └── loader.py        #   Generic episode loader
├── grounding/           # UI element localization
│   ├── base.py          #   OracleGrounder, GroundingModule ABC
│   └── detector.py      #   GeminiGrounder, SoM overlays
├── retrieval/           # Demo-conditioned inference
│   ├── retriever.py     #   Demo retrieval for RAG prompting
│   └── embeddings.py    #   Screenshot/action embeddings
├── benchmarks/          # ML-specific benchmark agents
│   └── agent.py         #   PolicyAgent, APIBenchmarkAgent, UnifiedBaselineAgent
├── cloud/               # Cloud GPU training
│   ├── lambda_labs.py   #   Lambda Labs integration
│   ├── local.py         #   Local training (CUDA/MPS)
│   └── ssh_tunnel.py    #   SSH tunnel management
├── segmentation/        # Recording segmentation pipeline
├── evals/               # Evaluation metrics (grounding, trajectory matching)
├── config.py            # Settings via pydantic-settings
└── scripts/             # CLI entry points (train, eval, compare, demo)

Benchmark Results

Synthetic Login (Qwen3-VL-2B with Set-of-Marks)

Metric	Score
Action Type Accuracy	100%
Element Accuracy	100%
Episode Success Rate	100%

Multi-Model Comparison (Synthetic Login, coordinate mode)

Model	Action Accuracy	Coord Error	Click Hit Rate
Qwen3-VL-2B FT	0.469	0.051	0.850
Qwen3-VL-8B FT	0.286	0.004	1.000
Claude Sonnet 4.5	0.121	0.757	0.000
GPT-5.1	0.183	0.057	0.600

These are results on a controlled synthetic benchmark with ~3 UI elements. They validate that the training pipeline works, not real-world performance. Evaluation on standard benchmarks (WAA, WebArena) is ongoing via openadapt-evals.

Cloud GPU Training

Lambda Labs

export LAMBDA_API_KEY=your_key_here

# One-command: launch, train, download, terminate
uv run python -m openadapt_ml.cloud.lambda_labs train \
  --capture ~/captures/my-workflow \
  --goal "Turn off Night Shift in System Settings"

Local (CUDA / Apple Silicon)

uv run python -m openadapt_ml.cloud.local train \
  --capture ~/captures/my-workflow --open

Ecosystem

OpenAdapt-ML is one component in the OpenAdapt stack:

Package	Purpose
openadapt-ml	ML engine: schemas, VLM adapters, training, inference, grounding
openadapt-evals	Evaluation infrastructure: VM management, pool orchestration, benchmark runners, `oa-vm` CLI
openadapt-capture	Lightweight GUI recording and demo sharing
OpenAdapt	Desktop automation platform (end-user application)

Looking for benchmark evaluation, Azure VM management, or the oa-vm CLI? Those live in openadapt-evals.

Documentation

docs/design.md -- System design (schemas, adapters, training, runtime)
docs/cloud_gpu_training.md -- Lambda Labs and Azure training guide
docs/qwen_login_experiment.md -- Synthetic benchmark reproduction
docs/gemini_grounding.md -- Grounding module documentation

Contributing

# Clone and install dev dependencies
git clone https://github.com/OpenAdaptAI/openadapt-ml.git
cd openadapt-ml
uv sync --extra dev --extra training

# Run tests
uv run pytest

# Lint
uv run ruff check .

We use Angular-style commits (feat:, fix:, docs:, etc.) with Python Semantic Release for automated versioning and PyPI publishing.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
.github/workflows		.github/workflows
configs		configs
deprecated		deprecated
docs		docs
examples		examples
experiment_results/representation_shootout		experiment_results/representation_shootout
experiments/qwen_login		experiments/qwen_login
negative_control_results		negative_control_results
openadapt_ml		openadapt_ml
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
RETRIEVAL_QUICKSTART.md		RETRIEVAL_QUICKSTART.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAdapt-ML

Demos

Key Features

Installation

Quick Start

Run a smoke test

Train on synthetic data

Train on real recordings

End-to-end benchmark (train + eval + plot)

Use the policy API

Use the schema

Architecture

Benchmark Results

Synthetic Login (Qwen3-VL-2B with Set-of-Marks)

Multi-Model Comparison (Synthetic Login, coordinate mode)

Cloud GPU Training

Lambda Labs

Local (CUDA / Apple Silicon)

Ecosystem

Documentation

Contributing

License

About

Uh oh!

Releases 10

Packages

Contributors 2

Uh oh!

Languages

License

OpenAdaptAI/openadapt-ml

Folders and files

Latest commit

History

Repository files navigation

OpenAdapt-ML

Demos

Key Features

Installation

Quick Start

Run a smoke test

Train on synthetic data

Train on real recordings

End-to-end benchmark (train + eval + plot)

Use the policy API

Use the schema

Architecture

Benchmark Results

Synthetic Login (Qwen3-VL-2B with Set-of-Marks)

Multi-Model Comparison (Synthetic Login, coordinate mode)

Cloud GPU Training

Lambda Labs

Local (CUDA / Apple Silicon)

Ecosystem

Documentation

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 2

Uh oh!

Languages

Packages