GitHub - Infini-AI-Lab/astraflow

AstraFlow — Dataflow-Oriented Reinforcement Learning for (Multi-)Agentic LLMs

AstraFlow is a dataflow-oriented reinforcement learning system designed for better flexibility and scalability.

AstraFlow natively supports the following for LLM RL training without any feature-specific system engineering:

Fully Async Multi-policy collaborative RL
Elastic heterogeneous cross-region rollouts
Substitutable rollout and trainer service
Composable data algorithms

Elastic RaaS pool of mixed-hardware nodes joining and leaving across regions

Elastic Heterogeneous Cross-region Rollouts: RaaS instances on mixed hardware and across regions join and leave the rollout pool on demand, with no scheduler- or region-specific code.

AstraFlow training a multi-policy workflow on an elastic, heterogeneous, cross-region rollout pool

Fully Async Multi-policy Collaborative RL Training: multiple policies train together, each as an independent trainer with its own data and weight stream.

News

[2026/05] AstraFlow v0.1.0 released — first public release of the full system. See the project website.
[2026/05] AstraFlow paper is on arXiv.

Getting Started

Recipes

AstraFlow currently supports the following recipes. Check the documentation for more detailed instructions.

Recipe	Description
`math/`	RLVR math reasoning — Qwen3-1.7B / 8B, M2PO, full and delta-weight transfer
`math-multi-agent/`	Actor + verifier collaborative math training
`math-efficient-data/`	Composable data algorithms — GRESO, dynamic sampling, buffer replay
`code/`	Code-generation RL — Qwen3-8B, M2PO
`code-multi-agent/`	Codegen + verifier competitive coding
`search/`	Search-augmented agent training with local retrieval
`alfworld/`	ALFWorld embodied household agent
`webshop/`	WebShop web-navigation shopping agent

Roadmap

Near-term focus:

Offline cluster training — Support training on offline clusters without internet access.
All-in-one launcher — A launcher helper that streamlines bringing up the AstraFlow, RaaS, and trainer services.
MoE model support — Extend the training backends to Mixture-of-Experts models.
Terminal-Bench training — Add a recipe for training agents on Terminal-Bench.
Megatron backend — Add Megatron-LM as a training backend.
vLLM rollout engine — Support vLLM alongside SGLang as a rollout engine.

Citation

If you find AstraFlow useful in your research, please cite:

@article{zheng2026astraflow,
  title   = {AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs},
  author  = {Zheng, Haizhong and Di, Yizhuo and Wang, Jiahui and Jin, Shuowei and
             Liu, Xueshen and Wu, Yongji and Mao, Z. Morley and Stoica, Ion and
             Zhao, Jiawei and Chen, Beidi},
  journal = {arXiv preprint arXiv:2605.15565},
  year    = {2026}
}

Acknowledgment

We learned the design and reused code from the following projects: AReaL, verl, AgentBench, ASearcher, and M2PO.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
astraEnv		astraEnv
astraflow		astraflow
docker		docker
docs		docs
examples		examples
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News

Getting Started

Recipes

Roadmap

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News

Getting Started

Recipes

Roadmap

Citation

Acknowledgment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages