Skip to content

Infini-AI-Lab/astraflow

Repository files navigation

AstraFlow — Dataflow-Oriented Reinforcement Learning for (Multi-)Agentic LLMs

arXiv Blog Docs Site License


AstraFlow is a dataflow-oriented reinforcement learning system designed for better flexibility and scalability.

AstraFlow natively supports the following for LLM RL training without any feature-specific system engineering:

  • Fully Async Multi-policy collaborative RL
  • Elastic heterogeneous cross-region rollouts
  • Substitutable rollout and trainer service
  • Composable data algorithms

Elastic RaaS pool of mixed-hardware nodes joining and leaving across regions

Elastic Heterogeneous Cross-region Rollouts: RaaS instances on mixed hardware and across regions join and leave the rollout pool on demand, with no scheduler- or region-specific code.


AstraFlow training a multi-policy workflow on an elastic, heterogeneous, cross-region rollout pool

Fully Async Multi-policy Collaborative RL Training: multiple policies train together, each as an independent trainer with its own data and weight stream.

News

  • [2026/05] AstraFlow v0.1.0 released — first public release of the full system. See the project website.
  • [2026/05] AstraFlow paper is on arXiv.

Getting Started

Recipes

AstraFlow currently supports the following recipes. Check the documentation for more detailed instructions.

Recipe Description
math/ RLVR math reasoning — Qwen3-1.7B / 8B, M2PO, full and delta-weight transfer
math-multi-agent/ Actor + verifier collaborative math training
math-efficient-data/ Composable data algorithms — GRESO, dynamic sampling, buffer replay
code/ Code-generation RL — Qwen3-8B, M2PO
code-multi-agent/ Codegen + verifier competitive coding
search/ Search-augmented agent training with local retrieval
alfworld/ ALFWorld embodied household agent
webshop/ WebShop web-navigation shopping agent

Roadmap

Near-term focus:

  • Offline cluster training — Support training on offline clusters without internet access.
  • All-in-one launcher — A launcher helper that streamlines bringing up the AstraFlow, RaaS, and trainer services.
  • MoE model support — Extend the training backends to Mixture-of-Experts models.
  • Terminal-Bench training — Add a recipe for training agents on Terminal-Bench.
  • Megatron backend — Add Megatron-LM as a training backend.
  • vLLM rollout engine — Support vLLM alongside SGLang as a rollout engine.

Citation

If you find AstraFlow useful in your research, please cite:

@article{zheng2026astraflow,
  title   = {AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs},
  author  = {Zheng, Haizhong and Di, Yizhuo and Wang, Jiahui and Jin, Shuowei and
             Liu, Xueshen and Wu, Yongji and Mao, Z. Morley and Stoica, Ion and
             Zhao, Jiawei and Chen, Beidi},
  journal = {arXiv preprint arXiv:2605.15565},
  year    = {2026}
}

Acknowledgment

We learned the design and reused code from the following projects: AReaL, verl, AgentBench, ASearcher, and M2PO.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors