Skip to content
@AMAP-ML

AMAP-ML

AMAP-ML

Foundation AI for Spatial and Generative Intelligence

Building product-facing foundation AI systems for maps, mobility, generative assets, and interactive worlds.
AMAP-ML is an AI team at Alibaba AMAP. We connect research, engineering, and real-world deployment across spatial intelligence, generative intelligence, reasoning agents, reinforcement learning, multimodal models, world models, and product-scale AIGC systems.

GitHub followers Join Us

Our work spans production systems, open-source projects, benchmarks, and publications at ICLR, CVPR, ACL, AAAI, SIGGRAPH, ICCV, ICML, EMNLP, ACM MM, and WWW. We release code and evaluation assets to help the community reproduce, compare, and extend our work.

Join us | Research interns, full-time researchers, and AI engineers in LLM agents, reinforcement learning, world models, multimodal learning, spatial intelligence, and generative AI are welcome to get in touch.


Highlights

30+ Open-source Projects  |  11 ICLR 2026 Papers  |  10 CVPR 2026 Papers  |  7 ACL 2026 Papers
5 AAAI 2026 Papers  |  4 ICML 2026 Papers  |  5 ICCV 2025 Papers  |  2 EMNLP 2025 Oral Papers
Focus: Spatial Intelligence · Generative Intelligence · LLM Agents · World Models · Multimodal AI


What We Build

AMAP-ML builds foundation AI capabilities around two AMAP product anchors:

Product anchor What it means
Spatial intelligence AI agents and models that understand, reason, plan, and act in real-world map, mobility, urban, autonomous-driving, and local-service scenarios.
Generative intelligence AIGC systems that create, edit, evaluate, and control visual assets, videos, 3D scenes, media content, and interactive experiences for map-native and local-service scenarios.

These product anchors are powered by a shared technical stack:

Capability Role
Reasoning agents and reinforcement learning Train agents and models to reason, use tools, plan, self-reflect, and improve through feedback.
World models and interactive AI Simulate future states, dynamic transitions, GUI environments, autonomous-driving scenes, and interactive 4D worlds.
Multimodal understanding Connect vision, language, maps, videos, GUIs, urban scenes, user intent, and product signals.
Generative modeling Produce and edit visual assets with controllability, quality, spatial consistency, temporal coherence, and product usability.
Benchmarks and evaluation Turn real AMAP scenarios into reusable public tasks, metrics, datasets, and reproducible evaluation suites.

Flagship Releases

These projects balance narrative fit with public adoption signals such as GitHub stars. They are organized around the two product anchors and the core AI systems that support them.

High-Signal Releases

Project What it represents Signal
SkillClaw Agentic skill evolution from real interaction traces. Agent system / high-star repo
FluxText Scene-text editing for controllable visual asset generation. Generative AI / high-star repo
Code2World GUI world model via renderable code generation. World model / high-star repo
Tree-GRPO Tree-search rollouts for LLM agent reinforcement learning. RL method / high-star repo
GPG Minimalist group policy gradient baseline for model reasoning. RL method / framework adoption
MobilityBench Route-planning agent evaluation grounded in real-world mobility scenarios. Spatial intelligence anchor

Spatial Intelligence

Project What it represents Artifact
MobilityBench Route-planning agent evaluation grounded in real-world mobility scenarios. Benchmark
Thinking-with-Map Map-augmented reasoning agent for geolocalization. Agent / ACL 2026 Findings
AutoDrive-R2 Reasoning and self-reflection for VLA models in autonomous driving. VLA / ICLR 2026
SpatialGenEval Spatial intelligence evaluation for text-to-image models. Benchmark / ICLR 2026

Generative Intelligence

Project What it represents Artifact
FluxText Scene-text editing for controllable visual asset generation. Generative AI
FE2E Editing-prior foundation model for dense geometry estimation. Generative vision / CVPR 2026
Omni-Effects Prompt-guided and spatially controllable visual effects generation. Generative AI / AAAI 2026
RL3DEdit Geometry-guided RL for multi-view consistent 3D scene editing. 3D editing / CVPR 2026
MACE-Dance Motion-appearance cascaded generation for music-driven dance video. Video generation / SIGGRAPH 2026

Core AI Systems

Project What it represents Artifact
Code2World GUI world model via renderable code generation. World model
DreamX-World Interactive world model for controllable world simulation. World model
Tree-GRPO Tree-search rollouts for LLM agent reinforcement learning. Method / ICLR 2026
GPG Minimalist group policy gradient baseline for model reasoning. Method / ICLR 2026
SkillClaw Agentic skill evolution for reusable skill libraries. Agent system
CoEvolve Agent-data mutual evolution for training LLM agents. Agent RL / ACL 2026

Recent Updates

  • 2026.05.18 MobilityBench is accepted by KDD 2026 -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
  • 2026.05.12 CoEvolve trains LLM agents through agent-data mutual evolution, using failure signals to synthesize harder tasks as the agent improves (ACL 2026).
  • 2026.05.12 Thinking-with-Map strengthens geolocalization with a reinforced parallel map-augmented reasoning agent (ACL 2026 Findings).
  • 2026.05.11 DreamX-World released the 5B-Cam model and inference code for interactive world simulation.
  • 2026.04.22 DCW mitigates SNR-t bias and improves diffusion generation quality across model families (CVPR 2026).
  • 2026.04.22 EMF extends efficient one-step generation from class-conditioned synthesis to text-conditioned image generation (CVPR 2026).
  • 2026.04.10 SkillClaw turns real interaction traces into reusable, evolving skill libraries.
  • 2026.04.01 MACE-Dance decouples motion generation and appearance synthesis for high-quality music-driven dance video (SIGGRAPH 2026).
  • 2026.03.23 Omni-WorldBench evaluates world models in dynamic 4D interactive settings.
  • 2026.03.20 AutoDrive-R2 improves VLA models with reasoning and self-reflection for autonomous driving scenarios (ICLR 2026).
  • 2026.03.18 Video-STAR uses tool-augmented reinforcement learning for open-vocabulary action recognition in video (ICLR 2026).
  • 2026.03.11 RL3DEdit uses geometry-guided reinforcement learning to make 3D scene edits more multi-view consistent (CVPR 2026).
Earlier Updates
  • 2026.03.01 FE2E transfers image-editing priors into dense depth and normal estimation (CVPR 2026).
  • 2026.02.28 FASA improves sparse decoding with frequency-aware attention (ICLR 2026).
  • 2026.02.27 Eevee provides high-resolution data and evaluation for video-based virtual try-on (CVPR 2026 Findings).
  • 2026.02.06 MobilityBench evaluates route-planning agents in real-world mobility scenarios (KDD 2026).
  • 2026.02.06 SpatialGenEval benchmarks spatial intelligence in text-to-image models (ICLR 2026).
  • 2026.02.06 Tree-GRPO replaces independent chain rollouts with tree-search rollouts for LLM agent reinforcement learning (ICLR 2026).
  • 2026.02.04 Code2World predicts GUI transitions through renderable code generation.
  • 2026.02.04 GPG provides a simple group policy gradient baseline for model reasoning (ICLR 2026).
  • 2025.10.22 Taming-Hallucinations reduces MLLM video hallucinations with counterfactual video generation.
  • 2025.06.20 FluxText provides a diffusion transformer baseline for scene-text editing.

Project Map

Spatial Intelligence and Mobility

Repository Contribution Venue
MobilityBench Route-planning agent evaluation in real-world mobility scenarios. KDD 2026
Thinking-with-Map Map-augmented geolocalization agent trained with reinforcement learning. ACL 2026 Findings
AutoDrive-R2 Reasoning and self-reflection for VLA models in autonomous driving. ICLR 2026
SpatialGenEval Spatial intelligence evaluation for text-to-image models. ICLR 2026
SocioReasoner Vision-language reasoning for urban socio-semantic segmentation. ICLR 2026
DSFNet Multi-scenario route ranking with a public industrial driving-route dataset and AMAP deployment. WWW 2025
IntTravel Real-world dataset and generative framework for integrated multi-task travel recommendation. arXiv 2026

Reasoning Agents and Reinforcement Learning

Repository Contribution Venue
SkillClaw Agentic evolver for collective skill library improvement. -
Tree-GRPO Tree-search rollouts for LLM agent reinforcement learning. ICLR 2026
GPG Simple and strong group policy gradient baseline for model reasoning. ICLR 2026
CoEvolve Agent-data mutual evolution for training LLM agents. ACL 2026
MathForge Difficulty-aware GRPO and multi-aspect reformulation for math reasoning. ICLR 2026
Video-STAR Tool-using reinforcement learning for open-vocabulary action recognition. ICLR 2026
FASA Frequency-aware sparse attention for efficient sparse decoding. ICLR 2026

World Models and Interactive AI

Repository Contribution Venue
DreamX-World General-purpose world model for interactive world simulation. -
Code2World GUI world model via renderable code generation. -
Omni-WorldBench Benchmark for interactive response capabilities of world models. arXiv 2026

Multimodal and Vision-Language Models

Repository Contribution Venue
UniVG-R1 Reasoning-guided universal visual grounding with reinforcement learning. CVPR 2026
RealQA Realistic image quality and aesthetic scoring with multimodal LLMs. -
Taming-Hallucinations Counterfactual video generation for reducing MLLM video hallucinations. -

Generative Intelligence

Repository Contribution Venue
FluxText Diffusion transformer baseline for scene-text editing. -
FE2E Image editing priors for depth and normal estimation. CVPR 2026
RL3DEdit Geometry-guided reinforcement learning for multi-view consistent 3D scene editing. CVPR 2026
MACE-Dance Motion-appearance cascaded generation for music-driven dance video. SIGGRAPH 2026
Omni-Effects Prompt-guided and spatially controllable composite visual effects generation. AAAI 2026
S2-Guidance Training-free stochastic self-guidance for diffusion models. ICLR 2026
EPG Pixel-space generative modeling via self-supervised pre-training. ICLR 2026
USP Unified self-supervised pretraining in VAE space for diffusion models. ICCV 2025
EMF Text-conditioned one-step image generation. CVPR 2026
DCW Differential correction for SNR-t bias in diffusion probabilistic models. CVPR 2026
NarrLV Narrative-centric evaluation for long video generation models. ICLR 2026
ImagerySearch Adaptive test-time search for video generation. AAAI 2026
Eevee High-resolution benchmark for video-based virtual try-on. CVPR 2026 Findings
VMBench Perception-aligned benchmark for video motion generation. ICCV 2025

For Collaborators and Applicants

We are actively looking for people who enjoy building strong AI systems: clean code, reproducible experiments, rigorous evaluation, ambitious problem selection, and real-world product impact.

If you are interested in research internships, full-time roles, or academic collaboration, please email cxxgtxy@gmail.com (homepage) with your CV, representative projects, and research interests.

Pinned Loading

  1. Tree-GRPO Tree-GRPO Public

    [ICLR 2026] Tree Search for LLM Agent Reinforcement Learning

    Python 357 34

  2. Code2World Code2World Public

    Code2World: A GUI World Model via Renderable Code Generation

    Python 319 17

  3. GPG GPG Public

    [ICLR26]GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

    Python 182 5

  4. DreamX-World DreamX-World Public

    DreamX-World: A General-Purpose Interactive World Model

    Python 239 9

  5. SkillClaw SkillClaw Public

    Let Skills Evolve Collectively with Agentic Evolver

    Python 1.4k 131

  6. FluxText FluxText Public

    Implementation of "FLUX-Text: A Simple and Advanced Diffusion Transformer Baseline for Scene Text Editing"

    Python 451 32

Repositories

Showing 10 of 43 repositories

Top languages

Loading…

Most used topics

Loading…