AMAP-ML

Foundation AI for Spatial and Generative Intelligence

Building product-facing foundation AI systems for maps, mobility, generative assets, and interactive worlds.
AMAP-ML is an AI team at Alibaba AMAP. We connect research, engineering, and real-world deployment across spatial intelligence, generative intelligence, reasoning agents, reinforcement learning, multimodal models, world models, and product-scale AIGC systems.

Our work spans production systems, open-source projects, benchmarks, and publications at ICLR, CVPR, ACL, AAAI, SIGGRAPH, ICCV, ICML, EMNLP, ACM MM, and WWW. We release code and evaluation assets to help the community reproduce, compare, and extend our work.

Join us | Research interns, full-time researchers, and AI engineers in LLM agents, reinforcement learning, world models, multimodal learning, spatial intelligence, and generative AI are welcome to get in touch.

Highlights

What We Build

AMAP-ML builds foundation AI capabilities around two AMAP product anchors:

Product anchor	What it means
Spatial intelligence	AI agents and models that understand, reason, plan, and act in real-world map, mobility, urban, autonomous-driving, and local-service scenarios.
Generative intelligence	AIGC systems that create, edit, evaluate, and control visual assets, videos, 3D scenes, media content, and interactive experiences for map-native and local-service scenarios.

These product anchors are powered by a shared technical stack:

Capability	Role
Reasoning agents and reinforcement learning	Train agents and models to reason, use tools, plan, self-reflect, and improve through feedback.
World models and interactive AI	Simulate future states, dynamic transitions, GUI environments, autonomous-driving scenes, and interactive 4D worlds.
Multimodal understanding	Connect vision, language, maps, videos, GUIs, urban scenes, user intent, and product signals.
Generative modeling	Produce and edit visual assets with controllability, quality, spatial consistency, temporal coherence, and product usability.
Benchmarks and evaluation	Turn real AMAP scenarios into reusable public tasks, metrics, datasets, and reproducible evaluation suites.

Flagship Releases

These projects balance narrative fit with public adoption signals such as GitHub stars. They are organized around the two product anchors and the core AI systems that support them.

High-Signal Releases

Project	What it represents	Signal
SkillClaw	Agentic skill evolution from real interaction traces.	Agent system / high-star repo
FluxText	Scene-text editing for controllable visual asset generation.	Generative AI / high-star repo
Code2World	GUI world model via renderable code generation.	World model / high-star repo
Tree-GRPO	Tree-search rollouts for LLM agent reinforcement learning.	RL method / high-star repo
GPG	Minimalist group policy gradient baseline for model reasoning.	RL method / framework adoption
MobilityBench	Route-planning agent evaluation grounded in real-world mobility scenarios.	Spatial intelligence anchor

Spatial Intelligence

Project	What it represents	Artifact
MobilityBench	Route-planning agent evaluation grounded in real-world mobility scenarios.	Benchmark
Thinking-with-Map	Map-augmented reasoning agent for geolocalization.	Agent / ACL 2026 Findings
AutoDrive-R2	Reasoning and self-reflection for VLA models in autonomous driving.	VLA / ICLR 2026
SpatialGenEval	Spatial intelligence evaluation for text-to-image models.	Benchmark / ICLR 2026

Generative Intelligence

Project	What it represents	Artifact
FluxText	Scene-text editing for controllable visual asset generation.	Generative AI
FE2E	Editing-prior foundation model for dense geometry estimation.	Generative vision / CVPR 2026
Omni-Effects	Prompt-guided and spatially controllable visual effects generation.	Generative AI / AAAI 2026
RL3DEdit	Geometry-guided RL for multi-view consistent 3D scene editing.	3D editing / CVPR 2026
MACE-Dance	Motion-appearance cascaded generation for music-driven dance video.	Video generation / SIGGRAPH 2026

Core AI Systems

Project	What it represents	Artifact
Code2World	GUI world model via renderable code generation.	World model
DreamX-World	Interactive world model for controllable world simulation.	World model
Tree-GRPO	Tree-search rollouts for LLM agent reinforcement learning.	Method / ICLR 2026
GPG	Minimalist group policy gradient baseline for model reasoning.	Method / ICLR 2026
SkillClaw	Agentic skill evolution for reusable skill libraries.	Agent system
CoEvolve	Agent-data mutual evolution for training LLM agents.	Agent RL / ACL 2026

Recent Updates

2026.05.18 MobilityBench is accepted by KDD 2026 -- A Scalable Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios.
2026.05.12 CoEvolve trains LLM agents through agent-data mutual evolution, using failure signals to synthesize harder tasks as the agent improves (ACL 2026).
2026.05.12 Thinking-with-Map strengthens geolocalization with a reinforced parallel map-augmented reasoning agent (ACL 2026 Findings).
2026.05.11 DreamX-World released the 5B-Cam model and inference code for interactive world simulation.
2026.04.22 DCW mitigates SNR-t bias and improves diffusion generation quality across model families (CVPR 2026).
2026.04.22 EMF extends efficient one-step generation from class-conditioned synthesis to text-conditioned image generation (CVPR 2026).
2026.04.10 SkillClaw turns real interaction traces into reusable, evolving skill libraries.
2026.04.01 MACE-Dance decouples motion generation and appearance synthesis for high-quality music-driven dance video (SIGGRAPH 2026).
2026.03.23 Omni-WorldBench evaluates world models in dynamic 4D interactive settings.
2026.03.20 AutoDrive-R2 improves VLA models with reasoning and self-reflection for autonomous driving scenarios (ICLR 2026).
2026.03.18 Video-STAR uses tool-augmented reinforcement learning for open-vocabulary action recognition in video (ICLR 2026).
2026.03.11 RL3DEdit uses geometry-guided reinforcement learning to make 3D scene edits more multi-view consistent (CVPR 2026).

Earlier Updates

2026.03.01 FE2E transfers image-editing priors into dense depth and normal estimation (CVPR 2026).
2026.02.28 FASA improves sparse decoding with frequency-aware attention (ICLR 2026).
2026.02.27 Eevee provides high-resolution data and evaluation for video-based virtual try-on (CVPR 2026 Findings).
2026.02.06 MobilityBench evaluates route-planning agents in real-world mobility scenarios (KDD 2026).
2026.02.06 SpatialGenEval benchmarks spatial intelligence in text-to-image models (ICLR 2026).
2026.02.06 Tree-GRPO replaces independent chain rollouts with tree-search rollouts for LLM agent reinforcement learning (ICLR 2026).
2026.02.04 Code2World predicts GUI transitions through renderable code generation.
2026.02.04 GPG provides a simple group policy gradient baseline for model reasoning (ICLR 2026).
2025.10.22 Taming-Hallucinations reduces MLLM video hallucinations with counterfactual video generation.
2025.06.20 FluxText provides a diffusion transformer baseline for scene-text editing.

Project Map

Spatial Intelligence and Mobility

Repository	Contribution	Venue
MobilityBench	Route-planning agent evaluation in real-world mobility scenarios.	KDD 2026
Thinking-with-Map	Map-augmented geolocalization agent trained with reinforcement learning.	ACL 2026 Findings
AutoDrive-R2	Reasoning and self-reflection for VLA models in autonomous driving.	ICLR 2026
SpatialGenEval	Spatial intelligence evaluation for text-to-image models.	ICLR 2026
SocioReasoner	Vision-language reasoning for urban socio-semantic segmentation.	ICLR 2026
DSFNet	Multi-scenario route ranking with a public industrial driving-route dataset and AMAP deployment.	WWW 2025
IntTravel	Real-world dataset and generative framework for integrated multi-task travel recommendation.	arXiv 2026

Reasoning Agents and Reinforcement Learning

Repository	Contribution	Venue
SkillClaw	Agentic evolver for collective skill library improvement.	-
Tree-GRPO	Tree-search rollouts for LLM agent reinforcement learning.	ICLR 2026
GPG	Simple and strong group policy gradient baseline for model reasoning.	ICLR 2026
CoEvolve	Agent-data mutual evolution for training LLM agents.	ACL 2026
MathForge	Difficulty-aware GRPO and multi-aspect reformulation for math reasoning.	ICLR 2026
Video-STAR	Tool-using reinforcement learning for open-vocabulary action recognition.	ICLR 2026
FASA	Frequency-aware sparse attention for efficient sparse decoding.	ICLR 2026

World Models and Interactive AI

Repository	Contribution	Venue
DreamX-World	General-purpose world model for interactive world simulation.	-
Code2World	GUI world model via renderable code generation.	-
Omni-WorldBench	Benchmark for interactive response capabilities of world models.	arXiv 2026

Multimodal and Vision-Language Models

Repository	Contribution	Venue
UniVG-R1	Reasoning-guided universal visual grounding with reinforcement learning.	CVPR 2026
RealQA	Realistic image quality and aesthetic scoring with multimodal LLMs.	-
Taming-Hallucinations	Counterfactual video generation for reducing MLLM video hallucinations.	-

Generative Intelligence

Repository	Contribution	Venue
FluxText	Diffusion transformer baseline for scene-text editing.	-
FE2E	Image editing priors for depth and normal estimation.	CVPR 2026
RL3DEdit	Geometry-guided reinforcement learning for multi-view consistent 3D scene editing.	CVPR 2026
MACE-Dance	Motion-appearance cascaded generation for music-driven dance video.	SIGGRAPH 2026
Omni-Effects	Prompt-guided and spatially controllable composite visual effects generation.	AAAI 2026
S2-Guidance	Training-free stochastic self-guidance for diffusion models.	ICLR 2026
EPG	Pixel-space generative modeling via self-supervised pre-training.	ICLR 2026
USP	Unified self-supervised pretraining in VAE space for diffusion models.	ICCV 2025
EMF	Text-conditioned one-step image generation.	CVPR 2026
DCW	Differential correction for SNR-t bias in diffusion probabilistic models.	CVPR 2026
NarrLV	Narrative-centric evaluation for long video generation models.	ICLR 2026
ImagerySearch	Adaptive test-time search for video generation.	AAAI 2026
Eevee	High-resolution benchmark for video-based virtual try-on.	CVPR 2026 Findings
VMBench	Perception-aligned benchmark for video motion generation.	ICCV 2025

For Collaborators and Applicants

We are actively looking for people who enjoy building strong AI systems: clean code, reproducible experiments, rigorous evaluation, ambitious problem selection, and real-world product impact.

If you are interested in research internships, full-time roles, or academic collaboration, please email cxxgtxy@gmail.com (homepage) with your CV, representative projects, and research interests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMAP-ML

AMAP-ML

Foundation AI for Spatial and Generative Intelligence

Highlights

What We Build

Flagship Releases

High-Signal Releases

Spatial Intelligence

Generative Intelligence

Core AI Systems

Recent Updates

Project Map

Spatial Intelligence and Mobility

Reasoning Agents and Reinforcement Learning

World Models and Interactive AI

Multimodal and Vision-Language Models

Generative Intelligence

For Collaborators and Applicants

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!