Claw-R1: The Data Foundation for
Agentic Reinforcement Learning

News

[2026.03.06] 📖 Claw-R1 Documentation Released. Project page and documentation are now available at Claw-R1 Project Page and Claw-R1 docs.
[2026.03.03] 🚧 Claw-R1 Project Init. We are actively developing the framework. Stay tuned for more features and documentation.

Overview

The Agentic RL ecosystem is thriving — frameworks like verl, Agent-R1, and MiniMax Forge have made remarkable progress in RL runtime and training algorithms. Meanwhile, General Agents (e.g., OpenClaw, Claude Code, Open Code) are producing interaction data that is far richer and more complex than traditional ReAct trajectories.

As agents grow more capable, a critical question emerges: How do we systematically collect, evaluate, and curate high-quality training data from diverse agent interactions? This is a relatively under-explored yet important direction — especially when human feedback is available as a natural quality signal.

Claw-R1 provides the data foundation for Agentic RL. It introduces a Middleware Layer (Gateway + DataPool) between the Agent Side and the Training Side, focusing on data collection, evaluation, and curation rather than training algorithms themselves.

Key Features

Universal Data Collection: White-box agents submit Steps via API; black-box agents integrate by simply pointing base_url to the Gateway (zero code changes); online services collect data from live user interactions in real-time.
Data Evaluation & Curation: Multi-dimensional reward system (rule-based / discriminative RM / generative RM), human feedback signal integration, policy version tracking for freshness-aware curation, and channel-based data partitioning.
Flexible Data Serving: Pluggable TrainingBackend to convert curated data into any training engine's native format, with GRPO-aware grouping, train/val channel isolation, and real-time monitoring.

Get Started

Roadmap

Data Quality Dashboard: Visual monitoring of data quality metrics, reward distributions, and collection statistics.
Human Feedback Pipeline: Structured pipeline for capturing and integrating explicit and implicit human feedback signals from online agent services.
Dataset Export & Versioning: Export curated datasets with full provenance tracking for reproducibility and sharing.
Extended TrainingBackend Support: Native adapters for additional RL frameworks beyond verl.

Contributors

Team Members: Daoyu Wang, Jie Ouyang, Shuo Yu

Supervisors: Qi Liu, Mingyue Cheng

Affiliation: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China

Acknowledgements

We extend our gratitude to Agent-R1, MiniMax Forge, verl, and rLLM for their pioneering work on Agentic RL training infrastructure. We also thank OpenClaw for their remarkable work on personal AI assistants. We are grateful to the broader Agentic RL community and all contributors for their support.

Citation

@misc{clawr1-2026,
  title={Claw-R1: The Data Foundation for Agentic Reinforcement Learning},
  author={Wang, Daoyu and Ouyang, Jie and Yu, Shuo and Cheng, Mingyue and Liu, Qi},
  year={2025},
  howpublished={\url{https://github.com/AgentR1/Claw-R1}},
  note={GitHub repository}
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
assets		assets
claw_r1		claw_r1
docs		docs
example		example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENCE		LICENCE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claw-R1: The Data Foundation for
Agentic Reinforcement Learning

News

Overview

Key Features

Get Started

Roadmap

Contributors

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Languages

Folders and files

Latest commit

History

Repository files navigation

Claw-R1: The Data Foundation for Agentic Reinforcement Learning

News

Overview

Key Features

Get Started

Roadmap

Contributors

Acknowledgements

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Languages

Claw-R1: The Data Foundation for
Agentic Reinforcement Learning

Packages