-
[2026.03.06] 📖 Claw-R1 Documentation Released. Project page and documentation are now available at Claw-R1 Project Page and Claw-R1 docs.
-
[2026.03.03] 🚧 Claw-R1 Project Init. We are actively developing the framework. Stay tuned for more features and documentation.
The Agentic RL ecosystem is thriving — frameworks like verl, Agent-R1, and MiniMax Forge have made remarkable progress in RL runtime and training algorithms. Meanwhile, General Agents (e.g., OpenClaw, Claude Code, Open Code) are producing interaction data that is far richer and more complex than traditional ReAct trajectories.
As agents grow more capable, a critical question emerges: How do we systematically collect, evaluate, and curate high-quality training data from diverse agent interactions? This is a relatively under-explored yet important direction — especially when human feedback is available as a natural quality signal.
Claw-R1 provides the data foundation for Agentic RL. It introduces a Middleware Layer (Gateway + DataPool) between the Agent Side and the Training Side, focusing on data collection, evaluation, and curation rather than training algorithms themselves.
-
Universal Data Collection: White-box agents submit Steps via API; black-box agents integrate by simply pointing
base_urlto the Gateway (zero code changes); online services collect data from live user interactions in real-time. -
Data Evaluation & Curation: Multi-dimensional reward system (rule-based / discriminative RM / generative RM), human feedback signal integration, policy version tracking for freshness-aware curation, and channel-based data partitioning.
-
Flexible Data Serving: Pluggable
TrainingBackendto convert curated data into any training engine's native format, with GRPO-aware grouping, train/val channel isolation, and real-time monitoring.
- Data Quality Dashboard: Visual monitoring of data quality metrics, reward distributions, and collection statistics.
- Human Feedback Pipeline: Structured pipeline for capturing and integrating explicit and implicit human feedback signals from online agent services.
- Dataset Export & Versioning: Export curated datasets with full provenance tracking for reproducibility and sharing.
- Extended TrainingBackend Support: Native adapters for additional RL frameworks beyond verl.
Team Members: Daoyu Wang, Jie Ouyang, Shuo Yu
Supervisors: Qi Liu, Mingyue Cheng
Affiliation: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China
We extend our gratitude to Agent-R1, MiniMax Forge, verl, and rLLM for their pioneering work on Agentic RL training infrastructure. We also thank OpenClaw for their remarkable work on personal AI assistants. We are grateful to the broader Agentic RL community and all contributors for their support.
@misc{clawr1-2026,
title={Claw-R1: The Data Foundation for Agentic Reinforcement Learning},
author={Wang, Daoyu and Ouyang, Jie and Yu, Shuo and Cheng, Mingyue and Liu, Qi},
year={2025},
howpublished={\url{https://github.com/AgentR1/Claw-R1}},
note={GitHub repository}
}
