Skip to content

AgentR1/Claw-R1

Repository files navigation

Claw-R1: The Data Foundation for
Agentic Reinforcement Learning

Project Home GitHub Repo stars GitHub forks Docs

Claw-R1 Logo

News

  • [2026.03.06] 📖 Claw-R1 Documentation Released. Project page and documentation are now available at Claw-R1 Project Page and Claw-R1 docs.

  • [2026.03.03] 🚧 Claw-R1 Project Init. We are actively developing the framework. Stay tuned for more features and documentation.

Overview

The Agentic RL ecosystem is thriving — frameworks like verl, Agent-R1, and MiniMax Forge have made remarkable progress in RL runtime and training algorithms. Meanwhile, General Agents (e.g., OpenClaw, Claude Code, Open Code) are producing interaction data that is far richer and more complex than traditional ReAct trajectories.

As agents grow more capable, a critical question emerges: How do we systematically collect, evaluate, and curate high-quality training data from diverse agent interactions? This is a relatively under-explored yet important direction — especially when human feedback is available as a natural quality signal.

Claw-R1 provides the data foundation for Agentic RL. It introduces a Middleware Layer (Gateway + DataPool) between the Agent Side and the Training Side, focusing on data collection, evaluation, and curation rather than training algorithms themselves.

Claw-R1 Framework

Key Features

  • Universal Data Collection: White-box agents submit Steps via API; black-box agents integrate by simply pointing base_url to the Gateway (zero code changes); online services collect data from live user interactions in real-time.

  • Data Evaluation & Curation: Multi-dimensional reward system (rule-based / discriminative RM / generative RM), human feedback signal integration, policy version tracking for freshness-aware curation, and channel-based data partitioning.

  • Flexible Data Serving: Pluggable TrainingBackend to convert curated data into any training engine's native format, with GRPO-aware grouping, train/val channel isolation, and real-time monitoring.

Get Started

Roadmap

  • Data Quality Dashboard: Visual monitoring of data quality metrics, reward distributions, and collection statistics.
  • Human Feedback Pipeline: Structured pipeline for capturing and integrating explicit and implicit human feedback signals from online agent services.
  • Dataset Export & Versioning: Export curated datasets with full provenance tracking for reproducibility and sharing.
  • Extended TrainingBackend Support: Native adapters for additional RL frameworks beyond verl.

Contributors

Team Members: Daoyu Wang, Jie Ouyang, Shuo Yu

Supervisors: Qi Liu, Mingyue Cheng

Affiliation: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China

Acknowledgements

We extend our gratitude to Agent-R1, MiniMax Forge, verl, and rLLM for their pioneering work on Agentic RL training infrastructure. We also thank OpenClaw for their remarkable work on personal AI assistants. We are grateful to the broader Agentic RL community and all contributors for their support.

Citation

@misc{clawr1-2026,
  title={Claw-R1: The Data Foundation for Agentic Reinforcement Learning},
  author={Wang, Daoyu and Ouyang, Jie and Yu, Shuo and Cheng, Mingyue and Liu, Qi},
  year={2025},
  howpublished={\url{https://github.com/AgentR1/Claw-R1}},
  note={GitHub repository}
}

About

Claw-R1: Empowering OpenClaw with Advanced Agentic RL.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages