Multi-Task Interactive Robot Fleet Learning with Visual World Models

CoRL 2024 Anonymous Submission

Abstract

Recent advancements in large multi-task robot agents hold promise of robot fleets operating in household and industrial settings, capable of performing diverse tasks across various environments. Despite these advances, these robots often struggle with generalization and robustness in real-world, unstructured settings, limiting their practical deployment. To address these issues, we propose Firework, a framework for multi-task interactive robot fleet learning, supervising the continual policy update throughout deployment with efficient multi-task runtime monitoring and humans in the loop. We realize this with a visual world model to predict future task outcomes, which serves as the backbone for learned representation across downstream error prediction tasks. We train dynamic error predictors that auto-adjust their prediction criterias to adapt to the robot's evolving autonomy, reducing human workload over time. Evaluations on large-scale benchmarks demonstrate our framework's effectiveness in improving multi-task policy performance and monitoring accuracy. We demonstrate the performance of Firework on robocasa in simulation and mutex in real world, two large, diverse multi-task environment benchmarks.

Video