OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation

OKAMI enables a humanoid robot to imitate manipulation skills from a single human video demonstration.

Tasks

salt_human_demo.mp4

Sprinkle-salt

plush_toy_human_demo.mp4

Plush-toy-in-basket

snack_human_demo.mp4

Place-snacks-on-plate

close_laptop_human_demo.mp4

Close-the-laptop

close_drawer_human_demo.mp4

Close-the-drawer

bagging_human_demo.mp4

Bagging

Robot Rollout

The systematic generalization includes visual backgorunds, camera angles, spatial layouts, and new object instances. Note that the camera angle generalization is inherently entailed in our pipeline, as the camera extrinsics of video demonstration is different from the one during rollout.

Generalization: Visual Backgrounds

close_drawer_human_demo.mp4

Human Video

close_drawer_green_tablecloth.mov

Deploy on a different table with a green tablecloth

close_drawer_cabinet.mov

Deploy near a cabinet

snack_human_demo.mp4

Human Video

snack_different_environment.mp4

Deploy on a kitchen table

snack_kitchen.mp4

Deploy on a kitchen table

salt_human_demo.mp4

Human Video

salt_kitchen_sink.mp4

Deploy near the water sink in kitchen

salt_frontview_kitchen.mp4

Deploy on a kitchen table

Generalization: Spatial Layouts

snack_human_demo.mp4

Human Video

close_laptop_human_demo.mp4

Human Video

bagging_human_demo.mp4

Human Video

bagging_rollout.mp4

Deploy when objects are at different heights

Generalization: New Object Instances

plush_toy_human_demo.mp4

Human Video

snack_human_demo.mp4

Human Video

Robust to Different Demonstrators

OKAMI allows the humanoid robot to imitate from videos demonstrated by different users, where the ways to complete a task differ. This also shows that our method is robust to users with different demographics.

close_laptop_human_demo.mp4

Human Video: Close Laptop Using Left Hand

close_laptop_outdoor_trimmed.mp4

Robot Rollout Video

demonstrator_1_close_laptop.mp4

Human Video: Close Laptop Using Right Hand

close_laptop_office.mp4

Robot Rollout Video

Failure Modes

OKAMI's policies may fail to grasp objects due to inaccuracies in controllers and the human reconstruction model, or fail to complete tasks because of unwanted collisions, undesired upper body rotations, or inaccuracy in solving inverse kinematics.

fail_to_grasp.mov

Failed to grasp the bottle due to the inaccurate reconstruction of wrist pose

fail_to_complete1.MP4

Failed to complete the task due to inaccurate inverse kinematics results

fail_to_complete2.MP4

Failed to complete the task due to unwanted collision between the index finger and the drawer, and unwanted body rotation