Weâve developed Random Network Distillation (RND), a prediction-based method for encouraging reinforcement learning agents to explore their environments through curiosity, which for the first time[1] exceeds average human performance on Montezumaâs Revenge. RND achieves state-of-the-art performance, periodically finds all 24 rooms and solves the first level without using demonstrations or having a
{{#tags}}- {{label}}
{{/tags}}