Can we solve Gym tasks with only 5-10 "trials"? Yes we can. We propose deployment efficiency as a new metric for RL, counting # of distinct data collection policies in learning. Most prior methods use 100s-1Ms. Joint w/ @Matsuo_Lab @ofirnachum 1/ Tweet added by Shane Gu @shaneguML

Shane Gu

4 years

Can we solve Gym tasks with only 5-10 "trials"? Yes we can. We propose deployment efficiency as a new metric for RL, counting # of distinct data collection policies in learning. Most prior methods use 100s-1Ms. Joint w/ @Matsuo_Lab @ofirnachum 1/

4

57

290

Shane Gu

@shaneguML

4 years

Deployment efficiency matters for real-world RL (e.g. healthcare, robotics, education, dialog systems) where policies likely need to be tested for safety and quality before being deployed for use or learn. It is therefore an important (but previously ignored) measure for RL. 2/

2

12

Shane Gu

@shaneguML

4 years

Our building block, BREMEN, is a simple model-based offline RL method that works even with 10-20x less data (model-free fails). Also check out 2 concurrent model+offline works @svlevine @chelseabfinn @tengyuma and @aravindr93 3/

MOReL : Model-Based Offline Reinforcement Learning

In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment. The ability to train RL policies...

arxiv.org

1

2

15

Shane Gu

@shaneguML

4 years

Congratulations to @__tmats__ @frt03_ for getting this done! And thanks to @ymatsuo for his support! 4/

1

3

Shane Gu

@shaneguML

4 years

Open-source code: 5/

GitHub - matsuolab/BREMEN: Codebase of Deployment-Efficient Reinforcement Learning via Model-Based...

Codebase of Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization (ICLR2021) - matsuolab/BREMEN

github.com

0

1

4

Aravind Rajeswaran @CVPR24

@aravindr93

4 years

@shaneguML @Matsuo_Lab @ofirnachum Might be also worth checking the game-MBRL paper, and in particular the MAL algorithm (). We mainly went for sample complexity, but MAL still solved gym tasks in ~ 15 deployments. Better results possible with hyperparam specialized to min. deployments.

MBRL-Game

Motivation Model-Based RL (MBRL) has received considerable interest due to its potential for sample efficient learning. While recent works have proposed new algorithms and heuristics, an algorithmic...

sites.google.com

1

0

3

Shane Gu

@shaneguML

4 years

@aravindr93 @Matsuo_Lab @ofirnachum Nice! We'll check out those results!

1

0

Shane Gu

@shaneguML

4 years

@le_roux_nicolas @Matsuo_Lab @ofirnachum @CriteoAILab Thanks for the reference!! I actually remember reading this paper :) We took model-based approach, but indeed PoWER, RWR, AWR-like algorithms are also likely suited for good offline and deployment-efficient performance. Hope to see more work in this direction!

0

Aravind Rajeswaran @CVPR24

@aravindr93

4 years

@shaneguML @Matsuo_Lab @ofirnachum Nice work Shane! Number of policy changes is an important metric. We also argue in our MOReL paper that one main advantage to offline RL is to view standard RL as a sequence of batch problems.

0

1

陸より、空海の可能性 🎵三世界幸せの調和成し遂げる❗

@DCTL7X1hPj3xxo7

4 years

@shaneguML @Matsuo_Lab @ofirnachum 素敵です

0

Replies