working in openai environments designing your own · designing your own mike rudd cs 885 guest...
TRANSCRIPT
![Page 1: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/1.jpg)
Working in OpenAI Environments &
Designing Your OwnMike Rudd
CS 885 Guest Lecture
May 18, 2018
![Page 2: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/2.jpg)
OpenAI*
• Not-for-profit, funded by private and corporate donations
• Employ small team of high-caliber researchers/advisors
• Promote research towards safe AGI
*https://openai.com/
![Page 3: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/3.jpg)
OpenAI Gym
• Standard set of environments for evaluating RL agents
• Provide benchmark for most new algorithms
• Extended to more complex problems as solutions improve
*https://openai.com/
![Page 4: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/4.jpg)
Recent Extensions
• Robotics• MuJoCo continuous control
tasks now “easily solvable”
• Harder set of continuouscontrol tasks
• Retro contest• Agents can overfit to their
environment
• Train agent that can transfer skills to new environments
![Page 5: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/5.jpg)
Interacting with the EnvironmentStandardized Code Applicable Across Tasks
![Page 6: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/6.jpg)
Sample Code
![Page 7: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/7.jpg)
Building Your Own EnvironmentPractically more important than beating Gym benchmarks
![Page 8: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/8.jpg)
Building Your Own Environment
• Not very difficult
• Just define a Python class with methods for:• Initialization• Step• Reset• Render
• Existing packages (physics engines) do most of the heavy lifting• Box2D• MuJoCo
![Page 9: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/9.jpg)
Example: Teaching a Car to Self-Park
![Page 10: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/10.jpg)
Challenge of Reward Definition
• Major difficulty is in creating reward function
• Algorithms can learn to exploit gaps in our logic, resulting in undesirable behaviours
• See e.g. Ng et al. (1999) for examples and theoretical analysis
Ng, A. Y., Harada, D., & Russell, S. (1999, June). Policy invariance under reward transformations: Theory and application to reward shaping. In ICML (Vol. 99, pp. 278-287).
![Page 11: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/11.jpg)
Reward Shaping
• Theoretically correct reward is 1 for success and 0 otherwise
• This is sparse though, and in practice is very difficult to learn
• Reward shaping seeks to modify the reward function to speed up learning (with dense signal) but to leave the theoretically optimal policy unchanged
• Ng et al. (1999) show that only shaping function 𝐹 satisfying the following equation would guarantee that the optimal policy is preserved:
𝐹 𝑠, 𝑎, 𝑠′ = 𝛾Φ 𝑠′ −Φ 𝑠 ∀𝑠 ∈ 𝑆\{s0}
![Page 12: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/12.jpg)
![Page 13: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/13.jpg)
![Page 14: Working in OpenAI Environments Designing Your Own · Designing Your Own Mike Rudd CS 885 Guest Lecture May 18, 2018. OpenAI* •Not-for-profit, funded by private ... Building Your](https://reader035.vdocument.in/reader035/viewer/2022062607/6050b311893366065d5f76c1/html5/thumbnails/14.jpg)