human-level control through deep reinforcement...

57
Human-level control through deep reinforcement learning Byeong-Sun Hong 2018-12-31 Computer Graphics @ Korea University Copyright of figures and other materials in the paper belongs to original authors. Volodymyr Mnih Google DeepMind Nature 2015

Upload: others

Post on 24-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Human-level control through deep reinforcement learning

Byeong-Sun Hong

2018-12-31

Computer Graphics @ Korea University

Copyright of figures and other materials in the paper belongs to original authors.

Volodymyr MnihGoogle DeepMindNature 2015

Page 2: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 2Computer Graphics @ Korea University

Index

• Introduction

• Background

▪ Reinforcement Learning

• Related Work

• Key Idea

• Method

• Result

• Conclusion

Page 3: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Introduction

Page 4: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 4Computer Graphics @ Korea University

Introduction

• 강화학습은 Agent가 환경에서의 행동을 어떻게 최적화 할 수있는지 제공한다.

• 많은 강화학습 알고리즘

▪ 사람의 도움이 필요한 도메인 또는 간단한 State 정보만 가진도메인에 대해서만 성공적

▪ Real World의 복잡한 환경에 대한 최적화는 여전히 어려운 문제

Page 5: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 5Computer Graphics @ Korea University

Contribution

• Deep Learning과 강화학습의 접목

▪ 고차원의 State 정보도 해석 가능

• Experience Replay(Replay Memory)

▪ Data Correlation 문제 해결

• Target-Q Network를 고정

▪ Network 불안정성 해결

Page 6: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 6Computer Graphics @ Korea University

Over View

Page 7: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Background

Page 8: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 8Computer Graphics @ Korea University

Reinforcement Learning

Machine Learning의 분류

• Machine Learning

출처 : Google Image

Page 9: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 9Computer Graphics @ Korea University

Reinforcement Learning

Environment / Agent

Environment Agent출처 : Google Image

Page 10: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 10Computer Graphics @ Korea University

Reinforcement Learning

RL의 적용과 모델링

• Reinforcement Learning을 적용하는 문제

▪ 순차적 결정문제를 해결할 때 사용

▪ 순차적 결정 문제를 RL을 통해 해결하기 위해선 수학적으로 모델링할 필요가 있다. (MDP로 모델링)

• MDP(Markov Decision Process)

▪ Markov 속성을 만족하면서 필요한 요소를 가진다.

• Markov 속성 - 미래상태의 결과가 과거상태와는 독립적으로현재상태에 의해서만 결정

• 필요한 요소 - State, Action, Reward, Policy

Page 11: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 11Computer Graphics @ Korea University

Reinforcement Learning

State / Action / Reward

• State

• Action

• Reward

• Discount Factor

출처 : Google Image

Page 12: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 12Computer Graphics @ Korea University

Reinforcement Learning

Policy

• Policy

▪ Agent가 모든 상태에 대해 해야 할 행동

• 강화학습의 최종 목표

▪ 최고의 Reward를 가지는최적의 Policy를 찾는 것!

출처 : Google Image

Page 13: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 13Computer Graphics @ Korea University

Reinforcement Learning

Value Function

• Value Function

▪ 앞으로 받을 것으로 예상되는 보상 총합

▪ State-Value Function, Action-Value Function이 있음

• State-Value Function

• Action-Value Function(Q-Function)

𝑅 = Reward𝛾 = Discount factor

Page 14: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 14Computer Graphics @ Korea University

Reinforcement Learning

RL-Road Map

출처 : Google Image

On-Policy Off-Policy

Q-Network

Page 15: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 15Computer Graphics @ Korea University

Reinforcement Learning

Model-Based vs Model-Free

• Planning / Learning

▪ Model Based = 환경의 Model을 정확히 알고 있을 때 계산을 통해문제를 해결 -> Planning – State Value Function

▪ Model Free = 환경의 Model을 모르지만 상호작용을 통해 학습을하며 문제를 해결 -> Learning – Action Value Function

출처 : Google Image

Page 16: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 16Computer Graphics @ Korea University

Reinforcement Learning

Model-Free RL

• Monte-Carlo Control

▪ 한 Episode가 끝날 때까지 진행한 뒤 Update

• Temporal-Difference Control

▪ Step 한번만 한 뒤에 Update

출처 : Google Image

Page 17: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 17Computer Graphics @ Korea University

Reinforcement Learning

On-Policy vs Off-Policy

• On Policy - 현재 움직이고 있는 Policy와 Improve하는 Policy가 같다

▪ SARSA -

• Off Policy – 현재 움직이고 있는 Policy와 Improve하는 Policy가 분리

▪ Q-learning -

출처 : Google Image

SARSA Q-Learning

Page 18: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 18Computer Graphics @ Korea University

• 지금까지 했던 방법들은 매우 간단한 환경(Gridworld)같은 곳에서만사용 가능

▪ 환경이 복잡해지면 각각의 Q값을 Update 해야 하는 문제(너무 많음)

▪ Q값을 바로 업데이트 하는 것이 아니라 w라는 Parameter를 추가하여w를 Update

Reinforcement Learning

Q-Network

출처 : Google Image

SARSA Q-Learning Q-Network

Page 19: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Related Work

Page 20: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 20Computer Graphics @ Korea University

Related Work

Reinforcement Learning

Temporal Difference Learning and TD-Gammon

[Gerald Tesauro /ACM 1995]

Reinforcement learning for robot soccer

[Martin Riedmiller et al. / Action Robot 2009]

An Object-Oriented Representation for Efficient

Reinforcement Learning[Carlos Diuk et al. /ICML 2008]

Page 21: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 21Computer Graphics @ Korea University

Related Work

Atari Game Learning

• The Arcade Learning Environment: An Evaluation Platform for General Agents

▪ [Marc G. Bellemare(Univ. of Alberta) et al. / AIR 2013]

Page 22: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 22Computer Graphics @ Korea University

Related Work

Convolutional Neural Network

• ImageNet Classification with Deep Convolutional Neural Networks

▪ [Krizhevsky A. (Univ. of Toronto) / ILSVRC 2012]

Page 23: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Key Idea

Page 24: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 24Computer Graphics @ Korea University

이전의 Limitation_1

• Correlation between samples

Page 25: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 25Computer Graphics @ Korea University

Key Idea_1

• Experience Replay(Replay Memory)

state, action, reward, next_state

state, action, reward, next_state

state, action, reward, next_state

state, action, reward, next_state

state, action, reward, next_state

state, action, reward, next_state

state, action, reward, next_state

state, action, reward, next_state

state, action, reward, next_state

state, action, reward, next_state

Update

Page 26: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 26Computer Graphics @ Korea University

이전의 Limitation_2

• Non-stationary Target

▪ Update 할 때, 자기 자신이 Target이 됨으로 학습이 매우 불안정

Main Network

Loss 계산Weight Update

Reward와다음 State에서

최고의 Q값의 합

Weight를 통해나온 Q값

𝑟 = Reward𝛾 = Discount factor

𝑠 = State𝑎 = Action

𝜃 = Weight𝑖 = Iteration

𝑄 = Q-value Function

Loss =

Page 27: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 27Computer Graphics @ Korea University

Key Idea_2

• Fixed Q-target

Main Network

Loss 계산Weight UpdateWeight를 통해

나온 Q값

Main Network

Loss 계산Weight Update

Weight를 통해나온 Q값

Target Network

일정 시간마다 Weight 복사

Reward와다음 State에서

최고의 Q값의 합

Reward와다음 State에서

최고의 Q값의 합

Page 28: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 28Computer Graphics @ Korea University

Key Idea_3

• Convolutional Neural Network

▪ Deep Learning의 발전에 따라 커다란 Input Data를 받아서 분석 할수 있게 되었음

▪ Low Level의 State를 받아오던 이전과는 달리 화면 전체의 Pixel을State로 받아오는 것이 가능

Action

Conv. laye

r

Conv. laye

r

Conv. laye

r

Page 29: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Method

Page 30: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 30Computer Graphics @ Korea University

Method

Data Pre-Processing (1/2)

210x160 pixel128 color

84x84 pixelGray

Page 31: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 31Computer Graphics @ Korea University

Method

Data Pre-Processing (2/2)

s1 s2

s3 s4 s5

State_1 = {s1, s2, s3, s4}State_2 = {s2, s3, s4, s5}State_3 = {s3, s4, s5, s6}

Page 32: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 32Computer Graphics @ Korea University

Method

Model architecture

Page 33: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 33Computer Graphics @ Korea University

Method

Algorithm

• Model free

▪ Action Value Function 사용

• Off-policy

▪ Q-Learning Algorithm 사용

• 𝜀-greedy 방식사용

▪ 𝜀 = 0~1 사이의 수 설정

▪ 0~1 사이의 수를 Random으로 뽑아 𝜀 보다 크면Max Q값을 가지는 Action, 작으면 Random Action 선택

• Loss Function

𝑟 = Reward𝛾 = Discount factor

Loss =

𝑠 = State𝑎 = Action

𝜃 = Weight𝑖 = Iteration

𝑄 = Q-value Function

Page 34: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 34Computer Graphics @ Korea University

Method

Training Details (1/3)

• 49개의 Atari 2600 Game들을 Same Network Architecture로학습시켰다.

• 전체 학습기간 5000만 Frame (38일 정도 소요)

• 여러 목숨을 가진 게임은 마지막 생명이 죽을 때 Episode의 끝으로표현

• Hyper Parameter 및 Optimization Parameter 값은 5게임만 가지고찾았다. (Pong, Breakout, Seaquest, Space Invaders, Beam Rider)

▪ 모든 게임을 고려하자면 너무 많은 계산이 필요

▪ 다른 게임들은 Extended Data Table1 대로 고정되어있다.

Page 35: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 35Computer Graphics @ Korea University

Method

Training Details (2/3)

• Optimization(Gradient Descent – RMSProp 사용)

• Policy = 𝜀-greedy 방식 사용

▪ 𝜀 값을 1.0에서 100만 Frame 까지 0.1로 줄이고 0.1로 고정

• Replay Memory = 최근의 100만 Frame 저장

• Reward Clipping

▪ 여러 게임을 같은 Model로 학습하기 위해

• Positive = +1 / Negative = -1 / Else = 0

𝜃𝐽(𝜃𝑡)𝜂𝜀

= Weight= Loss= Learning rate= 분모가 0이 되는 것 방지

Page 36: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 36Computer Graphics @ Korea University

Method

Training Details (3/3)

Page 37: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 37Computer Graphics @ Korea University

Method

Training Algorithm for Deep Q-networks

Page 38: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 38Computer Graphics @ Korea University

Method

Evaluation Procedure

• Professional DQN Agent vs Human Tester

▪ DQN Agent 제약 조건

• 𝜀 = 0.05로 설정

• 10Hz 마다 Action을 취할 수 있음

▪ 10Hz는 사람이 가장 빨리 Button을 누를 수 있는 시간

• 5분간 게임 30 Episode의 평균 값을 결과로 사용

▪ Professional Human Tester 제약 조건

• 일시 중지, 저장, 로드 불가

• 오디오 출력 비활성화 – 감각입력이 시각만 있도록

• 2시간 연습 후 5분 게임 20 Episode의 평균 값을 결과로 사용

Page 39: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Result

Page 40: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 40Computer Graphics @ Korea University

Result

Comparison with Others (1/3)

Page 41: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 41Computer Graphics @ Korea University

Result

Comparison with Others (2/3)

Page 42: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 42Computer Graphics @ Korea University

Result

Comparison with Others (3/3)

Page 43: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 43Computer Graphics @ Korea University

Result

Best / Worst Agent Game

Montezuma’s Revenge GravitarPrivate Eye

Boxing BreakoutVideo Pinball

Page 44: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 44Computer Graphics @ Korea University

Result

게임에서 Value 변화

Page 45: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 45Computer Graphics @ Korea University

Result

Training에 따른 결과

Space Invaders Seaquest

Page 46: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 46Computer Graphics @ Korea University

Result

Replay, Target Q 적용 vs 미 적용

Page 47: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 47Computer Graphics @ Korea University

Result

DQN vs Linear

Page 48: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 48Computer Graphics @ Korea University

Result

T-SNE

Page 49: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 49Computer Graphics @ Korea University

Result

T-SNE

Human

DQN

Page 50: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 50Computer Graphics @ Korea University

Result

Breakout Video

Page 51: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 51Computer Graphics @ Korea University

• 하나의 Architecture로 여러 환경에서 적응할 수 있는 것을 증명

▪ Pixel 값과 Score만 받아서 같은 Algorithm, Network Architecture, hyperparameter들로 학습

• Experience Replay와 Target-Q를 고정함으로 학습을 안정화 시켜서사람과 비슷하거나 사람보다 뛰어난 Agent 생성

Conclusion

Page 52: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Tae-hyeong Kim | 2012. 10. 29 | # 52Computer Graphics @ Korea University

Q & A

Page 53: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 53Computer Graphics @ Korea University

Appendix

RMSProp

• RMSprop

▪ Gradient Descent 방법 중 하나

▪ Adagrad의 step size가 작아지는 것을 보완하기 위해 만들어짐

𝜃𝐽(𝜃𝑡)𝜂𝜀

= Weight= Loss= Learning rate= 분모가 0이 되는 것 방지

Page 54: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 54Computer Graphics @ Korea University

Appendix

Linear Approximation (1/3)

• BASIC

▪ NTSC(National Television System Committee) 방식 사용

▪ 128가지의 색상 사용

• BASS

▪ SECAM(Séquentiel couleur avec mémoire, sequential color with memory) 방식 사용

▪ 8가지 색상으로 줄이면서 각 색상 사이의 interaction 계산

Page 55: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 55Computer Graphics @ Korea University

Appendix

Linear Approximation (2/3)

• DISCO

▪ 나오는 Object에 따라 Class를 나누고 Class 간의 interaction 체크

Page 56: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 56Computer Graphics @ Korea University

Appendix

Linear Approximation (3/3)

• LSH(Locality Sensitive Hashing)

▪ Input 화면은 작게 잘라서 hashed down

• RAM

▪ RAM을 활용

▪ 128Byte = 1024bit 값을 분석

Page 57: Human-level control through deep reinforcement learningkucg.korea.ac.kr/new/seminar/2018/ppt/ppt-2018-12-27.pdf · 2018-12-31 · Reinforcement Learning RL의적용과모델링 •

Byeong-sun Hong | 2018-12-31| # 57Computer Graphics @ Korea University

Appendix

Parameter 관련 5가지 Game

Beam Rider

Pong

Seaquest

BreakoutSpace Invaders