https:// njkq#t=206 njkq#t=206

• https://www.youtube.com/watch?v=Qlqe1DXnJKQ#t=206

Survey

• What were the best things about Matt making you lead one class discussion?

• Good for presentation skills: 4• Good for presenter to learn a topic in depth: 5• Nice to see a selection of different topics: 1

Survey

• What were the worst things?

• Some presenters were not prepared enough: 2

• Some topics were too advanced/technical for the point in the class or were hard to follow: 4

Survey

• Do you think Matt should do this next year?

• Yes: 7• Maybe: 1• Sure, why not?: 1• No: 0

Survey

• If Matt does it next year, how could he change it?

• Require handout with takeaways and resources: 1• Require more discussion / engagement: 3

• Better scheduling: avoid jumping around, avoid duplication (or group together): 2

• Be clear about using examples from exercises: 1• Give a short presentation to Matt a week before. This will

improve presentations and get more suggested readings: 1

– http://skylight.wsu.edu/s/11d19b5d-c8a7-4e69-93e3-2f110c2c6f07.srv

• s in S• a in A• T(s,a,s’) = Pr(s’ | s,a)• R(s,a)

• o in Ω• O(s’,a,o) = Pr(o | s’, a)

• http://users.isr.ist.utl.pt/~mtjspaan/tutorialDMMS/tutorialAAMAS11.pdf

Behaviorist Psychology

• B. F. Skinner’s operant conditioning

Behaviorist Psychology

• B. F. Skinner’s operant conditioning / behaviorist shaping

• Reward schedule– Frequency– Randomized

In topology, two continuous functions from one topological space to another are called homotopic if one can be "continuously deformed" into the other, such a deformation being called a homotopy between the two functions.

What does shaping mean for computational reinforcement learning? T. Erez and W. D. Smart, 2008

According to Erez and Smart, there are (at least) six ways of thinking about shaping from a homotopic standpoint. What can you come up with?

• Modify Reward: successive approximations, subgoaling

• Dynamics: physical properties of environment or of agent: changing length of pole, changing amount of noise

• Internal parameters: simulated annealing, schedule for learning rate, complexification in NEAT

• Initial state: learning from easy missions• Action space: infants lock their arms, abstracted

(simpler) space• Extending time horizon: POMDPs, decrease

discount factor, pole balancing time horizon

• The infant development timeline and its application to robot shaping. J. Law+, 2011

• Bio-inspired vs. Bio-mimicking vs. Computational understanding

• Robot clicker training

Reward Shaping

• Potential-based shaping: A. Ng, ’99• Shaping for transfer learning: G. Konidaris,

2006• Multi-agent shaping: S. Devlin, 201• Plan-based reward shaping: M. Grez, 2008

https:// njkq#t=206 njkq#t=206

learning rate

transfer learning

surveyif matt

planbased reward

short presentation

presentation skills

class discussion

powerpoint presentation

Documents

206 rt 206 ex-exs 206 ex2 206 mx 250 drm

ssl, x.509, https - how to configure your https server

itu-t rec. l.206 (08/2017) requirements for passive

https communities.bentley

red hat single sign-on 7...6.6.4. postgresql database c a t...

https://

chapter 206 two-sample t-test - statistical software

high definition atsc-t modulator - promax.es · december...

http vs https, do you really need https?

[ms-iphttps]: ip over https (ip-https) tunneling protocol

704 terry avenue, seattle, wa 98104 usa t+1 206 …...704...

https://

206 t gp_winter_light

s t vol. 21, no. 2, 206-215, november 2018 doi: 10.18196

[ms-iphttps]: ip over https (ip-https) tunneling...

des signes j'apprends la langue · 2019. 3. 17. · d e s p...

t-206 miterm

at&t merlin - teltek 206, 410, and 820 phone... · at&t...

bus 206 - transdev-idf.com · bus p t un service horaires...

peugeot function list v9€¦ · 206 mux abs or esp ecu...