reinforcement learning through the optimization lens

74
reinforcement learning through the optimization lens Benjamin Recht University of California, Berkeley

Upload: others

Post on 10-Jun-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: reinforcement learning through the optimization lens

reinforcement learning through the optimization lens

Benjamin RechtUniversity of California, Berkeley

Page 2: reinforcement learning through the optimization lens

trustable, scalable, predictable

Page 3: reinforcement learning through the optimization lens

Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

Control Theory !

Page 4: reinforcement learning through the optimization lens

Disciplinary Biases

Control Theory

RL

Reinforcement Learning

Control

AE/CE/EE/ME CS

continuous discretemodel action data action

Science MagazineIEEE Transactions

Page 5: reinforcement learning through the optimization lens

Disciplinary Biases

Control Theory

RL

Reinforcement Learning

Control

AE/CE/EE/ME CS

continuous discretemodel action data action

Science MagazineIEEE Transactions

Today’s talk will try to unify these camps and point out how to merge their perspectives.

Page 6: reinforcement learning through the optimization lens

Main research challenge: What are the fundamental limits of learning systems that

interact with the physical environment?

theoretical foundations

•statistical learning theory

•robust control theory

•core optimization

How well must we understand a system in order to control it?

Page 7: reinforcement learning through the optimization lens

Control theory is the study of dynamical systems with inputs

G

Simplest case of such systems are linear systems

xt

uy

xt is called the state, and the dimension of the state is called the degree, d.

ut is called the input, and the dimension is p.yt is called the output, and the dimension is q.

xt+1 = Axt + Butyt = Cxt + Dut

For today, will only consider C=I, D=0 (xt observed)

Page 8: reinforcement learning through the optimization lens

Control theory is the study of dynamical systems with inputs

Gxt

uy

For today, will only consider when xt observed (MDP).

Reinforcement Learning discrete

^

Simplest example: Partially Observed Markov Decision Process (POMDP)

p(xt+1 | past) = p(xt+1 | xt, ut)p(yt | past) = p(yt | xt, ut)

xt is the state, and it takes values in [d]ut is called the input, and takes values in [p].yt is called the output, and takes values in [q].

Page 9: reinforcement learning through the optimization lens

K

G

Controller Design

• A dynamical system is connected in feedback with a controller that tries to get the closed loop to behave.

• Actions decided based on observed trajectories

• A mapping from trajectory to action is called a policy, • Optimal control: find policy that minimizes some objective.

xt xt+1 = Axt + But<latexit sha1_base64="swBJUJK950UX4Q9MabdzRZVkgIc=">AAACjXicbVFdSxtBFJ2s9aOxNdG+FPoyNFgUS9gVRR/aYtOCffBB0agQl+Xu5CYZnJ1dZu6WhCX9NX1t/0//TWdjBJN4YeBwzv2Ye0+cKWnJ9/9VvKUXyyuray+r669eb9Tqm1vXNs2NwLZIVWpuY7CopMY2SVJ4mxmEJFZ4E99/K/Wbn2isTPUVjTIME+hr2ZMCyFFR/e0wKmgvGPMPn/lXPoyI7/EWzyOK6g2/6U+CL4JgChpsGufRZiW866YiT1CTUGBtJ/AzCgswJIXCcfUut5iBuIc+dhzUkKANi8kKY77tmC7vpcY9TXzCPq0oILF2lMQuMwEa2HmtJJ/TOjn1jsNC6iwn1OJhUC9XnFJe3oN3pUFBauQACCPdX7kYgAFB7mozUya9MxQzmxTDXEuRdnGOVTQkA460SAlIXW5VnEql+CVoy89kf0CPqmtbyjvfZV+S/XjmrNG7C8nOkGD+/Ivger8Z+M3g4qBx0ppas8besfdshwXsiJ2wH+yctZlgv9hv9of99WreoffJ+/KQ6lWmNW/YTHin/wHe3MgZ</latexit><latexit sha1_base64="swBJUJK950UX4Q9MabdzRZVkgIc=">AAACjXicbVFdSxtBFJ2s9aOxNdG+FPoyNFgUS9gVRR/aYtOCffBB0agQl+Xu5CYZnJ1dZu6WhCX9NX1t/0//TWdjBJN4YeBwzv2Ye0+cKWnJ9/9VvKUXyyuray+r669eb9Tqm1vXNs2NwLZIVWpuY7CopMY2SVJ4mxmEJFZ4E99/K/Wbn2isTPUVjTIME+hr2ZMCyFFR/e0wKmgvGPMPn/lXPoyI7/EWzyOK6g2/6U+CL4JgChpsGufRZiW866YiT1CTUGBtJ/AzCgswJIXCcfUut5iBuIc+dhzUkKANi8kKY77tmC7vpcY9TXzCPq0oILF2lMQuMwEa2HmtJJ/TOjn1jsNC6iwn1OJhUC9XnFJe3oN3pUFBauQACCPdX7kYgAFB7mozUya9MxQzmxTDXEuRdnGOVTQkA460SAlIXW5VnEql+CVoy89kf0CPqmtbyjvfZV+S/XjmrNG7C8nOkGD+/Ivger8Z+M3g4qBx0ppas8besfdshwXsiJ2wH+yctZlgv9hv9of99WreoffJ+/KQ6lWmNW/YTHin/wHe3MgZ</latexit><latexit sha1_base64="swBJUJK950UX4Q9MabdzRZVkgIc=">AAACjXicbVFdSxtBFJ2s9aOxNdG+FPoyNFgUS9gVRR/aYtOCffBB0agQl+Xu5CYZnJ1dZu6WhCX9NX1t/0//TWdjBJN4YeBwzv2Ye0+cKWnJ9/9VvKUXyyuray+r669eb9Tqm1vXNs2NwLZIVWpuY7CopMY2SVJ4mxmEJFZ4E99/K/Wbn2isTPUVjTIME+hr2ZMCyFFR/e0wKmgvGPMPn/lXPoyI7/EWzyOK6g2/6U+CL4JgChpsGufRZiW866YiT1CTUGBtJ/AzCgswJIXCcfUut5iBuIc+dhzUkKANi8kKY77tmC7vpcY9TXzCPq0oILF2lMQuMwEa2HmtJJ/TOjn1jsNC6iwn1OJhUC9XnFJe3oN3pUFBauQACCPdX7kYgAFB7mozUya9MxQzmxTDXEuRdnGOVTQkA460SAlIXW5VnEql+CVoy89kf0CPqmtbyjvfZV+S/XjmrNG7C8nOkGD+/Ivger8Z+M3g4qBx0ppas8besfdshwXsiJ2wH+yctZlgv9hv9of99WreoffJ+/KQ6lWmNW/YTHin/wHe3MgZ</latexit><latexit sha1_base64="swBJUJK950UX4Q9MabdzRZVkgIc=">AAACjXicbVFdSxtBFJ2s9aOxNdG+FPoyNFgUS9gVRR/aYtOCffBB0agQl+Xu5CYZnJ1dZu6WhCX9NX1t/0//TWdjBJN4YeBwzv2Ye0+cKWnJ9/9VvKUXyyuray+r669eb9Tqm1vXNs2NwLZIVWpuY7CopMY2SVJ4mxmEJFZ4E99/K/Wbn2isTPUVjTIME+hr2ZMCyFFR/e0wKmgvGPMPn/lXPoyI7/EWzyOK6g2/6U+CL4JgChpsGufRZiW866YiT1CTUGBtJ/AzCgswJIXCcfUut5iBuIc+dhzUkKANi8kKY77tmC7vpcY9TXzCPq0oILF2lMQuMwEa2HmtJJ/TOjn1jsNC6iwn1OJhUC9XnFJe3oN3pUFBauQACCPdX7kYgAFB7mozUya9MxQzmxTDXEuRdnGOVTQkA460SAlIXW5VnEql+CVoy89kf0CPqmtbyjvfZV+S/XjmrNG7C8nOkGD+/Ivger8Z+M3g4qBx0ppas8besfdshwXsiJ2wH+yctZlgv9hv9of99WreoffJ+/KQ6lWmNW/YTHin/wHe3MgZ</latexit>

x

u

⌧t = (u1, . . . , ut�1, x0, . . . , xt)<latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit>

⇡t(⌧t)<latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit>

ut = ⇡t(⌧t)<latexit sha1_base64="+5yNEZhVzC7GiNHA9QeX+XHLpxg=">AAACiHicbVFdSxtBFJ1sbavph7E++jIYChFK2BXB+iCICu2DD4qNCsmy3J3cJBdnZ5eZu8Ww+Fv6qj+p/6azMYUm6YWBM+fc75sWmhyH4e9G8Grt9Zu36xvNd+8/fNxsbX26cXlpFfZUrnN7l4JDTQZ7TKzxrrAIWarxNr0/q/Xbn2gd5eYHTwuMMxgbGpEC9lTS2i4TlsdyUFDCnQGD/+4lrXbYDWcmV0E0B20xt8tkqxEPhrkqMzSsNDjXj8KC4wosk9L42ByUDgtQ9zDGvocGMnRxNev+UX72zFCOcuufYTlj/42oIHNumqXeMwOeuGWtJv+n9UsefY0rMkXJaNRLoVGpJeeyXoUckkXFeuoBKEu+V6kmYEGxX9hClVnuAtXCJNVDaUjlQ1xiNT+wBU865AzI1FNV30hreQ3GyQsaT/iv6tPWcuecxsTuy4W/itlbcfYHiZbXvwpu9rtR2I2uDtonp/PTrIsdsSs6IhKH4kR8F5eiJ5SYil/iSTwHzSAMDoOjF9egMY/ZFgsWnP4BKmTHew==</latexit><latexit sha1_base64="+5yNEZhVzC7GiNHA9QeX+XHLpxg=">AAACiHicbVFdSxtBFJ1sbavph7E++jIYChFK2BXB+iCICu2DD4qNCsmy3J3cJBdnZ5eZu8Ww+Fv6qj+p/6azMYUm6YWBM+fc75sWmhyH4e9G8Grt9Zu36xvNd+8/fNxsbX26cXlpFfZUrnN7l4JDTQZ7TKzxrrAIWarxNr0/q/Xbn2gd5eYHTwuMMxgbGpEC9lTS2i4TlsdyUFDCnQGD/+4lrXbYDWcmV0E0B20xt8tkqxEPhrkqMzSsNDjXj8KC4wosk9L42ByUDgtQ9zDGvocGMnRxNev+UX72zFCOcuufYTlj/42oIHNumqXeMwOeuGWtJv+n9UsefY0rMkXJaNRLoVGpJeeyXoUckkXFeuoBKEu+V6kmYEGxX9hClVnuAtXCJNVDaUjlQ1xiNT+wBU865AzI1FNV30hreQ3GyQsaT/iv6tPWcuecxsTuy4W/itlbcfYHiZbXvwpu9rtR2I2uDtonp/PTrIsdsSs6IhKH4kR8F5eiJ5SYil/iSTwHzSAMDoOjF9egMY/ZFgsWnP4BKmTHew==</latexit><latexit sha1_base64="+5yNEZhVzC7GiNHA9QeX+XHLpxg=">AAACiHicbVFdSxtBFJ1sbavph7E++jIYChFK2BXB+iCICu2DD4qNCsmy3J3cJBdnZ5eZu8Ww+Fv6qj+p/6azMYUm6YWBM+fc75sWmhyH4e9G8Grt9Zu36xvNd+8/fNxsbX26cXlpFfZUrnN7l4JDTQZ7TKzxrrAIWarxNr0/q/Xbn2gd5eYHTwuMMxgbGpEC9lTS2i4TlsdyUFDCnQGD/+4lrXbYDWcmV0E0B20xt8tkqxEPhrkqMzSsNDjXj8KC4wosk9L42ByUDgtQ9zDGvocGMnRxNev+UX72zFCOcuufYTlj/42oIHNumqXeMwOeuGWtJv+n9UsefY0rMkXJaNRLoVGpJeeyXoUckkXFeuoBKEu+V6kmYEGxX9hClVnuAtXCJNVDaUjlQ1xiNT+wBU865AzI1FNV30hreQ3GyQsaT/iv6tPWcuecxsTuy4W/itlbcfYHiZbXvwpu9rtR2I2uDtonp/PTrIsdsSs6IhKH4kR8F5eiJ5SYil/iSTwHzSAMDoOjF9egMY/ZFgsWnP4BKmTHew==</latexit><latexit sha1_base64="+5yNEZhVzC7GiNHA9QeX+XHLpxg=">AAACiHicbVFdSxtBFJ1sbavph7E++jIYChFK2BXB+iCICu2DD4qNCsmy3J3cJBdnZ5eZu8Ww+Fv6qj+p/6azMYUm6YWBM+fc75sWmhyH4e9G8Grt9Zu36xvNd+8/fNxsbX26cXlpFfZUrnN7l4JDTQZ7TKzxrrAIWarxNr0/q/Xbn2gd5eYHTwuMMxgbGpEC9lTS2i4TlsdyUFDCnQGD/+4lrXbYDWcmV0E0B20xt8tkqxEPhrkqMzSsNDjXj8KC4wosk9L42ByUDgtQ9zDGvocGMnRxNev+UX72zFCOcuufYTlj/42oIHNumqXeMwOeuGWtJv+n9UsefY0rMkXJaNRLoVGpJeeyXoUckkXFeuoBKEu+V6kmYEGxX9hClVnuAtXCJNVDaUjlQ1xiNT+wBU865AzI1FNV30hreQ3GyQsaT/iv6tPWcuecxsTuy4W/itlbcfYHiZbXvwpu9rtR2I2uDtonp/PTrIsdsSs6IhKH4kR8F5eiJ5SYil/iSTwHzSAMDoOjF9egMY/ZFgsWnP4BKmTHew==</latexit>

Page 10: reinforcement learning through the optimization lens

Optimal control

Gxt

ux

e

Ct is the cost. If you maximize, it’s called a reward.

et is a noise processft is the state-transition function

is an observed trajectory⌧t = (u1, . . . , ut�1, x0, . . . , xt)<latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit>

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

is the policy. This is the optimization decision variable.⇡t(⌧t)<latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit>

Page 11: reinforcement learning through the optimization lens

K

G

Optimal control

⇠t

xt

x

u

• A dynamical system is connected in feedback with a controller that tries to get the closed loop to behave.

• Optimal control: find policy that minimizes some objective.Major challenge: how to perform optimal

control when the system is unknown?

G

ut = ⇡t(⌧t)<latexit sha1_base64="+5yNEZhVzC7GiNHA9QeX+XHLpxg=">AAACiHicbVFdSxtBFJ1sbavph7E++jIYChFK2BXB+iCICu2DD4qNCsmy3J3cJBdnZ5eZu8Ww+Fv6qj+p/6azMYUm6YWBM+fc75sWmhyH4e9G8Grt9Zu36xvNd+8/fNxsbX26cXlpFfZUrnN7l4JDTQZ7TKzxrrAIWarxNr0/q/Xbn2gd5eYHTwuMMxgbGpEC9lTS2i4TlsdyUFDCnQGD/+4lrXbYDWcmV0E0B20xt8tkqxEPhrkqMzSsNDjXj8KC4wosk9L42ByUDgtQ9zDGvocGMnRxNev+UX72zFCOcuufYTlj/42oIHNumqXeMwOeuGWtJv+n9UsefY0rMkXJaNRLoVGpJeeyXoUckkXFeuoBKEu+V6kmYEGxX9hClVnuAtXCJNVDaUjlQ1xiNT+wBU865AzI1FNV30hreQ3GyQsaT/iv6tPWcuecxsTuy4W/itlbcfYHiZbXvwpu9rtR2I2uDtonp/PTrIsdsSs6IhKH4kR8F5eiJ5SYil/iSTwHzSAMDoOjF9egMY/ZFgsWnP4BKmTHew==</latexit><latexit sha1_base64="+5yNEZhVzC7GiNHA9QeX+XHLpxg=">AAACiHicbVFdSxtBFJ1sbavph7E++jIYChFK2BXB+iCICu2DD4qNCsmy3J3cJBdnZ5eZu8Ww+Fv6qj+p/6azMYUm6YWBM+fc75sWmhyH4e9G8Grt9Zu36xvNd+8/fNxsbX26cXlpFfZUrnN7l4JDTQZ7TKzxrrAIWarxNr0/q/Xbn2gd5eYHTwuMMxgbGpEC9lTS2i4TlsdyUFDCnQGD/+4lrXbYDWcmV0E0B20xt8tkqxEPhrkqMzSsNDjXj8KC4wosk9L42ByUDgtQ9zDGvocGMnRxNev+UX72zFCOcuufYTlj/42oIHNumqXeMwOeuGWtJv+n9UsefY0rMkXJaNRLoVGpJeeyXoUckkXFeuoBKEu+V6kmYEGxX9hClVnuAtXCJNVDaUjlQ1xiNT+wBU865AzI1FNV30hreQ3GyQsaT/iv6tPWcuecxsTuy4W/itlbcfYHiZbXvwpu9rtR2I2uDtonp/PTrIsdsSs6IhKH4kR8F5eiJ5SYil/iSTwHzSAMDoOjF9egMY/ZFgsWnP4BKmTHew==</latexit><latexit sha1_base64="+5yNEZhVzC7GiNHA9QeX+XHLpxg=">AAACiHicbVFdSxtBFJ1sbavph7E++jIYChFK2BXB+iCICu2DD4qNCsmy3J3cJBdnZ5eZu8Ww+Fv6qj+p/6azMYUm6YWBM+fc75sWmhyH4e9G8Grt9Zu36xvNd+8/fNxsbX26cXlpFfZUrnN7l4JDTQZ7TKzxrrAIWarxNr0/q/Xbn2gd5eYHTwuMMxgbGpEC9lTS2i4TlsdyUFDCnQGD/+4lrXbYDWcmV0E0B20xt8tkqxEPhrkqMzSsNDjXj8KC4wosk9L42ByUDgtQ9zDGvocGMnRxNev+UX72zFCOcuufYTlj/42oIHNumqXeMwOeuGWtJv+n9UsefY0rMkXJaNRLoVGpJeeyXoUckkXFeuoBKEu+V6kmYEGxX9hClVnuAtXCJNVDaUjlQ1xiNT+wBU865AzI1FNV30hreQ3GyQsaT/iv6tPWcuecxsTuy4W/itlbcfYHiZbXvwpu9rtR2I2uDtonp/PTrIsdsSs6IhKH4kR8F5eiJ5SYil/iSTwHzSAMDoOjF9egMY/ZFgsWnP4BKmTHew==</latexit><latexit sha1_base64="+5yNEZhVzC7GiNHA9QeX+XHLpxg=">AAACiHicbVFdSxtBFJ1sbavph7E++jIYChFK2BXB+iCICu2DD4qNCsmy3J3cJBdnZ5eZu8Ww+Fv6qj+p/6azMYUm6YWBM+fc75sWmhyH4e9G8Grt9Zu36xvNd+8/fNxsbX26cXlpFfZUrnN7l4JDTQZ7TKzxrrAIWarxNr0/q/Xbn2gd5eYHTwuMMxgbGpEC9lTS2i4TlsdyUFDCnQGD/+4lrXbYDWcmV0E0B20xt8tkqxEPhrkqMzSsNDjXj8KC4wosk9L42ByUDgtQ9zDGvocGMnRxNev+UX72zFCOcuufYTlj/42oIHNumqXeMwOeuGWtJv+n9UsefY0rMkXJaNRLoVGpJeeyXoUckkXFeuoBKEu+V6kmYEGxX9hClVnuAtXCJNVDaUjlQ1xiNT+wBU865AzI1FNV30hreQ3GyQsaT/iv6tPWcuecxsTuy4W/itlbcfYHiZbXvwpu9rtR2I2uDtonp/PTrIsdsSs6IhKH4kR8F5eiJ5SYil/iSTwHzSAMDoOjF9egMY/ZFgsWnP4BKmTHew==</latexit>

Today: Reinvent RL attempting to answer this question

xt+1 = F(ut, ut�1, ut�2, . . .)<latexit sha1_base64="q0LCCcSUIkRn3vfvuNtoAZaCfvw=">AAACnHicbVHtahNBFJ2sVWv9Su1PoQwGMcUYdougf4RSSy1YoaJJC8myzM7eJENnZ5aZO5Kw7CP4NP61D9K3cTaJYBIvDHPmnPsx9960kMJiGN42gjtbd+/d336w8/DR4ydPm7vP+lY7w6HHtdTmKmUWpFDQQ4ESrgoDLE8lXKbXH2v98gcYK7T6jrMC4pyNlRgJztBTSfPVNCnxdVTRD/S07RLsOP9+E1WL+7DqDGWm0R4kzVbYDedGN0G0BC2ytItktxEPM81dDgq5ZNYOorDAuGQGBZdQ7QydhYLxazaGgYeK5WDjct5RRV96JqMjbfxRSOfsvxEly62d5an3zBlO7LpWk//TBg5H7+NSqMIhKL4oNHKSoqb1eGgmDHCUMw8YN8L/lfIJM4yjH+JKlXnuAvhKJ+XUKcF1BmusxCka5kkLmDOh6q7KT0JK+o0pS8/FeIJ/VZ+2ltsnYizQds79ptTBhrNfSLQ+/k3QP+xGYTf6+rZ1dLxczTZ5Tl6QNonIO3JEzsgF6RFOfpJf5De5CfaDk+Bz8GXhGjSWMXtkxYL+H5O0zuo=</latexit><latexit sha1_base64="q0LCCcSUIkRn3vfvuNtoAZaCfvw=">AAACnHicbVHtahNBFJ2sVWv9Su1PoQwGMcUYdougf4RSSy1YoaJJC8myzM7eJENnZ5aZO5Kw7CP4NP61D9K3cTaJYBIvDHPmnPsx9960kMJiGN42gjtbd+/d336w8/DR4ydPm7vP+lY7w6HHtdTmKmUWpFDQQ4ESrgoDLE8lXKbXH2v98gcYK7T6jrMC4pyNlRgJztBTSfPVNCnxdVTRD/S07RLsOP9+E1WL+7DqDGWm0R4kzVbYDedGN0G0BC2ytItktxEPM81dDgq5ZNYOorDAuGQGBZdQ7QydhYLxazaGgYeK5WDjct5RRV96JqMjbfxRSOfsvxEly62d5an3zBlO7LpWk//TBg5H7+NSqMIhKL4oNHKSoqb1eGgmDHCUMw8YN8L/lfIJM4yjH+JKlXnuAvhKJ+XUKcF1BmusxCka5kkLmDOh6q7KT0JK+o0pS8/FeIJ/VZ+2ltsnYizQds79ptTBhrNfSLQ+/k3QP+xGYTf6+rZ1dLxczTZ5Tl6QNonIO3JEzsgF6RFOfpJf5De5CfaDk+Bz8GXhGjSWMXtkxYL+H5O0zuo=</latexit><latexit sha1_base64="q0LCCcSUIkRn3vfvuNtoAZaCfvw=">AAACnHicbVHtahNBFJ2sVWv9Su1PoQwGMcUYdougf4RSSy1YoaJJC8myzM7eJENnZ5aZO5Kw7CP4NP61D9K3cTaJYBIvDHPmnPsx9960kMJiGN42gjtbd+/d336w8/DR4ydPm7vP+lY7w6HHtdTmKmUWpFDQQ4ESrgoDLE8lXKbXH2v98gcYK7T6jrMC4pyNlRgJztBTSfPVNCnxdVTRD/S07RLsOP9+E1WL+7DqDGWm0R4kzVbYDedGN0G0BC2ytItktxEPM81dDgq5ZNYOorDAuGQGBZdQ7QydhYLxazaGgYeK5WDjct5RRV96JqMjbfxRSOfsvxEly62d5an3zBlO7LpWk//TBg5H7+NSqMIhKL4oNHKSoqb1eGgmDHCUMw8YN8L/lfIJM4yjH+JKlXnuAvhKJ+XUKcF1BmusxCka5kkLmDOh6q7KT0JK+o0pS8/FeIJ/VZ+2ltsnYizQds79ptTBhrNfSLQ+/k3QP+xGYTf6+rZ1dLxczTZ5Tl6QNonIO3JEzsgF6RFOfpJf5De5CfaDk+Bz8GXhGjSWMXtkxYL+H5O0zuo=</latexit><latexit sha1_base64="q0LCCcSUIkRn3vfvuNtoAZaCfvw=">AAACnHicbVHtahNBFJ2sVWv9Su1PoQwGMcUYdougf4RSSy1YoaJJC8myzM7eJENnZ5aZO5Kw7CP4NP61D9K3cTaJYBIvDHPmnPsx9960kMJiGN42gjtbd+/d336w8/DR4ydPm7vP+lY7w6HHtdTmKmUWpFDQQ4ESrgoDLE8lXKbXH2v98gcYK7T6jrMC4pyNlRgJztBTSfPVNCnxdVTRD/S07RLsOP9+E1WL+7DqDGWm0R4kzVbYDedGN0G0BC2ytItktxEPM81dDgq5ZNYOorDAuGQGBZdQ7QydhYLxazaGgYeK5WDjct5RRV96JqMjbfxRSOfsvxEly62d5an3zBlO7LpWk//TBg5H7+NSqMIhKL4oNHKSoqb1eGgmDHCUMw8YN8L/lfIJM4yjH+JKlXnuAvhKJ+XUKcF1BmusxCka5kkLmDOh6q7KT0JK+o0pS8/FeIJ/VZ+2ltsnYizQds79ptTBhrNfSLQ+/k3QP+xGYTf6+rZ1dLxczTZ5Tl6QNonIO3JEzsgF6RFOfpJf5De5CfaDk+Bz8GXhGjSWMXtkxYL+H5O0zuo=</latexit>

Page 12: reinforcement learning through the optimization lens

�t(�u) + � · (�u � u + pI) = � · � + �g

MT = Q + mscp(Ts � T )

HVAC ROOM

sensorstate

action

Page 13: reinforcement learning through the optimization lens

�t(�u) + � · (�u � u + pI) = � · � + �g

MT = Q + mscp(Ts � T )

HVAC ROOM

sensorstate

action

• model predictive control

• reinforcement learning

• PID control?

Identifyeverything

Identify a coarse model

We don’t need no stinking models!

•PDE control•High performance aerodynamics

We need robust fundamentals to distinguish these approaches

Page 14: reinforcement learning through the optimization lens

But PID control works…

10-2 10-1 100 101 102-50

-40

-30

-20

-10

0

10

20

30

40

50

Mag

nitu

de (d

B)

Bode Diagram

Frequency (rad/sec)

One decade

Gain crossover point

Loglog slope = -1.5

2 ≈ 6dB

0.5 ≈ -6dB

2 parameters suffice for 95% of all control applications.

How much needs to be modeled for more advanced control?

Can we learn to compensate for poor models, changing conditions?

Page 15: reinforcement learning through the optimization lens

Optimal control

Gxt

ux

e

Ct is the cost. If you maximize, it’s called a reward.

et is a noise processft is the state-transition function

is an observed trajectory⌧t = (u1, . . . , ut�1, x0, . . . , xt)<latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit><latexit sha1_base64="oTGOPnlC3lpbuJxkZHAlqk3gehs=">AAACoXicbVHbahsxEJW3tzS9Oe1jX0RNwQHX7IZC8xIIbaEt5MG9OAnYyzKrHdsiWmmRRsVm8U/0a/ra/kX/plrHgdrugOBwzlw0Z/JKSUdx/KcV3bp95+69vfv7Dx4+evykffD03BlvBQ6FUcZe5uBQSY1DkqTwsrIIZa7wIr961+gX39E6afQ3WlSYljDVciIFUKCydm9M4DPiJ7zrs6Q3VoUh1/NZTa+SZW+exTfUPKPDrN2J+/Eq+C5I1qDD1jHIDlrpuDDCl6hJKHBulMQVpTVYkkLhcn/sHVYgrmCKowA1lOjSerXWkr8MTMEnxoania/YfytqKJ1blHnILIFmbltryP9pI0+T47SWuvKEWlwPmnjFyfDGI15Ii4LUIgAQVoa/cjEDC4KCkxtTVr0rFBub1HOvpTAFbrGK5mQhkA6pBKmbreoPUin+FbTjZ3I6oxs1tG3k7ns5lcH9s3AufbiTHA6SbNu/C86P+kncTz6/7py+XZ9mjz1nL1iXJewNO2Uf2YANmWA/2E/2i/2OOtGnaBB9uU6NWuuaZ2wjotFfdanQ+g==</latexit>

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

is the policy. This is the optimization decision variable.⇡t(⌧t)<latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit><latexit sha1_base64="DOx/ktybitgjChwuZWtodyh8jiA=">AAACgHicbVFdSxtBFJ1s1apt/aiPvgwGIYKkuyK09ElaQR98UDQqJEu4O7lJLs7ObmfuFsPi7+hr/Vn+G2djBJN44cLhnPt9k1yT4zB8qgUfFhaXPi6vrH76/GVtfWPz67XLCquwpTKd2dsEHGoy2GJijbe5RUgTjTfJ3e9Kv/mL1lFmrniUY5zCwFCfFLCn4k5OXW50GIou73U36mEzHJucB9EE1MXEzrubtbjTy1SRomGlwbl2FOYcl2CZlMaH1U7hMAd1BwNse2ggRReX46kf5K5nerKfWe+G5Zh9m1FC6twoTXxkCjx0s1pFvqe1C+7/iEsyecFo1EujfqElZ7I6geyRRcV65AEoS35WqYZgQbE/1FSXce0c1dQm5X1hSGU9nGE137MFTzrkFMhUW5UnpLW8BOPkGQ2G/Kr6spXcOKYBsds/898we3PB/iHR7PnnwfVBMwqb0cVh/ejX5DXLYlvsiIaIxHdxJE7FuWgJJf6If+K/eAyCoBF8C6KX0KA2ydkSUxb8fAb8qMVJ</latexit>

Page 16: reinforcement learning through the optimization lens

Newton’s Laws

zt+1 = zt + vtvt+1 = vt + at

mat = ut

subject to xt+1 =1 10 1

�xt +

01/m

�ut

<latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit>

xt =

ztvt

<latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit>

minimize<latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit>

TX

t=01|(xt)1|>✏

<latexit sha1_base64="lJVoaMm0ZXUFsbpjH2gMNfJ3dfs=">AAACl3icbVFdaxNBFJ2sVWv9avWtvgwGIQUJu0WwfVCCFu1DH1psbCFZl7uTm2TofCwzdyVhmx/gr/FVf4r/xtk0BZN4YeBwzrn3zr03L5T0FMd/GtGdjbv37m8+2Hr46PGTp9s7z756WzqBXWGVdZc5eFTSYJckKbwsHILOFV7kVx9r/eI7Oi+tOadpgamGkZFDKYAClW03+77UWUXv4tm3c55k1XVrktFelly/72PhpbJmFlxxO54HXwfJAjTZIk6znUbaH1hRajQkFHjfS+KC0gocSaFwttUvPRYgrmCEvQANaPRpNZ9mxl8FZsCH1oVniM/ZfzMq0N5PdR6cGmjsV7Wa/J/WK2l4kFbSFCWhETeNhqXiZHm9Gj6QDgWpaQAgnAx/5WIMDgSFBS51mdcuUCxNUk1KI4Ud4AqraEIOAumRNEhTT1V9lkrxL2A8P5GjMd2qoWwtt47kSJJ/fRKuZPbWzOEgyer610F3v33YTs7eNDsfFpfZZC/YS9ZiCXvLOuyYnbIuE+wH+8l+sd/RbtSJPkXHN9aosch5zpYiOvsLgB7OUw==</latexit><latexit sha1_base64="lJVoaMm0ZXUFsbpjH2gMNfJ3dfs=">AAACl3icbVFdaxNBFJ2sVWv9avWtvgwGIQUJu0WwfVCCFu1DH1psbCFZl7uTm2TofCwzdyVhmx/gr/FVf4r/xtk0BZN4YeBwzrn3zr03L5T0FMd/GtGdjbv37m8+2Hr46PGTp9s7z756WzqBXWGVdZc5eFTSYJckKbwsHILOFV7kVx9r/eI7Oi+tOadpgamGkZFDKYAClW03+77UWUXv4tm3c55k1XVrktFelly/72PhpbJmFlxxO54HXwfJAjTZIk6znUbaH1hRajQkFHjfS+KC0gocSaFwttUvPRYgrmCEvQANaPRpNZ9mxl8FZsCH1oVniM/ZfzMq0N5PdR6cGmjsV7Wa/J/WK2l4kFbSFCWhETeNhqXiZHm9Gj6QDgWpaQAgnAx/5WIMDgSFBS51mdcuUCxNUk1KI4Ud4AqraEIOAumRNEhTT1V9lkrxL2A8P5GjMd2qoWwtt47kSJJ/fRKuZPbWzOEgyer610F3v33YTs7eNDsfFpfZZC/YS9ZiCXvLOuyYnbIuE+wH+8l+sd/RbtSJPkXHN9aosch5zpYiOvsLgB7OUw==</latexit><latexit sha1_base64="lJVoaMm0ZXUFsbpjH2gMNfJ3dfs=">AAACl3icbVFdaxNBFJ2sVWv9avWtvgwGIQUJu0WwfVCCFu1DH1psbCFZl7uTm2TofCwzdyVhmx/gr/FVf4r/xtk0BZN4YeBwzrn3zr03L5T0FMd/GtGdjbv37m8+2Hr46PGTp9s7z756WzqBXWGVdZc5eFTSYJckKbwsHILOFV7kVx9r/eI7Oi+tOadpgamGkZFDKYAClW03+77UWUXv4tm3c55k1XVrktFelly/72PhpbJmFlxxO54HXwfJAjTZIk6znUbaH1hRajQkFHjfS+KC0gocSaFwttUvPRYgrmCEvQANaPRpNZ9mxl8FZsCH1oVniM/ZfzMq0N5PdR6cGmjsV7Wa/J/WK2l4kFbSFCWhETeNhqXiZHm9Gj6QDgWpaQAgnAx/5WIMDgSFBS51mdcuUCxNUk1KI4Ud4AqraEIOAumRNEhTT1V9lkrxL2A8P5GjMd2qoWwtt47kSJJ/fRKuZPbWzOEgyer610F3v33YTs7eNDsfFpfZZC/YS9ZiCXvLOuyYnbIuE+wH+8l+sd/RbtSJPkXHN9aosch5zpYiOvsLgB7OUw==</latexit><latexit sha1_base64="lJVoaMm0ZXUFsbpjH2gMNfJ3dfs=">AAACl3icbVFdaxNBFJ2sVWv9avWtvgwGIQUJu0WwfVCCFu1DH1psbCFZl7uTm2TofCwzdyVhmx/gr/FVf4r/xtk0BZN4YeBwzrn3zr03L5T0FMd/GtGdjbv37m8+2Hr46PGTp9s7z756WzqBXWGVdZc5eFTSYJckKbwsHILOFV7kVx9r/eI7Oi+tOadpgamGkZFDKYAClW03+77UWUXv4tm3c55k1XVrktFelly/72PhpbJmFlxxO54HXwfJAjTZIk6znUbaH1hRajQkFHjfS+KC0gocSaFwttUvPRYgrmCEvQANaPRpNZ9mxl8FZsCH1oVniM/ZfzMq0N5PdR6cGmjsV7Wa/J/WK2l4kFbSFCWhETeNhqXiZHm9Gj6QDgWpaQAgnAx/5WIMDgSFBS51mdcuUCxNUk1KI4Ud4AqraEIOAumRNEhTT1V9lkrxL2A8P5GjMd2qoWwtt47kSJJ/fRKuZPbWzOEgyer610F3v33YTs7eNDsfFpfZZC/YS9ZiCXvLOuyYnbIuE+wH+8l+sd/RbtSJPkXHN9aosch5zpYiOvsLgB7OUw==</latexit>

1

Page 17: reinforcement learning through the optimization lens

Newton’s Laws

zt+1 = zt + vtvt+1 = vt + at

mat = ut

+ru2t

subject to xt+1 =1 10 1

�xt +

01/m

�ut

<latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit>

xt =

ztvt

<latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit>

minimize<latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit>

TX

t=0(xt)

21

<latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit>

Page 18: reinforcement learning through the optimization lens

“Simplest” Example: LQR

+ru2t

subject to xt+1 =1 10 1

�xt +

01/m

�ut

<latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit>

xt =

ztvt

<latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit>

minimize<latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit>

TX

t=0(xt)

21

<latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit>

minimize Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit>

Page 19: reinforcement learning through the optimization lens

The Linearization PrincipleIf a machine learning algorithm does crazy things when

restricted to linear models, it’s going to do crazy things on complex nonlinear models too.

Would you believe someone had a good SAT solver if it couldn’t solve 2-SAT?

This has been a fruitful research direction:•Recurrent neural networks (Hardt, Ma, R. 2016)•Generalization and Margin in Neural Nets (Zhang et al 2017)•Residual Networks (Hardt and Ma 2017)•Bayesian Optimization (Jamieson et al 2017) •Adaptive gradient methods (Wilson et al 2017)

Page 20: reinforcement learning through the optimization lens

“Simplest” Example: LQR

+ru2t

subject to xt+1 =1 10 1

�xt +

01/m

�ut

<latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit>

xt =

ztvt

<latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit>

minimize<latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit>

TX

t=0(xt)

21

<latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit>

minimize Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit>

Page 21: reinforcement learning through the optimization lens

“Simplest” Example: LQR

What is the optimal estimation/design scheme?

How many samples are needed for near optimal control?

minimize Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit>

Suppose (A,B) unknown

Page 22: reinforcement learning through the optimization lens

Optimal control

Gxt

ux

e

generic solutions with known dynamics:

Batch Optimization

Dynamic Programming

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

Page 23: reinforcement learning through the optimization lens

RL Methods Gxt

ux

e

approximate dynamic programming

model-based

• Model-based: fit model from data• Model-free

- Approximate dynamic programming: estimate cost from data- Direct policy search: search for actions from data

direct policy search

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

How to solve optimal control when the model f is unknown?

Page 24: reinforcement learning through the optimization lens

Model-based RLminimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = f(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="wtsO4CVxqkkm5sp9iKOgdMu8Yxs=">AAADGHicbVJNb9QwEHXCV7t8beHIxWJFtRXVNkFIlEOlioLg0EMR3bbSOkSOd7Jr1XEie4J2ifJHuPJHuCGu3Pg3ONtUYncZydL4vednz4yTQkmLQfDH82/cvHX7zsZm5+69+w8edrcendm8NAKGIle5uUi4BSU1DFGigovCAM8SBefJ5VHDn38BY2WuT3FeQJTxiZapFBwdFHe/swQmUlfcGD6vK6XqDsuSfFZlUstMfoWablOWcZwmSfWujoEpSHHEbJnFFR6E9edTehRjfxbjbhnjDjNyMsWIsdbGDnDQWMyc+nlY0wOaXmt3odGzzjZ1G0ewQjojhrzx6TDQ4/ZVcbcXDIJF0PUkbJMeaeMk3vIiNs5FmYFGobi1ozAoMHJ2KIUCV2JpoeDikk9g5FLNM7BRtehmTZ85ZEzT3LilkS7Qf09UPLN2niVO2fTFrnIN+D9uVGK6H1VSFyWCFlcXpaWimNNmNHQsDQhUc5dwYaR7KxVTbrhAN8ClWxbeBYilSqpZqaXIx7CCKpyh4Q60gBmXuqmqei+Vop+4tvS4Gdg162wbuv9WTiTa3WP3S/TOmtgNJFxt/3py9mIQBoPw48ve4Zt2NBvkCXlK+iQkr8gh+UBOyJAIb9Pb8/a91/43/4f/0/91JfW99sxjshT+77+2e/1Q</latexit><latexit sha1_base64="wtsO4CVxqkkm5sp9iKOgdMu8Yxs=">AAADGHicbVJNb9QwEHXCV7t8beHIxWJFtRXVNkFIlEOlioLg0EMR3bbSOkSOd7Jr1XEie4J2ifJHuPJHuCGu3Pg3ONtUYncZydL4vednz4yTQkmLQfDH82/cvHX7zsZm5+69+w8edrcendm8NAKGIle5uUi4BSU1DFGigovCAM8SBefJ5VHDn38BY2WuT3FeQJTxiZapFBwdFHe/swQmUlfcGD6vK6XqDsuSfFZlUstMfoWablOWcZwmSfWujoEpSHHEbJnFFR6E9edTehRjfxbjbhnjDjNyMsWIsdbGDnDQWMyc+nlY0wOaXmt3odGzzjZ1G0ewQjojhrzx6TDQ4/ZVcbcXDIJF0PUkbJMeaeMk3vIiNs5FmYFGobi1ozAoMHJ2KIUCV2JpoeDikk9g5FLNM7BRtehmTZ85ZEzT3LilkS7Qf09UPLN2niVO2fTFrnIN+D9uVGK6H1VSFyWCFlcXpaWimNNmNHQsDQhUc5dwYaR7KxVTbrhAN8ClWxbeBYilSqpZqaXIx7CCKpyh4Q60gBmXuqmqei+Vop+4tvS4Gdg162wbuv9WTiTa3WP3S/TOmtgNJFxt/3py9mIQBoPw48ve4Zt2NBvkCXlK+iQkr8gh+UBOyJAIb9Pb8/a91/43/4f/0/91JfW99sxjshT+77+2e/1Q</latexit><latexit sha1_base64="wtsO4CVxqkkm5sp9iKOgdMu8Yxs=">AAADGHicbVJNb9QwEHXCV7t8beHIxWJFtRXVNkFIlEOlioLg0EMR3bbSOkSOd7Jr1XEie4J2ifJHuPJHuCGu3Pg3ONtUYncZydL4vednz4yTQkmLQfDH82/cvHX7zsZm5+69+w8edrcendm8NAKGIle5uUi4BSU1DFGigovCAM8SBefJ5VHDn38BY2WuT3FeQJTxiZapFBwdFHe/swQmUlfcGD6vK6XqDsuSfFZlUstMfoWablOWcZwmSfWujoEpSHHEbJnFFR6E9edTehRjfxbjbhnjDjNyMsWIsdbGDnDQWMyc+nlY0wOaXmt3odGzzjZ1G0ewQjojhrzx6TDQ4/ZVcbcXDIJF0PUkbJMeaeMk3vIiNs5FmYFGobi1ozAoMHJ2KIUCV2JpoeDikk9g5FLNM7BRtehmTZ85ZEzT3LilkS7Qf09UPLN2niVO2fTFrnIN+D9uVGK6H1VSFyWCFlcXpaWimNNmNHQsDQhUc5dwYaR7KxVTbrhAN8ClWxbeBYilSqpZqaXIx7CCKpyh4Q60gBmXuqmqei+Vop+4tvS4Gdg162wbuv9WTiTa3WP3S/TOmtgNJFxt/3py9mIQBoPw48ve4Zt2NBvkCXlK+iQkr8gh+UBOyJAIb9Pb8/a91/43/4f/0/91JfW99sxjshT+77+2e/1Q</latexit><latexit sha1_base64="wtsO4CVxqkkm5sp9iKOgdMu8Yxs=">AAADGHicbVJNb9QwEHXCV7t8beHIxWJFtRXVNkFIlEOlioLg0EMR3bbSOkSOd7Jr1XEie4J2ifJHuPJHuCGu3Pg3ONtUYncZydL4vednz4yTQkmLQfDH82/cvHX7zsZm5+69+w8edrcendm8NAKGIle5uUi4BSU1DFGigovCAM8SBefJ5VHDn38BY2WuT3FeQJTxiZapFBwdFHe/swQmUlfcGD6vK6XqDsuSfFZlUstMfoWablOWcZwmSfWujoEpSHHEbJnFFR6E9edTehRjfxbjbhnjDjNyMsWIsdbGDnDQWMyc+nlY0wOaXmt3odGzzjZ1G0ewQjojhrzx6TDQ4/ZVcbcXDIJF0PUkbJMeaeMk3vIiNs5FmYFGobi1ozAoMHJ2KIUCV2JpoeDikk9g5FLNM7BRtehmTZ85ZEzT3LilkS7Qf09UPLN2niVO2fTFrnIN+D9uVGK6H1VSFyWCFlcXpaWimNNmNHQsDQhUc5dwYaR7KxVTbrhAN8ClWxbeBYilSqpZqaXIx7CCKpyh4Q60gBmXuqmqei+Vop+4tvS4Gdg162wbuv9WTiTa3WP3S/TOmtgNJFxt/3py9mIQBoPw48ve4Zt2NBvkCXlK+iQkr8gh+UBOyJAIb9Pb8/a91/43/4f/0/91JfW99sxjshT+77+2e/1Q</latexit>

Collect some simulation data. Should have xt+1 ⇡ '(xt, ut) + ⌫t<latexit sha1_base64="qv2LanEkuNBubcf2z1eK6m/O+og=">AAACnXicbVHbahsxEJW3tzS9Oe1jHypqCg4JZrcUksfQC+2DKSmt44B3WWblsS2ilYQ0G2wW/0K/pq/tf/RvqnVcqO0OCA7nzEUzp7BKeorj363o1u07d+/t3d9/8PDR4yftg6cX3lRO4EAYZdxlAR6V1DggSQovrUMoC4XD4updow+v0Xlp9DdaWMxKmGo5kQIoUHm7O89rOkqWPAVrnZnz9BqcncnA03GV0yE/4qkOIG934l68Cr4LkjXosHWc5wetLB0bUZWoSSjwfpTElrIaHEmhcLmfVh4tiCuY4ihADSX6rF6ttOSvAjPmE+PC08RX7L8VNZTeL8oiZJZAM7+tNeT/tFFFk9OsltpWhFrcDJpUipPhzX34WDoUpBYBgHAy/JWLGTgQFK64MWXV26LY2KSeV1oKM8YtVtGcHATSI5UgdbNV/VEqxb+C9rwvpzP6q4a2jdx9L6eS/HE/WKUPd5KDIcn2+XfBxeteEveSL286Z2/X1uyx5+wl67KEnbAz9omdswET7Dv7wX6yX9GL6EPUjz7fpEatdc0zthHR8A8FI8/+</latexit><latexit sha1_base64="qv2LanEkuNBubcf2z1eK6m/O+og=">AAACnXicbVHbahsxEJW3tzS9Oe1jHypqCg4JZrcUksfQC+2DKSmt44B3WWblsS2ilYQ0G2wW/0K/pq/tf/RvqnVcqO0OCA7nzEUzp7BKeorj363o1u07d+/t3d9/8PDR4yftg6cX3lRO4EAYZdxlAR6V1DggSQovrUMoC4XD4updow+v0Xlp9DdaWMxKmGo5kQIoUHm7O89rOkqWPAVrnZnz9BqcncnA03GV0yE/4qkOIG934l68Cr4LkjXosHWc5wetLB0bUZWoSSjwfpTElrIaHEmhcLmfVh4tiCuY4ihADSX6rF6ttOSvAjPmE+PC08RX7L8VNZTeL8oiZJZAM7+tNeT/tFFFk9OsltpWhFrcDJpUipPhzX34WDoUpBYBgHAy/JWLGTgQFK64MWXV26LY2KSeV1oKM8YtVtGcHATSI5UgdbNV/VEqxb+C9rwvpzP6q4a2jdx9L6eS/HE/WKUPd5KDIcn2+XfBxeteEveSL286Z2/X1uyx5+wl67KEnbAz9omdswET7Dv7wX6yX9GL6EPUjz7fpEatdc0zthHR8A8FI8/+</latexit><latexit sha1_base64="qv2LanEkuNBubcf2z1eK6m/O+og=">AAACnXicbVHbahsxEJW3tzS9Oe1jHypqCg4JZrcUksfQC+2DKSmt44B3WWblsS2ilYQ0G2wW/0K/pq/tf/RvqnVcqO0OCA7nzEUzp7BKeorj363o1u07d+/t3d9/8PDR4yftg6cX3lRO4EAYZdxlAR6V1DggSQovrUMoC4XD4updow+v0Xlp9DdaWMxKmGo5kQIoUHm7O89rOkqWPAVrnZnz9BqcncnA03GV0yE/4qkOIG934l68Cr4LkjXosHWc5wetLB0bUZWoSSjwfpTElrIaHEmhcLmfVh4tiCuY4ihADSX6rF6ttOSvAjPmE+PC08RX7L8VNZTeL8oiZJZAM7+tNeT/tFFFk9OsltpWhFrcDJpUipPhzX34WDoUpBYBgHAy/JWLGTgQFK64MWXV26LY2KSeV1oKM8YtVtGcHATSI5UgdbNV/VEqxb+C9rwvpzP6q4a2jdx9L6eS/HE/WKUPd5KDIcn2+XfBxeteEveSL286Z2/X1uyx5+wl67KEnbAz9omdswET7Dv7wX6yX9GL6EPUjz7fpEatdc0zthHR8A8FI8/+</latexit><latexit sha1_base64="qv2LanEkuNBubcf2z1eK6m/O+og=">AAACnXicbVHbahsxEJW3tzS9Oe1jHypqCg4JZrcUksfQC+2DKSmt44B3WWblsS2ilYQ0G2wW/0K/pq/tf/RvqnVcqO0OCA7nzEUzp7BKeorj363o1u07d+/t3d9/8PDR4yftg6cX3lRO4EAYZdxlAR6V1DggSQovrUMoC4XD4updow+v0Xlp9DdaWMxKmGo5kQIoUHm7O89rOkqWPAVrnZnz9BqcncnA03GV0yE/4qkOIG934l68Cr4LkjXosHWc5wetLB0bUZWoSSjwfpTElrIaHEmhcLmfVh4tiCuY4ihADSX6rF6ttOSvAjPmE+PC08RX7L8VNZTeL8oiZJZAM7+tNeT/tFFFk9OsltpWhFrcDJpUipPhzX34WDoUpBYBgHAy/JWLGTgQFK64MWXV26LY2KSeV1oKM8YtVtGcHATSI5UgdbNV/VEqxb+C9rwvpzP6q4a2jdx9L6eS/HE/WKUPd5KDIcn2+XfBxeteEveSL286Z2/X1uyx5+wl67KEnbAz9omdswET7Dv7wX6yX9GL6EPUjz7fpEatdc0zthHR8A8FI8/+</latexit>

' = argmin'

N�1X

t=0||xt+1 � '(xt, ut)||2

<latexit sha1_base64="z65z1FewzNivdrNX0nJfMyZk9Iw=">AAACzHicbVFNb9NAEN2YrxK+0nLksiICpaKN7AqJXooqgQSHqiqCNJVi15psNvaq67W1O64SbXzlX/FDOHOF/8A6NYIkjLTS2/fezOzOjAspDPr+95Z36/adu/e27rcfPHz0+Elne+fc5KVmfMBymeuLMRguheIDFCj5RaE5ZGPJh+Ord7U+vObaiFx9wXnBowwSJaaCAToq7gzDFNCG16CLVFT0iIagkzATKv5LvqShKbPY4pFfXdrT/aCii8XM3V85tE8bX28W414Z4+5icXnQjjtdv+8vg26CoAFd0sRZvN2KwknOyowrZBKMGQV+gZEFjYJJXrXD0vAC2BUkfOSggoybyC4nUNEXjpnQaa7dUUiX7L8ZFjJj5tnYOTPA1KxrNfk/bVTi9DCyQhUlcsVuGk1LSTGn9TjpRGjOUM4dAKaFeytlKWhg6Ia+0mVZu+Bs5Sd2VirB8glfYyXOUIMjDccMhKp/ZT8IKelnUIaeiCTFP6orW8u99yIRaPZO3GbV7obZLSRYH/8mOD/oB34/+PS6e/y2Wc0WeUaekx4JyBtyTD6SMzIgjHwjP8hP8ss79dCzXnVj9VpNzlOyEt7X35/24uw=</latexit><latexit sha1_base64="z65z1FewzNivdrNX0nJfMyZk9Iw=">AAACzHicbVFNb9NAEN2YrxK+0nLksiICpaKN7AqJXooqgQSHqiqCNJVi15psNvaq67W1O64SbXzlX/FDOHOF/8A6NYIkjLTS2/fezOzOjAspDPr+95Z36/adu/e27rcfPHz0+Elne+fc5KVmfMBymeuLMRguheIDFCj5RaE5ZGPJh+Ord7U+vObaiFx9wXnBowwSJaaCAToq7gzDFNCG16CLVFT0iIagkzATKv5LvqShKbPY4pFfXdrT/aCii8XM3V85tE8bX28W414Z4+5icXnQjjtdv+8vg26CoAFd0sRZvN2KwknOyowrZBKMGQV+gZEFjYJJXrXD0vAC2BUkfOSggoybyC4nUNEXjpnQaa7dUUiX7L8ZFjJj5tnYOTPA1KxrNfk/bVTi9DCyQhUlcsVuGk1LSTGn9TjpRGjOUM4dAKaFeytlKWhg6Ia+0mVZu+Bs5Sd2VirB8glfYyXOUIMjDccMhKp/ZT8IKelnUIaeiCTFP6orW8u99yIRaPZO3GbV7obZLSRYH/8mOD/oB34/+PS6e/y2Wc0WeUaekx4JyBtyTD6SMzIgjHwjP8hP8ss79dCzXnVj9VpNzlOyEt7X35/24uw=</latexit><latexit sha1_base64="z65z1FewzNivdrNX0nJfMyZk9Iw=">AAACzHicbVFNb9NAEN2YrxK+0nLksiICpaKN7AqJXooqgQSHqiqCNJVi15psNvaq67W1O64SbXzlX/FDOHOF/8A6NYIkjLTS2/fezOzOjAspDPr+95Z36/adu/e27rcfPHz0+Elne+fc5KVmfMBymeuLMRguheIDFCj5RaE5ZGPJh+Ord7U+vObaiFx9wXnBowwSJaaCAToq7gzDFNCG16CLVFT0iIagkzATKv5LvqShKbPY4pFfXdrT/aCii8XM3V85tE8bX28W414Z4+5icXnQjjtdv+8vg26CoAFd0sRZvN2KwknOyowrZBKMGQV+gZEFjYJJXrXD0vAC2BUkfOSggoybyC4nUNEXjpnQaa7dUUiX7L8ZFjJj5tnYOTPA1KxrNfk/bVTi9DCyQhUlcsVuGk1LSTGn9TjpRGjOUM4dAKaFeytlKWhg6Ia+0mVZu+Bs5Sd2VirB8glfYyXOUIMjDccMhKp/ZT8IKelnUIaeiCTFP6orW8u99yIRaPZO3GbV7obZLSRYH/8mOD/oB34/+PS6e/y2Wc0WeUaekx4JyBtyTD6SMzIgjHwjP8hP8ss79dCzXnVj9VpNzlOyEt7X35/24uw=</latexit><latexit sha1_base64="z65z1FewzNivdrNX0nJfMyZk9Iw=">AAACzHicbVFNb9NAEN2YrxK+0nLksiICpaKN7AqJXooqgQSHqiqCNJVi15psNvaq67W1O64SbXzlX/FDOHOF/8A6NYIkjLTS2/fezOzOjAspDPr+95Z36/adu/e27rcfPHz0+Elne+fc5KVmfMBymeuLMRguheIDFCj5RaE5ZGPJh+Ord7U+vObaiFx9wXnBowwSJaaCAToq7gzDFNCG16CLVFT0iIagkzATKv5LvqShKbPY4pFfXdrT/aCii8XM3V85tE8bX28W414Z4+5icXnQjjtdv+8vg26CoAFd0sRZvN2KwknOyowrZBKMGQV+gZEFjYJJXrXD0vAC2BUkfOSggoybyC4nUNEXjpnQaa7dUUiX7L8ZFjJj5tnYOTPA1KxrNfk/bVTi9DCyQhUlcsVuGk1LSTGn9TjpRGjOUM4dAKaFeytlKWhg6Ia+0mVZu+Bs5Sd2VirB8glfYyXOUIMjDccMhKp/ZT8IKelnUIaeiCTFP6orW8u99yIRaPZO3GbV7obZLSRYH/8mOD/oB34/+PS6e/y2Wc0WeUaekx4JyBtyTD6SMzIgjHwjP8hP8ss79dCzXnVj9VpNzlOyEt7X35/24uw=</latexit>

Fit dynamics with supervised learning:

minimize E!

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = '(xt, ut) + !t

ut = ⇡(⌧t)<latexit sha1_base64="7WwS5p4/hlK3102Z185pb4izZt8=">AAADJnicbVJNbxMxEPUuXyV8pXDgwMUiokrVarWLkOASqaJAOfRQRNNWipeV13E2Vm3vyp6tElb7f7jyR7ghxI2fgjdZEEkYydLovZlnzzynhRQWwvCn51+7fuPmra3bnTt3791/0N1+eGbz0jA+ZLnMzUVKLZdC8yEIkPyiMJyqVPLz9PKw4c+vuLEi16cwL3isaKbFRDAKDkq6X0nKM6Eragyd15WUdYeoNJ9VSmihxGde4x1MFIVpmlZv64TkimeUSD6BEbGlSioYRPWnU3yYQH+WwH6ZwC4xIptCTEirZQMIGp2Zq96LajzA5IqaYir+duA9vFROwHXtYIcNSCH6BGhDdwjX4/aNSbcXBuEi8GYStUkPtXGSbHsxGeesVFwDk9TaURQWEDs5EExyN3BpeUHZJc34yKWaKm7jarHbGj9zyBhPcuOOBrxA/+2oqLJ2rlJX2WzJrnMN+D9uVMLkVVwJXZTANVteNCklhhw3RuGxMJyBnLuEMiPcWzGbUkMZODtXblloF5ytTFLNSi1YPuZrqIQZGOpAy0FRoZupqiMhJf5ItcXHjXN/WCfb0P03IhNg94/dn9G7G8XOkGh9/ZvJ2fMgCoPow4vewevWmi30BD1FfRShl+gAvUcnaIiY99gbeO+8I/+L/83/7v9Ylvpe2/MIrYT/6zcbuwOX</latexit><latexit sha1_base64="7WwS5p4/hlK3102Z185pb4izZt8=">AAADJnicbVJNbxMxEPUuXyV8pXDgwMUiokrVarWLkOASqaJAOfRQRNNWipeV13E2Vm3vyp6tElb7f7jyR7ghxI2fgjdZEEkYydLovZlnzzynhRQWwvCn51+7fuPmra3bnTt3791/0N1+eGbz0jA+ZLnMzUVKLZdC8yEIkPyiMJyqVPLz9PKw4c+vuLEi16cwL3isaKbFRDAKDkq6X0nKM6Eragyd15WUdYeoNJ9VSmihxGde4x1MFIVpmlZv64TkimeUSD6BEbGlSioYRPWnU3yYQH+WwH6ZwC4xIptCTEirZQMIGp2Zq96LajzA5IqaYir+duA9vFROwHXtYIcNSCH6BGhDdwjX4/aNSbcXBuEi8GYStUkPtXGSbHsxGeesVFwDk9TaURQWEDs5EExyN3BpeUHZJc34yKWaKm7jarHbGj9zyBhPcuOOBrxA/+2oqLJ2rlJX2WzJrnMN+D9uVMLkVVwJXZTANVteNCklhhw3RuGxMJyBnLuEMiPcWzGbUkMZODtXblloF5ytTFLNSi1YPuZrqIQZGOpAy0FRoZupqiMhJf5ItcXHjXN/WCfb0P03IhNg94/dn9G7G8XOkGh9/ZvJ2fMgCoPow4vewevWmi30BD1FfRShl+gAvUcnaIiY99gbeO+8I/+L/83/7v9Ylvpe2/MIrYT/6zcbuwOX</latexit><latexit sha1_base64="7WwS5p4/hlK3102Z185pb4izZt8=">AAADJnicbVJNbxMxEPUuXyV8pXDgwMUiokrVarWLkOASqaJAOfRQRNNWipeV13E2Vm3vyp6tElb7f7jyR7ghxI2fgjdZEEkYydLovZlnzzynhRQWwvCn51+7fuPmra3bnTt3791/0N1+eGbz0jA+ZLnMzUVKLZdC8yEIkPyiMJyqVPLz9PKw4c+vuLEi16cwL3isaKbFRDAKDkq6X0nKM6Eragyd15WUdYeoNJ9VSmihxGde4x1MFIVpmlZv64TkimeUSD6BEbGlSioYRPWnU3yYQH+WwH6ZwC4xIptCTEirZQMIGp2Zq96LajzA5IqaYir+duA9vFROwHXtYIcNSCH6BGhDdwjX4/aNSbcXBuEi8GYStUkPtXGSbHsxGeesVFwDk9TaURQWEDs5EExyN3BpeUHZJc34yKWaKm7jarHbGj9zyBhPcuOOBrxA/+2oqLJ2rlJX2WzJrnMN+D9uVMLkVVwJXZTANVteNCklhhw3RuGxMJyBnLuEMiPcWzGbUkMZODtXblloF5ytTFLNSi1YPuZrqIQZGOpAy0FRoZupqiMhJf5ItcXHjXN/WCfb0P03IhNg94/dn9G7G8XOkGh9/ZvJ2fMgCoPow4vewevWmi30BD1FfRShl+gAvUcnaIiY99gbeO+8I/+L/83/7v9Ylvpe2/MIrYT/6zcbuwOX</latexit><latexit sha1_base64="7WwS5p4/hlK3102Z185pb4izZt8=">AAADJnicbVJNbxMxEPUuXyV8pXDgwMUiokrVarWLkOASqaJAOfRQRNNWipeV13E2Vm3vyp6tElb7f7jyR7ghxI2fgjdZEEkYydLovZlnzzynhRQWwvCn51+7fuPmra3bnTt3791/0N1+eGbz0jA+ZLnMzUVKLZdC8yEIkPyiMJyqVPLz9PKw4c+vuLEi16cwL3isaKbFRDAKDkq6X0nKM6Eragyd15WUdYeoNJ9VSmihxGde4x1MFIVpmlZv64TkimeUSD6BEbGlSioYRPWnU3yYQH+WwH6ZwC4xIptCTEirZQMIGp2Zq96LajzA5IqaYir+duA9vFROwHXtYIcNSCH6BGhDdwjX4/aNSbcXBuEi8GYStUkPtXGSbHsxGeesVFwDk9TaURQWEDs5EExyN3BpeUHZJc34yKWaKm7jarHbGj9zyBhPcuOOBrxA/+2oqLJ2rlJX2WzJrnMN+D9uVMLkVVwJXZTANVteNCklhhw3RuGxMJyBnLuEMiPcWzGbUkMZODtXblloF5ytTFLNSi1YPuZrqIQZGOpAy0FRoZupqiMhJf5ItcXHjXN/WCfb0P03IhNg94/dn9G7G8XOkGh9/ZvJ2fMgCoPow4vewevWmi30BD1FfRShl+gAvUcnaIiY99gbeO+8I/+L/83/7v9Ylvpe2/MIrYT/6zcbuwOX</latexit>

Solve approximate problem:

Page 25: reinforcement learning through the optimization lens

“Simple” Example: LQR

“Obvious strategy”: Estimate (A,B), build control.

[Dean, Mania, Matni, R.,Tu, 2017]

Gaussian noise

Run an experiment for T steps with random input. Then minimize(A,B)

PTi=1 kxi+1 � Axi � Buik2

[Mania, R., Simchowitz, Tu, 2018]

controllability Gramianwhere �c = A�cA� + BB�If T � O

✓�2(d+ p)�min(⇤c)✏2

<latexit sha1_base64="Uo+1SKYh9OB/xN5KEqMafdWhDVA=">AAAC03icbVFdixMxFE3Hr3X92K4++hIsQotSOkXQfVtU0IeCK253F5puSTN3pmGTzGxyR7bEeRFf/Vn+CH+Dr/pupq1gWy8EDufcj9xzp4WSDnu9H43o2vUbN2/t3N69c/fe/b3m/oMTl5dWwFDkKrdnU+5ASQNDlKjgrLDA9VTB6fTida2ffgLrZG6OcV7AWPPMyFQKjoGaNCfHLINLylCqBPz7iilIsc1Sy4VnTmaan/dpO3ladCrPVOib8IlnmuPMaq+lqao2Gyxp0WFQOKlyc96vKLMym2Fn0mz1ur1F0G0Qr0CLrOJost8YsyQXpQaDQnHnRnGvwLHnFqVQUO2y0kHBxQXPYBSg4Rrc2C+cqOiTwCQ0zW14BumC/bfCc+3cXE9DZr2D29Rq8n/aqMT05dhLU5QIRiwHpaWimNPaVppICwLVPAAurAx/pWLGg4sYzF+bsuhdgFjbxF+VRoo8gQ1W4RVaHkgHqLk09Vb+rVSKfuTG0UHt8V81tK3l9huZSXTPBuHCprOVHA4Sb9q/DYb97kE3/vC8dfhqdZkd8og8Jm0SkxfkkLwjR2RIBPlOfpJf5Hd0En2OvkRfl6lRY1XzkKxF9O0PNFDnhA==</latexit><latexit sha1_base64="Uo+1SKYh9OB/xN5KEqMafdWhDVA=">AAAC03icbVFdixMxFE3Hr3X92K4++hIsQotSOkXQfVtU0IeCK253F5puSTN3pmGTzGxyR7bEeRFf/Vn+CH+Dr/pupq1gWy8EDufcj9xzp4WSDnu9H43o2vUbN2/t3N69c/fe/b3m/oMTl5dWwFDkKrdnU+5ASQNDlKjgrLDA9VTB6fTida2ffgLrZG6OcV7AWPPMyFQKjoGaNCfHLINLylCqBPz7iilIsc1Sy4VnTmaan/dpO3ladCrPVOib8IlnmuPMaq+lqao2Gyxp0WFQOKlyc96vKLMym2Fn0mz1ur1F0G0Qr0CLrOJost8YsyQXpQaDQnHnRnGvwLHnFqVQUO2y0kHBxQXPYBSg4Rrc2C+cqOiTwCQ0zW14BumC/bfCc+3cXE9DZr2D29Rq8n/aqMT05dhLU5QIRiwHpaWimNPaVppICwLVPAAurAx/pWLGg4sYzF+bsuhdgFjbxF+VRoo8gQ1W4RVaHkgHqLk09Vb+rVSKfuTG0UHt8V81tK3l9huZSXTPBuHCprOVHA4Sb9q/DYb97kE3/vC8dfhqdZkd8og8Jm0SkxfkkLwjR2RIBPlOfpJf5Hd0En2OvkRfl6lRY1XzkKxF9O0PNFDnhA==</latexit><latexit sha1_base64="Uo+1SKYh9OB/xN5KEqMafdWhDVA=">AAAC03icbVFdixMxFE3Hr3X92K4++hIsQotSOkXQfVtU0IeCK253F5puSTN3pmGTzGxyR7bEeRFf/Vn+CH+Dr/pupq1gWy8EDufcj9xzp4WSDnu9H43o2vUbN2/t3N69c/fe/b3m/oMTl5dWwFDkKrdnU+5ASQNDlKjgrLDA9VTB6fTida2ffgLrZG6OcV7AWPPMyFQKjoGaNCfHLINLylCqBPz7iilIsc1Sy4VnTmaan/dpO3ladCrPVOib8IlnmuPMaq+lqao2Gyxp0WFQOKlyc96vKLMym2Fn0mz1ur1F0G0Qr0CLrOJost8YsyQXpQaDQnHnRnGvwLHnFqVQUO2y0kHBxQXPYBSg4Rrc2C+cqOiTwCQ0zW14BumC/bfCc+3cXE9DZr2D29Rq8n/aqMT05dhLU5QIRiwHpaWimNPaVppICwLVPAAurAx/pWLGg4sYzF+bsuhdgFjbxF+VRoo8gQ1W4RVaHkgHqLk09Vb+rVSKfuTG0UHt8V81tK3l9huZSXTPBuHCprOVHA4Sb9q/DYb97kE3/vC8dfhqdZkd8og8Jm0SkxfkkLwjR2RIBPlOfpJf5Hd0En2OvkRfl6lRY1XzkKxF9O0PNFDnhA==</latexit><latexit sha1_base64="Uo+1SKYh9OB/xN5KEqMafdWhDVA=">AAAC03icbVFdixMxFE3Hr3X92K4++hIsQotSOkXQfVtU0IeCK253F5puSTN3pmGTzGxyR7bEeRFf/Vn+CH+Dr/pupq1gWy8EDufcj9xzp4WSDnu9H43o2vUbN2/t3N69c/fe/b3m/oMTl5dWwFDkKrdnU+5ASQNDlKjgrLDA9VTB6fTida2ffgLrZG6OcV7AWPPMyFQKjoGaNCfHLINLylCqBPz7iilIsc1Sy4VnTmaan/dpO3ladCrPVOib8IlnmuPMaq+lqao2Gyxp0WFQOKlyc96vKLMym2Fn0mz1ur1F0G0Qr0CLrOJost8YsyQXpQaDQnHnRnGvwLHnFqVQUO2y0kHBxQXPYBSg4Rrc2C+cqOiTwCQ0zW14BumC/bfCc+3cXE9DZr2D29Rq8n/aqMT05dhLU5QIRiwHpaWimNPaVppICwLVPAAurAx/pWLGg4sYzF+bsuhdgFjbxF+VRoo8gQ1W4RVaHkgHqLk09Vb+rVSKfuTG0UHt8V81tK3l9huZSXTPBuHCprOVHA4Sb9q/DYb97kE3/vC8dfhqdZkd8og8Jm0SkxfkkLwjR2RIBPlOfpJf5Hd0En2OvkRfl6lRY1XzkKxF9O0PNFDnhA==</latexit>

�A � A� � � �B � B� � �andthen w.h.p.

minimize limT!1 Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit>

Page 26: reinforcement learning through the optimization lens

Approximate Dynamic Programming

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

Page 27: reinforcement learning through the optimization lens

Dynamic Programming

“Cost-to-go”

=: V1(x)<latexit sha1_base64="Bgz4TmJmns5hqxzoe92TWkwzkl8=">AAACgHicbVFbSxtBFJ6s9RZv0T72ZWgQEpC4K4IiCKKF9sGHlJooxCWcnZwkQ2Zn15mzkrDkd/ja/qz+m87GFJqkBw58fN+5nyhV0pLv/y55ax/WNza3tss7u3v7B5XDo7ZNMiOwJRKVmKcILCqpsUWSFD6lBiGOFD5Go7tCf3xFY2WiH2iSYhjDQMu+FECOCrevr9rdPJjWxnVe7laqfsOfGV8FwRxU2dya3cNS+NxLRBajJqHA2k7gpxTmYEgKhdPyc2YxBTGCAXYc1BCjDfPZ1FN+7Jge7yfGuSY+Y//NyCG2dhJHLjIGGtplrSD/p3Uy6l+GudRpRqjFe6N+pjglvDgB70mDgtTEARBGulm5GIIBQe5QC11mtVMUC5vk40xLkfRwiVU0JgOOtEgxSF1slX+VSvEfoC2/l4Mh/VVd2UKufZEDSfbk3n1D11eC3UOC5fOvgvZZI/Abwffz6s3t/DVb7BP7zGosYBfshn1jTdZigr2wN/aT/fI8r+adesF7qFea53xkC+Zd/QHZXMPI</latexit><latexit sha1_base64="Bgz4TmJmns5hqxzoe92TWkwzkl8=">AAACgHicbVFbSxtBFJ6s9RZv0T72ZWgQEpC4K4IiCKKF9sGHlJooxCWcnZwkQ2Zn15mzkrDkd/ja/qz+m87GFJqkBw58fN+5nyhV0pLv/y55ax/WNza3tss7u3v7B5XDo7ZNMiOwJRKVmKcILCqpsUWSFD6lBiGOFD5Go7tCf3xFY2WiH2iSYhjDQMu+FECOCrevr9rdPJjWxnVe7laqfsOfGV8FwRxU2dya3cNS+NxLRBajJqHA2k7gpxTmYEgKhdPyc2YxBTGCAXYc1BCjDfPZ1FN+7Jge7yfGuSY+Y//NyCG2dhJHLjIGGtplrSD/p3Uy6l+GudRpRqjFe6N+pjglvDgB70mDgtTEARBGulm5GIIBQe5QC11mtVMUC5vk40xLkfRwiVU0JgOOtEgxSF1slX+VSvEfoC2/l4Mh/VVd2UKufZEDSfbk3n1D11eC3UOC5fOvgvZZI/Abwffz6s3t/DVb7BP7zGosYBfshn1jTdZigr2wN/aT/fI8r+adesF7qFea53xkC+Zd/QHZXMPI</latexit><latexit sha1_base64="Bgz4TmJmns5hqxzoe92TWkwzkl8=">AAACgHicbVFbSxtBFJ6s9RZv0T72ZWgQEpC4K4IiCKKF9sGHlJooxCWcnZwkQ2Zn15mzkrDkd/ja/qz+m87GFJqkBw58fN+5nyhV0pLv/y55ax/WNza3tss7u3v7B5XDo7ZNMiOwJRKVmKcILCqpsUWSFD6lBiGOFD5Go7tCf3xFY2WiH2iSYhjDQMu+FECOCrevr9rdPJjWxnVe7laqfsOfGV8FwRxU2dya3cNS+NxLRBajJqHA2k7gpxTmYEgKhdPyc2YxBTGCAXYc1BCjDfPZ1FN+7Jge7yfGuSY+Y//NyCG2dhJHLjIGGtplrSD/p3Uy6l+GudRpRqjFe6N+pjglvDgB70mDgtTEARBGulm5GIIBQe5QC11mtVMUC5vk40xLkfRwiVU0JgOOtEgxSF1slX+VSvEfoC2/l4Mh/VVd2UKufZEDSfbk3n1D11eC3UOC5fOvgvZZI/Abwffz6s3t/DVb7BP7zGosYBfshn1jTdZigr2wN/aT/fI8r+adesF7qFea53xkC+Zd/QHZXMPI</latexit><latexit sha1_base64="Bgz4TmJmns5hqxzoe92TWkwzkl8=">AAACgHicbVFbSxtBFJ6s9RZv0T72ZWgQEpC4K4IiCKKF9sGHlJooxCWcnZwkQ2Zn15mzkrDkd/ja/qz+m87GFJqkBw58fN+5nyhV0pLv/y55ax/WNza3tss7u3v7B5XDo7ZNMiOwJRKVmKcILCqpsUWSFD6lBiGOFD5Go7tCf3xFY2WiH2iSYhjDQMu+FECOCrevr9rdPJjWxnVe7laqfsOfGV8FwRxU2dya3cNS+NxLRBajJqHA2k7gpxTmYEgKhdPyc2YxBTGCAXYc1BCjDfPZ1FN+7Jge7yfGuSY+Y//NyCG2dhJHLjIGGtplrSD/p3Uy6l+GudRpRqjFe6N+pjglvDgB70mDgtTEARBGulm5GIIBQe5QC11mtVMUC5vk40xLkfRwiVU0JgOOtEgxSF1slX+VSvEfoC2/l4Mh/VVd2UKufZEDSfbk3n1D11eC3UOC5fOvgvZZI/Abwffz6s3t/DVb7BP7zGosYBfshn1jTdZigr2wN/aT/fI8r+adesF7qFea53xkC+Zd/QHZXMPI</latexit>

VT+1(x) = Cf(xT+1)<latexit sha1_base64="vmo6jaPptD64IiVB8NxgXouc0dE=">AAACkHicbVFdSxtBFJ1srdW02qiPvgwNhYSWsFsK9aVoVWgpPlhqohCX5e7kbjI4O7vM3JWEJQ/9NX1tf47/prNJCibxwsDhnPsx9544V9KS7z/UvGcbzzdfbG3XX77a2X3d2Nvv2awwArsiU5m5icGikhq7JEnhTW4Q0ljhdXx3VunX92iszPQVTXIMUxhqmUgB5Kiocbjdi8qrd8G0NW7zz/wsSlrjOdGuR42m3/FnwddBsABNtojLaK8W3g4yUaSoSSiwth/4OYUlGJJC4bR+W1jMQdzBEPsOakjRhuVsiyl/65gBTzLjniY+Yx9XlJBaO0ljl5kCjeyqVpFPaf2CkqOwlDovCLWYD0oKxSnj1Un4QBoUpCYOgDDS/ZWLERgQ5A63NGXWO0extEk5LrQU2QBXWEVjMuBIi5SC1NVW5VepFP8J2vILORzRf9W1reTWuRxKsu8vnDu6vZbsDAlWz78Oeh86gd8JfnxsnpwurNlih+wNa7GAfWIn7Bu7ZF0m2C/2m/1hf71978g79r7MU73aouaALYX3/R9+k8jC</latexit><latexit sha1_base64="vmo6jaPptD64IiVB8NxgXouc0dE=">AAACkHicbVFdSxtBFJ1srdW02qiPvgwNhYSWsFsK9aVoVWgpPlhqohCX5e7kbjI4O7vM3JWEJQ/9NX1tf47/prNJCibxwsDhnPsx9544V9KS7z/UvGcbzzdfbG3XX77a2X3d2Nvv2awwArsiU5m5icGikhq7JEnhTW4Q0ljhdXx3VunX92iszPQVTXIMUxhqmUgB5Kiocbjdi8qrd8G0NW7zz/wsSlrjOdGuR42m3/FnwddBsABNtojLaK8W3g4yUaSoSSiwth/4OYUlGJJC4bR+W1jMQdzBEPsOakjRhuVsiyl/65gBTzLjniY+Yx9XlJBaO0ljl5kCjeyqVpFPaf2CkqOwlDovCLWYD0oKxSnj1Un4QBoUpCYOgDDS/ZWLERgQ5A63NGXWO0extEk5LrQU2QBXWEVjMuBIi5SC1NVW5VepFP8J2vILORzRf9W1reTWuRxKsu8vnDu6vZbsDAlWz78Oeh86gd8JfnxsnpwurNlih+wNa7GAfWIn7Bu7ZF0m2C/2m/1hf71978g79r7MU73aouaALYX3/R9+k8jC</latexit><latexit sha1_base64="vmo6jaPptD64IiVB8NxgXouc0dE=">AAACkHicbVFdSxtBFJ1srdW02qiPvgwNhYSWsFsK9aVoVWgpPlhqohCX5e7kbjI4O7vM3JWEJQ/9NX1tf47/prNJCibxwsDhnPsx9544V9KS7z/UvGcbzzdfbG3XX77a2X3d2Nvv2awwArsiU5m5icGikhq7JEnhTW4Q0ljhdXx3VunX92iszPQVTXIMUxhqmUgB5Kiocbjdi8qrd8G0NW7zz/wsSlrjOdGuR42m3/FnwddBsABNtojLaK8W3g4yUaSoSSiwth/4OYUlGJJC4bR+W1jMQdzBEPsOakjRhuVsiyl/65gBTzLjniY+Yx9XlJBaO0ljl5kCjeyqVpFPaf2CkqOwlDovCLWYD0oKxSnj1Un4QBoUpCYOgDDS/ZWLERgQ5A63NGXWO0extEk5LrQU2QBXWEVjMuBIi5SC1NVW5VepFP8J2vILORzRf9W1reTWuRxKsu8vnDu6vZbsDAlWz78Oeh86gd8JfnxsnpwurNlih+wNa7GAfWIn7Bu7ZF0m2C/2m/1hf71978g79r7MU73aouaALYX3/R9+k8jC</latexit><latexit sha1_base64="vmo6jaPptD64IiVB8NxgXouc0dE=">AAACkHicbVFdSxtBFJ1srdW02qiPvgwNhYSWsFsK9aVoVWgpPlhqohCX5e7kbjI4O7vM3JWEJQ/9NX1tf47/prNJCibxwsDhnPsx9544V9KS7z/UvGcbzzdfbG3XX77a2X3d2Nvv2awwArsiU5m5icGikhq7JEnhTW4Q0ljhdXx3VunX92iszPQVTXIMUxhqmUgB5Kiocbjdi8qrd8G0NW7zz/wsSlrjOdGuR42m3/FnwddBsABNtojLaK8W3g4yUaSoSSiwth/4OYUlGJJC4bR+W1jMQdzBEPsOakjRhuVsiyl/65gBTzLjniY+Yx9XlJBaO0ljl5kCjeyqVpFPaf2CkqOwlDovCLWYD0oKxSnj1Un4QBoUpCYOgDDS/ZWLERgQ5A63NGXWO0extEk5LrQU2QBXWEVjMuBIi5SC1NVW5VepFP8J2vILORzRf9W1reTWuRxKsu8vnDu6vZbsDAlWz78Oeh86gd8JfnxsnpwurNlih+wNa7GAfWIn7Bu7ZF0m2C/2m/1hf71978g79r7MU73aouaALYX3/R9+k8jC</latexit>

Terminal value:

Recursive formula (recurse backwards):

Vk(x) = minu

Ck(x, u) + Ee [Vk+1(fk(x, u, e))]<latexit sha1_base64="Q+v5feR10yz4GGDYxoOb2QDE5V8=">AAACw3icbVHbattAEF2rtzS9Oe1jX5aagkSMkUKheQmEpqF9CCWltROwhVitR/ai1UrsjorNor/qz7Sv7Yd0ZbtQ2x1YOJxzZmZnJq2kMBiGPzrenbv37j84eHj46PGTp8+6R89Hpqw1hyEvZalvU2ZACgVDFCjhttLAilTCTZpftPrNN9BGlOorLiuICzZTIhOcoaOS7qdRYvPGXwT0jE4KoRJbN/Qiyf1Fvw7oseMYztPUXjaJhWYiIcMxbXOOo8bP1r4+BMFEi9kc46TbCwfhKug+iDagRzZxnRx14sm05HUBCrlkxoyjsMLYMo2CS2gOJ7WBivGczWDsoGIFmNiuBm/oa8dMaVZq9xTSFftvhmWFMcsidc52DrOrteT/tHGN2WlshapqBMXXjbJaUixpu0U6FRo4yqUDjGvh/kr5nGnG0e16q8uqdgV8axK7qJXg5RR2WIkL1MyRBrBgQrVT2Q9CSvqFKUOv2h3/VV3ZVvbfi5lA079yB1XBntkdJNpd/z4YnQyicBB9ftM7f7c5zQF5SV4Rn0TkLTknH8k1GRJOvpOf5Bf57V16uac9XFu9zibnBdkKr/kDePDeCg==</latexit><latexit sha1_base64="Q+v5feR10yz4GGDYxoOb2QDE5V8=">AAACw3icbVHbattAEF2rtzS9Oe1jX5aagkSMkUKheQmEpqF9CCWltROwhVitR/ai1UrsjorNor/qz7Sv7Yd0ZbtQ2x1YOJxzZmZnJq2kMBiGPzrenbv37j84eHj46PGTp8+6R89Hpqw1hyEvZalvU2ZACgVDFCjhttLAilTCTZpftPrNN9BGlOorLiuICzZTIhOcoaOS7qdRYvPGXwT0jE4KoRJbN/Qiyf1Fvw7oseMYztPUXjaJhWYiIcMxbXOOo8bP1r4+BMFEi9kc46TbCwfhKug+iDagRzZxnRx14sm05HUBCrlkxoyjsMLYMo2CS2gOJ7WBivGczWDsoGIFmNiuBm/oa8dMaVZq9xTSFftvhmWFMcsidc52DrOrteT/tHGN2WlshapqBMXXjbJaUixpu0U6FRo4yqUDjGvh/kr5nGnG0e16q8uqdgV8axK7qJXg5RR2WIkL1MyRBrBgQrVT2Q9CSvqFKUOv2h3/VV3ZVvbfi5lA079yB1XBntkdJNpd/z4YnQyicBB9ftM7f7c5zQF5SV4Rn0TkLTknH8k1GRJOvpOf5Bf57V16uac9XFu9zibnBdkKr/kDePDeCg==</latexit><latexit sha1_base64="Q+v5feR10yz4GGDYxoOb2QDE5V8=">AAACw3icbVHbattAEF2rtzS9Oe1jX5aagkSMkUKheQmEpqF9CCWltROwhVitR/ai1UrsjorNor/qz7Sv7Yd0ZbtQ2x1YOJxzZmZnJq2kMBiGPzrenbv37j84eHj46PGTp8+6R89Hpqw1hyEvZalvU2ZACgVDFCjhttLAilTCTZpftPrNN9BGlOorLiuICzZTIhOcoaOS7qdRYvPGXwT0jE4KoRJbN/Qiyf1Fvw7oseMYztPUXjaJhWYiIcMxbXOOo8bP1r4+BMFEi9kc46TbCwfhKug+iDagRzZxnRx14sm05HUBCrlkxoyjsMLYMo2CS2gOJ7WBivGczWDsoGIFmNiuBm/oa8dMaVZq9xTSFftvhmWFMcsidc52DrOrteT/tHGN2WlshapqBMXXjbJaUixpu0U6FRo4yqUDjGvh/kr5nGnG0e16q8uqdgV8axK7qJXg5RR2WIkL1MyRBrBgQrVT2Q9CSvqFKUOv2h3/VV3ZVvbfi5lA079yB1XBntkdJNpd/z4YnQyicBB9ftM7f7c5zQF5SV4Rn0TkLTknH8k1GRJOvpOf5Bf57V16uac9XFu9zibnBdkKr/kDePDeCg==</latexit><latexit sha1_base64="Q+v5feR10yz4GGDYxoOb2QDE5V8=">AAACw3icbVHbattAEF2rtzS9Oe1jX5aagkSMkUKheQmEpqF9CCWltROwhVitR/ai1UrsjorNor/qz7Sv7Yd0ZbtQ2x1YOJxzZmZnJq2kMBiGPzrenbv37j84eHj46PGTp8+6R89Hpqw1hyEvZalvU2ZACgVDFCjhttLAilTCTZpftPrNN9BGlOorLiuICzZTIhOcoaOS7qdRYvPGXwT0jE4KoRJbN/Qiyf1Fvw7oseMYztPUXjaJhWYiIcMxbXOOo8bP1r4+BMFEi9kc46TbCwfhKug+iDagRzZxnRx14sm05HUBCrlkxoyjsMLYMo2CS2gOJ7WBivGczWDsoGIFmNiuBm/oa8dMaVZq9xTSFftvhmWFMcsidc52DrOrteT/tHGN2WlshapqBMXXjbJaUixpu0U6FRo4yqUDjGvh/kr5nGnG0e16q8uqdgV8axK7qJXg5RR2WIkL1MyRBrBgQrVT2Q9CSvqFKUOv2h3/VV3ZVvbfi5lA079yB1XBntkdJNpd/z4YnQyicBB9ftM7f7c5zQF5SV4Rn0TkLTknH8k1GRJOvpOf5Bf57V16uac9XFu9zibnBdkKr/kDePDeCg==</latexit>

⇡k(⌧k) = argminu

Ck(xk, u) + Ee [Vk+1(fk(xk, u, e))]<latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit>

Optimal Policy:

“Value function”minimize Ee

hPTt=1 Ct(xt, ut) + Cf(xT+1)

i

s.t. xt+1 = ft(xt, ut, et), x1 = xut = ⇡t(⌧t)

<latexit sha1_base64="rLI7jLp75guz6H4R9BivhoZu6YM=">AAADMHicbVJNj9MwEHXC11K+unDkYqhYtdqqShASXCqtWNBy2ENXtLsr1SVyXKe11nYie4JSovCjuPJH4IS48itwukGwLSNFGr838+yZlziTwkIQfPf8a9dv3Ly1c7t15+69+w/auw9PbZobxicslak5j6nlUmg+AQGSn2eGUxVLfhZfHNb82UdurEj1GFYZnym60CIRjIKDovZXEvOF0CU1hq6qUsqqRVScFqUSWijxiVd4DxNFYRnH5dsq4kTyBKbE5ioqYRhWH8b4MIJuEUE/j6CH990xccdyvB9WPWLEYgkzQhpVO4BBreh4cDwe4uRvc587gf7nIgqHhevYww5zFSQTroYArfVbhOt589qo3QkGwTrwdhI2SQc1MYp2vRmZpyxXXAOT1NppGGQwc3IgmORu9NzyjLILuuBTl2qquJ2V6y1X+JlD5jhJjfs04DX6b0dJlbUrFbvKel92k6vB/3HTHJJXs1LoLAeu2eVFSS4xpLi2DM+F4QzkyiWUGeHeitmSGsrAGXvllrV2xtmVScoi14Klc76BSijAUAdaDooKXU9VHgkp8XuqLT6unfvDOtma7r4RCwG2f+z+Ht3bKnaGhJvr305Onw/CYBCevOgcvG6s2UGP0VPURSF6iQ7QOzRCE8S8J96RN/JO/C/+N/+H//Oy1PeankfoSvi/fgNhzQXM</latexit><latexit sha1_base64="rLI7jLp75guz6H4R9BivhoZu6YM=">AAADMHicbVJNj9MwEHXC11K+unDkYqhYtdqqShASXCqtWNBy2ENXtLsr1SVyXKe11nYie4JSovCjuPJH4IS48itwukGwLSNFGr838+yZlziTwkIQfPf8a9dv3Ly1c7t15+69+w/auw9PbZobxicslak5j6nlUmg+AQGSn2eGUxVLfhZfHNb82UdurEj1GFYZnym60CIRjIKDovZXEvOF0CU1hq6qUsqqRVScFqUSWijxiVd4DxNFYRnH5dsq4kTyBKbE5ioqYRhWH8b4MIJuEUE/j6CH990xccdyvB9WPWLEYgkzQhpVO4BBreh4cDwe4uRvc587gf7nIgqHhevYww5zFSQTroYArfVbhOt589qo3QkGwTrwdhI2SQc1MYp2vRmZpyxXXAOT1NppGGQwc3IgmORu9NzyjLILuuBTl2qquJ2V6y1X+JlD5jhJjfs04DX6b0dJlbUrFbvKel92k6vB/3HTHJJXs1LoLAeu2eVFSS4xpLi2DM+F4QzkyiWUGeHeitmSGsrAGXvllrV2xtmVScoi14Klc76BSijAUAdaDooKXU9VHgkp8XuqLT6unfvDOtma7r4RCwG2f+z+Ht3bKnaGhJvr305Onw/CYBCevOgcvG6s2UGP0VPURSF6iQ7QOzRCE8S8J96RN/JO/C/+N/+H//Oy1PeankfoSvi/fgNhzQXM</latexit><latexit sha1_base64="rLI7jLp75guz6H4R9BivhoZu6YM=">AAADMHicbVJNj9MwEHXC11K+unDkYqhYtdqqShASXCqtWNBy2ENXtLsr1SVyXKe11nYie4JSovCjuPJH4IS48itwukGwLSNFGr838+yZlziTwkIQfPf8a9dv3Ly1c7t15+69+w/auw9PbZobxicslak5j6nlUmg+AQGSn2eGUxVLfhZfHNb82UdurEj1GFYZnym60CIRjIKDovZXEvOF0CU1hq6qUsqqRVScFqUSWijxiVd4DxNFYRnH5dsq4kTyBKbE5ioqYRhWH8b4MIJuEUE/j6CH990xccdyvB9WPWLEYgkzQhpVO4BBreh4cDwe4uRvc587gf7nIgqHhevYww5zFSQTroYArfVbhOt589qo3QkGwTrwdhI2SQc1MYp2vRmZpyxXXAOT1NppGGQwc3IgmORu9NzyjLILuuBTl2qquJ2V6y1X+JlD5jhJjfs04DX6b0dJlbUrFbvKel92k6vB/3HTHJJXs1LoLAeu2eVFSS4xpLi2DM+F4QzkyiWUGeHeitmSGsrAGXvllrV2xtmVScoi14Klc76BSijAUAdaDooKXU9VHgkp8XuqLT6unfvDOtma7r4RCwG2f+z+Ht3bKnaGhJvr305Onw/CYBCevOgcvG6s2UGP0VPURSF6iQ7QOzRCE8S8J96RN/JO/C/+N/+H//Oy1PeankfoSvi/fgNhzQXM</latexit><latexit sha1_base64="rLI7jLp75guz6H4R9BivhoZu6YM=">AAADMHicbVJNj9MwEHXC11K+unDkYqhYtdqqShASXCqtWNBy2ENXtLsr1SVyXKe11nYie4JSovCjuPJH4IS48itwukGwLSNFGr838+yZlziTwkIQfPf8a9dv3Ly1c7t15+69+w/auw9PbZobxicslak5j6nlUmg+AQGSn2eGUxVLfhZfHNb82UdurEj1GFYZnym60CIRjIKDovZXEvOF0CU1hq6qUsqqRVScFqUSWijxiVd4DxNFYRnH5dsq4kTyBKbE5ioqYRhWH8b4MIJuEUE/j6CH990xccdyvB9WPWLEYgkzQhpVO4BBreh4cDwe4uRvc587gf7nIgqHhevYww5zFSQTroYArfVbhOt589qo3QkGwTrwdhI2SQc1MYp2vRmZpyxXXAOT1NppGGQwc3IgmORu9NzyjLILuuBTl2qquJ2V6y1X+JlD5jhJjfs04DX6b0dJlbUrFbvKel92k6vB/3HTHJJXs1LoLAeu2eVFSS4xpLi2DM+F4QzkyiWUGeHeitmSGsrAGXvllrV2xtmVScoi14Klc76BSijAUAdaDooKXU9VHgkp8XuqLT6unfvDOtma7r4RCwG2f+z+Ht3bKnaGhJvr305Onw/CYBCevOgcvG6s2UGP0VPURSF6iQ7QOzRCE8S8J96RN/JO/C/+N/+H//Oy1PeankfoSvi/fgNhzQXM</latexit>

Page 28: reinforcement learning through the optimization lens

VT�1(xT�1) = minuT�1

E⇥x⇤T�1QxT�1 + u⇤T�1RuT�1 + VT(xT)

= minuT�1

xT�1uT�1

�⇤ ⇢Q 00 R

�+⇥A B

⇤⇤PT

⇥A B

⇤�xT�1uT�1

�+ �2 Tr(PT)

<latexit sha1_base64="dy8OI6n3BN2D1DKwLOyPCTkty8U=">AAAEA3iclVPLbtNAFHUSHiW8WtjB5opAlLQlsisk2BSVl2DRRVKStFLGicaTiTPqeGzNjFGikZcs+RJ2iC0fwn/wAYwTB5GEDSPZPj733JfvdZBwprTr/iyVK1euXru+c6N689btO3d39+71VZxKQnsk5rG8CLCinAna00xzepFIiqOA0/Pg8k1uP/9EpWKx6Op5Qv0Ih4JNGMHaUqO90pf+yHSfelljtnw2oX4MKGJiZNIlk9lXrKdBYN5ZyOlED6AQD/c7BYIDSFccnK1w9QCgP+ra2N0mkiycah+hav14M35AQyZMYNNINstW0REqFIiK8R/rcL+6KAKZLb8O1MEFhOytDmdrXra+TfUrK3oNG7GhPepuK7eEsOwGZf9Vel6EYmGEh0eA4oRKrGMpcERNV2YNm7k52q25LXdxYBt4Bag5xWnbAfpoHJM0okITjpUaeG6ifYOlZoTTrIpSRRNMLnFIBxbm2ZRvFouTwRPLjGESS3sJDQv2bw+DI6XmUWCV+QqoTVtO/ss2SPXkhW+YSFJNBVkmmqQcdAz5FsKYSUo0n1uAiWS2ViBTLDHRdlfXsixiJ5SsdWJmqWAkHtMNluuZltiSiuoIM5F3Zd4zzuEjFgpO85mtrDZsbm68ZSHT6vDU/hCiuSW2A/E2P/826B+1PLfldZ7VTl4Wo9lxHjqPnIbjOc+dE+eD03Z6Din9Kj8o18qPK58rXyvfKt+X0nKp8LnvrJ3Kj9+u9U7g</latexit><latexit sha1_base64="dy8OI6n3BN2D1DKwLOyPCTkty8U=">AAAEA3iclVPLbtNAFHUSHiW8WtjB5opAlLQlsisk2BSVl2DRRVKStFLGicaTiTPqeGzNjFGikZcs+RJ2iC0fwn/wAYwTB5GEDSPZPj733JfvdZBwprTr/iyVK1euXru+c6N689btO3d39+71VZxKQnsk5rG8CLCinAna00xzepFIiqOA0/Pg8k1uP/9EpWKx6Op5Qv0Ih4JNGMHaUqO90pf+yHSfelljtnw2oX4MKGJiZNIlk9lXrKdBYN5ZyOlED6AQD/c7BYIDSFccnK1w9QCgP+ra2N0mkiycah+hav14M35AQyZMYNNINstW0REqFIiK8R/rcL+6KAKZLb8O1MEFhOytDmdrXra+TfUrK3oNG7GhPepuK7eEsOwGZf9Vel6EYmGEh0eA4oRKrGMpcERNV2YNm7k52q25LXdxYBt4Bag5xWnbAfpoHJM0okITjpUaeG6ifYOlZoTTrIpSRRNMLnFIBxbm2ZRvFouTwRPLjGESS3sJDQv2bw+DI6XmUWCV+QqoTVtO/ss2SPXkhW+YSFJNBVkmmqQcdAz5FsKYSUo0n1uAiWS2ViBTLDHRdlfXsixiJ5SsdWJmqWAkHtMNluuZltiSiuoIM5F3Zd4zzuEjFgpO85mtrDZsbm68ZSHT6vDU/hCiuSW2A/E2P/826B+1PLfldZ7VTl4Wo9lxHjqPnIbjOc+dE+eD03Z6Din9Kj8o18qPK58rXyvfKt+X0nKp8LnvrJ3Kj9+u9U7g</latexit><latexit sha1_base64="dy8OI6n3BN2D1DKwLOyPCTkty8U=">AAAEA3iclVPLbtNAFHUSHiW8WtjB5opAlLQlsisk2BSVl2DRRVKStFLGicaTiTPqeGzNjFGikZcs+RJ2iC0fwn/wAYwTB5GEDSPZPj733JfvdZBwprTr/iyVK1euXru+c6N689btO3d39+71VZxKQnsk5rG8CLCinAna00xzepFIiqOA0/Pg8k1uP/9EpWKx6Op5Qv0Ih4JNGMHaUqO90pf+yHSfelljtnw2oX4MKGJiZNIlk9lXrKdBYN5ZyOlED6AQD/c7BYIDSFccnK1w9QCgP+ra2N0mkiycah+hav14M35AQyZMYNNINstW0REqFIiK8R/rcL+6KAKZLb8O1MEFhOytDmdrXra+TfUrK3oNG7GhPepuK7eEsOwGZf9Vel6EYmGEh0eA4oRKrGMpcERNV2YNm7k52q25LXdxYBt4Bag5xWnbAfpoHJM0okITjpUaeG6ifYOlZoTTrIpSRRNMLnFIBxbm2ZRvFouTwRPLjGESS3sJDQv2bw+DI6XmUWCV+QqoTVtO/ss2SPXkhW+YSFJNBVkmmqQcdAz5FsKYSUo0n1uAiWS2ViBTLDHRdlfXsixiJ5SsdWJmqWAkHtMNluuZltiSiuoIM5F3Zd4zzuEjFgpO85mtrDZsbm68ZSHT6vDU/hCiuSW2A/E2P/826B+1PLfldZ7VTl4Wo9lxHjqPnIbjOc+dE+eD03Z6Din9Kj8o18qPK58rXyvfKt+X0nKp8LnvrJ3Kj9+u9U7g</latexit><latexit sha1_base64="dy8OI6n3BN2D1DKwLOyPCTkty8U=">AAAEA3iclVPLbtNAFHUSHiW8WtjB5opAlLQlsisk2BSVl2DRRVKStFLGicaTiTPqeGzNjFGikZcs+RJ2iC0fwn/wAYwTB5GEDSPZPj733JfvdZBwprTr/iyVK1euXru+c6N689btO3d39+71VZxKQnsk5rG8CLCinAna00xzepFIiqOA0/Pg8k1uP/9EpWKx6Op5Qv0Ih4JNGMHaUqO90pf+yHSfelljtnw2oX4MKGJiZNIlk9lXrKdBYN5ZyOlED6AQD/c7BYIDSFccnK1w9QCgP+ra2N0mkiycah+hav14M35AQyZMYNNINstW0REqFIiK8R/rcL+6KAKZLb8O1MEFhOytDmdrXra+TfUrK3oNG7GhPepuK7eEsOwGZf9Vel6EYmGEh0eA4oRKrGMpcERNV2YNm7k52q25LXdxYBt4Bag5xWnbAfpoHJM0okITjpUaeG6ifYOlZoTTrIpSRRNMLnFIBxbm2ZRvFouTwRPLjGESS3sJDQv2bw+DI6XmUWCV+QqoTVtO/ss2SPXkhW+YSFJNBVkmmqQcdAz5FsKYSUo0n1uAiWS2ViBTLDHRdlfXsixiJ5SsdWJmqWAkHtMNluuZltiSiuoIM5F3Zd4zzuEjFgpO85mtrDZsbm68ZSHT6vDU/hCiuSW2A/E2P/826B+1PLfldZ7VTl4Wo9lxHjqPnIbjOc+dE+eD03Z6Din9Kj8o18qPK58rXyvfKt+X0nKp8LnvrJ3Kj9+u9U7g</latexit>

“Simplest” Example: LQR

Dynamic Programming:

minimize EhPT�1

t=1 x⇤t Qxt + u⇤t Rut + x⇤T PTxTi

s.t. xt+1 = Axt + But + et<latexit sha1_base64="djbUQD6sqjxGsS29bFvvuZlzHwE=">AAADJHicbVJNb9NAEF2brxK+UhAnLisiUKEQ2QgJJFSplFZw6CGFpK0Up9Z6M0lW3V1bu2PkYPnvcOWPcEMcuPBbWKdGIgkjWfvmzcybnR0nmRQWg+CX51+6fOXqtY3rrRs3b92+0968e2zT3HAY8FSm5jRhFqTQMECBEk4zA0wlEk6S83d1/OQzGCtS3cd5BiPFplpMBGfoqLj9LUpgKnTJjGHzqpSyakUqSYtSCS2U+AIVfUwjxXCWJOVBFUmY4DCyuYpL3Amrs7L/PKxoEePZU3pUn3Sb5gvvY306r4j7zuvF/RpFRkxnOIqipovtYrfuUDi5bSe0Q982IntNOcTYikCPmxvG7U7QDRZG10HYgA5prBdveqNonPJcgUYumbXDMMhw5ORQcAlu3NxCxvg5m8LQQc0U2FG5eNmKPnLMmE5S4z6NdMH+W1EyZe1cJS6zfiO7GqvJ/8WGOU5ej0qhsxxB84tGk1xSTGm9JjoWBjjKuQOMG+HuSvmMGcbRLXOpy0I7A740SVnkWvB0DCusxAINc6QFVEzoeqryvZCSfmLa0sN6O3+jTrYOb+2LqUD77ND9MfrJWrJbSLj6/Ovg+EU3DLrh0cvO7l6zmg3ygDwkWyQkr8gu+UB6ZEC4d9974+17B/5X/7v/w/95kep7Tc09smT+7z8pdP9v</latexit><latexit sha1_base64="djbUQD6sqjxGsS29bFvvuZlzHwE=">AAADJHicbVJNb9NAEF2brxK+UhAnLisiUKEQ2QgJJFSplFZw6CGFpK0Up9Z6M0lW3V1bu2PkYPnvcOWPcEMcuPBbWKdGIgkjWfvmzcybnR0nmRQWg+CX51+6fOXqtY3rrRs3b92+0968e2zT3HAY8FSm5jRhFqTQMECBEk4zA0wlEk6S83d1/OQzGCtS3cd5BiPFplpMBGfoqLj9LUpgKnTJjGHzqpSyakUqSYtSCS2U+AIVfUwjxXCWJOVBFUmY4DCyuYpL3Amrs7L/PKxoEePZU3pUn3Sb5gvvY306r4j7zuvF/RpFRkxnOIqipovtYrfuUDi5bSe0Q982IntNOcTYikCPmxvG7U7QDRZG10HYgA5prBdveqNonPJcgUYumbXDMMhw5ORQcAlu3NxCxvg5m8LQQc0U2FG5eNmKPnLMmE5S4z6NdMH+W1EyZe1cJS6zfiO7GqvJ/8WGOU5ej0qhsxxB84tGk1xSTGm9JjoWBjjKuQOMG+HuSvmMGcbRLXOpy0I7A740SVnkWvB0DCusxAINc6QFVEzoeqryvZCSfmLa0sN6O3+jTrYOb+2LqUD77ND9MfrJWrJbSLj6/Ovg+EU3DLrh0cvO7l6zmg3ygDwkWyQkr8gu+UB6ZEC4d9974+17B/5X/7v/w/95kep7Tc09smT+7z8pdP9v</latexit><latexit sha1_base64="djbUQD6sqjxGsS29bFvvuZlzHwE=">AAADJHicbVJNb9NAEF2brxK+UhAnLisiUKEQ2QgJJFSplFZw6CGFpK0Up9Z6M0lW3V1bu2PkYPnvcOWPcEMcuPBbWKdGIgkjWfvmzcybnR0nmRQWg+CX51+6fOXqtY3rrRs3b92+0968e2zT3HAY8FSm5jRhFqTQMECBEk4zA0wlEk6S83d1/OQzGCtS3cd5BiPFplpMBGfoqLj9LUpgKnTJjGHzqpSyakUqSYtSCS2U+AIVfUwjxXCWJOVBFUmY4DCyuYpL3Amrs7L/PKxoEePZU3pUn3Sb5gvvY306r4j7zuvF/RpFRkxnOIqipovtYrfuUDi5bSe0Q982IntNOcTYikCPmxvG7U7QDRZG10HYgA5prBdveqNonPJcgUYumbXDMMhw5ORQcAlu3NxCxvg5m8LQQc0U2FG5eNmKPnLMmE5S4z6NdMH+W1EyZe1cJS6zfiO7GqvJ/8WGOU5ej0qhsxxB84tGk1xSTGm9JjoWBjjKuQOMG+HuSvmMGcbRLXOpy0I7A740SVnkWvB0DCusxAINc6QFVEzoeqryvZCSfmLa0sN6O3+jTrYOb+2LqUD77ND9MfrJWrJbSLj6/Ovg+EU3DLrh0cvO7l6zmg3ygDwkWyQkr8gu+UB6ZEC4d9974+17B/5X/7v/w/95kep7Tc09smT+7z8pdP9v</latexit><latexit sha1_base64="djbUQD6sqjxGsS29bFvvuZlzHwE=">AAADJHicbVJNb9NAEF2brxK+UhAnLisiUKEQ2QgJJFSplFZw6CGFpK0Up9Z6M0lW3V1bu2PkYPnvcOWPcEMcuPBbWKdGIgkjWfvmzcybnR0nmRQWg+CX51+6fOXqtY3rrRs3b92+0968e2zT3HAY8FSm5jRhFqTQMECBEk4zA0wlEk6S83d1/OQzGCtS3cd5BiPFplpMBGfoqLj9LUpgKnTJjGHzqpSyakUqSYtSCS2U+AIVfUwjxXCWJOVBFUmY4DCyuYpL3Amrs7L/PKxoEePZU3pUn3Sb5gvvY306r4j7zuvF/RpFRkxnOIqipovtYrfuUDi5bSe0Q982IntNOcTYikCPmxvG7U7QDRZG10HYgA5prBdveqNonPJcgUYumbXDMMhw5ORQcAlu3NxCxvg5m8LQQc0U2FG5eNmKPnLMmE5S4z6NdMH+W1EyZe1cJS6zfiO7GqvJ/8WGOU5ej0qhsxxB84tGk1xSTGm9JjoWBjjKuQOMG+HuSvmMGcbRLXOpy0I7A740SVnkWvB0DCusxAINc6QFVEzoeqryvZCSfmLa0sN6O3+jTrYOb+2LqUD77ND9MfrJWrJbSLj6/Ovg+EU3DLrh0cvO7l6zmg3ygDwkWyQkr8gu+UB6ZEC4d9974+17B/5X/7v/w/95kep7Tc09smT+7z8pdP9v</latexit>

VT(xT) = x⇤T PTxT<latexit sha1_base64="XGEBH7UY6gT6Q+HYa5tAe+6Qm3I=">AAACjXicbVFdaxNBFJ2srdba2kRfCr4MBiEpEnaLpX2oUqrQPvQhYpIG0nW5O7lJhszOLjN3JWGpv8ZX/T/+G2fTCCbxwjCHc+73jTMlLfn+74r3aGv78ZOdp7vP9vafH1RrL3o2zY3ArkhVavoxWFRSY5ckKexnBiGJFd7G04+lfvsNjZWp7tA8wzCBsZYjKYAcFVUPe1GnMYs6Tf6eu+/rEW9HnRJF1brf8hfGN0GwBHW2tHZUq4R3w1TkCWoSCqwdBH5GYQGGpFB4v3uXW8xATGGMAwc1JGjDYjHCPX/jmCEfpcY9TXzB/htRQGLtPImdZwI0setaSf5PG+Q0OgsLqbOcUIuHQqNccUp5uQ8+lAYFqbkDIIx0vXIxAQOC3NZWqixyZyhWJilmuZYiHeIaq2hGBhxpkRKQupyquJJK8S+gLb+R4wn9VV3aUm58kmNJ9u2NO41ubji7gwTr698EveNW4LeCz+/qF5fL0+ywV+w1a7CAnbILds3arMsE+85+sJ/sl3fgnXjn3ocHV6+yjHnJVsy7+gMfGcg5</latexit><latexit sha1_base64="XGEBH7UY6gT6Q+HYa5tAe+6Qm3I=">AAACjXicbVFdaxNBFJ2srdba2kRfCr4MBiEpEnaLpX2oUqrQPvQhYpIG0nW5O7lJhszOLjN3JWGpv8ZX/T/+G2fTCCbxwjCHc+73jTMlLfn+74r3aGv78ZOdp7vP9vafH1RrL3o2zY3ArkhVavoxWFRSY5ckKexnBiGJFd7G04+lfvsNjZWp7tA8wzCBsZYjKYAcFVUPe1GnMYs6Tf6eu+/rEW9HnRJF1brf8hfGN0GwBHW2tHZUq4R3w1TkCWoSCqwdBH5GYQGGpFB4v3uXW8xATGGMAwc1JGjDYjHCPX/jmCEfpcY9TXzB/htRQGLtPImdZwI0setaSf5PG+Q0OgsLqbOcUIuHQqNccUp5uQ8+lAYFqbkDIIx0vXIxAQOC3NZWqixyZyhWJilmuZYiHeIaq2hGBhxpkRKQupyquJJK8S+gLb+R4wn9VV3aUm58kmNJ9u2NO41ubji7gwTr698EveNW4LeCz+/qF5fL0+ywV+w1a7CAnbILds3arMsE+85+sJ/sl3fgnXjn3ocHV6+yjHnJVsy7+gMfGcg5</latexit><latexit sha1_base64="XGEBH7UY6gT6Q+HYa5tAe+6Qm3I=">AAACjXicbVFdaxNBFJ2srdba2kRfCr4MBiEpEnaLpX2oUqrQPvQhYpIG0nW5O7lJhszOLjN3JWGpv8ZX/T/+G2fTCCbxwjCHc+73jTMlLfn+74r3aGv78ZOdp7vP9vafH1RrL3o2zY3ArkhVavoxWFRSY5ckKexnBiGJFd7G04+lfvsNjZWp7tA8wzCBsZYjKYAcFVUPe1GnMYs6Tf6eu+/rEW9HnRJF1brf8hfGN0GwBHW2tHZUq4R3w1TkCWoSCqwdBH5GYQGGpFB4v3uXW8xATGGMAwc1JGjDYjHCPX/jmCEfpcY9TXzB/htRQGLtPImdZwI0setaSf5PG+Q0OgsLqbOcUIuHQqNccUp5uQ8+lAYFqbkDIIx0vXIxAQOC3NZWqixyZyhWJilmuZYiHeIaq2hGBhxpkRKQupyquJJK8S+gLb+R4wn9VV3aUm58kmNJ9u2NO41ubji7gwTr698EveNW4LeCz+/qF5fL0+ywV+w1a7CAnbILds3arMsE+85+sJ/sl3fgnXjn3ocHV6+yjHnJVsy7+gMfGcg5</latexit><latexit sha1_base64="XGEBH7UY6gT6Q+HYa5tAe+6Qm3I=">AAACjXicbVFdaxNBFJ2srdba2kRfCr4MBiEpEnaLpX2oUqrQPvQhYpIG0nW5O7lJhszOLjN3JWGpv8ZX/T/+G2fTCCbxwjCHc+73jTMlLfn+74r3aGv78ZOdp7vP9vafH1RrL3o2zY3ArkhVavoxWFRSY5ckKexnBiGJFd7G04+lfvsNjZWp7tA8wzCBsZYjKYAcFVUPe1GnMYs6Tf6eu+/rEW9HnRJF1brf8hfGN0GwBHW2tHZUq4R3w1TkCWoSCqwdBH5GYQGGpFB4v3uXW8xATGGMAwc1JGjDYjHCPX/jmCEfpcY9TXzB/htRQGLtPImdZwI0setaSf5PG+Q0OgsLqbOcUIuHQqNccUp5uQ8+lAYFqbkDIIx0vXIxAQOC3NZWqixyZyhWJilmuZYiHeIaq2hGBhxpkRKQupyquJJK8S+gLb+R4wn9VV3aUm58kmNJ9u2NO41ubji7gwTr698EveNW4LeCz+/qF5fL0+ywV+w1a7CAnbILds3arMsE+85+sJ/sl3fgnXjn3ocHV6+yjHnJVsy7+gMfGcg5</latexit>

PT�1 = Q+ A⇤PTA� A⇤PTB (R+ B⇤PTB)�1 B⇤PTA

uT�1 = �(B⇤PTB+ R)�1B⇤PTAxT =: KT�1xT�1

VT(xT�1) = x⇤T�1PT�1xT�1<latexit sha1_base64="ADB8qmdtU4tFlQNXV6h14VNn31g=">AAACnXicbVFdaxNBFJ2sVWv9SvXRhw4GIRUNuyK0L0LRSvsQSsR8FNJ1uTu5SYbOzi4zdyVhyV/w1/RV/4f/prPJFkzihYHDOfdj7j1xpqQl3/9b8+7t3H/wcPfR3uMnT589r++/6Ns0NwJ7IlWpuYzBopIaeyRJ4WVmEJJY4SC+/lLqg59orEx1l+YZhglMtBxLAeSoqN7sR93mLCq674PFIf/EK/jjLe+s0B0T1Rt+y18G3wZBBRqsik60XwuvRqnIE9QkFFg7DPyMwgIMSaFwsXeVW8xAXMMEhw5qSNCGxXKlBX/jmBEfp8Y9TXzJ/ltRQGLtPIldZgI0tZtaSf5PG+Y0Pg4LqbOcUIvVoHGuOKW8vA8fSYOC1NwBEEa6v3IxBQOC3BXXpix7ZyjWNilmuZYiHeEGq2hGBhxpkRKQutyqOJNK8e+gLW/LyZTuVNe2lJunciLJvms7q/ThVrIzJNg8/zbof2gFfiv49rFx8rmyZpe9Yq9ZkwXsiJ2wc9ZhPSbYL3bDfrM/3oH31Wt7F6tUr1bVvGRr4Q1uASJBzjE=</latexit><latexit sha1_base64="ADB8qmdtU4tFlQNXV6h14VNn31g=">AAACnXicbVFdaxNBFJ2sVWv9SvXRhw4GIRUNuyK0L0LRSvsQSsR8FNJ1uTu5SYbOzi4zdyVhyV/w1/RV/4f/prPJFkzihYHDOfdj7j1xpqQl3/9b8+7t3H/wcPfR3uMnT589r++/6Ns0NwJ7IlWpuYzBopIaeyRJ4WVmEJJY4SC+/lLqg59orEx1l+YZhglMtBxLAeSoqN7sR93mLCq674PFIf/EK/jjLe+s0B0T1Rt+y18G3wZBBRqsik60XwuvRqnIE9QkFFg7DPyMwgIMSaFwsXeVW8xAXMMEhw5qSNCGxXKlBX/jmBEfp8Y9TXzJ/ltRQGLtPIldZgI0tZtaSf5PG+Y0Pg4LqbOcUIvVoHGuOKW8vA8fSYOC1NwBEEa6v3IxBQOC3BXXpix7ZyjWNilmuZYiHeEGq2hGBhxpkRKQutyqOJNK8e+gLW/LyZTuVNe2lJunciLJvms7q/ThVrIzJNg8/zbof2gFfiv49rFx8rmyZpe9Yq9ZkwXsiJ2wc9ZhPSbYL3bDfrM/3oH31Wt7F6tUr1bVvGRr4Q1uASJBzjE=</latexit><latexit sha1_base64="ADB8qmdtU4tFlQNXV6h14VNn31g=">AAACnXicbVFdaxNBFJ2sVWv9SvXRhw4GIRUNuyK0L0LRSvsQSsR8FNJ1uTu5SYbOzi4zdyVhyV/w1/RV/4f/prPJFkzihYHDOfdj7j1xpqQl3/9b8+7t3H/wcPfR3uMnT589r++/6Ns0NwJ7IlWpuYzBopIaeyRJ4WVmEJJY4SC+/lLqg59orEx1l+YZhglMtBxLAeSoqN7sR93mLCq674PFIf/EK/jjLe+s0B0T1Rt+y18G3wZBBRqsik60XwuvRqnIE9QkFFg7DPyMwgIMSaFwsXeVW8xAXMMEhw5qSNCGxXKlBX/jmBEfp8Y9TXzJ/ltRQGLtPIldZgI0tZtaSf5PG+Y0Pg4LqbOcUIvVoHGuOKW8vA8fSYOC1NwBEEa6v3IxBQOC3BXXpix7ZyjWNilmuZYiHeEGq2hGBhxpkRKQutyqOJNK8e+gLW/LyZTuVNe2lJunciLJvms7q/ThVrIzJNg8/zbof2gFfiv49rFx8rmyZpe9Yq9ZkwXsiJ2wc9ZhPSbYL3bDfrM/3oH31Wt7F6tUr1bVvGRr4Q1uASJBzjE=</latexit><latexit sha1_base64="ADB8qmdtU4tFlQNXV6h14VNn31g=">AAACnXicbVFdaxNBFJ2sVWv9SvXRhw4GIRUNuyK0L0LRSvsQSsR8FNJ1uTu5SYbOzi4zdyVhyV/w1/RV/4f/prPJFkzihYHDOfdj7j1xpqQl3/9b8+7t3H/wcPfR3uMnT589r++/6Ns0NwJ7IlWpuYzBopIaeyRJ4WVmEJJY4SC+/lLqg59orEx1l+YZhglMtBxLAeSoqN7sR93mLCq674PFIf/EK/jjLe+s0B0T1Rt+y18G3wZBBRqsik60XwuvRqnIE9QkFFg7DPyMwgIMSaFwsXeVW8xAXMMEhw5qSNCGxXKlBX/jmBEfp8Y9TXzJ/ltRQGLtPIldZgI0tZtaSf5PG+Y0Pg4LqbOcUIvVoHGuOKW8vA8fSYOC1NwBEEa6v3IxBQOC3BXXpix7ZyjWNilmuZYiHeEGq2hGBhxpkRKQutyqOJNK8e+gLW/LyZTuVNe2lJunciLJvms7q/ThVrIzJNg8/zbof2gFfiv49rFx8rmyZpe9Yq9ZkwXsiJ2wc9ZhPSbYL3bDfrM/3oH31Wt7F6tUr1bVvGRr4Q1uASJBzjE=</latexit>

Page 29: reinforcement learning through the optimization lens

“Simplest” Example: LQR

When (A,B) known, optimal to build control ut = Kxt

P = Q+ A⇤PA� A⇤PB (R+ B⇤PB)�1 B⇤PA

Discrete Algebraic Riccati Equation

•Dynamic programming has simple form because quadratics are miraculous.•Solution is independent of noise variance.•For finite time horizons, we could solve this with a variety of batch solvers.•Note that the solution is only time invariant on the infinite time horizon.

ut = �(B⇤PB+ R)�1B⇤PAxt =: Kxt

minimize limT!1 Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit>

Page 30: reinforcement learning through the optimization lens

Approximate Dynamic Programming

Recursive formula:

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

⇡k(⌧k) = argminu

Ck(xk, u) + Ee [Vk+1(fk(xk, u, e))]<latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit>

Vk(x) = minu

Ck(x, u) + Ee [Vk+1(fk(x, u, e))]<latexit sha1_base64="Q+v5feR10yz4GGDYxoOb2QDE5V8=">AAACw3icbVHbattAEF2rtzS9Oe1jX5aagkSMkUKheQmEpqF9CCWltROwhVitR/ai1UrsjorNor/qz7Sv7Yd0ZbtQ2x1YOJxzZmZnJq2kMBiGPzrenbv37j84eHj46PGTp8+6R89Hpqw1hyEvZalvU2ZACgVDFCjhttLAilTCTZpftPrNN9BGlOorLiuICzZTIhOcoaOS7qdRYvPGXwT0jE4KoRJbN/Qiyf1Fvw7oseMYztPUXjaJhWYiIcMxbXOOo8bP1r4+BMFEi9kc46TbCwfhKug+iDagRzZxnRx14sm05HUBCrlkxoyjsMLYMo2CS2gOJ7WBivGczWDsoGIFmNiuBm/oa8dMaVZq9xTSFftvhmWFMcsidc52DrOrteT/tHGN2WlshapqBMXXjbJaUixpu0U6FRo4yqUDjGvh/kr5nGnG0e16q8uqdgV8axK7qJXg5RR2WIkL1MyRBrBgQrVT2Q9CSvqFKUOv2h3/VV3ZVvbfi5lA079yB1XBntkdJNpd/z4YnQyicBB9ftM7f7c5zQF5SV4Rn0TkLTknH8k1GRJOvpOf5Bf57V16uac9XFu9zibnBdkKr/kDePDeCg==</latexit><latexit sha1_base64="Q+v5feR10yz4GGDYxoOb2QDE5V8=">AAACw3icbVHbattAEF2rtzS9Oe1jX5aagkSMkUKheQmEpqF9CCWltROwhVitR/ai1UrsjorNor/qz7Sv7Yd0ZbtQ2x1YOJxzZmZnJq2kMBiGPzrenbv37j84eHj46PGTp8+6R89Hpqw1hyEvZalvU2ZACgVDFCjhttLAilTCTZpftPrNN9BGlOorLiuICzZTIhOcoaOS7qdRYvPGXwT0jE4KoRJbN/Qiyf1Fvw7oseMYztPUXjaJhWYiIcMxbXOOo8bP1r4+BMFEi9kc46TbCwfhKug+iDagRzZxnRx14sm05HUBCrlkxoyjsMLYMo2CS2gOJ7WBivGczWDsoGIFmNiuBm/oa8dMaVZq9xTSFftvhmWFMcsidc52DrOrteT/tHGN2WlshapqBMXXjbJaUixpu0U6FRo4yqUDjGvh/kr5nGnG0e16q8uqdgV8axK7qJXg5RR2WIkL1MyRBrBgQrVT2Q9CSvqFKUOv2h3/VV3ZVvbfi5lA079yB1XBntkdJNpd/z4YnQyicBB9ftM7f7c5zQF5SV4Rn0TkLTknH8k1GRJOvpOf5Bf57V16uac9XFu9zibnBdkKr/kDePDeCg==</latexit><latexit sha1_base64="Q+v5feR10yz4GGDYxoOb2QDE5V8=">AAACw3icbVHbattAEF2rtzS9Oe1jX5aagkSMkUKheQmEpqF9CCWltROwhVitR/ai1UrsjorNor/qz7Sv7Yd0ZbtQ2x1YOJxzZmZnJq2kMBiGPzrenbv37j84eHj46PGTp8+6R89Hpqw1hyEvZalvU2ZACgVDFCjhttLAilTCTZpftPrNN9BGlOorLiuICzZTIhOcoaOS7qdRYvPGXwT0jE4KoRJbN/Qiyf1Fvw7oseMYztPUXjaJhWYiIcMxbXOOo8bP1r4+BMFEi9kc46TbCwfhKug+iDagRzZxnRx14sm05HUBCrlkxoyjsMLYMo2CS2gOJ7WBivGczWDsoGIFmNiuBm/oa8dMaVZq9xTSFftvhmWFMcsidc52DrOrteT/tHGN2WlshapqBMXXjbJaUixpu0U6FRo4yqUDjGvh/kr5nGnG0e16q8uqdgV8axK7qJXg5RR2WIkL1MyRBrBgQrVT2Q9CSvqFKUOv2h3/VV3ZVvbfi5lA079yB1XBntkdJNpd/z4YnQyicBB9ftM7f7c5zQF5SV4Rn0TkLTknH8k1GRJOvpOf5Bf57V16uac9XFu9zibnBdkKr/kDePDeCg==</latexit><latexit sha1_base64="Q+v5feR10yz4GGDYxoOb2QDE5V8=">AAACw3icbVHbattAEF2rtzS9Oe1jX5aagkSMkUKheQmEpqF9CCWltROwhVitR/ai1UrsjorNor/qz7Sv7Yd0ZbtQ2x1YOJxzZmZnJq2kMBiGPzrenbv37j84eHj46PGTp8+6R89Hpqw1hyEvZalvU2ZACgVDFCjhttLAilTCTZpftPrNN9BGlOorLiuICzZTIhOcoaOS7qdRYvPGXwT0jE4KoRJbN/Qiyf1Fvw7oseMYztPUXjaJhWYiIcMxbXOOo8bP1r4+BMFEi9kc46TbCwfhKug+iDagRzZxnRx14sm05HUBCrlkxoyjsMLYMo2CS2gOJ7WBivGczWDsoGIFmNiuBm/oa8dMaVZq9xTSFftvhmWFMcsidc52DrOrteT/tHGN2WlshapqBMXXjbJaUixpu0U6FRo4yqUDjGvh/kr5nGnG0e16q8uqdgV8axK7qJXg5RR2WIkL1MyRBrBgQrVT2Q9CSvqFKUOv2h3/VV3ZVvbfi5lA079yB1XBntkdJNpd/z4YnQyicBB9ftM7f7c5zQF5SV4Rn0TkLTknH8k1GRJOvpOf5Bf57V16uac9XFu9zibnBdkKr/kDePDeCg==</latexit>

Optimal Policy:

Page 31: reinforcement learning through the optimization lens

Approximate Dynamic Programming

Bellman Equation:

Optimal Policy:

V�(x) = minu

C(x, u) + �Ee [V�(f(x, u, e))]<latexit sha1_base64="W0Y04Clrch/hfcNYeY3RxB6ejPw=">AAACynicbVFNixNBEO2MX+v6ldWjl8YgJBjCjAh6ERZX0UMOK5rsQmYYejo1k2a7e4buGklo5ua/8pd49Kp/wp4kikksaHi896qqqyqrpLAYht87wbXrN27eOrp9fOfuvfsPuicPp7asDYcJL2VpLjNmQQoNExQo4bIywFQm4SK7Omv1iy9grCj1Z1xVkChWaJELztBTaXcyTeOCKcX6ywF9TWMldOrqhp71l8N6QJ/RjeoFhossc++a1EETS8hxRv/m5q17CINBbESxwCTt9sJRuA56CKIt6JFtnKcnnSSel7xWoJFLZu0sCitMHDMouITmOK4tVIxfsQJmHmqmwCZuPX9Dn3pmTvPS+KeRrtl/MxxT1q5U5p3tGHZfa8n/abMa81eJE7qqETTfNMprSbGk7TLpXBjgKFceMG6E/yvlC2YYR7/ynS7r2hXwnUncstaCl3PYYyUu0TBPWkDFhG6ncu+FlPQT05aO2x3/UX3ZVu6/FYVAOxz7u+rBgdkfJNpf/yGYPh9F4Sj6+KJ3+mZ7miPymDwhfRKRl+SUfCDnZEI4+UZ+kJ/kVzAOTLAK3MYadLY5j8hOBF9/A3dT4Nc=</latexit><latexit sha1_base64="W0Y04Clrch/hfcNYeY3RxB6ejPw=">AAACynicbVFNixNBEO2MX+v6ldWjl8YgJBjCjAh6ERZX0UMOK5rsQmYYejo1k2a7e4buGklo5ua/8pd49Kp/wp4kikksaHi896qqqyqrpLAYht87wbXrN27eOrp9fOfuvfsPuicPp7asDYcJL2VpLjNmQQoNExQo4bIywFQm4SK7Omv1iy9grCj1Z1xVkChWaJELztBTaXcyTeOCKcX6ywF9TWMldOrqhp71l8N6QJ/RjeoFhossc++a1EETS8hxRv/m5q17CINBbESxwCTt9sJRuA56CKIt6JFtnKcnnSSel7xWoJFLZu0sCitMHDMouITmOK4tVIxfsQJmHmqmwCZuPX9Dn3pmTvPS+KeRrtl/MxxT1q5U5p3tGHZfa8n/abMa81eJE7qqETTfNMprSbGk7TLpXBjgKFceMG6E/yvlC2YYR7/ynS7r2hXwnUncstaCl3PYYyUu0TBPWkDFhG6ncu+FlPQT05aO2x3/UX3ZVu6/FYVAOxz7u+rBgdkfJNpf/yGYPh9F4Sj6+KJ3+mZ7miPymDwhfRKRl+SUfCDnZEI4+UZ+kJ/kVzAOTLAK3MYadLY5j8hOBF9/A3dT4Nc=</latexit><latexit sha1_base64="W0Y04Clrch/hfcNYeY3RxB6ejPw=">AAACynicbVFNixNBEO2MX+v6ldWjl8YgJBjCjAh6ERZX0UMOK5rsQmYYejo1k2a7e4buGklo5ua/8pd49Kp/wp4kikksaHi896qqqyqrpLAYht87wbXrN27eOrp9fOfuvfsPuicPp7asDYcJL2VpLjNmQQoNExQo4bIywFQm4SK7Omv1iy9grCj1Z1xVkChWaJELztBTaXcyTeOCKcX6ywF9TWMldOrqhp71l8N6QJ/RjeoFhossc++a1EETS8hxRv/m5q17CINBbESxwCTt9sJRuA56CKIt6JFtnKcnnSSel7xWoJFLZu0sCitMHDMouITmOK4tVIxfsQJmHmqmwCZuPX9Dn3pmTvPS+KeRrtl/MxxT1q5U5p3tGHZfa8n/abMa81eJE7qqETTfNMprSbGk7TLpXBjgKFceMG6E/yvlC2YYR7/ynS7r2hXwnUncstaCl3PYYyUu0TBPWkDFhG6ncu+FlPQT05aO2x3/UX3ZVu6/FYVAOxz7u+rBgdkfJNpf/yGYPh9F4Sj6+KJ3+mZ7miPymDwhfRKRl+SUfCDnZEI4+UZ+kJ/kVzAOTLAK3MYadLY5j8hOBF9/A3dT4Nc=</latexit><latexit sha1_base64="W0Y04Clrch/hfcNYeY3RxB6ejPw=">AAACynicbVFNixNBEO2MX+v6ldWjl8YgJBjCjAh6ERZX0UMOK5rsQmYYejo1k2a7e4buGklo5ua/8pd49Kp/wp4kikksaHi896qqqyqrpLAYht87wbXrN27eOrp9fOfuvfsPuicPp7asDYcJL2VpLjNmQQoNExQo4bIywFQm4SK7Omv1iy9grCj1Z1xVkChWaJELztBTaXcyTeOCKcX6ywF9TWMldOrqhp71l8N6QJ/RjeoFhossc++a1EETS8hxRv/m5q17CINBbESxwCTt9sJRuA56CKIt6JFtnKcnnSSel7xWoJFLZu0sCitMHDMouITmOK4tVIxfsQJmHmqmwCZuPX9Dn3pmTvPS+KeRrtl/MxxT1q5U5p3tGHZfa8n/abMa81eJE7qqETTfNMprSbGk7TLpXBjgKFceMG6E/yvlC2YYR7/ynS7r2hXwnUncstaCl3PYYyUu0TBPWkDFhG6ncu+FlPQT05aO2x3/UX3ZVu6/FYVAOxz7u+rBgdkfJNpf/yGYPh9F4Sj6+KJ3+mZ7miPymDwhfRKRl+SUfCDnZEI4+UZ+kJ/kVzAOTLAK3MYadLY5j8hOBF9/A3dT4Nc=</latexit>

⇡(x) = argminu

C(x, u) + �Ee [V�(f(x, u, e))]<latexit sha1_base64="GPSkemdf6G9LD6tV+mG9j/eDltg=">AAACyHicbVFdi9NAFJ3Gr3X96uqjL4NFSLCURAR9ERZXUWQfVtZ2F5oQJtObdNiZSZi50ZaQF/+VP8UnX/VfOGkr2NYLA4dzzr137r1ZJYXFMPzR865dv3Hz1sHtwzt3791/0D96OLFlbTiMeSlLc5kxC1JoGKNACZeVAaYyCRfZ1UmnX3wBY0WpP+OygkSxQotccIaOSvvncSX8RUBf05iZIlZCp03d0hN/MawD+ozGBVOKxYrhPMuad23aQBtLyHFKJ+la9PPOPIQgiI0o5pik/UE4CldB90G0AQOyibP0qJfEs5LXCjRyyaydRmGFScMMCi6hPYxrCxXjV6yAqYOaKbBJs5q+pU8dM6N5adzTSFfsvxkNU9YuVeac3Rh2V+vI/2nTGvNXSSN0VSNovm6U15JiSbtV0pkwwFEuHWDcCPdXyufMMI5u4VtdVrUr4FuTNItaC17OYIeVuEDDHGkBFRO6m6p5L6Sk50xbetrt+K/qynay/1YUAu3w1F1VB3tmd5Bod/37YPJ8FIWj6NOLwfGbzWkOyGPyhPgkIi/JMflAzsiYcPKd/CS/yG/vo1d5X73l2ur1NjmPyFZ43/4AI/XgWg==</latexit><latexit sha1_base64="GPSkemdf6G9LD6tV+mG9j/eDltg=">AAACyHicbVFdi9NAFJ3Gr3X96uqjL4NFSLCURAR9ERZXUWQfVtZ2F5oQJtObdNiZSZi50ZaQF/+VP8UnX/VfOGkr2NYLA4dzzr137r1ZJYXFMPzR865dv3Hz1sHtwzt3791/0D96OLFlbTiMeSlLc5kxC1JoGKNACZeVAaYyCRfZ1UmnX3wBY0WpP+OygkSxQotccIaOSvvncSX8RUBf05iZIlZCp03d0hN/MawD+ozGBVOKxYrhPMuad23aQBtLyHFKJ+la9PPOPIQgiI0o5pik/UE4CldB90G0AQOyibP0qJfEs5LXCjRyyaydRmGFScMMCi6hPYxrCxXjV6yAqYOaKbBJs5q+pU8dM6N5adzTSFfsvxkNU9YuVeac3Rh2V+vI/2nTGvNXSSN0VSNovm6U15JiSbtV0pkwwFEuHWDcCPdXyufMMI5u4VtdVrUr4FuTNItaC17OYIeVuEDDHGkBFRO6m6p5L6Sk50xbetrt+K/qynay/1YUAu3w1F1VB3tmd5Bod/37YPJ8FIWj6NOLwfGbzWkOyGPyhPgkIi/JMflAzsiYcPKd/CS/yG/vo1d5X73l2ur1NjmPyFZ43/4AI/XgWg==</latexit><latexit sha1_base64="GPSkemdf6G9LD6tV+mG9j/eDltg=">AAACyHicbVFdi9NAFJ3Gr3X96uqjL4NFSLCURAR9ERZXUWQfVtZ2F5oQJtObdNiZSZi50ZaQF/+VP8UnX/VfOGkr2NYLA4dzzr137r1ZJYXFMPzR865dv3Hz1sHtwzt3791/0D96OLFlbTiMeSlLc5kxC1JoGKNACZeVAaYyCRfZ1UmnX3wBY0WpP+OygkSxQotccIaOSvvncSX8RUBf05iZIlZCp03d0hN/MawD+ozGBVOKxYrhPMuad23aQBtLyHFKJ+la9PPOPIQgiI0o5pik/UE4CldB90G0AQOyibP0qJfEs5LXCjRyyaydRmGFScMMCi6hPYxrCxXjV6yAqYOaKbBJs5q+pU8dM6N5adzTSFfsvxkNU9YuVeac3Rh2V+vI/2nTGvNXSSN0VSNovm6U15JiSbtV0pkwwFEuHWDcCPdXyufMMI5u4VtdVrUr4FuTNItaC17OYIeVuEDDHGkBFRO6m6p5L6Sk50xbetrt+K/qynay/1YUAu3w1F1VB3tmd5Bod/37YPJ8FIWj6NOLwfGbzWkOyGPyhPgkIi/JMflAzsiYcPKd/CS/yG/vo1d5X73l2ur1NjmPyFZ43/4AI/XgWg==</latexit><latexit sha1_base64="GPSkemdf6G9LD6tV+mG9j/eDltg=">AAACyHicbVFdi9NAFJ3Gr3X96uqjL4NFSLCURAR9ERZXUWQfVtZ2F5oQJtObdNiZSZi50ZaQF/+VP8UnX/VfOGkr2NYLA4dzzr137r1ZJYXFMPzR865dv3Hz1sHtwzt3791/0D96OLFlbTiMeSlLc5kxC1JoGKNACZeVAaYyCRfZ1UmnX3wBY0WpP+OygkSxQotccIaOSvvncSX8RUBf05iZIlZCp03d0hN/MawD+ozGBVOKxYrhPMuad23aQBtLyHFKJ+la9PPOPIQgiI0o5pik/UE4CldB90G0AQOyibP0qJfEs5LXCjRyyaydRmGFScMMCi6hPYxrCxXjV6yAqYOaKbBJs5q+pU8dM6N5adzTSFfsvxkNU9YuVeac3Rh2V+vI/2nTGvNXSSN0VSNovm6U15JiSbtV0pkwwFEuHWDcCPdXyufMMI5u4VtdVrUr4FuTNItaC17OYIeVuEDDHGkBFRO6m6p5L6Sk50xbetrt+K/qynay/1YUAu3w1F1VB3tmd5Bod/37YPJ8FIWj6NOLwfGbzWkOyGPyhPgkIi/JMflAzsiYcPKd/CS/yG/vo1d5X73l2ur1NjmPyFZ43/4AI/XgWg==</latexit>

discount factor

Generate algorithms using the insight:V�(xk) ⇡ C(xk, uk) + �V�(xk+1) + ⌫k

<latexit sha1_base64="/+aTGFozwLRX7gGx96Wxxxb/MzQ=">AAACuHicbVHbahsxEJW3l6TpzWkf+yJqCg4JZrcUGvoU6kL7kIeU1k7AXrazsrwW1g1pttgs/qB+TV+bv6l27UJsd0BwdM6ZkWYmt1J4jOPbVnTv/oOHB4ePjh4/efrsefv4xdCb0jE+YEYad5OD51JoPkCBkt9Yx0Hlkl/n836tX//kzgujv+PS8lRBocVUMMBAZe3+MBsXoBR0F9n8hI7BWmcWtF9fz8qaOqVrA73jrOanyaqRdPBk7U7ci5ug+yDZgA7ZxFV23ErHE8NKxTUyCd6PkthiWoFDwSRfHY1Lzy2wORR8FKAGxX1aNd2u6JvATOjUuHA00oa9m1GB8n6p8uBUgDO/q9Xk/7RRidPztBLalsg1Wz80LSVFQ+vR0YlwnKFcBgDMifBXymbggGEY8NYrTW3L2VYn1aLUgpkJ32ElLtBBID1HBULXXVWfhZT0G2hPL0Uxw39qKFvL3U+iEOjPLsMW9cmeOSwk2R3/Phi+7SVxL/n6rnPxcbOaQ/KKvCZdkpD35IJ8IVdkQBj5RX6TP+Q2+hD9iIpIrK1Ra5PzkmxF5P4CvUjZOw==</latexit><latexit sha1_base64="/+aTGFozwLRX7gGx96Wxxxb/MzQ=">AAACuHicbVHbahsxEJW3l6TpzWkf+yJqCg4JZrcUGvoU6kL7kIeU1k7AXrazsrwW1g1pttgs/qB+TV+bv6l27UJsd0BwdM6ZkWYmt1J4jOPbVnTv/oOHB4ePjh4/efrsefv4xdCb0jE+YEYad5OD51JoPkCBkt9Yx0Hlkl/n836tX//kzgujv+PS8lRBocVUMMBAZe3+MBsXoBR0F9n8hI7BWmcWtF9fz8qaOqVrA73jrOanyaqRdPBk7U7ci5ug+yDZgA7ZxFV23ErHE8NKxTUyCd6PkthiWoFDwSRfHY1Lzy2wORR8FKAGxX1aNd2u6JvATOjUuHA00oa9m1GB8n6p8uBUgDO/q9Xk/7RRidPztBLalsg1Wz80LSVFQ+vR0YlwnKFcBgDMifBXymbggGEY8NYrTW3L2VYn1aLUgpkJ32ElLtBBID1HBULXXVWfhZT0G2hPL0Uxw39qKFvL3U+iEOjPLsMW9cmeOSwk2R3/Phi+7SVxL/n6rnPxcbOaQ/KKvCZdkpD35IJ8IVdkQBj5RX6TP+Q2+hD9iIpIrK1Ra5PzkmxF5P4CvUjZOw==</latexit><latexit sha1_base64="/+aTGFozwLRX7gGx96Wxxxb/MzQ=">AAACuHicbVHbahsxEJW3l6TpzWkf+yJqCg4JZrcUGvoU6kL7kIeU1k7AXrazsrwW1g1pttgs/qB+TV+bv6l27UJsd0BwdM6ZkWYmt1J4jOPbVnTv/oOHB4ePjh4/efrsefv4xdCb0jE+YEYad5OD51JoPkCBkt9Yx0Hlkl/n836tX//kzgujv+PS8lRBocVUMMBAZe3+MBsXoBR0F9n8hI7BWmcWtF9fz8qaOqVrA73jrOanyaqRdPBk7U7ci5ug+yDZgA7ZxFV23ErHE8NKxTUyCd6PkthiWoFDwSRfHY1Lzy2wORR8FKAGxX1aNd2u6JvATOjUuHA00oa9m1GB8n6p8uBUgDO/q9Xk/7RRidPztBLalsg1Wz80LSVFQ+vR0YlwnKFcBgDMifBXymbggGEY8NYrTW3L2VYn1aLUgpkJ32ElLtBBID1HBULXXVWfhZT0G2hPL0Uxw39qKFvL3U+iEOjPLsMW9cmeOSwk2R3/Phi+7SVxL/n6rnPxcbOaQ/KKvCZdkpD35IJ8IVdkQBj5RX6TP+Q2+hD9iIpIrK1Ra5PzkmxF5P4CvUjZOw==</latexit><latexit sha1_base64="/+aTGFozwLRX7gGx96Wxxxb/MzQ=">AAACuHicbVHbahsxEJW3l6TpzWkf+yJqCg4JZrcUGvoU6kL7kIeU1k7AXrazsrwW1g1pttgs/qB+TV+bv6l27UJsd0BwdM6ZkWYmt1J4jOPbVnTv/oOHB4ePjh4/efrsefv4xdCb0jE+YEYad5OD51JoPkCBkt9Yx0Hlkl/n836tX//kzgujv+PS8lRBocVUMMBAZe3+MBsXoBR0F9n8hI7BWmcWtF9fz8qaOqVrA73jrOanyaqRdPBk7U7ci5ug+yDZgA7ZxFV23ErHE8NKxTUyCd6PkthiWoFDwSRfHY1Lzy2wORR8FKAGxX1aNd2u6JvATOjUuHA00oa9m1GB8n6p8uBUgDO/q9Xk/7RRidPztBLalsg1Wz80LSVFQ+vR0YlwnKFcBgDMifBXymbggGEY8NYrTW3L2VYn1aLUgpkJ32ElLtBBID1HBULXXVWfhZT0G2hPL0Uxw39qKFvL3U+iEOjPLsMW9cmeOSwk2R3/Phi+7SVxL/n6rnPxcbOaQ/KKvCZdkpD35IJ8IVdkQBj5RX6TP+Q2+hD9iIpIrK1Ra5PzkmxF5P4CvUjZOw==</latexit>

minimize Ee⇥P1

t=1 �tC(xt, ut)

s.t. xt+1 = f(xt, ut, et)ut = ⇡(⌧t)

<latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit>

Page 32: reinforcement learning through the optimization lens

Approximate Dynamic Programming

Bellman Equation:

Optimal Policy:

V�(x) = minu

C(x, u) + �Ee [V�(f(x, u, e))]<latexit sha1_base64="W0Y04Clrch/hfcNYeY3RxB6ejPw=">AAACynicbVFNixNBEO2MX+v6ldWjl8YgJBjCjAh6ERZX0UMOK5rsQmYYejo1k2a7e4buGklo5ua/8pd49Kp/wp4kikksaHi896qqqyqrpLAYht87wbXrN27eOrp9fOfuvfsPuicPp7asDYcJL2VpLjNmQQoNExQo4bIywFQm4SK7Omv1iy9grCj1Z1xVkChWaJELztBTaXcyTeOCKcX6ywF9TWMldOrqhp71l8N6QJ/RjeoFhossc++a1EETS8hxRv/m5q17CINBbESxwCTt9sJRuA56CKIt6JFtnKcnnSSel7xWoJFLZu0sCitMHDMouITmOK4tVIxfsQJmHmqmwCZuPX9Dn3pmTvPS+KeRrtl/MxxT1q5U5p3tGHZfa8n/abMa81eJE7qqETTfNMprSbGk7TLpXBjgKFceMG6E/yvlC2YYR7/ynS7r2hXwnUncstaCl3PYYyUu0TBPWkDFhG6ncu+FlPQT05aO2x3/UX3ZVu6/FYVAOxz7u+rBgdkfJNpf/yGYPh9F4Sj6+KJ3+mZ7miPymDwhfRKRl+SUfCDnZEI4+UZ+kJ/kVzAOTLAK3MYadLY5j8hOBF9/A3dT4Nc=</latexit><latexit sha1_base64="W0Y04Clrch/hfcNYeY3RxB6ejPw=">AAACynicbVFNixNBEO2MX+v6ldWjl8YgJBjCjAh6ERZX0UMOK5rsQmYYejo1k2a7e4buGklo5ua/8pd49Kp/wp4kikksaHi896qqqyqrpLAYht87wbXrN27eOrp9fOfuvfsPuicPp7asDYcJL2VpLjNmQQoNExQo4bIywFQm4SK7Omv1iy9grCj1Z1xVkChWaJELztBTaXcyTeOCKcX6ywF9TWMldOrqhp71l8N6QJ/RjeoFhossc++a1EETS8hxRv/m5q17CINBbESxwCTt9sJRuA56CKIt6JFtnKcnnSSel7xWoJFLZu0sCitMHDMouITmOK4tVIxfsQJmHmqmwCZuPX9Dn3pmTvPS+KeRrtl/MxxT1q5U5p3tGHZfa8n/abMa81eJE7qqETTfNMprSbGk7TLpXBjgKFceMG6E/yvlC2YYR7/ynS7r2hXwnUncstaCl3PYYyUu0TBPWkDFhG6ncu+FlPQT05aO2x3/UX3ZVu6/FYVAOxz7u+rBgdkfJNpf/yGYPh9F4Sj6+KJ3+mZ7miPymDwhfRKRl+SUfCDnZEI4+UZ+kJ/kVzAOTLAK3MYadLY5j8hOBF9/A3dT4Nc=</latexit><latexit sha1_base64="W0Y04Clrch/hfcNYeY3RxB6ejPw=">AAACynicbVFNixNBEO2MX+v6ldWjl8YgJBjCjAh6ERZX0UMOK5rsQmYYejo1k2a7e4buGklo5ua/8pd49Kp/wp4kikksaHi896qqqyqrpLAYht87wbXrN27eOrp9fOfuvfsPuicPp7asDYcJL2VpLjNmQQoNExQo4bIywFQm4SK7Omv1iy9grCj1Z1xVkChWaJELztBTaXcyTeOCKcX6ywF9TWMldOrqhp71l8N6QJ/RjeoFhossc++a1EETS8hxRv/m5q17CINBbESxwCTt9sJRuA56CKIt6JFtnKcnnSSel7xWoJFLZu0sCitMHDMouITmOK4tVIxfsQJmHmqmwCZuPX9Dn3pmTvPS+KeRrtl/MxxT1q5U5p3tGHZfa8n/abMa81eJE7qqETTfNMprSbGk7TLpXBjgKFceMG6E/yvlC2YYR7/ynS7r2hXwnUncstaCl3PYYyUu0TBPWkDFhG6ncu+FlPQT05aO2x3/UX3ZVu6/FYVAOxz7u+rBgdkfJNpf/yGYPh9F4Sj6+KJ3+mZ7miPymDwhfRKRl+SUfCDnZEI4+UZ+kJ/kVzAOTLAK3MYadLY5j8hOBF9/A3dT4Nc=</latexit><latexit sha1_base64="W0Y04Clrch/hfcNYeY3RxB6ejPw=">AAACynicbVFNixNBEO2MX+v6ldWjl8YgJBjCjAh6ERZX0UMOK5rsQmYYejo1k2a7e4buGklo5ua/8pd49Kp/wp4kikksaHi896qqqyqrpLAYht87wbXrN27eOrp9fOfuvfsPuicPp7asDYcJL2VpLjNmQQoNExQo4bIywFQm4SK7Omv1iy9grCj1Z1xVkChWaJELztBTaXcyTeOCKcX6ywF9TWMldOrqhp71l8N6QJ/RjeoFhossc++a1EETS8hxRv/m5q17CINBbESxwCTt9sJRuA56CKIt6JFtnKcnnSSel7xWoJFLZu0sCitMHDMouITmOK4tVIxfsQJmHmqmwCZuPX9Dn3pmTvPS+KeRrtl/MxxT1q5U5p3tGHZfa8n/abMa81eJE7qqETTfNMprSbGk7TLpXBjgKFceMG6E/yvlC2YYR7/ynS7r2hXwnUncstaCl3PYYyUu0TBPWkDFhG6ncu+FlPQT05aO2x3/UX3ZVu6/FYVAOxz7u+rBgdkfJNpf/yGYPh9F4Sj6+KJ3+mZ7miPymDwhfRKRl+SUfCDnZEI4+UZ+kJ/kVzAOTLAK3MYadLY5j8hOBF9/A3dT4Nc=</latexit>

⇡(x) = argminu

C(x, u) + �Ee [V�(f(x, u, e))]<latexit sha1_base64="GPSkemdf6G9LD6tV+mG9j/eDltg=">AAACyHicbVFdi9NAFJ3Gr3X96uqjL4NFSLCURAR9ERZXUWQfVtZ2F5oQJtObdNiZSZi50ZaQF/+VP8UnX/VfOGkr2NYLA4dzzr137r1ZJYXFMPzR865dv3Hz1sHtwzt3791/0D96OLFlbTiMeSlLc5kxC1JoGKNACZeVAaYyCRfZ1UmnX3wBY0WpP+OygkSxQotccIaOSvvncSX8RUBf05iZIlZCp03d0hN/MawD+ozGBVOKxYrhPMuad23aQBtLyHFKJ+la9PPOPIQgiI0o5pik/UE4CldB90G0AQOyibP0qJfEs5LXCjRyyaydRmGFScMMCi6hPYxrCxXjV6yAqYOaKbBJs5q+pU8dM6N5adzTSFfsvxkNU9YuVeac3Rh2V+vI/2nTGvNXSSN0VSNovm6U15JiSbtV0pkwwFEuHWDcCPdXyufMMI5u4VtdVrUr4FuTNItaC17OYIeVuEDDHGkBFRO6m6p5L6Sk50xbetrt+K/qynay/1YUAu3w1F1VB3tmd5Bod/37YPJ8FIWj6NOLwfGbzWkOyGPyhPgkIi/JMflAzsiYcPKd/CS/yG/vo1d5X73l2ur1NjmPyFZ43/4AI/XgWg==</latexit><latexit sha1_base64="GPSkemdf6G9LD6tV+mG9j/eDltg=">AAACyHicbVFdi9NAFJ3Gr3X96uqjL4NFSLCURAR9ERZXUWQfVtZ2F5oQJtObdNiZSZi50ZaQF/+VP8UnX/VfOGkr2NYLA4dzzr137r1ZJYXFMPzR865dv3Hz1sHtwzt3791/0D96OLFlbTiMeSlLc5kxC1JoGKNACZeVAaYyCRfZ1UmnX3wBY0WpP+OygkSxQotccIaOSvvncSX8RUBf05iZIlZCp03d0hN/MawD+ozGBVOKxYrhPMuad23aQBtLyHFKJ+la9PPOPIQgiI0o5pik/UE4CldB90G0AQOyibP0qJfEs5LXCjRyyaydRmGFScMMCi6hPYxrCxXjV6yAqYOaKbBJs5q+pU8dM6N5adzTSFfsvxkNU9YuVeac3Rh2V+vI/2nTGvNXSSN0VSNovm6U15JiSbtV0pkwwFEuHWDcCPdXyufMMI5u4VtdVrUr4FuTNItaC17OYIeVuEDDHGkBFRO6m6p5L6Sk50xbetrt+K/qynay/1YUAu3w1F1VB3tmd5Bod/37YPJ8FIWj6NOLwfGbzWkOyGPyhPgkIi/JMflAzsiYcPKd/CS/yG/vo1d5X73l2ur1NjmPyFZ43/4AI/XgWg==</latexit><latexit sha1_base64="GPSkemdf6G9LD6tV+mG9j/eDltg=">AAACyHicbVFdi9NAFJ3Gr3X96uqjL4NFSLCURAR9ERZXUWQfVtZ2F5oQJtObdNiZSZi50ZaQF/+VP8UnX/VfOGkr2NYLA4dzzr137r1ZJYXFMPzR865dv3Hz1sHtwzt3791/0D96OLFlbTiMeSlLc5kxC1JoGKNACZeVAaYyCRfZ1UmnX3wBY0WpP+OygkSxQotccIaOSvvncSX8RUBf05iZIlZCp03d0hN/MawD+ozGBVOKxYrhPMuad23aQBtLyHFKJ+la9PPOPIQgiI0o5pik/UE4CldB90G0AQOyibP0qJfEs5LXCjRyyaydRmGFScMMCi6hPYxrCxXjV6yAqYOaKbBJs5q+pU8dM6N5adzTSFfsvxkNU9YuVeac3Rh2V+vI/2nTGvNXSSN0VSNovm6U15JiSbtV0pkwwFEuHWDcCPdXyufMMI5u4VtdVrUr4FuTNItaC17OYIeVuEDDHGkBFRO6m6p5L6Sk50xbetrt+K/qynay/1YUAu3w1F1VB3tmd5Bod/37YPJ8FIWj6NOLwfGbzWkOyGPyhPgkIi/JMflAzsiYcPKd/CS/yG/vo1d5X73l2ur1NjmPyFZ43/4AI/XgWg==</latexit><latexit sha1_base64="GPSkemdf6G9LD6tV+mG9j/eDltg=">AAACyHicbVFdi9NAFJ3Gr3X96uqjL4NFSLCURAR9ERZXUWQfVtZ2F5oQJtObdNiZSZi50ZaQF/+VP8UnX/VfOGkr2NYLA4dzzr137r1ZJYXFMPzR865dv3Hz1sHtwzt3791/0D96OLFlbTiMeSlLc5kxC1JoGKNACZeVAaYyCRfZ1UmnX3wBY0WpP+OygkSxQotccIaOSvvncSX8RUBf05iZIlZCp03d0hN/MawD+ozGBVOKxYrhPMuad23aQBtLyHFKJ+la9PPOPIQgiI0o5pik/UE4CldB90G0AQOyibP0qJfEs5LXCjRyyaydRmGFScMMCi6hPYxrCxXjV6yAqYOaKbBJs5q+pU8dM6N5adzTSFfsvxkNU9YuVeac3Rh2V+vI/2nTGvNXSSN0VSNovm6U15JiSbtV0pkwwFEuHWDcCPdXyufMMI5u4VtdVrUr4FuTNItaC17OYIeVuEDDHGkBFRO6m6p5L6Sk50xbetrt+K/qynay/1YUAu3w1F1VB3tmd5Bod/37YPJ8FIWj6NOLwfGbzWkOyGPyhPgkIi/JMflAzsiYcPKd/CS/yG/vo1d5X73l2ur1NjmPyFZ43/4AI/XgWg==</latexit>

Q(x, u) = C(x, u) + Ee [�V�(f(x, u, e))]<latexit sha1_base64="azSZeZtxX0p7GmuXEgkrcOFcX2k=">AAACvnicbVHbihNBEO2Mt914y+qjL41BSDCEmUXQFyG4igr7sIsmu5AZhp5OzaTZvgzdNZIw5Jv8Gh980V+xJ4lgEgsaDuecquqqykopHIbhz1Zw6/adu/eOjtv3Hzx89Lhz8mTiTGU5jLmRxl5nzIEUGsYoUMJ1aYGpTMJVdnPW6FffwDph9FdclpAoVmiRC87QU2nn8/FlbzGo+vQtPduAlzRWDOdZVn9YpTWsYgk5TmlcMKUYnaQb0Msb9wD6/diKYo5JO+10w2G4DnoIoi3okm1cpCetJJ4ZXinQyCVzbhqFJSY1syi4hFU7rhyUjN+wAqYeaqbAJfV65hV94ZkZzY31TyNds/9m1Ew5t1SZdzbjuH2tIf+nTSvM3yS10GWFoPmmUV5JioY2C6QzYYGjXHrAuBX+r5TPmWUc/Zp3uqxrl8B3JqkXlRbczGCPlbhAyzzpABUTupmq/iikpF+YdvS8WfJf1Zdt5N57UQh0g3N/S90/MPuDRPvrPwST02EUDqPLV93Ru+1pjsgz8pz0SERekxH5RC7ImHDynfwgv8jvYBTkgQrMxhq0tjlPyU4Eiz+P/NrE</latexit><latexit sha1_base64="azSZeZtxX0p7GmuXEgkrcOFcX2k=">AAACvnicbVHbihNBEO2Mt914y+qjL41BSDCEmUXQFyG4igr7sIsmu5AZhp5OzaTZvgzdNZIw5Jv8Gh980V+xJ4lgEgsaDuecquqqykopHIbhz1Zw6/adu/eOjtv3Hzx89Lhz8mTiTGU5jLmRxl5nzIEUGsYoUMJ1aYGpTMJVdnPW6FffwDph9FdclpAoVmiRC87QU2nn8/FlbzGo+vQtPduAlzRWDOdZVn9YpTWsYgk5TmlcMKUYnaQb0Msb9wD6/diKYo5JO+10w2G4DnoIoi3okm1cpCetJJ4ZXinQyCVzbhqFJSY1syi4hFU7rhyUjN+wAqYeaqbAJfV65hV94ZkZzY31TyNds/9m1Ew5t1SZdzbjuH2tIf+nTSvM3yS10GWFoPmmUV5JioY2C6QzYYGjXHrAuBX+r5TPmWUc/Zp3uqxrl8B3JqkXlRbczGCPlbhAyzzpABUTupmq/iikpF+YdvS8WfJf1Zdt5N57UQh0g3N/S90/MPuDRPvrPwST02EUDqPLV93Ru+1pjsgz8pz0SERekxH5RC7ImHDynfwgv8jvYBTkgQrMxhq0tjlPyU4Eiz+P/NrE</latexit><latexit sha1_base64="azSZeZtxX0p7GmuXEgkrcOFcX2k=">AAACvnicbVHbihNBEO2Mt914y+qjL41BSDCEmUXQFyG4igr7sIsmu5AZhp5OzaTZvgzdNZIw5Jv8Gh980V+xJ4lgEgsaDuecquqqykopHIbhz1Zw6/adu/eOjtv3Hzx89Lhz8mTiTGU5jLmRxl5nzIEUGsYoUMJ1aYGpTMJVdnPW6FffwDph9FdclpAoVmiRC87QU2nn8/FlbzGo+vQtPduAlzRWDOdZVn9YpTWsYgk5TmlcMKUYnaQb0Msb9wD6/diKYo5JO+10w2G4DnoIoi3okm1cpCetJJ4ZXinQyCVzbhqFJSY1syi4hFU7rhyUjN+wAqYeaqbAJfV65hV94ZkZzY31TyNds/9m1Ew5t1SZdzbjuH2tIf+nTSvM3yS10GWFoPmmUV5JioY2C6QzYYGjXHrAuBX+r5TPmWUc/Zp3uqxrl8B3JqkXlRbczGCPlbhAyzzpABUTupmq/iikpF+YdvS8WfJf1Zdt5N57UQh0g3N/S90/MPuDRPvrPwST02EUDqPLV93Ru+1pjsgz8pz0SERekxH5RC7ImHDynfwgv8jvYBTkgQrMxhq0tjlPyU4Eiz+P/NrE</latexit><latexit sha1_base64="azSZeZtxX0p7GmuXEgkrcOFcX2k=">AAACvnicbVHbihNBEO2Mt914y+qjL41BSDCEmUXQFyG4igr7sIsmu5AZhp5OzaTZvgzdNZIw5Jv8Gh980V+xJ4lgEgsaDuecquqqykopHIbhz1Zw6/adu/eOjtv3Hzx89Lhz8mTiTGU5jLmRxl5nzIEUGsYoUMJ1aYGpTMJVdnPW6FffwDph9FdclpAoVmiRC87QU2nn8/FlbzGo+vQtPduAlzRWDOdZVn9YpTWsYgk5TmlcMKUYnaQb0Msb9wD6/diKYo5JO+10w2G4DnoIoi3okm1cpCetJJ4ZXinQyCVzbhqFJSY1syi4hFU7rhyUjN+wAqYeaqbAJfV65hV94ZkZzY31TyNds/9m1Ew5t1SZdzbjuH2tIf+nTSvM3yS10GWFoPmmUV5JioY2C6QzYYGjXHrAuBX+r5TPmWUc/Zp3uqxrl8B3JqkXlRbczGCPlbhAyzzpABUTupmq/iikpF+YdvS8WfJf1Zdt5N57UQh0g3N/S90/MPuDRPvrPwST02EUDqPLV93Ru+1pjsgz8pz0SERekxH5RC7ImHDynfwgv8jvYBTkgQrMxhq0tjlPyU4Eiz+P/NrE</latexit>

⇡(x) = argminu

Q(x, u)<latexit sha1_base64="9JOy7EmzI33VkY9AhBTPWqdN0Gg=">AAACkXicbVFdaxNBFJ1s/WjrV9o+6sNgEBIoYbcItg9CUEHBPrRo2kJ2CXcnN5uhM7PLzF1JWPLir/FV/43/prNxBZN4YeBwzv2Ye09aKOkoDH+3gp179x883N3bf/T4ydNn7YPDK5eXVuBQ5Cq3Nyk4VNLgkCQpvCksgk4VXqe372v9+htaJ3PzlRYFJhoyI6dSAHlq3H4RF7I77/G3PAabxVqacVUu+WV3flz2xu1O2A9XwbdB1IAOa+JifNBK4kkuSo2GhALnRlFYUFKBJSkULvfj0mEB4hYyHHloQKNLqtUaS/7KMxM+za1/hviK/beiAu3cQqc+UwPN3KZWk//TRiVNT5NKmqIkNOLPoGmpOOW8vgmfSIuC1MIDEFb6v3IxAwuC/OXWpqx6FyjWNqnmpZEin+AGq2hOFjzpkDRIU29VfZRK8S9gHD+X2Yz+qr5tLXc/yEySOz739pjeVrI3JNo8/za4OulHYT+6fN0ZvGus2WXP2UvWZRF7wwbsE7tgQybYd/aD/WS/gqPgLBgETW7QamqO2FoEn+8Ab73Klg==</latexit><latexit sha1_base64="9JOy7EmzI33VkY9AhBTPWqdN0Gg=">AAACkXicbVFdaxNBFJ1s/WjrV9o+6sNgEBIoYbcItg9CUEHBPrRo2kJ2CXcnN5uhM7PLzF1JWPLir/FV/43/prNxBZN4YeBwzv2Ye09aKOkoDH+3gp179x883N3bf/T4ydNn7YPDK5eXVuBQ5Cq3Nyk4VNLgkCQpvCksgk4VXqe372v9+htaJ3PzlRYFJhoyI6dSAHlq3H4RF7I77/G3PAabxVqacVUu+WV3flz2xu1O2A9XwbdB1IAOa+JifNBK4kkuSo2GhALnRlFYUFKBJSkULvfj0mEB4hYyHHloQKNLqtUaS/7KMxM+za1/hviK/beiAu3cQqc+UwPN3KZWk//TRiVNT5NKmqIkNOLPoGmpOOW8vgmfSIuC1MIDEFb6v3IxAwuC/OXWpqx6FyjWNqnmpZEin+AGq2hOFjzpkDRIU29VfZRK8S9gHD+X2Yz+qr5tLXc/yEySOz739pjeVrI3JNo8/za4OulHYT+6fN0ZvGus2WXP2UvWZRF7wwbsE7tgQybYd/aD/WS/gqPgLBgETW7QamqO2FoEn+8Ab73Klg==</latexit><latexit sha1_base64="9JOy7EmzI33VkY9AhBTPWqdN0Gg=">AAACkXicbVFdaxNBFJ1s/WjrV9o+6sNgEBIoYbcItg9CUEHBPrRo2kJ2CXcnN5uhM7PLzF1JWPLir/FV/43/prNxBZN4YeBwzv2Ye09aKOkoDH+3gp179x883N3bf/T4ydNn7YPDK5eXVuBQ5Cq3Nyk4VNLgkCQpvCksgk4VXqe372v9+htaJ3PzlRYFJhoyI6dSAHlq3H4RF7I77/G3PAabxVqacVUu+WV3flz2xu1O2A9XwbdB1IAOa+JifNBK4kkuSo2GhALnRlFYUFKBJSkULvfj0mEB4hYyHHloQKNLqtUaS/7KMxM+za1/hviK/beiAu3cQqc+UwPN3KZWk//TRiVNT5NKmqIkNOLPoGmpOOW8vgmfSIuC1MIDEFb6v3IxAwuC/OXWpqx6FyjWNqnmpZEin+AGq2hOFjzpkDRIU29VfZRK8S9gHD+X2Yz+qr5tLXc/yEySOz739pjeVrI3JNo8/za4OulHYT+6fN0ZvGus2WXP2UvWZRF7wwbsE7tgQybYd/aD/WS/gqPgLBgETW7QamqO2FoEn+8Ab73Klg==</latexit><latexit sha1_base64="9JOy7EmzI33VkY9AhBTPWqdN0Gg=">AAACkXicbVFdaxNBFJ1s/WjrV9o+6sNgEBIoYbcItg9CUEHBPrRo2kJ2CXcnN5uhM7PLzF1JWPLir/FV/43/prNxBZN4YeBwzv2Ye09aKOkoDH+3gp179x883N3bf/T4ydNn7YPDK5eXVuBQ5Cq3Nyk4VNLgkCQpvCksgk4VXqe372v9+htaJ3PzlRYFJhoyI6dSAHlq3H4RF7I77/G3PAabxVqacVUu+WV3flz2xu1O2A9XwbdB1IAOa+JifNBK4kkuSo2GhALnRlFYUFKBJSkULvfj0mEB4hYyHHloQKNLqtUaS/7KMxM+za1/hviK/beiAu3cQqc+UwPN3KZWk//TRiVNT5NKmqIkNOLPoGmpOOW8vgmfSIuC1MIDEFb6v3IxAwuC/OXWpqx6FyjWNqnmpZEin+AGq2hOFjzpkDRIU29VfZRK8S9gHD+X2Yz+qr5tLXc/yEySOz739pjeVrI3JNo8/za4OulHYT+6fN0ZvGus2WXP2UvWZRF7wwbsE7tgQybYd/aD/WS/gqPgLBgETW7QamqO2FoEn+8Ab73Klg==</latexit>

Q(x, u) = C(x, u) + �Ee

minu0

Q(f(x, u, e), u0)�

<latexit sha1_base64="JQWqz2wojJa3HLjCMszje1ktl10=">AAACwXicbVFdixMxFE3Hr3X92K4++hIssi2WMiOCvigLVfShD1u0uwudoWTSO9OwSWZIbqRlmD/lr9FH/SVm2gq29ULI4Zx7b3LuTUspLIbhz1Zw6/adu/eO7h8/ePjo8Un79MmlLZzhMOGFLMx1yixIoWGCAiVclwaYSiVcpTfDRr/6BsaKQn/FVQmJYrkWmeAMPTVrj8bdZd/16Lvh5n5JaZwzpRiNFcNFmlYf61kFdSwhw2mshJ5V7qym427WFPSh13dnvdiIfIHJrN0JB+E66CGItqBDtnExO20l8bzgToFGLpm10ygsMamYQcEl1Mexs1AyfsNymHqomQKbVGvbNX3hmTnNCuOPRrpm/62omLJ2pVKf2Xix+1pD/k+bOszeJpXQpUPQfPNQ5iTFgjYzpHNhgKNcecC4Ef6vlC+YYRz9pHdeWfcuge84qZZOC17MYY+VuETDPGkBFRO6cVV9ElLSL0xbOmpm/Ff1bRu5+0HkAm1/5NepewfJfiHR/vgPweWrQRQOovHrzvn77WqOyDPynHRJRN6Qc/KZXJAJ4eQ7+UF+kd/BMBBBGZhNatDa1jwlOxFUfwCCm9xj</latexit><latexit sha1_base64="JQWqz2wojJa3HLjCMszje1ktl10=">AAACwXicbVFdixMxFE3Hr3X92K4++hIssi2WMiOCvigLVfShD1u0uwudoWTSO9OwSWZIbqRlmD/lr9FH/SVm2gq29ULI4Zx7b3LuTUspLIbhz1Zw6/adu/eO7h8/ePjo8Un79MmlLZzhMOGFLMx1yixIoWGCAiVclwaYSiVcpTfDRr/6BsaKQn/FVQmJYrkWmeAMPTVrj8bdZd/16Lvh5n5JaZwzpRiNFcNFmlYf61kFdSwhw2mshJ5V7qym427WFPSh13dnvdiIfIHJrN0JB+E66CGItqBDtnExO20l8bzgToFGLpm10ygsMamYQcEl1Mexs1AyfsNymHqomQKbVGvbNX3hmTnNCuOPRrpm/62omLJ2pVKf2Xix+1pD/k+bOszeJpXQpUPQfPNQ5iTFgjYzpHNhgKNcecC4Ef6vlC+YYRz9pHdeWfcuge84qZZOC17MYY+VuETDPGkBFRO6cVV9ElLSL0xbOmpm/Ff1bRu5+0HkAm1/5NepewfJfiHR/vgPweWrQRQOovHrzvn77WqOyDPynHRJRN6Qc/KZXJAJ4eQ7+UF+kd/BMBBBGZhNatDa1jwlOxFUfwCCm9xj</latexit><latexit sha1_base64="JQWqz2wojJa3HLjCMszje1ktl10=">AAACwXicbVFdixMxFE3Hr3X92K4++hIssi2WMiOCvigLVfShD1u0uwudoWTSO9OwSWZIbqRlmD/lr9FH/SVm2gq29ULI4Zx7b3LuTUspLIbhz1Zw6/adu/eO7h8/ePjo8Un79MmlLZzhMOGFLMx1yixIoWGCAiVclwaYSiVcpTfDRr/6BsaKQn/FVQmJYrkWmeAMPTVrj8bdZd/16Lvh5n5JaZwzpRiNFcNFmlYf61kFdSwhw2mshJ5V7qym427WFPSh13dnvdiIfIHJrN0JB+E66CGItqBDtnExO20l8bzgToFGLpm10ygsMamYQcEl1Mexs1AyfsNymHqomQKbVGvbNX3hmTnNCuOPRrpm/62omLJ2pVKf2Xix+1pD/k+bOszeJpXQpUPQfPNQ5iTFgjYzpHNhgKNcecC4Ef6vlC+YYRz9pHdeWfcuge84qZZOC17MYY+VuETDPGkBFRO6cVV9ElLSL0xbOmpm/Ff1bRu5+0HkAm1/5NepewfJfiHR/vgPweWrQRQOovHrzvn77WqOyDPynHRJRN6Qc/KZXJAJ4eQ7+UF+kd/BMBBBGZhNatDa1jwlOxFUfwCCm9xj</latexit><latexit sha1_base64="JQWqz2wojJa3HLjCMszje1ktl10=">AAACwXicbVFdixMxFE3Hr3X92K4++hIssi2WMiOCvigLVfShD1u0uwudoWTSO9OwSWZIbqRlmD/lr9FH/SVm2gq29ULI4Zx7b3LuTUspLIbhz1Zw6/adu/eO7h8/ePjo8Un79MmlLZzhMOGFLMx1yixIoWGCAiVclwaYSiVcpTfDRr/6BsaKQn/FVQmJYrkWmeAMPTVrj8bdZd/16Lvh5n5JaZwzpRiNFcNFmlYf61kFdSwhw2mshJ5V7qym427WFPSh13dnvdiIfIHJrN0JB+E66CGItqBDtnExO20l8bzgToFGLpm10ygsMamYQcEl1Mexs1AyfsNymHqomQKbVGvbNX3hmTnNCuOPRrpm/62omLJ2pVKf2Xix+1pD/k+bOszeJpXQpUPQfPNQ5iTFgjYzpHNhgKNcecC4Ef6vlC+YYRz9pHdeWfcuge84qZZOC17MYY+VuETDPGkBFRO6cVV9ElLSL0xbOmpm/Ff1bRu5+0HkAm1/5NepewfJfiHR/vgPweWrQRQOovHrzvn77WqOyDPynHRJRN6Qc/KZXJAJ4eQ7+UF+kd/BMBBBGZhNatDa1jwlOxFUfwCCm9xj</latexit>

minimize Ee⇥P1

t=1 �tC(xt, ut)

s.t. xt+1 = f(xt, ut, et)ut = ⇡(⌧t)

<latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit>

Page 33: reinforcement learning through the optimization lens

Approximate Dynamic Programming

Q(x, u) = C(x, u) + Ee [�V�(f(x, u, e))]<latexit sha1_base64="azSZeZtxX0p7GmuXEgkrcOFcX2k=">AAACvnicbVHbihNBEO2Mt914y+qjL41BSDCEmUXQFyG4igr7sIsmu5AZhp5OzaTZvgzdNZIw5Jv8Gh980V+xJ4lgEgsaDuecquqqykopHIbhz1Zw6/adu/eOjtv3Hzx89Lhz8mTiTGU5jLmRxl5nzIEUGsYoUMJ1aYGpTMJVdnPW6FffwDph9FdclpAoVmiRC87QU2nn8/FlbzGo+vQtPduAlzRWDOdZVn9YpTWsYgk5TmlcMKUYnaQb0Msb9wD6/diKYo5JO+10w2G4DnoIoi3okm1cpCetJJ4ZXinQyCVzbhqFJSY1syi4hFU7rhyUjN+wAqYeaqbAJfV65hV94ZkZzY31TyNds/9m1Ew5t1SZdzbjuH2tIf+nTSvM3yS10GWFoPmmUV5JioY2C6QzYYGjXHrAuBX+r5TPmWUc/Zp3uqxrl8B3JqkXlRbczGCPlbhAyzzpABUTupmq/iikpF+YdvS8WfJf1Zdt5N57UQh0g3N/S90/MPuDRPvrPwST02EUDqPLV93Ru+1pjsgz8pz0SERekxH5RC7ImHDynfwgv8jvYBTkgQrMxhq0tjlPyU4Eiz+P/NrE</latexit><latexit sha1_base64="azSZeZtxX0p7GmuXEgkrcOFcX2k=">AAACvnicbVHbihNBEO2Mt914y+qjL41BSDCEmUXQFyG4igr7sIsmu5AZhp5OzaTZvgzdNZIw5Jv8Gh980V+xJ4lgEgsaDuecquqqykopHIbhz1Zw6/adu/eOjtv3Hzx89Lhz8mTiTGU5jLmRxl5nzIEUGsYoUMJ1aYGpTMJVdnPW6FffwDph9FdclpAoVmiRC87QU2nn8/FlbzGo+vQtPduAlzRWDOdZVn9YpTWsYgk5TmlcMKUYnaQb0Msb9wD6/diKYo5JO+10w2G4DnoIoi3okm1cpCetJJ4ZXinQyCVzbhqFJSY1syi4hFU7rhyUjN+wAqYeaqbAJfV65hV94ZkZzY31TyNds/9m1Ew5t1SZdzbjuH2tIf+nTSvM3yS10GWFoPmmUV5JioY2C6QzYYGjXHrAuBX+r5TPmWUc/Zp3uqxrl8B3JqkXlRbczGCPlbhAyzzpABUTupmq/iikpF+YdvS8WfJf1Zdt5N57UQh0g3N/S90/MPuDRPvrPwST02EUDqPLV93Ru+1pjsgz8pz0SERekxH5RC7ImHDynfwgv8jvYBTkgQrMxhq0tjlPyU4Eiz+P/NrE</latexit><latexit sha1_base64="azSZeZtxX0p7GmuXEgkrcOFcX2k=">AAACvnicbVHbihNBEO2Mt914y+qjL41BSDCEmUXQFyG4igr7sIsmu5AZhp5OzaTZvgzdNZIw5Jv8Gh980V+xJ4lgEgsaDuecquqqykopHIbhz1Zw6/adu/eOjtv3Hzx89Lhz8mTiTGU5jLmRxl5nzIEUGsYoUMJ1aYGpTMJVdnPW6FffwDph9FdclpAoVmiRC87QU2nn8/FlbzGo+vQtPduAlzRWDOdZVn9YpTWsYgk5TmlcMKUYnaQb0Msb9wD6/diKYo5JO+10w2G4DnoIoi3okm1cpCetJJ4ZXinQyCVzbhqFJSY1syi4hFU7rhyUjN+wAqYeaqbAJfV65hV94ZkZzY31TyNds/9m1Ew5t1SZdzbjuH2tIf+nTSvM3yS10GWFoPmmUV5JioY2C6QzYYGjXHrAuBX+r5TPmWUc/Zp3uqxrl8B3JqkXlRbczGCPlbhAyzzpABUTupmq/iikpF+YdvS8WfJf1Zdt5N57UQh0g3N/S90/MPuDRPvrPwST02EUDqPLV93Ru+1pjsgz8pz0SERekxH5RC7ImHDynfwgv8jvYBTkgQrMxhq0tjlPyU4Eiz+P/NrE</latexit><latexit sha1_base64="azSZeZtxX0p7GmuXEgkrcOFcX2k=">AAACvnicbVHbihNBEO2Mt914y+qjL41BSDCEmUXQFyG4igr7sIsmu5AZhp5OzaTZvgzdNZIw5Jv8Gh980V+xJ4lgEgsaDuecquqqykopHIbhz1Zw6/adu/eOjtv3Hzx89Lhz8mTiTGU5jLmRxl5nzIEUGsYoUMJ1aYGpTMJVdnPW6FffwDph9FdclpAoVmiRC87QU2nn8/FlbzGo+vQtPduAlzRWDOdZVn9YpTWsYgk5TmlcMKUYnaQb0Msb9wD6/diKYo5JO+10w2G4DnoIoi3okm1cpCetJJ4ZXinQyCVzbhqFJSY1syi4hFU7rhyUjN+wAqYeaqbAJfV65hV94ZkZzY31TyNds/9m1Ew5t1SZdzbjuH2tIf+nTSvM3yS10GWFoPmmUV5JioY2C6QzYYGjXHrAuBX+r5TPmWUc/Zp3uqxrl8B3JqkXlRbczGCPlbhAyzzpABUTupmq/iikpF+YdvS8WfJf1Zdt5N57UQh0g3N/S90/MPuDRPvrPwST02EUDqPLV93Ru+1pjsgz8pz0SERekxH5RC7ImHDynfwgv8jvYBTkgQrMxhq0tjlPyU4Eiz+P/NrE</latexit>

Bellman Equation: Q(x, u) = C(x, u) + �Ee

minu0

Q(f(x, u, e), u0)�

<latexit sha1_base64="JQWqz2wojJa3HLjCMszje1ktl10=">AAACwXicbVFdixMxFE3Hr3X92K4++hIssi2WMiOCvigLVfShD1u0uwudoWTSO9OwSWZIbqRlmD/lr9FH/SVm2gq29ULI4Zx7b3LuTUspLIbhz1Zw6/adu/eO7h8/ePjo8Un79MmlLZzhMOGFLMx1yixIoWGCAiVclwaYSiVcpTfDRr/6BsaKQn/FVQmJYrkWmeAMPTVrj8bdZd/16Lvh5n5JaZwzpRiNFcNFmlYf61kFdSwhw2mshJ5V7qym427WFPSh13dnvdiIfIHJrN0JB+E66CGItqBDtnExO20l8bzgToFGLpm10ygsMamYQcEl1Mexs1AyfsNymHqomQKbVGvbNX3hmTnNCuOPRrpm/62omLJ2pVKf2Xix+1pD/k+bOszeJpXQpUPQfPNQ5iTFgjYzpHNhgKNcecC4Ef6vlC+YYRz9pHdeWfcuge84qZZOC17MYY+VuETDPGkBFRO6cVV9ElLSL0xbOmpm/Ff1bRu5+0HkAm1/5NepewfJfiHR/vgPweWrQRQOovHrzvn77WqOyDPynHRJRN6Qc/KZXJAJ4eQ7+UF+kd/BMBBBGZhNatDa1jwlOxFUfwCCm9xj</latexit><latexit sha1_base64="JQWqz2wojJa3HLjCMszje1ktl10=">AAACwXicbVFdixMxFE3Hr3X92K4++hIssi2WMiOCvigLVfShD1u0uwudoWTSO9OwSWZIbqRlmD/lr9FH/SVm2gq29ULI4Zx7b3LuTUspLIbhz1Zw6/adu/eO7h8/ePjo8Un79MmlLZzhMOGFLMx1yixIoWGCAiVclwaYSiVcpTfDRr/6BsaKQn/FVQmJYrkWmeAMPTVrj8bdZd/16Lvh5n5JaZwzpRiNFcNFmlYf61kFdSwhw2mshJ5V7qym427WFPSh13dnvdiIfIHJrN0JB+E66CGItqBDtnExO20l8bzgToFGLpm10ygsMamYQcEl1Mexs1AyfsNymHqomQKbVGvbNX3hmTnNCuOPRrpm/62omLJ2pVKf2Xix+1pD/k+bOszeJpXQpUPQfPNQ5iTFgjYzpHNhgKNcecC4Ef6vlC+YYRz9pHdeWfcuge84qZZOC17MYY+VuETDPGkBFRO6cVV9ElLSL0xbOmpm/Ff1bRu5+0HkAm1/5NepewfJfiHR/vgPweWrQRQOovHrzvn77WqOyDPynHRJRN6Qc/KZXJAJ4eQ7+UF+kd/BMBBBGZhNatDa1jwlOxFUfwCCm9xj</latexit><latexit sha1_base64="JQWqz2wojJa3HLjCMszje1ktl10=">AAACwXicbVFdixMxFE3Hr3X92K4++hIssi2WMiOCvigLVfShD1u0uwudoWTSO9OwSWZIbqRlmD/lr9FH/SVm2gq29ULI4Zx7b3LuTUspLIbhz1Zw6/adu/eO7h8/ePjo8Un79MmlLZzhMOGFLMx1yixIoWGCAiVclwaYSiVcpTfDRr/6BsaKQn/FVQmJYrkWmeAMPTVrj8bdZd/16Lvh5n5JaZwzpRiNFcNFmlYf61kFdSwhw2mshJ5V7qym427WFPSh13dnvdiIfIHJrN0JB+E66CGItqBDtnExO20l8bzgToFGLpm10ygsMamYQcEl1Mexs1AyfsNymHqomQKbVGvbNX3hmTnNCuOPRrpm/62omLJ2pVKf2Xix+1pD/k+bOszeJpXQpUPQfPNQ5iTFgjYzpHNhgKNcecC4Ef6vlC+YYRz9pHdeWfcuge84qZZOC17MYY+VuETDPGkBFRO6cVV9ElLSL0xbOmpm/Ff1bRu5+0HkAm1/5NepewfJfiHR/vgPweWrQRQOovHrzvn77WqOyDPynHRJRN6Qc/KZXJAJ4eQ7+UF+kd/BMBBBGZhNatDa1jwlOxFUfwCCm9xj</latexit><latexit sha1_base64="JQWqz2wojJa3HLjCMszje1ktl10=">AAACwXicbVFdixMxFE3Hr3X92K4++hIssi2WMiOCvigLVfShD1u0uwudoWTSO9OwSWZIbqRlmD/lr9FH/SVm2gq29ULI4Zx7b3LuTUspLIbhz1Zw6/adu/eO7h8/ePjo8Un79MmlLZzhMOGFLMx1yixIoWGCAiVclwaYSiVcpTfDRr/6BsaKQn/FVQmJYrkWmeAMPTVrj8bdZd/16Lvh5n5JaZwzpRiNFcNFmlYf61kFdSwhw2mshJ5V7qym427WFPSh13dnvdiIfIHJrN0JB+E66CGItqBDtnExO20l8bzgToFGLpm10ygsMamYQcEl1Mexs1AyfsNymHqomQKbVGvbNX3hmTnNCuOPRrpm/62omLJ2pVKf2Xix+1pD/k+bOszeJpXQpUPQfPNQ5iTFgjYzpHNhgKNcecC4Ef6vlC+YYRz9pHdeWfcuge84qZZOC17MYY+VuETDPGkBFRO6cVV9ElLSL0xbOmpm/Ff1bRu5+0HkAm1/5NepewfJfiHR/vgPweWrQRQOovHrzvn77WqOyDPynHRJRN6Qc/KZXJAJ4eQ7+UF+kd/BMBBBGZhNatDa1jwlOxFUfwCCm9xj</latexit>

Optimal Policy: ⇡(x) = argminu

Q(x, u)<latexit sha1_base64="9JOy7EmzI33VkY9AhBTPWqdN0Gg=">AAACkXicbVFdaxNBFJ1s/WjrV9o+6sNgEBIoYbcItg9CUEHBPrRo2kJ2CXcnN5uhM7PLzF1JWPLir/FV/43/prNxBZN4YeBwzv2Ye09aKOkoDH+3gp179x883N3bf/T4ydNn7YPDK5eXVuBQ5Cq3Nyk4VNLgkCQpvCksgk4VXqe372v9+htaJ3PzlRYFJhoyI6dSAHlq3H4RF7I77/G3PAabxVqacVUu+WV3flz2xu1O2A9XwbdB1IAOa+JifNBK4kkuSo2GhALnRlFYUFKBJSkULvfj0mEB4hYyHHloQKNLqtUaS/7KMxM+za1/hviK/beiAu3cQqc+UwPN3KZWk//TRiVNT5NKmqIkNOLPoGmpOOW8vgmfSIuC1MIDEFb6v3IxAwuC/OXWpqx6FyjWNqnmpZEin+AGq2hOFjzpkDRIU29VfZRK8S9gHD+X2Yz+qr5tLXc/yEySOz739pjeVrI3JNo8/za4OulHYT+6fN0ZvGus2WXP2UvWZRF7wwbsE7tgQybYd/aD/WS/gqPgLBgETW7QamqO2FoEn+8Ab73Klg==</latexit><latexit sha1_base64="9JOy7EmzI33VkY9AhBTPWqdN0Gg=">AAACkXicbVFdaxNBFJ1s/WjrV9o+6sNgEBIoYbcItg9CUEHBPrRo2kJ2CXcnN5uhM7PLzF1JWPLir/FV/43/prNxBZN4YeBwzv2Ye09aKOkoDH+3gp179x883N3bf/T4ydNn7YPDK5eXVuBQ5Cq3Nyk4VNLgkCQpvCksgk4VXqe372v9+htaJ3PzlRYFJhoyI6dSAHlq3H4RF7I77/G3PAabxVqacVUu+WV3flz2xu1O2A9XwbdB1IAOa+JifNBK4kkuSo2GhALnRlFYUFKBJSkULvfj0mEB4hYyHHloQKNLqtUaS/7KMxM+za1/hviK/beiAu3cQqc+UwPN3KZWk//TRiVNT5NKmqIkNOLPoGmpOOW8vgmfSIuC1MIDEFb6v3IxAwuC/OXWpqx6FyjWNqnmpZEin+AGq2hOFjzpkDRIU29VfZRK8S9gHD+X2Yz+qr5tLXc/yEySOz739pjeVrI3JNo8/za4OulHYT+6fN0ZvGus2WXP2UvWZRF7wwbsE7tgQybYd/aD/WS/gqPgLBgETW7QamqO2FoEn+8Ab73Klg==</latexit><latexit sha1_base64="9JOy7EmzI33VkY9AhBTPWqdN0Gg=">AAACkXicbVFdaxNBFJ1s/WjrV9o+6sNgEBIoYbcItg9CUEHBPrRo2kJ2CXcnN5uhM7PLzF1JWPLir/FV/43/prNxBZN4YeBwzv2Ye09aKOkoDH+3gp179x883N3bf/T4ydNn7YPDK5eXVuBQ5Cq3Nyk4VNLgkCQpvCksgk4VXqe372v9+htaJ3PzlRYFJhoyI6dSAHlq3H4RF7I77/G3PAabxVqacVUu+WV3flz2xu1O2A9XwbdB1IAOa+JifNBK4kkuSo2GhALnRlFYUFKBJSkULvfj0mEB4hYyHHloQKNLqtUaS/7KMxM+za1/hviK/beiAu3cQqc+UwPN3KZWk//TRiVNT5NKmqIkNOLPoGmpOOW8vgmfSIuC1MIDEFb6v3IxAwuC/OXWpqx6FyjWNqnmpZEin+AGq2hOFjzpkDRIU29VfZRK8S9gHD+X2Yz+qr5tLXc/yEySOz739pjeVrI3JNo8/za4OulHYT+6fN0ZvGus2WXP2UvWZRF7wwbsE7tgQybYd/aD/WS/gqPgLBgETW7QamqO2FoEn+8Ab73Klg==</latexit><latexit sha1_base64="9JOy7EmzI33VkY9AhBTPWqdN0Gg=">AAACkXicbVFdaxNBFJ1s/WjrV9o+6sNgEBIoYbcItg9CUEHBPrRo2kJ2CXcnN5uhM7PLzF1JWPLir/FV/43/prNxBZN4YeBwzv2Ye09aKOkoDH+3gp179x883N3bf/T4ydNn7YPDK5eXVuBQ5Cq3Nyk4VNLgkCQpvCksgk4VXqe372v9+htaJ3PzlRYFJhoyI6dSAHlq3H4RF7I77/G3PAabxVqacVUu+WV3flz2xu1O2A9XwbdB1IAOa+JifNBK4kkuSo2GhALnRlFYUFKBJSkULvfj0mEB4hYyHHloQKNLqtUaS/7KMxM+za1/hviK/beiAu3cQqc+UwPN3KZWk//TRiVNT5NKmqIkNOLPoGmpOOW8vgmfSIuC1MIDEFb6v3IxAwuC/OXWpqx6FyjWNqnmpZEin+AGq2hOFjzpkDRIU29VfZRK8S9gHD+X2Yz+qr5tLXc/yEySOz739pjeVrI3JNo8/za4OulHYT+6fN0ZvGus2WXP2UvWZRF7wwbsE7tgQybYd/aD/WS/gqPgLBgETW7QamqO2FoEn+8Ab73Klg==</latexit>

Q(xk, uk) ⇡ C(xk, uk) + �minu0

Q(xk+1, u0) + ⌫k<latexit sha1_base64="881dDg/mD6XWd8c/m07dn8N1MZg=">AAACu3icbVFNb9NAEN2Yj5bylcKRy4oINVWjyEZI5YJUUQQcemgFaSvFljXebJzFu2trdxYlsvKP+DXcEPwY1m4QJGGklZ7eezOzM5NVUlgMwx+d4NbtO3d3du/t3X/w8NHj7v6TS1s6w/iIlbI01xlYLoXmIxQo+XVlOKhM8qusOG30q6/cWFHqz7ioeKIg12IqGKCn0u77i/48LQYuLQ5pDFVlyjk9/Usd0TgHpYDGSui0dgdL2iTUxVG0HLiD1qC9M+32wmHYBt0G0Qr0yCrO0/1OEk9K5hTXyCRYO47CCpMaDAom+XIvdpZXwArI+dhDDYrbpG4HXtIXnpnQaWn800hb9t+MGpS1C5V5pwKc2U2tIf+njR1OXye10JVDrtlNo6mTFEvabI9OhOEM5cIDYEb4v1I2AwMM/Y7XurS1K87WJqnnTgtWTvgGK3GOBjxpOSoQupmq/iCkpJ9AW3om8hn+UX3ZRu6/E7lAOzjzh9SHW2Z/kGhz/dvg8uUwCofRxaveydvVaXbJM/Kc9ElEjskJ+UjOyYgw8o18Jz/Jr+BNwIIvgbyxBp1VzlOyFoH7DSKE2bk=</latexit><latexit sha1_base64="881dDg/mD6XWd8c/m07dn8N1MZg=">AAACu3icbVFNb9NAEN2Yj5bylcKRy4oINVWjyEZI5YJUUQQcemgFaSvFljXebJzFu2trdxYlsvKP+DXcEPwY1m4QJGGklZ7eezOzM5NVUlgMwx+d4NbtO3d3du/t3X/w8NHj7v6TS1s6w/iIlbI01xlYLoXmIxQo+XVlOKhM8qusOG30q6/cWFHqz7ioeKIg12IqGKCn0u77i/48LQYuLQ5pDFVlyjk9/Usd0TgHpYDGSui0dgdL2iTUxVG0HLiD1qC9M+32wmHYBt0G0Qr0yCrO0/1OEk9K5hTXyCRYO47CCpMaDAom+XIvdpZXwArI+dhDDYrbpG4HXtIXnpnQaWn800hb9t+MGpS1C5V5pwKc2U2tIf+njR1OXye10JVDrtlNo6mTFEvabI9OhOEM5cIDYEb4v1I2AwMM/Y7XurS1K87WJqnnTgtWTvgGK3GOBjxpOSoQupmq/iCkpJ9AW3om8hn+UX3ZRu6/E7lAOzjzh9SHW2Z/kGhz/dvg8uUwCofRxaveydvVaXbJM/Kc9ElEjskJ+UjOyYgw8o18Jz/Jr+BNwIIvgbyxBp1VzlOyFoH7DSKE2bk=</latexit><latexit sha1_base64="881dDg/mD6XWd8c/m07dn8N1MZg=">AAACu3icbVFNb9NAEN2Yj5bylcKRy4oINVWjyEZI5YJUUQQcemgFaSvFljXebJzFu2trdxYlsvKP+DXcEPwY1m4QJGGklZ7eezOzM5NVUlgMwx+d4NbtO3d3du/t3X/w8NHj7v6TS1s6w/iIlbI01xlYLoXmIxQo+XVlOKhM8qusOG30q6/cWFHqz7ioeKIg12IqGKCn0u77i/48LQYuLQ5pDFVlyjk9/Usd0TgHpYDGSui0dgdL2iTUxVG0HLiD1qC9M+32wmHYBt0G0Qr0yCrO0/1OEk9K5hTXyCRYO47CCpMaDAom+XIvdpZXwArI+dhDDYrbpG4HXtIXnpnQaWn800hb9t+MGpS1C5V5pwKc2U2tIf+njR1OXye10JVDrtlNo6mTFEvabI9OhOEM5cIDYEb4v1I2AwMM/Y7XurS1K87WJqnnTgtWTvgGK3GOBjxpOSoQupmq/iCkpJ9AW3om8hn+UX3ZRu6/E7lAOzjzh9SHW2Z/kGhz/dvg8uUwCofRxaveydvVaXbJM/Kc9ElEjskJ+UjOyYgw8o18Jz/Jr+BNwIIvgbyxBp1VzlOyFoH7DSKE2bk=</latexit><latexit sha1_base64="881dDg/mD6XWd8c/m07dn8N1MZg=">AAACu3icbVFNb9NAEN2Yj5bylcKRy4oINVWjyEZI5YJUUQQcemgFaSvFljXebJzFu2trdxYlsvKP+DXcEPwY1m4QJGGklZ7eezOzM5NVUlgMwx+d4NbtO3d3du/t3X/w8NHj7v6TS1s6w/iIlbI01xlYLoXmIxQo+XVlOKhM8qusOG30q6/cWFHqz7ioeKIg12IqGKCn0u77i/48LQYuLQ5pDFVlyjk9/Usd0TgHpYDGSui0dgdL2iTUxVG0HLiD1qC9M+32wmHYBt0G0Qr0yCrO0/1OEk9K5hTXyCRYO47CCpMaDAom+XIvdpZXwArI+dhDDYrbpG4HXtIXnpnQaWn800hb9t+MGpS1C5V5pwKc2U2tIf+njR1OXye10JVDrtlNo6mTFEvabI9OhOEM5cIDYEb4v1I2AwMM/Y7XurS1K87WJqnnTgtWTvgGK3GOBjxpOSoQupmq/iCkpJ9AW3om8hn+UX3ZRu6/E7lAOzjzh9SHW2Z/kGhz/dvg8uUwCofRxaveydvVaXbJM/Kc9ElEjskJ+UjOyYgw8o18Jz/Jr+BNwIIvgbyxBp1VzlOyFoH7DSKE2bk=</latexit>

Qnew

(xk, uk) = (1� ⌘)Qold

(xk, uk)� ⌘

✓C(xk, uk) + �min

u0Qold

(xk+1, u0)◆

<latexit sha1_base64="A3vYLzVfulT7AEA2Lan03T/AF8Q=">AAADAXicbVJNbxMxEPUuX234SssFiYtphJqoabSLkMoFVKlIcOihEaStlI1WjneSWLG9K3sWEq32xJU/wg1x5ZfwN/gFeNNUNAkjWXp6783YM+NhJoXFIPjt+bdu37l7b2u7dv/Bw0eP6zu75zbNDYceT2VqLofMghQaeihQwmVmgKmhhIvh9KTSLz6DsSLVn3CewUCxsRYjwRk6Kq5/2+7GRaQYTowqNHwpy+YsnrbzeNqib2gzPIwAWYt242tPKpMblkNa6TSSMMLmyT/+gEZjppRTlNBxke+XdLNGMT0Iy3a+34qMGE+wVYvrjaATLIJugnAJGmQZZ/GON4iSlOcKNHLJrO2HQYaDghkUXEJZi3ILGeNTNoa+g5opsINiMbaSvnBMQkepcUcjXbA3MwqmrJ2roXNWD7frWkX+T+vnOHo9KITOcgTNry4a5ZJiSqsd0EQY4CjnDjBuhHsr5RNmGEe3qZVbFrUz4CudFLNcC54msMZKnKFhjrSAiglddVW8F1LSj0xbeloN+Vp1ZSu5+U6MBdr2qfsOurVhdgsJ18e/Cc5fdsKgE3ZfNY7fLlezRZ6RPdIkITkix+QDOSM9wskf76n33Nvzv/rf/R/+zyur7y1znpCV8H/9BfbF8pc=</latexit><latexit sha1_base64="A3vYLzVfulT7AEA2Lan03T/AF8Q=">AAADAXicbVJNbxMxEPUuX234SssFiYtphJqoabSLkMoFVKlIcOihEaStlI1WjneSWLG9K3sWEq32xJU/wg1x5ZfwN/gFeNNUNAkjWXp6783YM+NhJoXFIPjt+bdu37l7b2u7dv/Bw0eP6zu75zbNDYceT2VqLofMghQaeihQwmVmgKmhhIvh9KTSLz6DsSLVn3CewUCxsRYjwRk6Kq5/2+7GRaQYTowqNHwpy+YsnrbzeNqib2gzPIwAWYt242tPKpMblkNa6TSSMMLmyT/+gEZjppRTlNBxke+XdLNGMT0Iy3a+34qMGE+wVYvrjaATLIJugnAJGmQZZ/GON4iSlOcKNHLJrO2HQYaDghkUXEJZi3ILGeNTNoa+g5opsINiMbaSvnBMQkepcUcjXbA3MwqmrJ2roXNWD7frWkX+T+vnOHo9KITOcgTNry4a5ZJiSqsd0EQY4CjnDjBuhHsr5RNmGEe3qZVbFrUz4CudFLNcC54msMZKnKFhjrSAiglddVW8F1LSj0xbeloN+Vp1ZSu5+U6MBdr2qfsOurVhdgsJ18e/Cc5fdsKgE3ZfNY7fLlezRZ6RPdIkITkix+QDOSM9wskf76n33Nvzv/rf/R/+zyur7y1znpCV8H/9BfbF8pc=</latexit><latexit sha1_base64="A3vYLzVfulT7AEA2Lan03T/AF8Q=">AAADAXicbVJNbxMxEPUuX234SssFiYtphJqoabSLkMoFVKlIcOihEaStlI1WjneSWLG9K3sWEq32xJU/wg1x5ZfwN/gFeNNUNAkjWXp6783YM+NhJoXFIPjt+bdu37l7b2u7dv/Bw0eP6zu75zbNDYceT2VqLofMghQaeihQwmVmgKmhhIvh9KTSLz6DsSLVn3CewUCxsRYjwRk6Kq5/2+7GRaQYTowqNHwpy+YsnrbzeNqib2gzPIwAWYt242tPKpMblkNa6TSSMMLmyT/+gEZjppRTlNBxke+XdLNGMT0Iy3a+34qMGE+wVYvrjaATLIJugnAJGmQZZ/GON4iSlOcKNHLJrO2HQYaDghkUXEJZi3ILGeNTNoa+g5opsINiMbaSvnBMQkepcUcjXbA3MwqmrJ2roXNWD7frWkX+T+vnOHo9KITOcgTNry4a5ZJiSqsd0EQY4CjnDjBuhHsr5RNmGEe3qZVbFrUz4CudFLNcC54msMZKnKFhjrSAiglddVW8F1LSj0xbeloN+Vp1ZSu5+U6MBdr2qfsOurVhdgsJ18e/Cc5fdsKgE3ZfNY7fLlezRZ6RPdIkITkix+QDOSM9wskf76n33Nvzv/rf/R/+zyur7y1znpCV8H/9BfbF8pc=</latexit><latexit sha1_base64="A3vYLzVfulT7AEA2Lan03T/AF8Q=">AAADAXicbVJNbxMxEPUuX234SssFiYtphJqoabSLkMoFVKlIcOihEaStlI1WjneSWLG9K3sWEq32xJU/wg1x5ZfwN/gFeNNUNAkjWXp6783YM+NhJoXFIPjt+bdu37l7b2u7dv/Bw0eP6zu75zbNDYceT2VqLofMghQaeihQwmVmgKmhhIvh9KTSLz6DsSLVn3CewUCxsRYjwRk6Kq5/2+7GRaQYTowqNHwpy+YsnrbzeNqib2gzPIwAWYt242tPKpMblkNa6TSSMMLmyT/+gEZjppRTlNBxke+XdLNGMT0Iy3a+34qMGE+wVYvrjaATLIJugnAJGmQZZ/GON4iSlOcKNHLJrO2HQYaDghkUXEJZi3ILGeNTNoa+g5opsINiMbaSvnBMQkepcUcjXbA3MwqmrJ2roXNWD7frWkX+T+vnOHo9KITOcgTNry4a5ZJiSqsd0EQY4CjnDjBuhHsr5RNmGEe3qZVbFrUz4CudFLNcC54msMZKnKFhjrSAiglddVW8F1LSj0xbeloN+Vp1ZSu5+U6MBdr2qfsOurVhdgsJ18e/Cc5fdsKgE3ZfNY7fLlezRZ6RPdIkITkix+QDOSM9wskf76n33Nvzv/rf/R/+zyur7y1znpCV8H/9BfbF8pc=</latexit>

Q-learning:

minimize Ee⇥P1

t=1 �tC(xt, ut)

s.t. xt+1 = f(xt, ut, et)ut = ⇡(⌧t)

<latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit><latexit sha1_base64="c96nRog7AifHrw62VBldaO2Qo+c=">AAADIHicbVJdb9MwFHXC1ygfa+GRF4uKqRNTlSCk8VKpMBA87GEIuk2qs8hxndSa7UT2DWqJ8md45Y/whniEX4PTZRJtuVKk63PPPfa9J0khhYUg+O35N27eun1n527n3v0HD3e7vUenNi8N4xOWy9ycJ9RyKTSfgADJzwvDqUokP0suj5r62RdurMj1Z1gWPFI00yIVjIKD4u53kvBM6IoaQ5d1JWXdISrJF5USWijxldd4DxNFYZ4k1bs65kTyFKbEliquYBTWF0ToFJaYZFQpegH4aLCI4aCMYZ8Ykc0hIqSVtEMYNnIL1/k8rPEIp9fcA97wSWcPu8OIFGJAgDYaHcL1rH1d3O0Hw2AVeDsJ26SP2jiJe15EZjkrFdfAJLV2GgYFRE4OBJPcjVpaXlB2STM+dammituoWm21xs8cMsNpbtynAa/QfzsqqqxdqsQxm/3YzVoD/q82LSF9FVVCFyVwza4uSkuJIceNRXgmDGcgly6hzAj3Vszm1FAGzsi1W1baBWdrk1SLUguWz/gGKmEBhjrQclDUueamqt4LKfEnqi0+bsy6rjrZpjx4KzIB9uDY/S16f4vsDAk317+dnL4YhsEw/PiyP37TWrODnqCnaIBCdIjG6AM6QRPEvJ536I291/43/4f/0/91RfW9tucxWgv/z1+KYwFM</latexit>

Page 34: reinforcement learning through the optimization lens

Direct Policy Searchminimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

Page 35: reinforcement learning through the optimization lens

Sampling to Search

•Search over probability distributions•Use function approximations that might not capture optimal distribution

•Can build (incredibly high variance) stochastic gradient estimates by sampling:

<latexit sha1_base64="54fqRXzQyzg9HEIjG1k+akUc6Uk=">AAACd3icbVFLS8NAEN7G9/t19OBiUSpISUTQkwgKevCgaFVog0y203Zxs4m7E7EE/4JX/Wv+FG9uagXbOjDw8X3znihV0pLvf5a8sfGJyanpmdm5+YXFpeWV1VubZEZgTSQqMfcRWFRSY40kKbxPDUIcKbyLHk8K/e4ZjZWJvqFuimEMbS1bUgAVVEPh08Ny2a/6PeOjIOiDMuvb5cNKKWw0E5HFqEkosLYe+CmFORiSQuHrbCOzmIJ4hDbWHdQQow3z3rCvfMsxTd5KjHNNvMf+zcghtrYbRy4yBurYYa0g/9PqGbUOw1zqNCPU4qdRK1OcEl5szpvSoCDVdQCEkW5WLjpgQJC7z0CXXu0UxcAm+UumpUiaOMQqeiEDjrRIMUhdbJWfSaX4NWjLL2S7Q7+qK1vIlVPZlmR3L9wT9M5IsHtIMHz+UXC7Vw38anC1Xz4+6r9mmq2zTVZhATtgx+ycXbIaE6zD3tg7+yh9eRvetlf5CfVK/Zw1NmBe8A0ZvMK2</latexit><latexit sha1_base64="54fqRXzQyzg9HEIjG1k+akUc6Uk=">AAACd3icbVFLS8NAEN7G9/t19OBiUSpISUTQkwgKevCgaFVog0y203Zxs4m7E7EE/4JX/Wv+FG9uagXbOjDw8X3znihV0pLvf5a8sfGJyanpmdm5+YXFpeWV1VubZEZgTSQqMfcRWFRSY40kKbxPDUIcKbyLHk8K/e4ZjZWJvqFuimEMbS1bUgAVVEPh08Ny2a/6PeOjIOiDMuvb5cNKKWw0E5HFqEkosLYe+CmFORiSQuHrbCOzmIJ4hDbWHdQQow3z3rCvfMsxTd5KjHNNvMf+zcghtrYbRy4yBurYYa0g/9PqGbUOw1zqNCPU4qdRK1OcEl5szpvSoCDVdQCEkW5WLjpgQJC7z0CXXu0UxcAm+UumpUiaOMQqeiEDjrRIMUhdbJWfSaX4NWjLL2S7Q7+qK1vIlVPZlmR3L9wT9M5IsHtIMHz+UXC7Vw38anC1Xz4+6r9mmq2zTVZhATtgx+ycXbIaE6zD3tg7+yh9eRvetlf5CfVK/Zw1NmBe8A0ZvMK2</latexit><latexit sha1_base64="54fqRXzQyzg9HEIjG1k+akUc6Uk=">AAACd3icbVFLS8NAEN7G9/t19OBiUSpISUTQkwgKevCgaFVog0y203Zxs4m7E7EE/4JX/Wv+FG9uagXbOjDw8X3znihV0pLvf5a8sfGJyanpmdm5+YXFpeWV1VubZEZgTSQqMfcRWFRSY40kKbxPDUIcKbyLHk8K/e4ZjZWJvqFuimEMbS1bUgAVVEPh08Ny2a/6PeOjIOiDMuvb5cNKKWw0E5HFqEkosLYe+CmFORiSQuHrbCOzmIJ4hDbWHdQQow3z3rCvfMsxTd5KjHNNvMf+zcghtrYbRy4yBurYYa0g/9PqGbUOw1zqNCPU4qdRK1OcEl5szpvSoCDVdQCEkW5WLjpgQJC7z0CXXu0UxcAm+UumpUiaOMQqeiEDjrRIMUhdbJWfSaX4NWjLL2S7Q7+qK1vIlVPZlmR3L9wT9M5IsHtIMHz+UXC7Vw38anC1Xz4+6r9mmq2zTVZhATtgx+ycXbIaE6zD3tg7+yh9eRvetlf5CfVK/Zw1NmBe8A0ZvMK2</latexit><latexit sha1_base64="54fqRXzQyzg9HEIjG1k+akUc6Uk=">AAACd3icbVFLS8NAEN7G9/t19OBiUSpISUTQkwgKevCgaFVog0y203Zxs4m7E7EE/4JX/Wv+FG9uagXbOjDw8X3znihV0pLvf5a8sfGJyanpmdm5+YXFpeWV1VubZEZgTSQqMfcRWFRSY40kKbxPDUIcKbyLHk8K/e4ZjZWJvqFuimEMbS1bUgAVVEPh08Ny2a/6PeOjIOiDMuvb5cNKKWw0E5HFqEkosLYe+CmFORiSQuHrbCOzmIJ4hDbWHdQQow3z3rCvfMsxTd5KjHNNvMf+zcghtrYbRy4yBurYYa0g/9PqGbUOw1zqNCPU4qdRK1OcEl5szpvSoCDVdQCEkW5WLjpgQJC7z0CXXu0UxcAm+UumpUiaOMQqeiEDjrRIMUhdbJWfSaX4NWjLL2S7Q7+qK1vIlVPZlmR3L9wT9M5IsHtIMHz+UXC7Vw38anC1Xz4+6r9mmq2zTVZhATtgx+ycXbIaE6zD3tg7+yh9eRvetlf5CfVK/Zw1NmBe8A0ZvMK2</latexit>

min# Ep(z;#)[�(z)]<latexit sha1_base64="NIGzqa0Cicsnamj1oL8Sebl/XGU=">AAACzXicbVFNb9NAEN2YrzZ8pXDksiICJRKK7AqJSr1UfAgOlQiCtJFiy1pvJvaq67W1O66aLubKv+J/cOcKv4F1ahBJGGmlp/dm3uzMJKUUBn3/e8e7dv3GzVs7u93bd+7eu9/be3BiikpzmPBCFnqaMANSKJigQAnTUgPLEwmnydmrRj89B21EoT7hsoQoZ6kSC8EZOiruTcMEUqEs05otaytl3d0Nc6FiG54zjRkgq+lTGuYMsySxb+rYloPLw7/isJ7RcJyJweWQRt0Q1Ly1int9f+Svgm6DoAV90sY43utE4bzgVQ4KuWTGzAK/xMjZoeAS6m5YGSgZP2MpzBxULAcT2dUKavrEMXO6KLR7CumK/bfCstyYZZ64zGYUs6k15P+0WYWLg8gKVVYIil81WlSSYkGbfdK50MBRLh1gXAv3V8ozphlHt/W1LivvEvjaJPaiUoIXc9hgJV6gZo40gDkTqpnKvhVS0o9MGXos0gz/qM62kQevRSrQPDt2p1XDrWR3kGBz/dvgZH8U+KPgw/P+0cv2NDvkEXlMBiQgL8gReUfGZEI4+UZ+kJ/kl/feq7zP3perVK/T1jwka+F9/Q1KO+SM</latexit><latexit sha1_base64="NIGzqa0Cicsnamj1oL8Sebl/XGU=">AAACzXicbVFNb9NAEN2YrzZ8pXDksiICJRKK7AqJSr1UfAgOlQiCtJFiy1pvJvaq67W1O66aLubKv+J/cOcKv4F1ahBJGGmlp/dm3uzMJKUUBn3/e8e7dv3GzVs7u93bd+7eu9/be3BiikpzmPBCFnqaMANSKJigQAnTUgPLEwmnydmrRj89B21EoT7hsoQoZ6kSC8EZOiruTcMEUqEs05otaytl3d0Nc6FiG54zjRkgq+lTGuYMsySxb+rYloPLw7/isJ7RcJyJweWQRt0Q1Ly1int9f+Svgm6DoAV90sY43utE4bzgVQ4KuWTGzAK/xMjZoeAS6m5YGSgZP2MpzBxULAcT2dUKavrEMXO6KLR7CumK/bfCstyYZZ64zGYUs6k15P+0WYWLg8gKVVYIil81WlSSYkGbfdK50MBRLh1gXAv3V8ozphlHt/W1LivvEvjaJPaiUoIXc9hgJV6gZo40gDkTqpnKvhVS0o9MGXos0gz/qM62kQevRSrQPDt2p1XDrWR3kGBz/dvgZH8U+KPgw/P+0cv2NDvkEXlMBiQgL8gReUfGZEI4+UZ+kJ/kl/feq7zP3perVK/T1jwka+F9/Q1KO+SM</latexit><latexit sha1_base64="NIGzqa0Cicsnamj1oL8Sebl/XGU=">AAACzXicbVFNb9NAEN2YrzZ8pXDksiICJRKK7AqJSr1UfAgOlQiCtJFiy1pvJvaq67W1O66aLubKv+J/cOcKv4F1ahBJGGmlp/dm3uzMJKUUBn3/e8e7dv3GzVs7u93bd+7eu9/be3BiikpzmPBCFnqaMANSKJigQAnTUgPLEwmnydmrRj89B21EoT7hsoQoZ6kSC8EZOiruTcMEUqEs05otaytl3d0Nc6FiG54zjRkgq+lTGuYMsySxb+rYloPLw7/isJ7RcJyJweWQRt0Q1Ly1int9f+Svgm6DoAV90sY43utE4bzgVQ4KuWTGzAK/xMjZoeAS6m5YGSgZP2MpzBxULAcT2dUKavrEMXO6KLR7CumK/bfCstyYZZ64zGYUs6k15P+0WYWLg8gKVVYIil81WlSSYkGbfdK50MBRLh1gXAv3V8ozphlHt/W1LivvEvjaJPaiUoIXc9hgJV6gZo40gDkTqpnKvhVS0o9MGXos0gz/qM62kQevRSrQPDt2p1XDrWR3kGBz/dvgZH8U+KPgw/P+0cv2NDvkEXlMBiQgL8gReUfGZEI4+UZ+kJ/kl/feq7zP3perVK/T1jwka+F9/Q1KO+SM</latexit><latexit sha1_base64="NIGzqa0Cicsnamj1oL8Sebl/XGU=">AAACzXicbVFNb9NAEN2YrzZ8pXDksiICJRKK7AqJSr1UfAgOlQiCtJFiy1pvJvaq67W1O66aLubKv+J/cOcKv4F1ahBJGGmlp/dm3uzMJKUUBn3/e8e7dv3GzVs7u93bd+7eu9/be3BiikpzmPBCFnqaMANSKJigQAnTUgPLEwmnydmrRj89B21EoT7hsoQoZ6kSC8EZOiruTcMEUqEs05otaytl3d0Nc6FiG54zjRkgq+lTGuYMsySxb+rYloPLw7/isJ7RcJyJweWQRt0Q1Ly1int9f+Svgm6DoAV90sY43utE4bzgVQ4KuWTGzAK/xMjZoeAS6m5YGSgZP2MpzBxULAcT2dUKavrEMXO6KLR7CumK/bfCstyYZZ64zGYUs6k15P+0WYWLg8gKVVYIil81WlSSYkGbfdK50MBRLh1gXAv3V8ozphlHt/W1LivvEvjaJPaiUoIXc9hgJV6gZo40gDkTqpnKvhVS0o9MGXos0gz/qM62kQevRSrQPDt2p1XDrWR3kGBz/dvgZH8U+KPgw/P+0cv2NDvkEXlMBiQgL8gReUfGZEI4+UZ+kJ/kl/feq7zP3perVK/T1jwka+F9/Q1KO+SM</latexit>

minz2Rd �(z)<latexit sha1_base64="bnC2P7SNADZKKYAqpFTNIEWfeuQ=">AAACt3icbVHbbtNAEN2YS9twS+GRlxURKJVQZCMkyltVkOChD+GStlIcovV6bI+6XpvdMWpq+X/4Gl6Bv2GdGokkjDTS0Tlzn6hUaMn3f/e8Gzdv3d7Z3evfuXvv/oPB/sNTW1RGwlQWqjDnkbCgUMOUkBSclwZEHik4iy7etPrZNzAWC/2ZliXMc5FqTFAKctRicBxGkKKuhTFi2dRKNf29MEe9qK9C1GEuKIui+mPzJW74M87DSYajqwPeD0HHXdJiMPTH/sr4Ngg6MGSdTRb7vXkYF7LKQZNUwtpZ4Jc0d+UIpYKmH1YWSiEvRAozB7XIwc7r1bINf+qYmCeFca6Jr9h/M2qRW7vMIxfZTm83tZb8nzarKDmc16jLikDL60ZJpTgVvL0cj9GAJLV0QEiDblYuM2GEJHfftS6r2iXItU3qy0qjLGLYYBVdkhGOtEC5QN1uVb9DpfgnoS0/wTSjv6or28qjt5gi2ecn7on6YCvYPSTYPP82OH0xDvxx8OHl8Oi4e80ue8yesBEL2Ct2xN6zCZsyyb6zH+wn++W99hZe4mXXoV6vy3nE1sz7+gf5Ptsx</latexit><latexit sha1_base64="bnC2P7SNADZKKYAqpFTNIEWfeuQ=">AAACt3icbVHbbtNAEN2YS9twS+GRlxURKJVQZCMkyltVkOChD+GStlIcovV6bI+6XpvdMWpq+X/4Gl6Bv2GdGokkjDTS0Tlzn6hUaMn3f/e8Gzdv3d7Z3evfuXvv/oPB/sNTW1RGwlQWqjDnkbCgUMOUkBSclwZEHik4iy7etPrZNzAWC/2ZliXMc5FqTFAKctRicBxGkKKuhTFi2dRKNf29MEe9qK9C1GEuKIui+mPzJW74M87DSYajqwPeD0HHXdJiMPTH/sr4Ngg6MGSdTRb7vXkYF7LKQZNUwtpZ4Jc0d+UIpYKmH1YWSiEvRAozB7XIwc7r1bINf+qYmCeFca6Jr9h/M2qRW7vMIxfZTm83tZb8nzarKDmc16jLikDL60ZJpTgVvL0cj9GAJLV0QEiDblYuM2GEJHfftS6r2iXItU3qy0qjLGLYYBVdkhGOtEC5QN1uVb9DpfgnoS0/wTSjv6or28qjt5gi2ecn7on6YCvYPSTYPP82OH0xDvxx8OHl8Oi4e80ue8yesBEL2Ct2xN6zCZsyyb6zH+wn++W99hZe4mXXoV6vy3nE1sz7+gf5Ptsx</latexit><latexit sha1_base64="bnC2P7SNADZKKYAqpFTNIEWfeuQ=">AAACt3icbVHbbtNAEN2YS9twS+GRlxURKJVQZCMkyltVkOChD+GStlIcovV6bI+6XpvdMWpq+X/4Gl6Bv2GdGokkjDTS0Tlzn6hUaMn3f/e8Gzdv3d7Z3evfuXvv/oPB/sNTW1RGwlQWqjDnkbCgUMOUkBSclwZEHik4iy7etPrZNzAWC/2ZliXMc5FqTFAKctRicBxGkKKuhTFi2dRKNf29MEe9qK9C1GEuKIui+mPzJW74M87DSYajqwPeD0HHXdJiMPTH/sr4Ngg6MGSdTRb7vXkYF7LKQZNUwtpZ4Jc0d+UIpYKmH1YWSiEvRAozB7XIwc7r1bINf+qYmCeFca6Jr9h/M2qRW7vMIxfZTm83tZb8nzarKDmc16jLikDL60ZJpTgVvL0cj9GAJLV0QEiDblYuM2GEJHfftS6r2iXItU3qy0qjLGLYYBVdkhGOtEC5QN1uVb9DpfgnoS0/wTSjv6or28qjt5gi2ecn7on6YCvYPSTYPP82OH0xDvxx8OHl8Oi4e80ue8yesBEL2Ct2xN6zCZsyyb6zH+wn++W99hZe4mXXoV6vy3nE1sz7+gf5Ptsx</latexit><latexit sha1_base64="bnC2P7SNADZKKYAqpFTNIEWfeuQ=">AAACt3icbVHbbtNAEN2YS9twS+GRlxURKJVQZCMkyltVkOChD+GStlIcovV6bI+6XpvdMWpq+X/4Gl6Bv2GdGokkjDTS0Tlzn6hUaMn3f/e8Gzdv3d7Z3evfuXvv/oPB/sNTW1RGwlQWqjDnkbCgUMOUkBSclwZEHik4iy7etPrZNzAWC/2ZliXMc5FqTFAKctRicBxGkKKuhTFi2dRKNf29MEe9qK9C1GEuKIui+mPzJW74M87DSYajqwPeD0HHXdJiMPTH/sr4Ngg6MGSdTRb7vXkYF7LKQZNUwtpZ4Jc0d+UIpYKmH1YWSiEvRAozB7XIwc7r1bINf+qYmCeFca6Jr9h/M2qRW7vMIxfZTm83tZb8nzarKDmc16jLikDL60ZJpTgVvL0cj9GAJLV0QEiDblYuM2GEJHfftS6r2iXItU3qy0qjLGLYYBVdkhGOtEC5QN1uVb9DpfgnoS0/wTSjv6or28qjt5gi2ecn7on6YCvYPSTYPP82OH0xDvxx8OHl8Oi4e80ue8yesBEL2Ct2xN6zCZsyyb6zH+wn++W99hZe4mXXoV6vy3nE1sz7+gf5Ptsx</latexit>

=<latexit sha1_base64="ZcjlI+x7yeFDO4mUZy9FRV29Axs=">AAACdHicbVFLS8NAEN7GV32/jnpYLIKClEQEvSiCgh48KFoVapDJdtou3WzC7kRagr/Aq/44/4hnN7WCbR0Y+Pi+eU+UKmnJ9z9L3sTk1PRMeXZufmFxaXllde3eJpkRWBOJSsxjBBaV1FgjSQofU4MQRwofos5ZoT+8oLEy0XfUSzGMoaVlUwogR90cP69U/KrfNz4OggGosIFdP6+WwqdGIrIYNQkF1tYDP6UwB0NSKHyde8ospiA60MK6gxpitGHen/SVbzumwZuJca6J99m/GTnE1vbiyEXGQG07qhXkf1o9o+ZRmEudZoRa/DRqZopTwou1eUMaFKR6DoAw0s3KRRsMCHLHGerSr52iGNok72ZaiqSBI6yiLhlwpEWKQepiq/xCKsVvQVt+JVtt+lVd2ULeOZctSXbvyn1A744Fu4cEo+cfB/f71cCvBjcHldOTwWvKbINtsR0WsEN2yi7ZNasxwZC9sXf2UfryNr2Kt/0T6pUGOetsyLzqN87JwTc=</latexit><latexit sha1_base64="ZcjlI+x7yeFDO4mUZy9FRV29Axs=">AAACdHicbVFLS8NAEN7GV32/jnpYLIKClEQEvSiCgh48KFoVapDJdtou3WzC7kRagr/Aq/44/4hnN7WCbR0Y+Pi+eU+UKmnJ9z9L3sTk1PRMeXZufmFxaXllde3eJpkRWBOJSsxjBBaV1FgjSQofU4MQRwofos5ZoT+8oLEy0XfUSzGMoaVlUwogR90cP69U/KrfNz4OggGosIFdP6+WwqdGIrIYNQkF1tYDP6UwB0NSKHyde8ospiA60MK6gxpitGHen/SVbzumwZuJca6J99m/GTnE1vbiyEXGQG07qhXkf1o9o+ZRmEudZoRa/DRqZopTwou1eUMaFKR6DoAw0s3KRRsMCHLHGerSr52iGNok72ZaiqSBI6yiLhlwpEWKQepiq/xCKsVvQVt+JVtt+lVd2ULeOZctSXbvyn1A744Fu4cEo+cfB/f71cCvBjcHldOTwWvKbINtsR0WsEN2yi7ZNasxwZC9sXf2UfryNr2Kt/0T6pUGOetsyLzqN87JwTc=</latexit><latexit sha1_base64="ZcjlI+x7yeFDO4mUZy9FRV29Axs=">AAACdHicbVFLS8NAEN7GV32/jnpYLIKClEQEvSiCgh48KFoVapDJdtou3WzC7kRagr/Aq/44/4hnN7WCbR0Y+Pi+eU+UKmnJ9z9L3sTk1PRMeXZufmFxaXllde3eJpkRWBOJSsxjBBaV1FgjSQofU4MQRwofos5ZoT+8oLEy0XfUSzGMoaVlUwogR90cP69U/KrfNz4OggGosIFdP6+WwqdGIrIYNQkF1tYDP6UwB0NSKHyde8ospiA60MK6gxpitGHen/SVbzumwZuJca6J99m/GTnE1vbiyEXGQG07qhXkf1o9o+ZRmEudZoRa/DRqZopTwou1eUMaFKR6DoAw0s3KRRsMCHLHGerSr52iGNok72ZaiqSBI6yiLhlwpEWKQepiq/xCKsVvQVt+JVtt+lVd2ULeOZctSXbvyn1A744Fu4cEo+cfB/f71cCvBjcHldOTwWvKbINtsR0WsEN2yi7ZNasxwZC9sXf2UfryNr2Kt/0T6pUGOetsyLzqN87JwTc=</latexit><latexit sha1_base64="ZcjlI+x7yeFDO4mUZy9FRV29Axs=">AAACdHicbVFLS8NAEN7GV32/jnpYLIKClEQEvSiCgh48KFoVapDJdtou3WzC7kRagr/Aq/44/4hnN7WCbR0Y+Pi+eU+UKmnJ9z9L3sTk1PRMeXZufmFxaXllde3eJpkRWBOJSsxjBBaV1FgjSQofU4MQRwofos5ZoT+8oLEy0XfUSzGMoaVlUwogR90cP69U/KrfNz4OggGosIFdP6+WwqdGIrIYNQkF1tYDP6UwB0NSKHyde8ospiA60MK6gxpitGHen/SVbzumwZuJca6J99m/GTnE1vbiyEXGQG07qhXkf1o9o+ZRmEudZoRa/DRqZopTwou1eUMaFKR6DoAw0s3KRRsMCHLHGerSr52iGNok72ZaiqSBI6yiLhlwpEWKQepiq/xCKsVvQVt+JVtt+lVd2ULeOZctSXbvyn1A744Fu4cEo+cfB/f71cCvBjcHldOTwWvKbINtsR0WsEN2yi7ZNasxwZC9sXf2UfryNr2Kt/0T6pUGOetsyLzqN87JwTc=</latexit> minp(z) Ep[�(z)]<latexit sha1_base64="cIhLZcNuhlL5+u92cYHC02yuj8o=">AAACuXicbVFbi9NAFJ7G2269dfXRl8GidEFKIoKCL4uu6MM+VLS7C00Ik8lpctzJJMycyNaQP+Sv8XX9NU66EWzrgQMf33fuJ6kUWvL9q4F34+at23f29od3791/8HB08OjUlrWRMJelKs15Iiwo1DAnJAXnlQFRJArOkov3nX72HYzFUn+lVQVRITKNS5SCHBWPjsMEMtSNMEas2kapdrgfFqjjppr8OGz5cx4WgvIkaT60cbXg4SxHJ/BoGIJO+7R4NPan/tr4Lgh6MGa9zeKDQRSmpawL0CSVsHYR+BVFrhyhVNAOw9pCJeSFyGDhoBYF2KhZr9vyZ45J+bI0zjXxNftvRiMKa1dF4iK72e221pH/0xY1Ld9EDeqqJtDyutGyVpxK3t2Op2hAklo5IKRBNyuXuTBCkrvwRpd17QrkxibNZa1RlilssYouyQhHWqBCoO62aj6iUvyL0JafYJbTX9WV7eTJMWZI9sWJe6M+3Al2Dwm2z78LTl9OA38afH41PnrXv2aPPWFP2YQF7DU7Yp/YjM2ZZD/ZL3bFfntvPeHl3rfrUG/Q5zxmG+bZP+xQ24s=</latexit><latexit sha1_base64="cIhLZcNuhlL5+u92cYHC02yuj8o=">AAACuXicbVFbi9NAFJ7G2269dfXRl8GidEFKIoKCL4uu6MM+VLS7C00Ik8lpctzJJMycyNaQP+Sv8XX9NU66EWzrgQMf33fuJ6kUWvL9q4F34+at23f29od3791/8HB08OjUlrWRMJelKs15Iiwo1DAnJAXnlQFRJArOkov3nX72HYzFUn+lVQVRITKNS5SCHBWPjsMEMtSNMEas2kapdrgfFqjjppr8OGz5cx4WgvIkaT60cbXg4SxHJ/BoGIJO+7R4NPan/tr4Lgh6MGa9zeKDQRSmpawL0CSVsHYR+BVFrhyhVNAOw9pCJeSFyGDhoBYF2KhZr9vyZ45J+bI0zjXxNftvRiMKa1dF4iK72e221pH/0xY1Ld9EDeqqJtDyutGyVpxK3t2Op2hAklo5IKRBNyuXuTBCkrvwRpd17QrkxibNZa1RlilssYouyQhHWqBCoO62aj6iUvyL0JafYJbTX9WV7eTJMWZI9sWJe6M+3Al2Dwm2z78LTl9OA38afH41PnrXv2aPPWFP2YQF7DU7Yp/YjM2ZZD/ZL3bFfntvPeHl3rfrUG/Q5zxmG+bZP+xQ24s=</latexit><latexit sha1_base64="cIhLZcNuhlL5+u92cYHC02yuj8o=">AAACuXicbVFbi9NAFJ7G2269dfXRl8GidEFKIoKCL4uu6MM+VLS7C00Ik8lpctzJJMycyNaQP+Sv8XX9NU66EWzrgQMf33fuJ6kUWvL9q4F34+at23f29od3791/8HB08OjUlrWRMJelKs15Iiwo1DAnJAXnlQFRJArOkov3nX72HYzFUn+lVQVRITKNS5SCHBWPjsMEMtSNMEas2kapdrgfFqjjppr8OGz5cx4WgvIkaT60cbXg4SxHJ/BoGIJO+7R4NPan/tr4Lgh6MGa9zeKDQRSmpawL0CSVsHYR+BVFrhyhVNAOw9pCJeSFyGDhoBYF2KhZr9vyZ45J+bI0zjXxNftvRiMKa1dF4iK72e221pH/0xY1Ld9EDeqqJtDyutGyVpxK3t2Op2hAklo5IKRBNyuXuTBCkrvwRpd17QrkxibNZa1RlilssYouyQhHWqBCoO62aj6iUvyL0JafYJbTX9WV7eTJMWZI9sWJe6M+3Al2Dwm2z78LTl9OA38afH41PnrXv2aPPWFP2YQF7DU7Yp/YjM2ZZD/ZL3bFfntvPeHl3rfrUG/Q5zxmG+bZP+xQ24s=</latexit><latexit sha1_base64="cIhLZcNuhlL5+u92cYHC02yuj8o=">AAACuXicbVFbi9NAFJ7G2269dfXRl8GidEFKIoKCL4uu6MM+VLS7C00Ik8lpctzJJMycyNaQP+Sv8XX9NU66EWzrgQMf33fuJ6kUWvL9q4F34+at23f29od3791/8HB08OjUlrWRMJelKs15Iiwo1DAnJAXnlQFRJArOkov3nX72HYzFUn+lVQVRITKNS5SCHBWPjsMEMtSNMEas2kapdrgfFqjjppr8OGz5cx4WgvIkaT60cbXg4SxHJ/BoGIJO+7R4NPan/tr4Lgh6MGa9zeKDQRSmpawL0CSVsHYR+BVFrhyhVNAOw9pCJeSFyGDhoBYF2KhZr9vyZ45J+bI0zjXxNftvRiMKa1dF4iK72e221pH/0xY1Ld9EDeqqJtDyutGyVpxK3t2Op2hAklo5IKRBNyuXuTBCkrvwRpd17QrkxibNZa1RlilssYouyQhHWqBCoO62aj6iUvyL0JafYJbTX9WV7eTJMWZI9sWJe6M+3Al2Dwm2z78LTl9OA38afH41PnrXv2aPPWFP2YQF7DU7Yp/YjM2ZZD/ZL3bFfntvPeHl3rfrUG/Q5zxmG+bZP+xQ24s=</latexit>

=: J(#)<latexit sha1_base64="i0AeAYT9HIHUK/VjTsDD2czegl8=">AAAChHicbVFdaxpBFB03bWrtRzR57MtQKSiI7LYJLYUEaQstxYeU1ijoInfHqw6ZnV1m7gZl8Zfktf1R/TedVQtVe+HC4Zz7faNUSUu+/7vkHT14ePyo/Ljy5Omz5yfV2umNTTIjsCcSlZhBBBaV1NgjSQoHqUGII4X96PZjoffv0FiZ6B+0TDGMYablVAogR42rJ5fv+dfG6A4MzZGgOa7W/ba/Nn4Igi2os61dj2ulcDRJRBajJqHA2mHgpxTmrqAUCleVUWYxBXELMxw6qCFGG+bryVf8lWMmfJoY55r4mv03I4fY2mUcucgYaG73tYL8nzbMaPouzKVOM0ItNo2mmeKU8OIMfCINClJLB0AY6WblYg4GBLlj7XRZ105R7GySLzItRTLBPVbRggw40iLFIHWxVf5ZKsW/g7a8K2dz+qu6soXc+CRnkmyr6z6imwfB7iHB/vkPwc3rduC3g2/n9c6H7WvK7AV7yRosYG9Zh31h16zHBMvYPfvJfnnHXst7411sQr3SNueM7Zh39QcJiMWp</latexit><latexit sha1_base64="i0AeAYT9HIHUK/VjTsDD2czegl8=">AAAChHicbVFdaxpBFB03bWrtRzR57MtQKSiI7LYJLYUEaQstxYeU1ijoInfHqw6ZnV1m7gZl8Zfktf1R/TedVQtVe+HC4Zz7faNUSUu+/7vkHT14ePyo/Ljy5Omz5yfV2umNTTIjsCcSlZhBBBaV1NgjSQoHqUGII4X96PZjoffv0FiZ6B+0TDGMYablVAogR42rJ5fv+dfG6A4MzZGgOa7W/ba/Nn4Igi2os61dj2ulcDRJRBajJqHA2mHgpxTmrqAUCleVUWYxBXELMxw6qCFGG+bryVf8lWMmfJoY55r4mv03I4fY2mUcucgYaG73tYL8nzbMaPouzKVOM0ItNo2mmeKU8OIMfCINClJLB0AY6WblYg4GBLlj7XRZ105R7GySLzItRTLBPVbRggw40iLFIHWxVf5ZKsW/g7a8K2dz+qu6soXc+CRnkmyr6z6imwfB7iHB/vkPwc3rduC3g2/n9c6H7WvK7AV7yRosYG9Zh31h16zHBMvYPfvJfnnHXst7411sQr3SNueM7Zh39QcJiMWp</latexit><latexit sha1_base64="i0AeAYT9HIHUK/VjTsDD2czegl8=">AAAChHicbVFdaxpBFB03bWrtRzR57MtQKSiI7LYJLYUEaQstxYeU1ijoInfHqw6ZnV1m7gZl8Zfktf1R/TedVQtVe+HC4Zz7faNUSUu+/7vkHT14ePyo/Ljy5Omz5yfV2umNTTIjsCcSlZhBBBaV1NgjSQoHqUGII4X96PZjoffv0FiZ6B+0TDGMYablVAogR42rJ5fv+dfG6A4MzZGgOa7W/ba/Nn4Igi2os61dj2ulcDRJRBajJqHA2mHgpxTmrqAUCleVUWYxBXELMxw6qCFGG+bryVf8lWMmfJoY55r4mv03I4fY2mUcucgYaG73tYL8nzbMaPouzKVOM0ItNo2mmeKU8OIMfCINClJLB0AY6WblYg4GBLlj7XRZ105R7GySLzItRTLBPVbRggw40iLFIHWxVf5ZKsW/g7a8K2dz+qu6soXc+CRnkmyr6z6imwfB7iHB/vkPwc3rduC3g2/n9c6H7WvK7AV7yRosYG9Zh31h16zHBMvYPfvJfnnHXst7411sQr3SNueM7Zh39QcJiMWp</latexit><latexit sha1_base64="i0AeAYT9HIHUK/VjTsDD2czegl8=">AAAChHicbVFdaxpBFB03bWrtRzR57MtQKSiI7LYJLYUEaQstxYeU1ijoInfHqw6ZnV1m7gZl8Zfktf1R/TedVQtVe+HC4Zz7faNUSUu+/7vkHT14ePyo/Ljy5Omz5yfV2umNTTIjsCcSlZhBBBaV1NgjSQoHqUGII4X96PZjoffv0FiZ6B+0TDGMYablVAogR42rJ5fv+dfG6A4MzZGgOa7W/ba/Nn4Igi2os61dj2ulcDRJRBajJqHA2mHgpxTmrqAUCleVUWYxBXELMxw6qCFGG+bryVf8lWMmfJoY55r4mv03I4fY2mUcucgYaG73tYL8nzbMaPouzKVOM0ItNo2mmeKU8OIMfCINClJLB0AY6WblYg4GBLlj7XRZ105R7GySLzItRTLBPVbRggw40iLFIHWxVf5ZKsW/g7a8K2dz+qu6soXc+CRnkmyr6z6imwfB7iHB/vkPwc3rduC3g2/n9c6H7WvK7AV7yRosYG9Zh31h16zHBMvYPfvJfnnHXst7411sQr3SNueM7Zh39QcJiMWp</latexit>

rJ(#) = Ep(z;#) [�(z)r# log p(z;#)]<latexit sha1_base64="XNCVDuiGnBPly0pxCjg0898xEBs=">AAAC4nicbVHLitRAFK1kfIzjq0eXbgobIQ3SJIOgIMLgA0Vm0aI9M9AJoVK5SYqpVELVzTA9IT/gTtz6V678FHdWuttHd3uh4HDOfdS9J6mlMOj7Pxx358rVa9d3b+zdvHX7zt3B/r1jUzWaw5RXstKnCTMghYIpCpRwWmtgZSLhJDl71esn56CNqNQnnNcQlSxXIhOcoaXiQRkqlkhG33vhOdNYALIRfUHDkmGRJO2bLm5r7/L5X7ELJWQ4CyeF8C5HdFket38SOhrKKqfrRaEWeYERjQdDf+wvgm6DYAWGZBWTeN+JwrTiTQkKuWTGzAK/xqi1jQWX0O2FjYGa8TOWw8xCxUowUbu4S0cfWSalWaXtU0gX7L8VLSuNmZeJzez3NZtaT/5PmzWYPYtaoeoGQfHloKyRFCvaH5mmQgNHObeAcS3sXykvmGYcrRVrUxa9a+Brm7QXjRK8SmGDlXiBmlnSAJZMqH6r9q2Qkn5kytCj/si/Vdu2l73XIhdoHh9Zv9VoK9kaEmyefxscH4wDfxx8eDI8fLmyZpc8IA+JRwLylBySd2RCpoST7+Sn4zo7bup+dr+4X5eprrOquU/Wwv32C33A6oI=</latexit><latexit sha1_base64="XNCVDuiGnBPly0pxCjg0898xEBs=">AAAC4nicbVHLitRAFK1kfIzjq0eXbgobIQ3SJIOgIMLgA0Vm0aI9M9AJoVK5SYqpVELVzTA9IT/gTtz6V678FHdWuttHd3uh4HDOfdS9J6mlMOj7Pxx358rVa9d3b+zdvHX7zt3B/r1jUzWaw5RXstKnCTMghYIpCpRwWmtgZSLhJDl71esn56CNqNQnnNcQlSxXIhOcoaXiQRkqlkhG33vhOdNYALIRfUHDkmGRJO2bLm5r7/L5X7ELJWQ4CyeF8C5HdFket38SOhrKKqfrRaEWeYERjQdDf+wvgm6DYAWGZBWTeN+JwrTiTQkKuWTGzAK/xqi1jQWX0O2FjYGa8TOWw8xCxUowUbu4S0cfWSalWaXtU0gX7L8VLSuNmZeJzez3NZtaT/5PmzWYPYtaoeoGQfHloKyRFCvaH5mmQgNHObeAcS3sXykvmGYcrRVrUxa9a+Brm7QXjRK8SmGDlXiBmlnSAJZMqH6r9q2Qkn5kytCj/si/Vdu2l73XIhdoHh9Zv9VoK9kaEmyefxscH4wDfxx8eDI8fLmyZpc8IA+JRwLylBySd2RCpoST7+Sn4zo7bup+dr+4X5eprrOquU/Wwv32C33A6oI=</latexit><latexit sha1_base64="XNCVDuiGnBPly0pxCjg0898xEBs=">AAAC4nicbVHLitRAFK1kfIzjq0eXbgobIQ3SJIOgIMLgA0Vm0aI9M9AJoVK5SYqpVELVzTA9IT/gTtz6V678FHdWuttHd3uh4HDOfdS9J6mlMOj7Pxx358rVa9d3b+zdvHX7zt3B/r1jUzWaw5RXstKnCTMghYIpCpRwWmtgZSLhJDl71esn56CNqNQnnNcQlSxXIhOcoaXiQRkqlkhG33vhOdNYALIRfUHDkmGRJO2bLm5r7/L5X7ELJWQ4CyeF8C5HdFket38SOhrKKqfrRaEWeYERjQdDf+wvgm6DYAWGZBWTeN+JwrTiTQkKuWTGzAK/xqi1jQWX0O2FjYGa8TOWw8xCxUowUbu4S0cfWSalWaXtU0gX7L8VLSuNmZeJzez3NZtaT/5PmzWYPYtaoeoGQfHloKyRFCvaH5mmQgNHObeAcS3sXykvmGYcrRVrUxa9a+Brm7QXjRK8SmGDlXiBmlnSAJZMqH6r9q2Qkn5kytCj/si/Vdu2l73XIhdoHh9Zv9VoK9kaEmyefxscH4wDfxx8eDI8fLmyZpc8IA+JRwLylBySd2RCpoST7+Sn4zo7bup+dr+4X5eprrOquU/Wwv32C33A6oI=</latexit><latexit sha1_base64="XNCVDuiGnBPly0pxCjg0898xEBs=">AAAC4nicbVHLitRAFK1kfIzjq0eXbgobIQ3SJIOgIMLgA0Vm0aI9M9AJoVK5SYqpVELVzTA9IT/gTtz6V678FHdWuttHd3uh4HDOfdS9J6mlMOj7Pxx358rVa9d3b+zdvHX7zt3B/r1jUzWaw5RXstKnCTMghYIpCpRwWmtgZSLhJDl71esn56CNqNQnnNcQlSxXIhOcoaXiQRkqlkhG33vhOdNYALIRfUHDkmGRJO2bLm5r7/L5X7ELJWQ4CyeF8C5HdFket38SOhrKKqfrRaEWeYERjQdDf+wvgm6DYAWGZBWTeN+JwrTiTQkKuWTGzAK/xqi1jQWX0O2FjYGa8TOWw8xCxUowUbu4S0cfWSalWaXtU0gX7L8VLSuNmZeJzez3NZtaT/5PmzWYPYtaoeoGQfHloKyRFCvaH5mmQgNHObeAcS3sXykvmGYcrRVrUxa9a+Brm7QXjRK8SmGDlXiBmlnSAJZMqH6r9q2Qkn5kytCj/si/Vdu2l73XIhdoHh9Zv9VoK9kaEmyefxscH4wDfxx8eDI8fLmyZpc8IA+JRwLylBySd2RCpoST7+Sn4zo7bup+dr+4X5eprrOquU/Wwv32C33A6oI=</latexit>

Page 36: reinforcement learning through the optimization lens

Reinforce Algorithm

r#J(#) =Z

�(z)r#p(z;#)dz<latexit sha1_base64="oo9jp+19Oa2N8NsqHpL71jm7sec=">AAACynicbVFbaxNBFJ6st7beUn30ZTAoG5CwWwQLRSgqKJKHiKYtZJdwdvYkO3R2dpk5W5osefNf+Ut89FX/hLNplCbpgYFvvu/cT1IqaSkIfra8W7fv3L23s7t3/8HDR4/b+09ObFEZgUNRqMKcJWBRSY1DkqTwrDQIeaLwNDl/3+inF2isLPQ3mpUY5zDVciIFkKPG7eFupCFRMK6jCzCUIcGCf/b/f7r85VseSU08GmTSn3f5Df6lPz+6FpHOx+1O0AuWxrdBuAIdtrLBeL8VR2khqhw1CQXWjsKgpLh2OaVQuNiLKosliHOY4shBDTnauF7Ov+AvHJPySWHcc50u2esRNeTWzvLEeeZAmd3UGvImbVTR5DCupS4rQi2uCk0qxangzTJ5Kg0KUjMHQBjpeuUiAwOC3MrXqixzlyjWJqkvKy1FkeIGq+iSDDjSIuUgdTNV/VEqxb+Ctrwvpxn9U13aRvY/yKkk+6rv7qq7W87uIOHm+rfByUEvDHrhl9ed43er0+ywZ+w581nI3rBj9okN2JAJ9oP9Yr/ZH6/vGW/m1VeuXmsV85Stmff9L/YH4fA=</latexit><latexit sha1_base64="oo9jp+19Oa2N8NsqHpL71jm7sec=">AAACynicbVFbaxNBFJ6st7beUn30ZTAoG5CwWwQLRSgqKJKHiKYtZJdwdvYkO3R2dpk5W5osefNf+Ut89FX/hLNplCbpgYFvvu/cT1IqaSkIfra8W7fv3L23s7t3/8HDR4/b+09ObFEZgUNRqMKcJWBRSY1DkqTwrDQIeaLwNDl/3+inF2isLPQ3mpUY5zDVciIFkKPG7eFupCFRMK6jCzCUIcGCf/b/f7r85VseSU08GmTSn3f5Df6lPz+6FpHOx+1O0AuWxrdBuAIdtrLBeL8VR2khqhw1CQXWjsKgpLh2OaVQuNiLKosliHOY4shBDTnauF7Ov+AvHJPySWHcc50u2esRNeTWzvLEeeZAmd3UGvImbVTR5DCupS4rQi2uCk0qxangzTJ5Kg0KUjMHQBjpeuUiAwOC3MrXqixzlyjWJqkvKy1FkeIGq+iSDDjSIuUgdTNV/VEqxb+Ctrwvpxn9U13aRvY/yKkk+6rv7qq7W87uIOHm+rfByUEvDHrhl9ed43er0+ywZ+w581nI3rBj9okN2JAJ9oP9Yr/ZH6/vGW/m1VeuXmsV85Stmff9L/YH4fA=</latexit><latexit sha1_base64="oo9jp+19Oa2N8NsqHpL71jm7sec=">AAACynicbVFbaxNBFJ6st7beUn30ZTAoG5CwWwQLRSgqKJKHiKYtZJdwdvYkO3R2dpk5W5osefNf+Ut89FX/hLNplCbpgYFvvu/cT1IqaSkIfra8W7fv3L23s7t3/8HDR4/b+09ObFEZgUNRqMKcJWBRSY1DkqTwrDQIeaLwNDl/3+inF2isLPQ3mpUY5zDVciIFkKPG7eFupCFRMK6jCzCUIcGCf/b/f7r85VseSU08GmTSn3f5Df6lPz+6FpHOx+1O0AuWxrdBuAIdtrLBeL8VR2khqhw1CQXWjsKgpLh2OaVQuNiLKosliHOY4shBDTnauF7Ov+AvHJPySWHcc50u2esRNeTWzvLEeeZAmd3UGvImbVTR5DCupS4rQi2uCk0qxangzTJ5Kg0KUjMHQBjpeuUiAwOC3MrXqixzlyjWJqkvKy1FkeIGq+iSDDjSIuUgdTNV/VEqxb+Ctrwvpxn9U13aRvY/yKkk+6rv7qq7W87uIOHm+rfByUEvDHrhl9ed43er0+ywZ+w581nI3rBj9okN2JAJ9oP9Yr/ZH6/vGW/m1VeuXmsV85Stmff9L/YH4fA=</latexit><latexit sha1_base64="oo9jp+19Oa2N8NsqHpL71jm7sec=">AAACynicbVFbaxNBFJ6st7beUn30ZTAoG5CwWwQLRSgqKJKHiKYtZJdwdvYkO3R2dpk5W5osefNf+Ut89FX/hLNplCbpgYFvvu/cT1IqaSkIfra8W7fv3L23s7t3/8HDR4/b+09ObFEZgUNRqMKcJWBRSY1DkqTwrDQIeaLwNDl/3+inF2isLPQ3mpUY5zDVciIFkKPG7eFupCFRMK6jCzCUIcGCf/b/f7r85VseSU08GmTSn3f5Df6lPz+6FpHOx+1O0AuWxrdBuAIdtrLBeL8VR2khqhw1CQXWjsKgpLh2OaVQuNiLKosliHOY4shBDTnauF7Ov+AvHJPySWHcc50u2esRNeTWzvLEeeZAmd3UGvImbVTR5DCupS4rQi2uCk0qxangzTJ5Kg0KUjMHQBjpeuUiAwOC3MrXqixzlyjWJqkvKy1FkeIGq+iSDDjSIuUgdTNV/VEqxb+Ctrwvpxn9U13aRvY/yKkk+6rv7qq7W87uIOHm+rfByUEvDHrhl9ed43er0+ywZ+w581nI3rBj9okN2JAJ9oP9Yr/ZH6/vGW/m1VeuXmsV85Stmff9L/YH4fA=</latexit>

=

Z�(z)

✓r#p(z;#)p(z;#)

◆p(z;#)dz

<latexit sha1_base64="NBBN4O5CMynrkblxni3VHtvu0jM=">AAAC23icbVFNixNBEO2MH7uuX1k9emkMwgQkzCyCwiIsKuhhDxHN7kJmCDWdmkyzPT1Dd82yyZCTN/Hqv/IP+De86sGebBQzsaDh9XtV1V2vklJJS0HwveNdu37j5s7urb3bd+7eu9/df3Bii8oIHIlCFeYsAYtKahyRJIVnpUHIE4WnyfnrRj+9QGNloT/SvMQ4h5mWqRRAjpp005c8kpp4NMykv+jzSGFKfpQaEHWkIVEwqaMLMJQhwZKX/uLw77W/rFv3yMhZRv1WGp8uJt1eMAhWwbdBuAY9to7hZL8TR9NCVDlqEgqsHYdBSXHtekqhcLkXVRZLEOcww7GDGnK0cb0yZMmfOGbK08K444Zbsf9W1JBbO88Tl5kDZbatNeT/tHFF6Yu4lrqsCLW4eiitFKeCN+7yqTQoSM0dAGGk+ysXGTgzye1g45VV7xLFxiT1ZaWlKKbYYhVdkgFHWqQcpG6mqt9KpfgH0JYfN67/UV3bRvbfyJkk+/TYLVr3t5LdQsK2/dvg5GAQBoPw/bPe0av1anbZI/aY+Sxkz9kRe8eGbMQE+8Z+sJ/slxd7n7zP3perVK+zrnnINsL7+htfc+ml</latexit><latexit sha1_base64="NBBN4O5CMynrkblxni3VHtvu0jM=">AAAC23icbVFNixNBEO2MH7uuX1k9emkMwgQkzCyCwiIsKuhhDxHN7kJmCDWdmkyzPT1Dd82yyZCTN/Hqv/IP+De86sGebBQzsaDh9XtV1V2vklJJS0HwveNdu37j5s7urb3bd+7eu9/df3Bii8oIHIlCFeYsAYtKahyRJIVnpUHIE4WnyfnrRj+9QGNloT/SvMQ4h5mWqRRAjpp005c8kpp4NMykv+jzSGFKfpQaEHWkIVEwqaMLMJQhwZKX/uLw77W/rFv3yMhZRv1WGp8uJt1eMAhWwbdBuAY9to7hZL8TR9NCVDlqEgqsHYdBSXHtekqhcLkXVRZLEOcww7GDGnK0cb0yZMmfOGbK08K444Zbsf9W1JBbO88Tl5kDZbatNeT/tHFF6Yu4lrqsCLW4eiitFKeCN+7yqTQoSM0dAGGk+ysXGTgzye1g45VV7xLFxiT1ZaWlKKbYYhVdkgFHWqQcpG6mqt9KpfgH0JYfN67/UV3bRvbfyJkk+/TYLVr3t5LdQsK2/dvg5GAQBoPw/bPe0av1anbZI/aY+Sxkz9kRe8eGbMQE+8Z+sJ/slxd7n7zP3perVK+zrnnINsL7+htfc+ml</latexit><latexit sha1_base64="NBBN4O5CMynrkblxni3VHtvu0jM=">AAAC23icbVFNixNBEO2MH7uuX1k9emkMwgQkzCyCwiIsKuhhDxHN7kJmCDWdmkyzPT1Dd82yyZCTN/Hqv/IP+De86sGebBQzsaDh9XtV1V2vklJJS0HwveNdu37j5s7urb3bd+7eu9/df3Bii8oIHIlCFeYsAYtKahyRJIVnpUHIE4WnyfnrRj+9QGNloT/SvMQ4h5mWqRRAjpp005c8kpp4NMykv+jzSGFKfpQaEHWkIVEwqaMLMJQhwZKX/uLw77W/rFv3yMhZRv1WGp8uJt1eMAhWwbdBuAY9to7hZL8TR9NCVDlqEgqsHYdBSXHtekqhcLkXVRZLEOcww7GDGnK0cb0yZMmfOGbK08K444Zbsf9W1JBbO88Tl5kDZbatNeT/tHFF6Yu4lrqsCLW4eiitFKeCN+7yqTQoSM0dAGGk+ysXGTgzye1g45VV7xLFxiT1ZaWlKKbYYhVdkgFHWqQcpG6mqt9KpfgH0JYfN67/UV3bRvbfyJkk+/TYLVr3t5LdQsK2/dvg5GAQBoPw/bPe0av1anbZI/aY+Sxkz9kRe8eGbMQE+8Z+sJ/slxd7n7zP3perVK+zrnnINsL7+htfc+ml</latexit><latexit sha1_base64="NBBN4O5CMynrkblxni3VHtvu0jM=">AAAC23icbVFNixNBEO2MH7uuX1k9emkMwgQkzCyCwiIsKuhhDxHN7kJmCDWdmkyzPT1Dd82yyZCTN/Hqv/IP+De86sGebBQzsaDh9XtV1V2vklJJS0HwveNdu37j5s7urb3bd+7eu9/df3Bii8oIHIlCFeYsAYtKahyRJIVnpUHIE4WnyfnrRj+9QGNloT/SvMQ4h5mWqRRAjpp005c8kpp4NMykv+jzSGFKfpQaEHWkIVEwqaMLMJQhwZKX/uLw77W/rFv3yMhZRv1WGp8uJt1eMAhWwbdBuAY9to7hZL8TR9NCVDlqEgqsHYdBSXHtekqhcLkXVRZLEOcww7GDGnK0cb0yZMmfOGbK08K444Zbsf9W1JBbO88Tl5kDZbatNeT/tHFF6Yu4lrqsCLW4eiitFKeCN+7yqTQoSM0dAGGk+ysXGTgzye1g45VV7xLFxiT1ZaWlKKbYYhVdkgFHWqQcpG6mqt9KpfgH0JYfN67/UV3bRvbfyJkk+/TYLVr3t5LdQsK2/dvg5GAQBoPw/bPe0av1anbZI/aY+Sxkz9kRe8eGbMQE+8Z+sJ/slxd7n7zP3perVK+zrnnINsL7+htfc+ml</latexit>

=

Z(�(z)r# log p(z;#)) p(z;#)dz

<latexit sha1_base64="7ypZMNmmI6UoHPh3SSog/g0XYTM=">AAACy3icbVFdi9NAFJ3Gr931q6uPvgwWIQUpySIoiLCooA8rVLTdhSaUm8ltMuxkEmZulm1jH/1X/hFffdUf4aRbxbZeGDicc+bO3HOTSklLQfC94127fuPmrb39g9t37t673z18MLZlbQSORKlKc5aARSU1jkiSwrPKIBSJwtPk/E2rn16gsbLUn2leYVxApuVMCiBHTbvjVzySmnikcEY+j4a59Bd9HmlIFEyb6AIM5UiwdI4y45W/ePmXczYjs5z6W3S62J92e8EgWBXfBeEa9Ni6htPDThylpagL1CQUWDsJg4rixvWUQuHyIKotViDOIcOJgxoKtHGzCmDJnzgm5bPSuOOGWbH/3migsHZeJM5ZAOV2W2vJ/2mTmmYv4kbqqibU4uqhWa04lbxNk6fSoCA1dwCEke6vXORgQJDLfOOVVe8KxcYkzWWtpShT3GIVXZIBR1qkAqRup2reSaX4J9CWn7Sp/1Fd21b238pMkn164har+ztmt5BwO/5dMD4ahMEg/Pisd/x6vZo99og9Zj4L2XN2zN6zIRsxwb6xH+wn++V98Ky38L5cWb3O+s5DtlHe19/ON+HT</latexit><latexit sha1_base64="7ypZMNmmI6UoHPh3SSog/g0XYTM=">AAACy3icbVFdi9NAFJ3Gr931q6uPvgwWIQUpySIoiLCooA8rVLTdhSaUm8ltMuxkEmZulm1jH/1X/hFffdUf4aRbxbZeGDicc+bO3HOTSklLQfC94127fuPmrb39g9t37t673z18MLZlbQSORKlKc5aARSU1jkiSwrPKIBSJwtPk/E2rn16gsbLUn2leYVxApuVMCiBHTbvjVzySmnikcEY+j4a59Bd9HmlIFEyb6AIM5UiwdI4y45W/ePmXczYjs5z6W3S62J92e8EgWBXfBeEa9Ni6htPDThylpagL1CQUWDsJg4rixvWUQuHyIKotViDOIcOJgxoKtHGzCmDJnzgm5bPSuOOGWbH/3migsHZeJM5ZAOV2W2vJ/2mTmmYv4kbqqibU4uqhWa04lbxNk6fSoCA1dwCEke6vXORgQJDLfOOVVe8KxcYkzWWtpShT3GIVXZIBR1qkAqRup2reSaX4J9CWn7Sp/1Fd21b238pMkn164har+ztmt5BwO/5dMD4ahMEg/Pisd/x6vZo99og9Zj4L2XN2zN6zIRsxwb6xH+wn++V98Ky38L5cWb3O+s5DtlHe19/ON+HT</latexit><latexit sha1_base64="7ypZMNmmI6UoHPh3SSog/g0XYTM=">AAACy3icbVFdi9NAFJ3Gr931q6uPvgwWIQUpySIoiLCooA8rVLTdhSaUm8ltMuxkEmZulm1jH/1X/hFffdUf4aRbxbZeGDicc+bO3HOTSklLQfC94127fuPmrb39g9t37t673z18MLZlbQSORKlKc5aARSU1jkiSwrPKIBSJwtPk/E2rn16gsbLUn2leYVxApuVMCiBHTbvjVzySmnikcEY+j4a59Bd9HmlIFEyb6AIM5UiwdI4y45W/ePmXczYjs5z6W3S62J92e8EgWBXfBeEa9Ni6htPDThylpagL1CQUWDsJg4rixvWUQuHyIKotViDOIcOJgxoKtHGzCmDJnzgm5bPSuOOGWbH/3migsHZeJM5ZAOV2W2vJ/2mTmmYv4kbqqibU4uqhWa04lbxNk6fSoCA1dwCEke6vXORgQJDLfOOVVe8KxcYkzWWtpShT3GIVXZIBR1qkAqRup2reSaX4J9CWn7Sp/1Fd21b238pMkn164har+ztmt5BwO/5dMD4ahMEg/Pisd/x6vZo99og9Zj4L2XN2zN6zIRsxwb6xH+wn++V98Ky38L5cWb3O+s5DtlHe19/ON+HT</latexit><latexit sha1_base64="7ypZMNmmI6UoHPh3SSog/g0XYTM=">AAACy3icbVFdi9NAFJ3Gr931q6uPvgwWIQUpySIoiLCooA8rVLTdhSaUm8ltMuxkEmZulm1jH/1X/hFffdUf4aRbxbZeGDicc+bO3HOTSklLQfC94127fuPmrb39g9t37t673z18MLZlbQSORKlKc5aARSU1jkiSwrPKIBSJwtPk/E2rn16gsbLUn2leYVxApuVMCiBHTbvjVzySmnikcEY+j4a59Bd9HmlIFEyb6AIM5UiwdI4y45W/ePmXczYjs5z6W3S62J92e8EgWBXfBeEa9Ni6htPDThylpagL1CQUWDsJg4rixvWUQuHyIKotViDOIcOJgxoKtHGzCmDJnzgm5bPSuOOGWbH/3migsHZeJM5ZAOV2W2vJ/2mTmmYv4kbqqibU4uqhWa04lbxNk6fSoCA1dwCEke6vXORgQJDLfOOVVe8KxcYkzWWtpShT3GIVXZIBR1qkAqRup2reSaX4J9CWn7Sp/1Fd21b238pMkn164har+ztmt5BwO/5dMD4ahMEg/Pisd/x6vZo99og9Zj4L2XN2zN6zIRsxwb6xH+wn++V98Ky38L5cWb3O+s5DtlHe19/ON+HT</latexit>

= Ep(z;#) [�(z)r# log p(z;#)]<latexit sha1_base64="FA1z8Kn/Imf/v9cYLoes1FOpWME=">AAACz3icbVFbi9NAFJ7G27reuvroy2ARWpCSiKCwLCxe0Id96KLdXWxCOJmeJsNOJmHmZN1uiPjqv/Jv+Ad81Z/gpFvFth4Y+Pi+c5nznaRU0pLvf+94V65eu35j6+b2rdt37t7r7tw/skVlBI5FoQpzkoBFJTWOSZLCk9Ig5InC4+T0Vasfn6GxstAfaF5ilEOq5UwKIEfF3Y97PMyBsiSp3zRxXfYvdsMzMJQhwaAJFc5owsNRJvsXAx5qSBTE9d+MhoeqSPlqFQ+NTDOK4m7PH/qL4JsgWIIeW8Yo3ulE4bQQVY6ahAJrJ4FfUlS7xlIobLbDymIJ4hRSnDioIUcb1QsTGv7YMVM+K4x7mviC/beihtzaeZ64zHZhu6615P+0SUWzF1EtdVkRanE5aFYpTgVvHeVTaVCQmjsAwkj3Vy4yMCDI+b4yZdG7RLGySX1eaSmKKa6xis7JgCMtUg5St1vVb6VS/D1oyw9aj/+orm0r91/LVJJ9cuCOqwcbye4gwbr9m+Do6TDwh8Hhs97+y+VptthD9oj1WcCes332jo3YmAn2jf1gP9kv79D75H32vlymep1lzQO2Et7X30+55PI=</latexit><latexit sha1_base64="FA1z8Kn/Imf/v9cYLoes1FOpWME=">AAACz3icbVFbi9NAFJ7G27reuvroy2ARWpCSiKCwLCxe0Id96KLdXWxCOJmeJsNOJmHmZN1uiPjqv/Jv+Ad81Z/gpFvFth4Y+Pi+c5nznaRU0pLvf+94V65eu35j6+b2rdt37t7r7tw/skVlBI5FoQpzkoBFJTWOSZLCk9Ig5InC4+T0Vasfn6GxstAfaF5ilEOq5UwKIEfF3Y97PMyBsiSp3zRxXfYvdsMzMJQhwaAJFc5owsNRJvsXAx5qSBTE9d+MhoeqSPlqFQ+NTDOK4m7PH/qL4JsgWIIeW8Yo3ulE4bQQVY6ahAJrJ4FfUlS7xlIobLbDymIJ4hRSnDioIUcb1QsTGv7YMVM+K4x7mviC/beihtzaeZ64zHZhu6615P+0SUWzF1EtdVkRanE5aFYpTgVvHeVTaVCQmjsAwkj3Vy4yMCDI+b4yZdG7RLGySX1eaSmKKa6xis7JgCMtUg5St1vVb6VS/D1oyw9aj/+orm0r91/LVJJ9cuCOqwcbye4gwbr9m+Do6TDwh8Hhs97+y+VptthD9oj1WcCes332jo3YmAn2jf1gP9kv79D75H32vlymep1lzQO2Et7X30+55PI=</latexit><latexit sha1_base64="FA1z8Kn/Imf/v9cYLoes1FOpWME=">AAACz3icbVFbi9NAFJ7G27reuvroy2ARWpCSiKCwLCxe0Id96KLdXWxCOJmeJsNOJmHmZN1uiPjqv/Jv+Ad81Z/gpFvFth4Y+Pi+c5nznaRU0pLvf+94V65eu35j6+b2rdt37t7r7tw/skVlBI5FoQpzkoBFJTWOSZLCk9Ig5InC4+T0Vasfn6GxstAfaF5ilEOq5UwKIEfF3Y97PMyBsiSp3zRxXfYvdsMzMJQhwaAJFc5owsNRJvsXAx5qSBTE9d+MhoeqSPlqFQ+NTDOK4m7PH/qL4JsgWIIeW8Yo3ulE4bQQVY6ahAJrJ4FfUlS7xlIobLbDymIJ4hRSnDioIUcb1QsTGv7YMVM+K4x7mviC/beihtzaeZ64zHZhu6615P+0SUWzF1EtdVkRanE5aFYpTgVvHeVTaVCQmjsAwkj3Vy4yMCDI+b4yZdG7RLGySX1eaSmKKa6xis7JgCMtUg5St1vVb6VS/D1oyw9aj/+orm0r91/LVJJ9cuCOqwcbye4gwbr9m+Do6TDwh8Hhs97+y+VptthD9oj1WcCes332jo3YmAn2jf1gP9kv79D75H32vlymep1lzQO2Et7X30+55PI=</latexit><latexit sha1_base64="FA1z8Kn/Imf/v9cYLoes1FOpWME=">AAACz3icbVFbi9NAFJ7G27reuvroy2ARWpCSiKCwLCxe0Id96KLdXWxCOJmeJsNOJmHmZN1uiPjqv/Jv+Ad81Z/gpFvFth4Y+Pi+c5nznaRU0pLvf+94V65eu35j6+b2rdt37t7r7tw/skVlBI5FoQpzkoBFJTWOSZLCk9Ig5InC4+T0Vasfn6GxstAfaF5ilEOq5UwKIEfF3Y97PMyBsiSp3zRxXfYvdsMzMJQhwaAJFc5owsNRJvsXAx5qSBTE9d+MhoeqSPlqFQ+NTDOK4m7PH/qL4JsgWIIeW8Yo3ulE4bQQVY6ahAJrJ4FfUlS7xlIobLbDymIJ4hRSnDioIUcb1QsTGv7YMVM+K4x7mviC/beihtzaeZ64zHZhu6615P+0SUWzF1EtdVkRanE5aFYpTgVvHeVTaVCQmjsAwkj3Vy4yMCDI+b4yZdG7RLGySX1eaSmKKa6xis7JgCMtUg5St1vVb6VS/D1oyw9aj/+orm0r91/LVJJ9cuCOqwcbye4gwbr9m+Do6TDwh8Hhs97+y+VptthD9oj1WcCes332jo3YmAn2jf1gP9kv79D75H32vlymep1lzQO2Et7X30+55PI=</latexit>

J(#) := Ep(z;#)[�(z)]<latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit>

Page 37: reinforcement learning through the optimization lens

Reinforce AlgorithmJ(#) := Ep(z;#)[�(z)]

<latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit>

rJ(#) = Ep(z;#) [�(z)r# log p(z;#)]<latexit sha1_base64="XNCVDuiGnBPly0pxCjg0898xEBs=">AAAC4nicbVHLitRAFK1kfIzjq0eXbgobIQ3SJIOgIMLgA0Vm0aI9M9AJoVK5SYqpVELVzTA9IT/gTtz6V678FHdWuttHd3uh4HDOfdS9J6mlMOj7Pxx358rVa9d3b+zdvHX7zt3B/r1jUzWaw5RXstKnCTMghYIpCpRwWmtgZSLhJDl71esn56CNqNQnnNcQlSxXIhOcoaXiQRkqlkhG33vhOdNYALIRfUHDkmGRJO2bLm5r7/L5X7ELJWQ4CyeF8C5HdFket38SOhrKKqfrRaEWeYERjQdDf+wvgm6DYAWGZBWTeN+JwrTiTQkKuWTGzAK/xqi1jQWX0O2FjYGa8TOWw8xCxUowUbu4S0cfWSalWaXtU0gX7L8VLSuNmZeJzez3NZtaT/5PmzWYPYtaoeoGQfHloKyRFCvaH5mmQgNHObeAcS3sXykvmGYcrRVrUxa9a+Brm7QXjRK8SmGDlXiBmlnSAJZMqH6r9q2Qkn5kytCj/si/Vdu2l73XIhdoHh9Zv9VoK9kaEmyefxscH4wDfxx8eDI8fLmyZpc8IA+JRwLylBySd2RCpoST7+Sn4zo7bup+dr+4X5eprrOquU/Wwv32C33A6oI=</latexit><latexit sha1_base64="XNCVDuiGnBPly0pxCjg0898xEBs=">AAAC4nicbVHLitRAFK1kfIzjq0eXbgobIQ3SJIOgIMLgA0Vm0aI9M9AJoVK5SYqpVELVzTA9IT/gTtz6V678FHdWuttHd3uh4HDOfdS9J6mlMOj7Pxx358rVa9d3b+zdvHX7zt3B/r1jUzWaw5RXstKnCTMghYIpCpRwWmtgZSLhJDl71esn56CNqNQnnNcQlSxXIhOcoaXiQRkqlkhG33vhOdNYALIRfUHDkmGRJO2bLm5r7/L5X7ELJWQ4CyeF8C5HdFket38SOhrKKqfrRaEWeYERjQdDf+wvgm6DYAWGZBWTeN+JwrTiTQkKuWTGzAK/xqi1jQWX0O2FjYGa8TOWw8xCxUowUbu4S0cfWSalWaXtU0gX7L8VLSuNmZeJzez3NZtaT/5PmzWYPYtaoeoGQfHloKyRFCvaH5mmQgNHObeAcS3sXykvmGYcrRVrUxa9a+Brm7QXjRK8SmGDlXiBmlnSAJZMqH6r9q2Qkn5kytCj/si/Vdu2l73XIhdoHh9Zv9VoK9kaEmyefxscH4wDfxx8eDI8fLmyZpc8IA+JRwLylBySd2RCpoST7+Sn4zo7bup+dr+4X5eprrOquU/Wwv32C33A6oI=</latexit><latexit sha1_base64="XNCVDuiGnBPly0pxCjg0898xEBs=">AAAC4nicbVHLitRAFK1kfIzjq0eXbgobIQ3SJIOgIMLgA0Vm0aI9M9AJoVK5SYqpVELVzTA9IT/gTtz6V678FHdWuttHd3uh4HDOfdS9J6mlMOj7Pxx358rVa9d3b+zdvHX7zt3B/r1jUzWaw5RXstKnCTMghYIpCpRwWmtgZSLhJDl71esn56CNqNQnnNcQlSxXIhOcoaXiQRkqlkhG33vhOdNYALIRfUHDkmGRJO2bLm5r7/L5X7ELJWQ4CyeF8C5HdFket38SOhrKKqfrRaEWeYERjQdDf+wvgm6DYAWGZBWTeN+JwrTiTQkKuWTGzAK/xqi1jQWX0O2FjYGa8TOWw8xCxUowUbu4S0cfWSalWaXtU0gX7L8VLSuNmZeJzez3NZtaT/5PmzWYPYtaoeoGQfHloKyRFCvaH5mmQgNHObeAcS3sXykvmGYcrRVrUxa9a+Brm7QXjRK8SmGDlXiBmlnSAJZMqH6r9q2Qkn5kytCj/si/Vdu2l73XIhdoHh9Zv9VoK9kaEmyefxscH4wDfxx8eDI8fLmyZpc8IA+JRwLylBySd2RCpoST7+Sn4zo7bup+dr+4X5eprrOquU/Wwv32C33A6oI=</latexit><latexit sha1_base64="XNCVDuiGnBPly0pxCjg0898xEBs=">AAAC4nicbVHLitRAFK1kfIzjq0eXbgobIQ3SJIOgIMLgA0Vm0aI9M9AJoVK5SYqpVELVzTA9IT/gTtz6V678FHdWuttHd3uh4HDOfdS9J6mlMOj7Pxx358rVa9d3b+zdvHX7zt3B/r1jUzWaw5RXstKnCTMghYIpCpRwWmtgZSLhJDl71esn56CNqNQnnNcQlSxXIhOcoaXiQRkqlkhG33vhOdNYALIRfUHDkmGRJO2bLm5r7/L5X7ELJWQ4CyeF8C5HdFket38SOhrKKqfrRaEWeYERjQdDf+wvgm6DYAWGZBWTeN+JwrTiTQkKuWTGzAK/xqi1jQWX0O2FjYGa8TOWw8xCxUowUbu4S0cfWSalWaXtU0gX7L8VLSuNmZeJzez3NZtaT/5PmzWYPYtaoeoGQfHloKyRFCvaH5mmQgNHObeAcS3sXykvmGYcrRVrUxa9a+Brm7QXjRK8SmGDlXiBmlnSAJZMqH6r9q2Qkn5kytCj/si/Vdu2l73XIhdoHh9Zv9VoK9kaEmyefxscH4wDfxx8eDI8fLmyZpc8IA+JRwLylBySd2RCpoST7+Sn4zo7bup+dr+4X5eprrOquU/Wwv32C33A6oI=</latexit>

Update #k+1 = #k � ↵kG(zk,#k)<latexit sha1_base64="/6WnXjvtI3OGtr3OXgquS2ijbjg=">AAACsHicbVFNa9tAEF2rX2n6Eac95rLEBByaGikEGgiF0AaSQw4prRODLcRoPbY2Wq3E7ijEFf4x/TW9tsf+m64cl8Z2BhYe78282ZmJCyUt+f6fhvfo8ZOnz9aer794+er1RnPzzaXNSyOwK3KVm14MFpXU2CVJCnuFQchihVdx+rnWr27QWJnrbzQpMMxgrOVICiBHRc2jwQ0YSpAgqtJ3wZR/5P+ZlL/nA1BFUsPT9vco3bsn7kbNlt/xZ8FXQTAHLTaPi2izEQ6GuSgz1CQUWNsP/ILCyllKoXC6PigtFiBSGGPfQQ0Z2rCaTTnlO44Z8lFu3NPEZ+z9igoyaydZ7DIzoMQuazX5kNYvaXQYVlIXJaEWd41GpeKU83plfCgNClITB0AY6f7KRQIGBLnFLnSZeRcoFiapbkstRT7EJVbRLRlwpEXKQOp6qupUKsW/grb8XI4T+qc621pun8ixJLt37q6nd1eS3UGC5fWvgsv9TuB3gi8HreNP89OssS22zdosYB/YMTtjF6zLBPvBfrJf7Le37/W8yIO7VK8xr3nLFsK7/gtDhteV</latexit><latexit sha1_base64="/6WnXjvtI3OGtr3OXgquS2ijbjg=">AAACsHicbVFNa9tAEF2rX2n6Eac95rLEBByaGikEGgiF0AaSQw4prRODLcRoPbY2Wq3E7ijEFf4x/TW9tsf+m64cl8Z2BhYe78282ZmJCyUt+f6fhvfo8ZOnz9aer794+er1RnPzzaXNSyOwK3KVm14MFpXU2CVJCnuFQchihVdx+rnWr27QWJnrbzQpMMxgrOVICiBHRc2jwQ0YSpAgqtJ3wZR/5P+ZlL/nA1BFUsPT9vco3bsn7kbNlt/xZ8FXQTAHLTaPi2izEQ6GuSgz1CQUWNsP/ILCyllKoXC6PigtFiBSGGPfQQ0Z2rCaTTnlO44Z8lFu3NPEZ+z9igoyaydZ7DIzoMQuazX5kNYvaXQYVlIXJaEWd41GpeKU83plfCgNClITB0AY6f7KRQIGBLnFLnSZeRcoFiapbkstRT7EJVbRLRlwpEXKQOp6qupUKsW/grb8XI4T+qc621pun8ixJLt37q6nd1eS3UGC5fWvgsv9TuB3gi8HreNP89OssS22zdosYB/YMTtjF6zLBPvBfrJf7Le37/W8yIO7VK8xr3nLFsK7/gtDhteV</latexit><latexit sha1_base64="/6WnXjvtI3OGtr3OXgquS2ijbjg=">AAACsHicbVFNa9tAEF2rX2n6Eac95rLEBByaGikEGgiF0AaSQw4prRODLcRoPbY2Wq3E7ijEFf4x/TW9tsf+m64cl8Z2BhYe78282ZmJCyUt+f6fhvfo8ZOnz9aer794+er1RnPzzaXNSyOwK3KVm14MFpXU2CVJCnuFQchihVdx+rnWr27QWJnrbzQpMMxgrOVICiBHRc2jwQ0YSpAgqtJ3wZR/5P+ZlL/nA1BFUsPT9vco3bsn7kbNlt/xZ8FXQTAHLTaPi2izEQ6GuSgz1CQUWNsP/ILCyllKoXC6PigtFiBSGGPfQQ0Z2rCaTTnlO44Z8lFu3NPEZ+z9igoyaydZ7DIzoMQuazX5kNYvaXQYVlIXJaEWd41GpeKU83plfCgNClITB0AY6f7KRQIGBLnFLnSZeRcoFiapbkstRT7EJVbRLRlwpEXKQOp6qupUKsW/grb8XI4T+qc621pun8ixJLt37q6nd1eS3UGC5fWvgsv9TuB3gi8HreNP89OssS22zdosYB/YMTtjF6zLBPvBfrJf7Le37/W8yIO7VK8xr3nLFsK7/gtDhteV</latexit><latexit sha1_base64="/6WnXjvtI3OGtr3OXgquS2ijbjg=">AAACsHicbVFNa9tAEF2rX2n6Eac95rLEBByaGikEGgiF0AaSQw4prRODLcRoPbY2Wq3E7ijEFf4x/TW9tsf+m64cl8Z2BhYe78282ZmJCyUt+f6fhvfo8ZOnz9aer794+er1RnPzzaXNSyOwK3KVm14MFpXU2CVJCnuFQchihVdx+rnWr27QWJnrbzQpMMxgrOVICiBHRc2jwQ0YSpAgqtJ3wZR/5P+ZlL/nA1BFUsPT9vco3bsn7kbNlt/xZ8FXQTAHLTaPi2izEQ6GuSgz1CQUWNsP/ILCyllKoXC6PigtFiBSGGPfQQ0Z2rCaTTnlO44Z8lFu3NPEZ+z9igoyaydZ7DIzoMQuazX5kNYvaXQYVlIXJaEWd41GpeKU83plfCgNClITB0AY6f7KRQIGBLnFLnSZeRcoFiapbkstRT7EJVbRLRlwpEXKQOp6qupUKsW/grb8XI4T+qc621pun8ixJLt37q6nd1eS3UGC5fWvgsv9TuB3gi8HreNP89OssS22zdosYB/YMTtjF6zLBPvBfrJf7Le37/W8yIO7VK8xr3nLFsK7/gtDhteV</latexit>

Compute G(zk,#k) = �(zk)r#k log p(zk;#k)<latexit sha1_base64="pP9kp46QO/IZm8uhki7Xz6N6M8w=">AAACwnicbVFti9NAEN7Gt7v61tOPflksQgtHSURQEOHQg/PDCRXt3UETwmQ7TdZuNnF3clyN/VX+mvuqf8RNr0JfHFh4eJ5nZnZmklJJS75/3fJu3b5z997efvv+g4ePHncOnpzZojICR6JQhblIwKKSGkckSeFFaRDyROF5MvvQ6OeXaKws9FealxjlkGo5lQLIUXHn0/5J70c8OwwvwVCGBPGsz9/xcJjJhu/zUEOiIK7XDAseqiLlZWN4u57Yjjtdf+Avg++CYAW6bBXD+KAVhZNCVDlqEgqsHQd+SVHtakqhcNEOK4sliBmkOHZQQ442qpdzL/gLx0z4tDDuaeJLdj2jhtzaeZ44Zw6U2W2tIf+njSuavolqqcuKUIubRtNKcSp4s0Q+kQYFqbkDIIx0f+UiAwOC3Ko3uixrlyg2JqmvKi1FMcEtVtEVGXCkRcpB6maq+kQqxb+AtvxUphn9U13ZRu4dy1SSPTx199T9HbM7SLC9/l1w9nIQ+IPg86vu0fvVafbYM/ac9VjAXrMj9pEN2YgJ9otds9/sj3fsffO+e/bG6rVWOU/ZRng//wLn8t5Q</latexit><latexit sha1_base64="pP9kp46QO/IZm8uhki7Xz6N6M8w=">AAACwnicbVFti9NAEN7Gt7v61tOPflksQgtHSURQEOHQg/PDCRXt3UETwmQ7TdZuNnF3clyN/VX+mvuqf8RNr0JfHFh4eJ5nZnZmklJJS75/3fJu3b5z997efvv+g4ePHncOnpzZojICR6JQhblIwKKSGkckSeFFaRDyROF5MvvQ6OeXaKws9FealxjlkGo5lQLIUXHn0/5J70c8OwwvwVCGBPGsz9/xcJjJhu/zUEOiIK7XDAseqiLlZWN4u57Yjjtdf+Avg++CYAW6bBXD+KAVhZNCVDlqEgqsHQd+SVHtakqhcNEOK4sliBmkOHZQQ442qpdzL/gLx0z4tDDuaeJLdj2jhtzaeZ44Zw6U2W2tIf+njSuavolqqcuKUIubRtNKcSp4s0Q+kQYFqbkDIIx0f+UiAwOC3Ko3uixrlyg2JqmvKi1FMcEtVtEVGXCkRcpB6maq+kQqxb+AtvxUphn9U13ZRu4dy1SSPTx199T9HbM7SLC9/l1w9nIQ+IPg86vu0fvVafbYM/ac9VjAXrMj9pEN2YgJ9otds9/sj3fsffO+e/bG6rVWOU/ZRng//wLn8t5Q</latexit><latexit sha1_base64="pP9kp46QO/IZm8uhki7Xz6N6M8w=">AAACwnicbVFti9NAEN7Gt7v61tOPflksQgtHSURQEOHQg/PDCRXt3UETwmQ7TdZuNnF3clyN/VX+mvuqf8RNr0JfHFh4eJ5nZnZmklJJS75/3fJu3b5z997efvv+g4ePHncOnpzZojICR6JQhblIwKKSGkckSeFFaRDyROF5MvvQ6OeXaKws9FealxjlkGo5lQLIUXHn0/5J70c8OwwvwVCGBPGsz9/xcJjJhu/zUEOiIK7XDAseqiLlZWN4u57Yjjtdf+Avg++CYAW6bBXD+KAVhZNCVDlqEgqsHQd+SVHtakqhcNEOK4sliBmkOHZQQ442qpdzL/gLx0z4tDDuaeJLdj2jhtzaeZ44Zw6U2W2tIf+njSuavolqqcuKUIubRtNKcSp4s0Q+kQYFqbkDIIx0f+UiAwOC3Ko3uixrlyg2JqmvKi1FMcEtVtEVGXCkRcpB6maq+kQqxb+AtvxUphn9U13ZRu4dy1SSPTx199T9HbM7SLC9/l1w9nIQ+IPg86vu0fvVafbYM/ac9VjAXrMj9pEN2YgJ9otds9/sj3fsffO+e/bG6rVWOU/ZRng//wLn8t5Q</latexit><latexit sha1_base64="pP9kp46QO/IZm8uhki7Xz6N6M8w=">AAACwnicbVFti9NAEN7Gt7v61tOPflksQgtHSURQEOHQg/PDCRXt3UETwmQ7TdZuNnF3clyN/VX+mvuqf8RNr0JfHFh4eJ5nZnZmklJJS75/3fJu3b5z997efvv+g4ePHncOnpzZojICR6JQhblIwKKSGkckSeFFaRDyROF5MvvQ6OeXaKws9FealxjlkGo5lQLIUXHn0/5J70c8OwwvwVCGBPGsz9/xcJjJhu/zUEOiIK7XDAseqiLlZWN4u57Yjjtdf+Avg++CYAW6bBXD+KAVhZNCVDlqEgqsHQd+SVHtakqhcNEOK4sliBmkOHZQQ442qpdzL/gLx0z4tDDuaeJLdj2jhtzaeZ44Zw6U2W2tIf+njSuavolqqcuKUIubRtNKcSp4s0Q+kQYFqbkDIIx0f+UiAwOC3Ko3uixrlyg2JqmvKi1FMcEtVtEVGXCkRcpB6maq+kQqxb+AtvxUphn9U13ZRu4dy1SSPTx199T9HbM7SLC9/l1w9nIQ+IPg86vu0fvVafbYM/ac9VjAXrMj9pEN2YgJ9otds9/sj3fsffO+e/bG6rVWOU/ZRng//wLn8t5Q</latexit>

Sample zk ⇠ p(z;#k)<latexit sha1_base64="jJ1keKWEsNtnOjcyYyGyTE4uVyc=">AAACjnicbVFtaxNBEN6cL631LdVP4pfFIKQg4U7ECiIWFeyHfqho2kJyHHObSTJkd+/YnStNj+Kv8av+Hv+Ne2kEkziw8PA8M8/szOSlJs9x/LsV3bh56/bW9p2du/fuP3jY3n104ovKKeyrQhfuLAePmiz2mVjjWekQTK7xNJ99bPTTc3SeCvuN5yWmBiaWxqSAA5W1n1xmMzn0ZGTZvXw7PAfHU2TIZntZuxP34kXITZAsQUcs4zjbbaXDUaEqg5aVBu8HSVxyWgdLUhqvdoaVxxLUDCY4CNCCQZ/Wixmu5PPAjOS4cOFZlgv234oajPdzk4dMAzz161pD/k8bVDx+k9Zky4rRqutG40pLLmSzEDkih4r1PABQjsJfpZqCA8VhbStdFt4lqpVJ6ovKkipGuMZqvmAHgfTIBsg2U9WfSWv5FayXRzSZ8l812DZy9xNNiP2Lo3Abu7eRHA6SrK9/E5y87CVxL/nyqnPwYXmabfFUPBNdkYh9cSAOxbHoCyW+ix/ip/gVtaPX0bvo/XVq1FrWPBYrER3+AYKIykQ=</latexit><latexit sha1_base64="jJ1keKWEsNtnOjcyYyGyTE4uVyc=">AAACjnicbVFtaxNBEN6cL631LdVP4pfFIKQg4U7ECiIWFeyHfqho2kJyHHObSTJkd+/YnStNj+Kv8av+Hv+Ne2kEkziw8PA8M8/szOSlJs9x/LsV3bh56/bW9p2du/fuP3jY3n104ovKKeyrQhfuLAePmiz2mVjjWekQTK7xNJ99bPTTc3SeCvuN5yWmBiaWxqSAA5W1n1xmMzn0ZGTZvXw7PAfHU2TIZntZuxP34kXITZAsQUcs4zjbbaXDUaEqg5aVBu8HSVxyWgdLUhqvdoaVxxLUDCY4CNCCQZ/Wixmu5PPAjOS4cOFZlgv234oajPdzk4dMAzz161pD/k8bVDx+k9Zky4rRqutG40pLLmSzEDkih4r1PABQjsJfpZqCA8VhbStdFt4lqpVJ6ovKkipGuMZqvmAHgfTIBsg2U9WfSWv5FayXRzSZ8l812DZy9xNNiP2Lo3Abu7eRHA6SrK9/E5y87CVxL/nyqnPwYXmabfFUPBNdkYh9cSAOxbHoCyW+ix/ip/gVtaPX0bvo/XVq1FrWPBYrER3+AYKIykQ=</latexit><latexit sha1_base64="jJ1keKWEsNtnOjcyYyGyTE4uVyc=">AAACjnicbVFtaxNBEN6cL631LdVP4pfFIKQg4U7ECiIWFeyHfqho2kJyHHObSTJkd+/YnStNj+Kv8av+Hv+Ne2kEkziw8PA8M8/szOSlJs9x/LsV3bh56/bW9p2du/fuP3jY3n104ovKKeyrQhfuLAePmiz2mVjjWekQTK7xNJ99bPTTc3SeCvuN5yWmBiaWxqSAA5W1n1xmMzn0ZGTZvXw7PAfHU2TIZntZuxP34kXITZAsQUcs4zjbbaXDUaEqg5aVBu8HSVxyWgdLUhqvdoaVxxLUDCY4CNCCQZ/Wixmu5PPAjOS4cOFZlgv234oajPdzk4dMAzz161pD/k8bVDx+k9Zky4rRqutG40pLLmSzEDkih4r1PABQjsJfpZqCA8VhbStdFt4lqpVJ6ovKkipGuMZqvmAHgfTIBsg2U9WfSWv5FayXRzSZ8l812DZy9xNNiP2Lo3Abu7eRHA6SrK9/E5y87CVxL/nyqnPwYXmabfFUPBNdkYh9cSAOxbHoCyW+ix/ip/gVtaPX0bvo/XVq1FrWPBYrER3+AYKIykQ=</latexit><latexit sha1_base64="jJ1keKWEsNtnOjcyYyGyTE4uVyc=">AAACjnicbVFtaxNBEN6cL631LdVP4pfFIKQg4U7ECiIWFeyHfqho2kJyHHObSTJkd+/YnStNj+Kv8av+Hv+Ne2kEkziw8PA8M8/szOSlJs9x/LsV3bh56/bW9p2du/fuP3jY3n104ovKKeyrQhfuLAePmiz2mVjjWekQTK7xNJ99bPTTc3SeCvuN5yWmBiaWxqSAA5W1n1xmMzn0ZGTZvXw7PAfHU2TIZntZuxP34kXITZAsQUcs4zjbbaXDUaEqg5aVBu8HSVxyWgdLUhqvdoaVxxLUDCY4CNCCQZ/Wixmu5PPAjOS4cOFZlgv234oajPdzk4dMAzz161pD/k8bVDx+k9Zky4rRqutG40pLLmSzEDkih4r1PABQjsJfpZqCA8VhbStdFt4lqpVJ6ovKkipGuMZqvmAHgfTIBsg2U9WfSWv5FayXRzSZ8l812DZy9xNNiP2Lo3Abu7eRHA6SrK9/E5y87CVxL/nyqnPwYXmabfFUPBNdkYh9cSAOxbHoCyW+ix/ip/gVtaPX0bvo/XVq1FrWPBYrER3+AYKIykQ=</latexit>

Page 38: reinforcement learning through the optimization lens

Reinforce AlgorithmJ(#) := Ep(z;#)[�(z)]

<latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit><latexit sha1_base64="DqEbQq2Yo/BjN7jcTcmtxXkfWFA=">AAACq3icbVHdahNBFJ6sVWvqT6qX3gyGwgY07IqoKELxB6X0IqWmLWaXcHZykh06O7vMnC1Nl7yJT+Nt+wK+jbNpxCbxwIGP7zv/JymUtBQEvxverY3bd+5u3mtu3X/w8FFr+/GRzUsjsC9ylZuTBCwqqbFPkhSeFAYhSxQeJ6efav34DI2Vuf5O0wLjDCZajqUActSw9XrPj87AUIoEnXcfeJQBpUlSfZkNq8K/eP9PnA141Eulf9HhcXPYagfdYG58HYQL0GYL6w23G3E0ykWZoSahwNpBGBQUV666FApnzai0WIA4hQkOHNSQoY2r+YIzvuOYER/nxrkmPmdvZlSQWTvNEhdZz29XtZr8nzYoafw2rqQuSkItrhuNS8Up5/W1+EgaFKSmDoAw0s3KRQoGBLmbLnWZ1y5QLG1SnZdainyEK6yiczLgSIuUgdT1VtVXqRQ/BG35vpyk9Fd1ZWvZ/ywnkuzzffc43VkLdg8JV8+/Do5edsOgGx68au9+XLxmkz1lz5jPQvaG7bJvrMf6TLCf7Be7ZFfeC+/Q++FF16FeY5HzhC2Zh38AQTjVCA==</latexit>

#k+1 = #k � ↵k�(zk)(zk + tanh(#k))<latexit sha1_base64="ys3KLRJikVG9dud1JMMn7l7ZGPg=">AAACz3icbVHLbtNAFJ2YVymPprBkMyJCSlSIbIREN5UqQIJFF6kgbUVsWdeTm3jk8diauS4NVhBb/orf4AfYwicwToNoEq40o6Nz7mPm3KRU0pLv/2h5167fuHlr6/b2nbv37u+0dx+c2KIyAoeiUIU5S8CikhqHJEnhWWkQ8kThaZK9bvTTczRWFvoDzUqMcphqOZECyFFx+2N4DoZSJIjrbC+Y8wP+j8n4Mx6CKtMGhoNUdj/HWa+5+B4PixINUGE05FgT6HTevVLa68Xtjt/3F8E3QbAEHbaMQbzbisJxIaocNQkF1o4Cv6Sodj2lUDjfDiuLJYgMpjhysBlso3phwpw/ccyYTwrjjia+YK9W1JBbO8sTl5kDpXZda8j/aaOKJvtRLXVZEWpxOWhSKU4FbxzlY2lQkJo5AMJI91YuUjAgyPm+MmXRu0Sx8pP6otJSFGNcYxVdkAFHWqQcpG5+Vb+VSvH3oC0/ktOU/qqubSN338ipJPv0yC1X9zaS3UKCdfs3wcnzfuD3g+MXncNXy9VssUfsMeuygL1kh+wdG7AhE+w7+8l+sd/esffJ++J9vUz1Wsuah2wlvG9/AJo45DI=</latexit><latexit sha1_base64="ys3KLRJikVG9dud1JMMn7l7ZGPg=">AAACz3icbVHLbtNAFJ2YVymPprBkMyJCSlSIbIREN5UqQIJFF6kgbUVsWdeTm3jk8diauS4NVhBb/orf4AfYwicwToNoEq40o6Nz7mPm3KRU0pLv/2h5167fuHlr6/b2nbv37u+0dx+c2KIyAoeiUIU5S8CikhqHJEnhWWkQ8kThaZK9bvTTczRWFvoDzUqMcphqOZECyFFx+2N4DoZSJIjrbC+Y8wP+j8n4Mx6CKtMGhoNUdj/HWa+5+B4PixINUGE05FgT6HTevVLa68Xtjt/3F8E3QbAEHbaMQbzbisJxIaocNQkF1o4Cv6Sodj2lUDjfDiuLJYgMpjhysBlso3phwpw/ccyYTwrjjia+YK9W1JBbO8sTl5kDpXZda8j/aaOKJvtRLXVZEWpxOWhSKU4FbxzlY2lQkJo5AMJI91YuUjAgyPm+MmXRu0Sx8pP6otJSFGNcYxVdkAFHWqQcpG5+Vb+VSvH3oC0/ktOU/qqubSN338ipJPv0yC1X9zaS3UKCdfs3wcnzfuD3g+MXncNXy9VssUfsMeuygL1kh+wdG7AhE+w7+8l+sd/esffJ++J9vUz1Wsuah2wlvG9/AJo45DI=</latexit><latexit sha1_base64="ys3KLRJikVG9dud1JMMn7l7ZGPg=">AAACz3icbVHLbtNAFJ2YVymPprBkMyJCSlSIbIREN5UqQIJFF6kgbUVsWdeTm3jk8diauS4NVhBb/orf4AfYwicwToNoEq40o6Nz7mPm3KRU0pLv/2h5167fuHlr6/b2nbv37u+0dx+c2KIyAoeiUIU5S8CikhqHJEnhWWkQ8kThaZK9bvTTczRWFvoDzUqMcphqOZECyFFx+2N4DoZSJIjrbC+Y8wP+j8n4Mx6CKtMGhoNUdj/HWa+5+B4PixINUGE05FgT6HTevVLa68Xtjt/3F8E3QbAEHbaMQbzbisJxIaocNQkF1o4Cv6Sodj2lUDjfDiuLJYgMpjhysBlso3phwpw/ccyYTwrjjia+YK9W1JBbO8sTl5kDpXZda8j/aaOKJvtRLXVZEWpxOWhSKU4FbxzlY2lQkJo5AMJI91YuUjAgyPm+MmXRu0Sx8pP6otJSFGNcYxVdkAFHWqQcpG5+Vb+VSvH3oC0/ktOU/qqubSN338ipJPv0yC1X9zaS3UKCdfs3wcnzfuD3g+MXncNXy9VssUfsMeuygL1kh+wdG7AhE+w7+8l+sd/esffJ++J9vUz1Wsuah2wlvG9/AJo45DI=</latexit><latexit sha1_base64="ys3KLRJikVG9dud1JMMn7l7ZGPg=">AAACz3icbVHLbtNAFJ2YVymPprBkMyJCSlSIbIREN5UqQIJFF6kgbUVsWdeTm3jk8diauS4NVhBb/orf4AfYwicwToNoEq40o6Nz7mPm3KRU0pLv/2h5167fuHlr6/b2nbv37u+0dx+c2KIyAoeiUIU5S8CikhqHJEnhWWkQ8kThaZK9bvTTczRWFvoDzUqMcphqOZECyFFx+2N4DoZSJIjrbC+Y8wP+j8n4Mx6CKtMGhoNUdj/HWa+5+B4PixINUGE05FgT6HTevVLa68Xtjt/3F8E3QbAEHbaMQbzbisJxIaocNQkF1o4Cv6Sodj2lUDjfDiuLJYgMpjhysBlso3phwpw/ccyYTwrjjia+YK9W1JBbO8sTl5kDpXZda8j/aaOKJvtRLXVZEWpxOWhSKU4FbxzlY2lQkJo5AMJI91YuUjAgyPm+MmXRu0Sx8pP6otJSFGNcYxVdkAFHWqQcpG5+Vb+VSvH3oC0/ktOU/qqubSN338ipJPv0yC1X9zaS3UKCdfs3wcnzfuD3g+MXncNXy9VssUfsMeuygL1kh+wdG7AhE+w7+8l+sd/esffJ++J9vUz1Wsuah2wlvG9/AJo45DI=</latexit>

p(z;#) =dY

i=1

exp(zi#i)

exp(�#i) + exp(#i)<latexit sha1_base64="jwAaB2PnwnLlFM05Bp8BogPJn2w=">AAAC1nicbVFNixNBEO2MX+v6sVk9emkMQoIaZkRQkJUFBT3sYUWzG8iMQ6WnJmm2p2forlkSh3gTr/4rf4M/wqte7ZlE2SQWNDzee13V/WpcKGnJ93+0vEuXr1y9tnN998bNW7f32vt3TmxeGoEDkavcDMdgUUmNA5KkcFgYhGys8HR89qrWT8/RWJnrDzQvMMpgomUqBZCj4jYU3U8vwnMwNEWCHj/gYWHyJK7kQbD4mPAwNSCqEGfOF0v+zxnL3mJJP77IPWyoNVfc7vh9vym+DYIV6LBVHcf7rShMclFmqEkosHYU+AVFlesphcLFblhaLECcwQRHDmrI0EZVk8WCP3BMwtPcuKOJN+zFGxVk1s6zsXNmQFO7qdXk/7RRSenzqJK6KAm1WA5KS8Up53WwPJEGBam5AyCMdG/lYgouPXLxr01pehco1n5SzUotRZ7gBqtoRgYcaZEykLr+VfVGKsXfg7b8SE6m9Fd1bWu5+1pOJNlHR27HurdldgsJNuPfBidP+oHfD9497Ry+XK1mh91j91mXBewZO2Rv2TEbMMG+s5/sF/vtDb3P3hfv69LqtVZ37rK18r79AaK+59Q=</latexit><latexit sha1_base64="jwAaB2PnwnLlFM05Bp8BogPJn2w=">AAAC1nicbVFNixNBEO2MX+v6sVk9emkMQoIaZkRQkJUFBT3sYUWzG8iMQ6WnJmm2p2forlkSh3gTr/4rf4M/wqte7ZlE2SQWNDzee13V/WpcKGnJ93+0vEuXr1y9tnN998bNW7f32vt3TmxeGoEDkavcDMdgUUmNA5KkcFgYhGys8HR89qrWT8/RWJnrDzQvMMpgomUqBZCj4jYU3U8vwnMwNEWCHj/gYWHyJK7kQbD4mPAwNSCqEGfOF0v+zxnL3mJJP77IPWyoNVfc7vh9vym+DYIV6LBVHcf7rShMclFmqEkosHYU+AVFlesphcLFblhaLECcwQRHDmrI0EZVk8WCP3BMwtPcuKOJN+zFGxVk1s6zsXNmQFO7qdXk/7RRSenzqJK6KAm1WA5KS8Up53WwPJEGBam5AyCMdG/lYgouPXLxr01pehco1n5SzUotRZ7gBqtoRgYcaZEykLr+VfVGKsXfg7b8SE6m9Fd1bWu5+1pOJNlHR27HurdldgsJNuPfBidP+oHfD9497Ry+XK1mh91j91mXBewZO2Rv2TEbMMG+s5/sF/vtDb3P3hfv69LqtVZ37rK18r79AaK+59Q=</latexit><latexit sha1_base64="jwAaB2PnwnLlFM05Bp8BogPJn2w=">AAAC1nicbVFNixNBEO2MX+v6sVk9emkMQoIaZkRQkJUFBT3sYUWzG8iMQ6WnJmm2p2forlkSh3gTr/4rf4M/wqte7ZlE2SQWNDzee13V/WpcKGnJ93+0vEuXr1y9tnN998bNW7f32vt3TmxeGoEDkavcDMdgUUmNA5KkcFgYhGys8HR89qrWT8/RWJnrDzQvMMpgomUqBZCj4jYU3U8vwnMwNEWCHj/gYWHyJK7kQbD4mPAwNSCqEGfOF0v+zxnL3mJJP77IPWyoNVfc7vh9vym+DYIV6LBVHcf7rShMclFmqEkosHYU+AVFlesphcLFblhaLECcwQRHDmrI0EZVk8WCP3BMwtPcuKOJN+zFGxVk1s6zsXNmQFO7qdXk/7RRSenzqJK6KAm1WA5KS8Up53WwPJEGBam5AyCMdG/lYgouPXLxr01pehco1n5SzUotRZ7gBqtoRgYcaZEykLr+VfVGKsXfg7b8SE6m9Fd1bWu5+1pOJNlHR27HurdldgsJNuPfBidP+oHfD9497Ry+XK1mh91j91mXBewZO2Rv2TEbMMG+s5/sF/vtDb3P3hfv69LqtVZ37rK18r79AaK+59Q=</latexit><latexit sha1_base64="jwAaB2PnwnLlFM05Bp8BogPJn2w=">AAAC1nicbVFNixNBEO2MX+v6sVk9emkMQoIaZkRQkJUFBT3sYUWzG8iMQ6WnJmm2p2forlkSh3gTr/4rf4M/wqte7ZlE2SQWNDzee13V/WpcKGnJ93+0vEuXr1y9tnN998bNW7f32vt3TmxeGoEDkavcDMdgUUmNA5KkcFgYhGys8HR89qrWT8/RWJnrDzQvMMpgomUqBZCj4jYU3U8vwnMwNEWCHj/gYWHyJK7kQbD4mPAwNSCqEGfOF0v+zxnL3mJJP77IPWyoNVfc7vh9vym+DYIV6LBVHcf7rShMclFmqEkosHYU+AVFlesphcLFblhaLECcwQRHDmrI0EZVk8WCP3BMwtPcuKOJN+zFGxVk1s6zsXNmQFO7qdXk/7RRSenzqJK6KAm1WA5KS8Up53WwPJEGBam5AyCMdG/lYgouPXLxr01pehco1n5SzUotRZ7gBqtoRgYcaZEykLr+VfVGKsXfg7b8SE6m9Fd1bWu5+1pOJNlHR27HurdldgsJNuPfBidP+oHfD9497Ry+XK1mh91j91mXBewZO2Rv2TEbMMG+s5/sF/vtDb3P3hfv69LqtVZ37rK18r79AaK+59Q=</latexit>

z 2 {�1, 1}d<latexit sha1_base64="qQxcFHPhMdwQZteMZ4v5odvaQPM=">AAAChXicbVHfaxNBEN6cra1t1VQffVkahBbacFeK+qQFBfvQh4qmLeTOMLc3SYbu7R27c6XxyH/iq/5P/jfupSmYxIGBj++b35OWmhyH4Z9W8Ght/fHG5pOt7Z2nz563d19cuqKyCnuq0IW9TsGhJoM9JtZ4XVqEPNV4ld58bPSrW7SOCvONJyUmOYwMDUkBe2rQbv+QMRkZ10fRYRRPv2eDdifshjOTqyCag46Y28Vgt5XEWaGqHA0rDc71o7DkpAbLpDROt+LKYQnqBkbY99BAji6pZ6NP5WvPZHJYWO+G5Yz9N6OG3LlJnvrIHHjslrWG/J/Wr3j4LqnJlBWjUfeNhpWWXMjmDjIji4r1xANQlvysUo3BgmJ/rYUus9olqoVN6rvKkCoyXGI137EFTzrkHMg0W9WfSWv5FYyT5zQa84Pqyzby/icaEbvDc/8Sc7AS7B8SLZ9/FVwed6OwG3056Zy+n79mU7wSe2JfROKtOBVn4kL0hBK34qf4JX4HG8FRcBK8uQ8NWvOcl2LBgg9/AZGnxdw=</latexit><latexit sha1_base64="qQxcFHPhMdwQZteMZ4v5odvaQPM=">AAAChXicbVHfaxNBEN6cra1t1VQffVkahBbacFeK+qQFBfvQh4qmLeTOMLc3SYbu7R27c6XxyH/iq/5P/jfupSmYxIGBj++b35OWmhyH4Z9W8Ght/fHG5pOt7Z2nz563d19cuqKyCnuq0IW9TsGhJoM9JtZ4XVqEPNV4ld58bPSrW7SOCvONJyUmOYwMDUkBe2rQbv+QMRkZ10fRYRRPv2eDdifshjOTqyCag46Y28Vgt5XEWaGqHA0rDc71o7DkpAbLpDROt+LKYQnqBkbY99BAji6pZ6NP5WvPZHJYWO+G5Yz9N6OG3LlJnvrIHHjslrWG/J/Wr3j4LqnJlBWjUfeNhpWWXMjmDjIji4r1xANQlvysUo3BgmJ/rYUus9olqoVN6rvKkCoyXGI137EFTzrkHMg0W9WfSWv5FYyT5zQa84Pqyzby/icaEbvDc/8Sc7AS7B8SLZ9/FVwed6OwG3056Zy+n79mU7wSe2JfROKtOBVn4kL0hBK34qf4JX4HG8FRcBK8uQ8NWvOcl2LBgg9/AZGnxdw=</latexit><latexit sha1_base64="qQxcFHPhMdwQZteMZ4v5odvaQPM=">AAAChXicbVHfaxNBEN6cra1t1VQffVkahBbacFeK+qQFBfvQh4qmLeTOMLc3SYbu7R27c6XxyH/iq/5P/jfupSmYxIGBj++b35OWmhyH4Z9W8Ght/fHG5pOt7Z2nz563d19cuqKyCnuq0IW9TsGhJoM9JtZ4XVqEPNV4ld58bPSrW7SOCvONJyUmOYwMDUkBe2rQbv+QMRkZ10fRYRRPv2eDdifshjOTqyCag46Y28Vgt5XEWaGqHA0rDc71o7DkpAbLpDROt+LKYQnqBkbY99BAji6pZ6NP5WvPZHJYWO+G5Yz9N6OG3LlJnvrIHHjslrWG/J/Wr3j4LqnJlBWjUfeNhpWWXMjmDjIji4r1xANQlvysUo3BgmJ/rYUus9olqoVN6rvKkCoyXGI137EFTzrkHMg0W9WfSWv5FYyT5zQa84Pqyzby/icaEbvDc/8Sc7AS7B8SLZ9/FVwed6OwG3056Zy+n79mU7wSe2JfROKtOBVn4kL0hBK34qf4JX4HG8FRcBK8uQ8NWvOcl2LBgg9/AZGnxdw=</latexit><latexit sha1_base64="qQxcFHPhMdwQZteMZ4v5odvaQPM=">AAAChXicbVHfaxNBEN6cra1t1VQffVkahBbacFeK+qQFBfvQh4qmLeTOMLc3SYbu7R27c6XxyH/iq/5P/jfupSmYxIGBj++b35OWmhyH4Z9W8Ght/fHG5pOt7Z2nz563d19cuqKyCnuq0IW9TsGhJoM9JtZ4XVqEPNV4ld58bPSrW7SOCvONJyUmOYwMDUkBe2rQbv+QMRkZ10fRYRRPv2eDdifshjOTqyCag46Y28Vgt5XEWaGqHA0rDc71o7DkpAbLpDROt+LKYQnqBkbY99BAji6pZ6NP5WvPZHJYWO+G5Yz9N6OG3LlJnvrIHHjslrWG/J/Wr3j4LqnJlBWjUfeNhpWWXMjmDjIji4r1xANQlvysUo3BgmJ/rYUus9olqoVN6rvKkCoyXGI137EFTzrkHMg0W9WfSWv5FYyT5zQa84Pqyzby/icaEbvDc/8Sc7AS7B8SLZ9/FVwed6OwG3056Zy+n79mU7wSe2JfROKtOBVn4kL0hBK34qf4JX4HG8FRcBK8uQ8NWvOcl2LBgg9/AZGnxdw=</latexit>

Generic algorithm for solving discrete optimization:

Compute

Update #k+1 = #k � ↵kG(zk,#k)<latexit sha1_base64="/6WnXjvtI3OGtr3OXgquS2ijbjg=">AAACsHicbVFNa9tAEF2rX2n6Eac95rLEBByaGikEGgiF0AaSQw4prRODLcRoPbY2Wq3E7ijEFf4x/TW9tsf+m64cl8Z2BhYe78282ZmJCyUt+f6fhvfo8ZOnz9aer794+er1RnPzzaXNSyOwK3KVm14MFpXU2CVJCnuFQchihVdx+rnWr27QWJnrbzQpMMxgrOVICiBHRc2jwQ0YSpAgqtJ3wZR/5P+ZlL/nA1BFUsPT9vco3bsn7kbNlt/xZ8FXQTAHLTaPi2izEQ6GuSgz1CQUWNsP/ILCyllKoXC6PigtFiBSGGPfQQ0Z2rCaTTnlO44Z8lFu3NPEZ+z9igoyaydZ7DIzoMQuazX5kNYvaXQYVlIXJaEWd41GpeKU83plfCgNClITB0AY6f7KRQIGBLnFLnSZeRcoFiapbkstRT7EJVbRLRlwpEXKQOp6qupUKsW/grb8XI4T+qc621pun8ixJLt37q6nd1eS3UGC5fWvgsv9TuB3gi8HreNP89OssS22zdosYB/YMTtjF6zLBPvBfrJf7Le37/W8yIO7VK8xr3nLFsK7/gtDhteV</latexit><latexit sha1_base64="/6WnXjvtI3OGtr3OXgquS2ijbjg=">AAACsHicbVFNa9tAEF2rX2n6Eac95rLEBByaGikEGgiF0AaSQw4prRODLcRoPbY2Wq3E7ijEFf4x/TW9tsf+m64cl8Z2BhYe78282ZmJCyUt+f6fhvfo8ZOnz9aer794+er1RnPzzaXNSyOwK3KVm14MFpXU2CVJCnuFQchihVdx+rnWr27QWJnrbzQpMMxgrOVICiBHRc2jwQ0YSpAgqtJ3wZR/5P+ZlL/nA1BFUsPT9vco3bsn7kbNlt/xZ8FXQTAHLTaPi2izEQ6GuSgz1CQUWNsP/ILCyllKoXC6PigtFiBSGGPfQQ0Z2rCaTTnlO44Z8lFu3NPEZ+z9igoyaydZ7DIzoMQuazX5kNYvaXQYVlIXJaEWd41GpeKU83plfCgNClITB0AY6f7KRQIGBLnFLnSZeRcoFiapbkstRT7EJVbRLRlwpEXKQOp6qupUKsW/grb8XI4T+qc621pun8ixJLt37q6nd1eS3UGC5fWvgsv9TuB3gi8HreNP89OssS22zdosYB/YMTtjF6zLBPvBfrJf7Le37/W8yIO7VK8xr3nLFsK7/gtDhteV</latexit><latexit sha1_base64="/6WnXjvtI3OGtr3OXgquS2ijbjg=">AAACsHicbVFNa9tAEF2rX2n6Eac95rLEBByaGikEGgiF0AaSQw4prRODLcRoPbY2Wq3E7ijEFf4x/TW9tsf+m64cl8Z2BhYe78282ZmJCyUt+f6fhvfo8ZOnz9aer794+er1RnPzzaXNSyOwK3KVm14MFpXU2CVJCnuFQchihVdx+rnWr27QWJnrbzQpMMxgrOVICiBHRc2jwQ0YSpAgqtJ3wZR/5P+ZlL/nA1BFUsPT9vco3bsn7kbNlt/xZ8FXQTAHLTaPi2izEQ6GuSgz1CQUWNsP/ILCyllKoXC6PigtFiBSGGPfQQ0Z2rCaTTnlO44Z8lFu3NPEZ+z9igoyaydZ7DIzoMQuazX5kNYvaXQYVlIXJaEWd41GpeKU83plfCgNClITB0AY6f7KRQIGBLnFLnSZeRcoFiapbkstRT7EJVbRLRlwpEXKQOp6qupUKsW/grb8XI4T+qc621pun8ixJLt37q6nd1eS3UGC5fWvgsv9TuB3gi8HreNP89OssS22zdosYB/YMTtjF6zLBPvBfrJf7Le37/W8yIO7VK8xr3nLFsK7/gtDhteV</latexit><latexit sha1_base64="/6WnXjvtI3OGtr3OXgquS2ijbjg=">AAACsHicbVFNa9tAEF2rX2n6Eac95rLEBByaGikEGgiF0AaSQw4prRODLcRoPbY2Wq3E7ijEFf4x/TW9tsf+m64cl8Z2BhYe78282ZmJCyUt+f6fhvfo8ZOnz9aer794+er1RnPzzaXNSyOwK3KVm14MFpXU2CVJCnuFQchihVdx+rnWr27QWJnrbzQpMMxgrOVICiBHRc2jwQ0YSpAgqtJ3wZR/5P+ZlL/nA1BFUsPT9vco3bsn7kbNlt/xZ8FXQTAHLTaPi2izEQ6GuSgz1CQUWNsP/ILCyllKoXC6PigtFiBSGGPfQQ0Z2rCaTTnlO44Z8lFu3NPEZ+z9igoyaydZ7DIzoMQuazX5kNYvaXQYVlIXJaEWd41GpeKU83plfCgNClITB0AY6f7KRQIGBLnFLnSZeRcoFiapbkstRT7EJVbRLRlwpEXKQOp6qupUKsW/grb8XI4T+qc621pun8ixJLt37q6nd1eS3UGC5fWvgsv9TuB3gi8HreNP89OssS22zdosYB/YMTtjF6zLBPvBfrJf7Le37/W8yIO7VK8xr3nLFsK7/gtDhteV</latexit>

G(zk,#k) = �(zk)r#k log p(zk;#k)<latexit sha1_base64="pP9kp46QO/IZm8uhki7Xz6N6M8w=">AAACwnicbVFti9NAEN7Gt7v61tOPflksQgtHSURQEOHQg/PDCRXt3UETwmQ7TdZuNnF3clyN/VX+mvuqf8RNr0JfHFh4eJ5nZnZmklJJS75/3fJu3b5z997efvv+g4ePHncOnpzZojICR6JQhblIwKKSGkckSeFFaRDyROF5MvvQ6OeXaKws9FealxjlkGo5lQLIUXHn0/5J70c8OwwvwVCGBPGsz9/xcJjJhu/zUEOiIK7XDAseqiLlZWN4u57Yjjtdf+Avg++CYAW6bBXD+KAVhZNCVDlqEgqsHQd+SVHtakqhcNEOK4sliBmkOHZQQ442qpdzL/gLx0z4tDDuaeJLdj2jhtzaeZ44Zw6U2W2tIf+njSuavolqqcuKUIubRtNKcSp4s0Q+kQYFqbkDIIx0f+UiAwOC3Ko3uixrlyg2JqmvKi1FMcEtVtEVGXCkRcpB6maq+kQqxb+AtvxUphn9U13ZRu4dy1SSPTx199T9HbM7SLC9/l1w9nIQ+IPg86vu0fvVafbYM/ac9VjAXrMj9pEN2YgJ9otds9/sj3fsffO+e/bG6rVWOU/ZRng//wLn8t5Q</latexit><latexit sha1_base64="pP9kp46QO/IZm8uhki7Xz6N6M8w=">AAACwnicbVFti9NAEN7Gt7v61tOPflksQgtHSURQEOHQg/PDCRXt3UETwmQ7TdZuNnF3clyN/VX+mvuqf8RNr0JfHFh4eJ5nZnZmklJJS75/3fJu3b5z997efvv+g4ePHncOnpzZojICR6JQhblIwKKSGkckSeFFaRDyROF5MvvQ6OeXaKws9FealxjlkGo5lQLIUXHn0/5J70c8OwwvwVCGBPGsz9/xcJjJhu/zUEOiIK7XDAseqiLlZWN4u57Yjjtdf+Avg++CYAW6bBXD+KAVhZNCVDlqEgqsHQd+SVHtakqhcNEOK4sliBmkOHZQQ442qpdzL/gLx0z4tDDuaeJLdj2jhtzaeZ44Zw6U2W2tIf+njSuavolqqcuKUIubRtNKcSp4s0Q+kQYFqbkDIIx0f+UiAwOC3Ko3uixrlyg2JqmvKi1FMcEtVtEVGXCkRcpB6maq+kQqxb+AtvxUphn9U13ZRu4dy1SSPTx199T9HbM7SLC9/l1w9nIQ+IPg86vu0fvVafbYM/ac9VjAXrMj9pEN2YgJ9otds9/sj3fsffO+e/bG6rVWOU/ZRng//wLn8t5Q</latexit><latexit sha1_base64="pP9kp46QO/IZm8uhki7Xz6N6M8w=">AAACwnicbVFti9NAEN7Gt7v61tOPflksQgtHSURQEOHQg/PDCRXt3UETwmQ7TdZuNnF3clyN/VX+mvuqf8RNr0JfHFh4eJ5nZnZmklJJS75/3fJu3b5z997efvv+g4ePHncOnpzZojICR6JQhblIwKKSGkckSeFFaRDyROF5MvvQ6OeXaKws9FealxjlkGo5lQLIUXHn0/5J70c8OwwvwVCGBPGsz9/xcJjJhu/zUEOiIK7XDAseqiLlZWN4u57Yjjtdf+Avg++CYAW6bBXD+KAVhZNCVDlqEgqsHQd+SVHtakqhcNEOK4sliBmkOHZQQ442qpdzL/gLx0z4tDDuaeJLdj2jhtzaeZ44Zw6U2W2tIf+njSuavolqqcuKUIubRtNKcSp4s0Q+kQYFqbkDIIx0f+UiAwOC3Ko3uixrlyg2JqmvKi1FMcEtVtEVGXCkRcpB6maq+kQqxb+AtvxUphn9U13ZRu4dy1SSPTx199T9HbM7SLC9/l1w9nIQ+IPg86vu0fvVafbYM/ac9VjAXrMj9pEN2YgJ9otds9/sj3fsffO+e/bG6rVWOU/ZRng//wLn8t5Q</latexit><latexit sha1_base64="pP9kp46QO/IZm8uhki7Xz6N6M8w=">AAACwnicbVFti9NAEN7Gt7v61tOPflksQgtHSURQEOHQg/PDCRXt3UETwmQ7TdZuNnF3clyN/VX+mvuqf8RNr0JfHFh4eJ5nZnZmklJJS75/3fJu3b5z997efvv+g4ePHncOnpzZojICR6JQhblIwKKSGkckSeFFaRDyROF5MvvQ6OeXaKws9FealxjlkGo5lQLIUXHn0/5J70c8OwwvwVCGBPGsz9/xcJjJhu/zUEOiIK7XDAseqiLlZWN4u57Yjjtdf+Avg++CYAW6bBXD+KAVhZNCVDlqEgqsHQd+SVHtakqhcNEOK4sliBmkOHZQQ442qpdzL/gLx0z4tDDuaeJLdj2jhtzaeZ44Zw6U2W2tIf+njSuavolqqcuKUIubRtNKcSp4s0Q+kQYFqbkDIIx0f+UiAwOC3Ko3uixrlyg2JqmvKi1FMcEtVtEVGXCkRcpB6maq+kQqxb+AtvxUphn9U13ZRu4dy1SSPTx199T9HbM7SLC9/l1w9nIQ+IPg86vu0fvVafbYM/ac9VjAXrMj9pEN2YgJ9otds9/sj3fsffO+e/bG6rVWOU/ZRng//wLn8t5Q</latexit>

Sample zk ⇠ p(z;#k)<latexit sha1_base64="jJ1keKWEsNtnOjcyYyGyTE4uVyc=">AAACjnicbVFtaxNBEN6cL631LdVP4pfFIKQg4U7ECiIWFeyHfqho2kJyHHObSTJkd+/YnStNj+Kv8av+Hv+Ne2kEkziw8PA8M8/szOSlJs9x/LsV3bh56/bW9p2du/fuP3jY3n104ovKKeyrQhfuLAePmiz2mVjjWekQTK7xNJ99bPTTc3SeCvuN5yWmBiaWxqSAA5W1n1xmMzn0ZGTZvXw7PAfHU2TIZntZuxP34kXITZAsQUcs4zjbbaXDUaEqg5aVBu8HSVxyWgdLUhqvdoaVxxLUDCY4CNCCQZ/Wixmu5PPAjOS4cOFZlgv234oajPdzk4dMAzz161pD/k8bVDx+k9Zky4rRqutG40pLLmSzEDkih4r1PABQjsJfpZqCA8VhbStdFt4lqpVJ6ovKkipGuMZqvmAHgfTIBsg2U9WfSWv5FayXRzSZ8l812DZy9xNNiP2Lo3Abu7eRHA6SrK9/E5y87CVxL/nyqnPwYXmabfFUPBNdkYh9cSAOxbHoCyW+ix/ip/gVtaPX0bvo/XVq1FrWPBYrER3+AYKIykQ=</latexit><latexit sha1_base64="jJ1keKWEsNtnOjcyYyGyTE4uVyc=">AAACjnicbVFtaxNBEN6cL631LdVP4pfFIKQg4U7ECiIWFeyHfqho2kJyHHObSTJkd+/YnStNj+Kv8av+Hv+Ne2kEkziw8PA8M8/szOSlJs9x/LsV3bh56/bW9p2du/fuP3jY3n104ovKKeyrQhfuLAePmiz2mVjjWekQTK7xNJ99bPTTc3SeCvuN5yWmBiaWxqSAA5W1n1xmMzn0ZGTZvXw7PAfHU2TIZntZuxP34kXITZAsQUcs4zjbbaXDUaEqg5aVBu8HSVxyWgdLUhqvdoaVxxLUDCY4CNCCQZ/Wixmu5PPAjOS4cOFZlgv234oajPdzk4dMAzz161pD/k8bVDx+k9Zky4rRqutG40pLLmSzEDkih4r1PABQjsJfpZqCA8VhbStdFt4lqpVJ6ovKkipGuMZqvmAHgfTIBsg2U9WfSWv5FayXRzSZ8l812DZy9xNNiP2Lo3Abu7eRHA6SrK9/E5y87CVxL/nyqnPwYXmabfFUPBNdkYh9cSAOxbHoCyW+ix/ip/gVtaPX0bvo/XVq1FrWPBYrER3+AYKIykQ=</latexit><latexit sha1_base64="jJ1keKWEsNtnOjcyYyGyTE4uVyc=">AAACjnicbVFtaxNBEN6cL631LdVP4pfFIKQg4U7ECiIWFeyHfqho2kJyHHObSTJkd+/YnStNj+Kv8av+Hv+Ne2kEkziw8PA8M8/szOSlJs9x/LsV3bh56/bW9p2du/fuP3jY3n104ovKKeyrQhfuLAePmiz2mVjjWekQTK7xNJ99bPTTc3SeCvuN5yWmBiaWxqSAA5W1n1xmMzn0ZGTZvXw7PAfHU2TIZntZuxP34kXITZAsQUcs4zjbbaXDUaEqg5aVBu8HSVxyWgdLUhqvdoaVxxLUDCY4CNCCQZ/Wixmu5PPAjOS4cOFZlgv234oajPdzk4dMAzz161pD/k8bVDx+k9Zky4rRqutG40pLLmSzEDkih4r1PABQjsJfpZqCA8VhbStdFt4lqpVJ6ovKkipGuMZqvmAHgfTIBsg2U9WfSWv5FayXRzSZ8l812DZy9xNNiP2Lo3Abu7eRHA6SrK9/E5y87CVxL/nyqnPwYXmabfFUPBNdkYh9cSAOxbHoCyW+ix/ip/gVtaPX0bvo/XVq1FrWPBYrER3+AYKIykQ=</latexit><latexit sha1_base64="jJ1keKWEsNtnOjcyYyGyTE4uVyc=">AAACjnicbVFtaxNBEN6cL631LdVP4pfFIKQg4U7ECiIWFeyHfqho2kJyHHObSTJkd+/YnStNj+Kv8av+Hv+Ne2kEkziw8PA8M8/szOSlJs9x/LsV3bh56/bW9p2du/fuP3jY3n104ovKKeyrQhfuLAePmiz2mVjjWekQTK7xNJ99bPTTc3SeCvuN5yWmBiaWxqSAA5W1n1xmMzn0ZGTZvXw7PAfHU2TIZntZuxP34kXITZAsQUcs4zjbbaXDUaEqg5aVBu8HSVxyWgdLUhqvdoaVxxLUDCY4CNCCQZ/Wixmu5PPAjOS4cOFZlgv234oajPdzk4dMAzz161pD/k8bVDx+k9Zky4rRqutG40pLLmSzEDkih4r1PABQjsJfpZqCA8VhbStdFt4lqpVJ6ovKkipGuMZqvmAHgfTIBsg2U9WfSWv5FayXRzSZ8l812DZy9xNNiP2Lo3Abu7eRHA6SrK9/E5y87CVxL/nyqnPwYXmabfFUPBNdkYh9cSAOxbHoCyW+ix/ip/gVtaPX0bvo/XVq1FrWPBYrER3+AYKIykQ=</latexit>

Does this “solve” any discrete problem?

Page 39: reinforcement learning through the optimization lens

Direct Policy Search

Both are Derivative-free algorithms!

minimize Eet,ut

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut ⇠ p(u|xt;#)

<latexit sha1_base64="rY/Q9u1/EodhNQdEowrn6R+2Wk4=">AAADKXicbVLbbtNAELXNrYRLU3iDlxURVSKiyEZIIKGiSAVxUR+KaNpKWWOtN5Nk1d21tTuuEoy/iFd+hDfglR9hnQRBUkayNHvOmbM7M05zKSyG4Q8/uHT5ytVrW9cbN27eur3d3LlzbLPCcBjwTGbmNGUWpNAwQIESTnMDTKUSTtKz/Zo/OQdjRaaPcJ5DrNhEi7HgDB2UNL/SFCZCl8wYNq9KKasGVWk2K5XQQolPUJFdQhXDaZqWr6qkhAS7RYIVlTDGIbWFSkrci6qPR2Q/wfZsSXeoEZMpxpSu7GwPe7XVzKkfRRXZI+O/6i7UFbSxS9yBWqFI3i4+O/I5PWcGp4Cs06CgR6tnJs1W2AsXQS4m0Sppeas4THb8mI4yXijQyCWzdhiFOcbODgWX4HouLOSMn7EJDF2qmQIbl4vxVuShQ0ZknBn3aSQL9N+Kkilr5yp1ynpQdpOrwf9xwwLHz+JS6LxA0Hx50biQBDNS74qMhAGOcu4Sxo1wbyV8ygzj6Da6dsvCOwe+1kk5K7Tg2Qg2UIkzNMyBFlAxoeuuytdCSvKBaUsO6s39YZ1tTbdfiolA2z1wv43uXBC7hUSb47+YHD/uRWEvev+k1X+xWs2Wd9974LW9yHvq9b033qE38Lh/z+/7b/13wZfgW/A9+LmUBv6q5q63FsGv3yqvBYA=</latexit><latexit sha1_base64="rY/Q9u1/EodhNQdEowrn6R+2Wk4=">AAADKXicbVLbbtNAELXNrYRLU3iDlxURVSKiyEZIIKGiSAVxUR+KaNpKWWOtN5Nk1d21tTuuEoy/iFd+hDfglR9hnQRBUkayNHvOmbM7M05zKSyG4Q8/uHT5ytVrW9cbN27eur3d3LlzbLPCcBjwTGbmNGUWpNAwQIESTnMDTKUSTtKz/Zo/OQdjRaaPcJ5DrNhEi7HgDB2UNL/SFCZCl8wYNq9KKasGVWk2K5XQQolPUJFdQhXDaZqWr6qkhAS7RYIVlTDGIbWFSkrci6qPR2Q/wfZsSXeoEZMpxpSu7GwPe7XVzKkfRRXZI+O/6i7UFbSxS9yBWqFI3i4+O/I5PWcGp4Cs06CgR6tnJs1W2AsXQS4m0Sppeas4THb8mI4yXijQyCWzdhiFOcbODgWX4HouLOSMn7EJDF2qmQIbl4vxVuShQ0ZknBn3aSQL9N+Kkilr5yp1ynpQdpOrwf9xwwLHz+JS6LxA0Hx50biQBDNS74qMhAGOcu4Sxo1wbyV8ygzj6Da6dsvCOwe+1kk5K7Tg2Qg2UIkzNMyBFlAxoeuuytdCSvKBaUsO6s39YZ1tTbdfiolA2z1wv43uXBC7hUSb47+YHD/uRWEvev+k1X+xWs2Wd9974LW9yHvq9b033qE38Lh/z+/7b/13wZfgW/A9+LmUBv6q5q63FsGv3yqvBYA=</latexit><latexit sha1_base64="rY/Q9u1/EodhNQdEowrn6R+2Wk4=">AAADKXicbVLbbtNAELXNrYRLU3iDlxURVSKiyEZIIKGiSAVxUR+KaNpKWWOtN5Nk1d21tTuuEoy/iFd+hDfglR9hnQRBUkayNHvOmbM7M05zKSyG4Q8/uHT5ytVrW9cbN27eur3d3LlzbLPCcBjwTGbmNGUWpNAwQIESTnMDTKUSTtKz/Zo/OQdjRaaPcJ5DrNhEi7HgDB2UNL/SFCZCl8wYNq9KKasGVWk2K5XQQolPUJFdQhXDaZqWr6qkhAS7RYIVlTDGIbWFSkrci6qPR2Q/wfZsSXeoEZMpxpSu7GwPe7XVzKkfRRXZI+O/6i7UFbSxS9yBWqFI3i4+O/I5PWcGp4Cs06CgR6tnJs1W2AsXQS4m0Sppeas4THb8mI4yXijQyCWzdhiFOcbODgWX4HouLOSMn7EJDF2qmQIbl4vxVuShQ0ZknBn3aSQL9N+Kkilr5yp1ynpQdpOrwf9xwwLHz+JS6LxA0Hx50biQBDNS74qMhAGOcu4Sxo1wbyV8ygzj6Da6dsvCOwe+1kk5K7Tg2Qg2UIkzNMyBFlAxoeuuytdCSvKBaUsO6s39YZ1tTbdfiolA2z1wv43uXBC7hUSb47+YHD/uRWEvev+k1X+xWs2Wd9974LW9yHvq9b033qE38Lh/z+/7b/13wZfgW/A9+LmUBv6q5q63FsGv3yqvBYA=</latexit><latexit sha1_base64="rY/Q9u1/EodhNQdEowrn6R+2Wk4=">AAADKXicbVLbbtNAELXNrYRLU3iDlxURVSKiyEZIIKGiSAVxUR+KaNpKWWOtN5Nk1d21tTuuEoy/iFd+hDfglR9hnQRBUkayNHvOmbM7M05zKSyG4Q8/uHT5ytVrW9cbN27eur3d3LlzbLPCcBjwTGbmNGUWpNAwQIESTnMDTKUSTtKz/Zo/OQdjRaaPcJ5DrNhEi7HgDB2UNL/SFCZCl8wYNq9KKasGVWk2K5XQQolPUJFdQhXDaZqWr6qkhAS7RYIVlTDGIbWFSkrci6qPR2Q/wfZsSXeoEZMpxpSu7GwPe7XVzKkfRRXZI+O/6i7UFbSxS9yBWqFI3i4+O/I5PWcGp4Cs06CgR6tnJs1W2AsXQS4m0Sppeas4THb8mI4yXijQyCWzdhiFOcbODgWX4HouLOSMn7EJDF2qmQIbl4vxVuShQ0ZknBn3aSQL9N+Kkilr5yp1ynpQdpOrwf9xwwLHz+JS6LxA0Hx50biQBDNS74qMhAGOcu4Sxo1wbyV8ygzj6Da6dsvCOwe+1kk5K7Tg2Qg2UIkzNMyBFlAxoeuuytdCSvKBaUsO6s39YZ1tTbdfiolA2z1wv43uXBC7hUSb47+YHD/uRWEvev+k1X+xWs2Wd9974LW9yHvq9b033qE38Lh/z+/7b/13wZfgW/A9+LmUBv6q5q63FsGv3yqvBYA=</latexit>

Policy Gradient

probabilistic policy

minimize Eet,!

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡(⌧t;#+ !)

<latexit sha1_base64="O4J6OTJoEQZPJiTxt8GSzcRLUKY=">AAADNXicbVJNixNBEJ0Zv9b4ldWjl8ZgSNglZERQkJWFVfSQw4qb7EI6Dj2dmqTZ7p6hu2ZJHOZ3efVvePAmXv0L9iQjmsSChur3XlV1VXWcSWGx3//mB9eu37h5a+92487de/cfNPcfjmyaGw5DnsrUXMTMghQahihQwkVmgKlYwnl8eVLx51dgrEj1GS4zmCg20yIRnKGDouZXGsNM6IIZw5ZlIWXZoCpOF4USWijxGUrSJlQxnMdx8baMCojwkKYKZqykEhIcU5urqMCjsPx0Rk4i7CycIo+wS42YzXFCaZ3R9rBXZVs49UFYkiOS/FUfQhVBG23iLo4iNBMdiszdXtErZnAOyA7WhbsNCnpaPzlqtvq9/srIrhPWTsur7TTa9yd0mvJcgUYumbXjsJ/hxKVDwSW4/nMLGeOXbAZj52qmwE6K1ahL8tQhU5Kkxh2NZIX+G1EwZe1SxU5ZDc1ucxX4P26cY/JyUgid5QiarwsluSSYkmpvZCoMcJRL5zBuhHsr4XNmGEe33Y0qq9wZ8I1OikWuBU+nsIVKXKBhDrSAiglddVW8E1KSj0xbMqhW+Id1aSu680bMBNrDgftCursjdgsJt8e/64ye9cJ+L/zwvHX8ul7NnvfYe+J1vNB74R17771Tb+hxv+0P/KE/Cr4E34Mfwc+1NPDrmEfehgW/fgNXjgl/</latexit><latexit sha1_base64="O4J6OTJoEQZPJiTxt8GSzcRLUKY=">AAADNXicbVJNixNBEJ0Zv9b4ldWjl8ZgSNglZERQkJWFVfSQw4qb7EI6Dj2dmqTZ7p6hu2ZJHOZ3efVvePAmXv0L9iQjmsSChur3XlV1VXWcSWGx3//mB9eu37h5a+92487de/cfNPcfjmyaGw5DnsrUXMTMghQahihQwkVmgKlYwnl8eVLx51dgrEj1GS4zmCg20yIRnKGDouZXGsNM6IIZw5ZlIWXZoCpOF4USWijxGUrSJlQxnMdx8baMCojwkKYKZqykEhIcU5urqMCjsPx0Rk4i7CycIo+wS42YzXFCaZ3R9rBXZVs49UFYkiOS/FUfQhVBG23iLo4iNBMdiszdXtErZnAOyA7WhbsNCnpaPzlqtvq9/srIrhPWTsur7TTa9yd0mvJcgUYumbXjsJ/hxKVDwSW4/nMLGeOXbAZj52qmwE6K1ahL8tQhU5Kkxh2NZIX+G1EwZe1SxU5ZDc1ucxX4P26cY/JyUgid5QiarwsluSSYkmpvZCoMcJRL5zBuhHsr4XNmGEe33Y0qq9wZ8I1OikWuBU+nsIVKXKBhDrSAiglddVW8E1KSj0xbMqhW+Id1aSu680bMBNrDgftCursjdgsJt8e/64ye9cJ+L/zwvHX8ul7NnvfYe+J1vNB74R17771Tb+hxv+0P/KE/Cr4E34Mfwc+1NPDrmEfehgW/fgNXjgl/</latexit><latexit sha1_base64="O4J6OTJoEQZPJiTxt8GSzcRLUKY=">AAADNXicbVJNixNBEJ0Zv9b4ldWjl8ZgSNglZERQkJWFVfSQw4qb7EI6Dj2dmqTZ7p6hu2ZJHOZ3efVvePAmXv0L9iQjmsSChur3XlV1VXWcSWGx3//mB9eu37h5a+92487de/cfNPcfjmyaGw5DnsrUXMTMghQahihQwkVmgKlYwnl8eVLx51dgrEj1GS4zmCg20yIRnKGDouZXGsNM6IIZw5ZlIWXZoCpOF4USWijxGUrSJlQxnMdx8baMCojwkKYKZqykEhIcU5urqMCjsPx0Rk4i7CycIo+wS42YzXFCaZ3R9rBXZVs49UFYkiOS/FUfQhVBG23iLo4iNBMdiszdXtErZnAOyA7WhbsNCnpaPzlqtvq9/srIrhPWTsur7TTa9yd0mvJcgUYumbXjsJ/hxKVDwSW4/nMLGeOXbAZj52qmwE6K1ahL8tQhU5Kkxh2NZIX+G1EwZe1SxU5ZDc1ucxX4P26cY/JyUgid5QiarwsluSSYkmpvZCoMcJRL5zBuhHsr4XNmGEe33Y0qq9wZ8I1OikWuBU+nsIVKXKBhDrSAiglddVW8E1KSj0xbMqhW+Id1aSu680bMBNrDgftCursjdgsJt8e/64ye9cJ+L/zwvHX8ul7NnvfYe+J1vNB74R17771Tb+hxv+0P/KE/Cr4E34Mfwc+1NPDrmEfehgW/fgNXjgl/</latexit><latexit sha1_base64="O4J6OTJoEQZPJiTxt8GSzcRLUKY=">AAADNXicbVJNixNBEJ0Zv9b4ldWjl8ZgSNglZERQkJWFVfSQw4qb7EI6Dj2dmqTZ7p6hu2ZJHOZ3efVvePAmXv0L9iQjmsSChur3XlV1VXWcSWGx3//mB9eu37h5a+92487de/cfNPcfjmyaGw5DnsrUXMTMghQahihQwkVmgKlYwnl8eVLx51dgrEj1GS4zmCg20yIRnKGDouZXGsNM6IIZw5ZlIWXZoCpOF4USWijxGUrSJlQxnMdx8baMCojwkKYKZqykEhIcU5urqMCjsPx0Rk4i7CycIo+wS42YzXFCaZ3R9rBXZVs49UFYkiOS/FUfQhVBG23iLo4iNBMdiszdXtErZnAOyA7WhbsNCnpaPzlqtvq9/srIrhPWTsur7TTa9yd0mvJcgUYumbXjsJ/hxKVDwSW4/nMLGeOXbAZj52qmwE6K1ahL8tQhU5Kkxh2NZIX+G1EwZe1SxU5ZDc1ucxX4P26cY/JyUgid5QiarwsluSSYkmpvZCoMcJRL5zBuhHsr4XNmGEe33Y0qq9wZ8I1OikWuBU+nsIVKXKBhDrSAiglddVW8E1KSj0xbMqhW+Id1aSu680bMBNrDgftCursjdgsJt8e/64ye9cJ+L/zwvHX8ul7NnvfYe+J1vNB74R17771Tb+hxv+0P/KE/Cr4E34Mfwc+1NPDrmEfehgW/fgNXjgl/</latexit>

Random Search

parameter perturbation

Reinforce applied to either problems does not depend on the dynamics

minimize Ee

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡t(⌧t)

<latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit><latexit sha1_base64="Vs+14vGXEYCWQa4/aBIirWhHyZg=">AAADGnicbVJNb9NAELXNV0n5SOHIZUVElYooshESCFSpoiA49FBE01bKGmu9GSer7q6t3TFKsPxPuPJHuCGuXPg3rFMjSMJIlmbfe/N2Z8ZpIYXFMPzlB1euXrt+Y+tmZ/vW7Tt3uzv3Tm1eGg4jnsvcnKfMghQaRihQwnlhgKlUwll6cdjwZ5/AWJHrE1wUECs21SITnKGDku5XmsJU6IoZwxZ1JWXdoSrN55USWijxGWqyS6hiOEvT6k2dAJWQ4ZjaUiUV7kf1xxNymGB/nuCgTHCPGjGdYUxpa2OHOGws5k79OKrJPsn+qgfQVNDOLnEHR9FCOJIia5w6FPSkfVfS7YXDcBlkM4napOe1cZzs+DGd5LxUoJFLZu04CguMnR0KLsE1WVooGL9gUxi7VDMFNq6W86zJI4dMSJYb92kkS/TfioopaxcqdcpmMnada8D/ceMSs+dxJXRRImh+eVFWSoI5aZZDJsIAR7lwCeNGuLcSPmOGcXQrXLll6V0AX+mkmpda8HwCa6jEORrmQAuomNBNV9VbISX5wLQlR83K/rDOtqH7r8VUoB0cuf9E722I3UKi9fFvJqdPhlE4jN4/7R28alez5T3wHnp9L/KeeQfeO+/YG3nc3/Yj/4X/MvgSfAu+Bz8upYHf1tz3ViL4+Rs0RP43</latexit>

Page 40: reinforcement learning through the optimization lens

Direct Policy Searchminimize Eet,ut

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut ⇠ p(u|xt;#)

<latexit sha1_base64="rY/Q9u1/EodhNQdEowrn6R+2Wk4=">AAADKXicbVLbbtNAELXNrYRLU3iDlxURVSKiyEZIIKGiSAVxUR+KaNpKWWOtN5Nk1d21tTuuEoy/iFd+hDfglR9hnQRBUkayNHvOmbM7M05zKSyG4Q8/uHT5ytVrW9cbN27eur3d3LlzbLPCcBjwTGbmNGUWpNAwQIESTnMDTKUSTtKz/Zo/OQdjRaaPcJ5DrNhEi7HgDB2UNL/SFCZCl8wYNq9KKasGVWk2K5XQQolPUJFdQhXDaZqWr6qkhAS7RYIVlTDGIbWFSkrci6qPR2Q/wfZsSXeoEZMpxpSu7GwPe7XVzKkfRRXZI+O/6i7UFbSxS9yBWqFI3i4+O/I5PWcGp4Cs06CgR6tnJs1W2AsXQS4m0Sppeas4THb8mI4yXijQyCWzdhiFOcbODgWX4HouLOSMn7EJDF2qmQIbl4vxVuShQ0ZknBn3aSQL9N+Kkilr5yp1ynpQdpOrwf9xwwLHz+JS6LxA0Hx50biQBDNS74qMhAGOcu4Sxo1wbyV8ygzj6Da6dsvCOwe+1kk5K7Tg2Qg2UIkzNMyBFlAxoeuuytdCSvKBaUsO6s39YZ1tTbdfiolA2z1wv43uXBC7hUSb47+YHD/uRWEvev+k1X+xWs2Wd9974LW9yHvq9b033qE38Lh/z+/7b/13wZfgW/A9+LmUBv6q5q63FsGv3yqvBYA=</latexit><latexit sha1_base64="rY/Q9u1/EodhNQdEowrn6R+2Wk4=">AAADKXicbVLbbtNAELXNrYRLU3iDlxURVSKiyEZIIKGiSAVxUR+KaNpKWWOtN5Nk1d21tTuuEoy/iFd+hDfglR9hnQRBUkayNHvOmbM7M05zKSyG4Q8/uHT5ytVrW9cbN27eur3d3LlzbLPCcBjwTGbmNGUWpNAwQIESTnMDTKUSTtKz/Zo/OQdjRaaPcJ5DrNhEi7HgDB2UNL/SFCZCl8wYNq9KKasGVWk2K5XQQolPUJFdQhXDaZqWr6qkhAS7RYIVlTDGIbWFSkrci6qPR2Q/wfZsSXeoEZMpxpSu7GwPe7XVzKkfRRXZI+O/6i7UFbSxS9yBWqFI3i4+O/I5PWcGp4Cs06CgR6tnJs1W2AsXQS4m0Sppeas4THb8mI4yXijQyCWzdhiFOcbODgWX4HouLOSMn7EJDF2qmQIbl4vxVuShQ0ZknBn3aSQL9N+Kkilr5yp1ynpQdpOrwf9xwwLHz+JS6LxA0Hx50biQBDNS74qMhAGOcu4Sxo1wbyV8ygzj6Da6dsvCOwe+1kk5K7Tg2Qg2UIkzNMyBFlAxoeuuytdCSvKBaUsO6s39YZ1tTbdfiolA2z1wv43uXBC7hUSb47+YHD/uRWEvev+k1X+xWs2Wd9974LW9yHvq9b033qE38Lh/z+/7b/13wZfgW/A9+LmUBv6q5q63FsGv3yqvBYA=</latexit><latexit sha1_base64="rY/Q9u1/EodhNQdEowrn6R+2Wk4=">AAADKXicbVLbbtNAELXNrYRLU3iDlxURVSKiyEZIIKGiSAVxUR+KaNpKWWOtN5Nk1d21tTuuEoy/iFd+hDfglR9hnQRBUkayNHvOmbM7M05zKSyG4Q8/uHT5ytVrW9cbN27eur3d3LlzbLPCcBjwTGbmNGUWpNAwQIESTnMDTKUSTtKz/Zo/OQdjRaaPcJ5DrNhEi7HgDB2UNL/SFCZCl8wYNq9KKasGVWk2K5XQQolPUJFdQhXDaZqWr6qkhAS7RYIVlTDGIbWFSkrci6qPR2Q/wfZsSXeoEZMpxpSu7GwPe7XVzKkfRRXZI+O/6i7UFbSxS9yBWqFI3i4+O/I5PWcGp4Cs06CgR6tnJs1W2AsXQS4m0Sppeas4THb8mI4yXijQyCWzdhiFOcbODgWX4HouLOSMn7EJDF2qmQIbl4vxVuShQ0ZknBn3aSQL9N+Kkilr5yp1ynpQdpOrwf9xwwLHz+JS6LxA0Hx50biQBDNS74qMhAGOcu4Sxo1wbyV8ygzj6Da6dsvCOwe+1kk5K7Tg2Qg2UIkzNMyBFlAxoeuuytdCSvKBaUsO6s39YZ1tTbdfiolA2z1wv43uXBC7hUSb47+YHD/uRWEvev+k1X+xWs2Wd9974LW9yHvq9b033qE38Lh/z+/7b/13wZfgW/A9+LmUBv6q5q63FsGv3yqvBYA=</latexit><latexit sha1_base64="rY/Q9u1/EodhNQdEowrn6R+2Wk4=">AAADKXicbVLbbtNAELXNrYRLU3iDlxURVSKiyEZIIKGiSAVxUR+KaNpKWWOtN5Nk1d21tTuuEoy/iFd+hDfglR9hnQRBUkayNHvOmbM7M05zKSyG4Q8/uHT5ytVrW9cbN27eur3d3LlzbLPCcBjwTGbmNGUWpNAwQIESTnMDTKUSTtKz/Zo/OQdjRaaPcJ5DrNhEi7HgDB2UNL/SFCZCl8wYNq9KKasGVWk2K5XQQolPUJFdQhXDaZqWr6qkhAS7RYIVlTDGIbWFSkrci6qPR2Q/wfZsSXeoEZMpxpSu7GwPe7XVzKkfRRXZI+O/6i7UFbSxS9yBWqFI3i4+O/I5PWcGp4Cs06CgR6tnJs1W2AsXQS4m0Sppeas4THb8mI4yXijQyCWzdhiFOcbODgWX4HouLOSMn7EJDF2qmQIbl4vxVuShQ0ZknBn3aSQL9N+Kkilr5yp1ynpQdpOrwf9xwwLHz+JS6LxA0Hx50biQBDNS74qMhAGOcu4Sxo1wbyV8ygzj6Da6dsvCOwe+1kk5K7Tg2Qg2UIkzNMyBFlAxoeuuytdCSvKBaUsO6s39YZ1tTbdfiolA2z1wv43uXBC7hUSb47+YHD/uRWEvev+k1X+xWs2Wd9974LW9yHvq9b033qE38Lh/z+/7b/13wZfgW/A9+LmUBv6q5q63FsGv3yqvBYA=</latexit>

Policy Gradient

probabilistic policy

G(⌧,#) =

TX

t=1C(xt, ut)

T�1X

t=0r# log p#(ut|xt;#)

!

<latexit sha1_base64="UPaO0YTGZnfI8WJP4JzH6WbyakE=">AAADBnicbVJda9RAFJ3Er1q/tvoowuCiZmFdEilUkEqhQn3oQ8XdtrBJw81kdnfoZBJmbsoucd999Y/4Jr76N/wlvjrZRuvueiFwOOfMuTP3JimkMOj7Px332vUbN29t3N68c/fe/QetrYfHJi814wOWy1yfJmC4FIoPUKDkp4XmkCWSnyTn+7V+csG1Ebnq46zgUQZjJUaCAVoqbn058EKEshtegMYJR+jQXRpKPkIvNGUWV7gbzM/6dN+bxtgtY+zQUIvxBDshS3Ncsfrzs6r/MpjTUEEiIf6ban35mBZXhGejPtnIN1eNm9y41fZ7/qLoOgga0CZNHcVbThSmOSszrpBJMGYY+AVGlc0VTPL5ZlgaXgA7hzEfWqgg4yaqFrOb02eWSeko1/ZTSBfsvycqyIyZZYl1ZoATs6rV5P+0YYmj11ElVFEiV+yy0aiUFHNaL4KmQnOGcmYBMC3sXSmbgAaGdl1LXRbZBWdLL6mmpRIsT/kKK3GKGixpOGYgVP2q6kBIST+CMvSwnvEf1cbWsvdOjAWa7qH9J1RnzWwXEqyOfx0cv+oFfi/4sN3ee9usZoM8Jk+JRwKyQ/bIe3JEBoSRX84T57nzwv3sfnW/ud8vra7TnHlElsr98RszCvet</latexit><latexit sha1_base64="UPaO0YTGZnfI8WJP4JzH6WbyakE=">AAADBnicbVJda9RAFJ3Er1q/tvoowuCiZmFdEilUkEqhQn3oQ8XdtrBJw81kdnfoZBJmbsoucd999Y/4Jr76N/wlvjrZRuvueiFwOOfMuTP3JimkMOj7Px332vUbN29t3N68c/fe/QetrYfHJi814wOWy1yfJmC4FIoPUKDkp4XmkCWSnyTn+7V+csG1Ebnq46zgUQZjJUaCAVoqbn058EKEshtegMYJR+jQXRpKPkIvNGUWV7gbzM/6dN+bxtgtY+zQUIvxBDshS3Ncsfrzs6r/MpjTUEEiIf6ban35mBZXhGejPtnIN1eNm9y41fZ7/qLoOgga0CZNHcVbThSmOSszrpBJMGYY+AVGlc0VTPL5ZlgaXgA7hzEfWqgg4yaqFrOb02eWSeko1/ZTSBfsvycqyIyZZYl1ZoATs6rV5P+0YYmj11ElVFEiV+yy0aiUFHNaL4KmQnOGcmYBMC3sXSmbgAaGdl1LXRbZBWdLL6mmpRIsT/kKK3GKGixpOGYgVP2q6kBIST+CMvSwnvEf1cbWsvdOjAWa7qH9J1RnzWwXEqyOfx0cv+oFfi/4sN3ee9usZoM8Jk+JRwKyQ/bIe3JEBoSRX84T57nzwv3sfnW/ud8vra7TnHlElsr98RszCvet</latexit><latexit sha1_base64="UPaO0YTGZnfI8WJP4JzH6WbyakE=">AAADBnicbVJda9RAFJ3Er1q/tvoowuCiZmFdEilUkEqhQn3oQ8XdtrBJw81kdnfoZBJmbsoucd999Y/4Jr76N/wlvjrZRuvueiFwOOfMuTP3JimkMOj7Px332vUbN29t3N68c/fe/QetrYfHJi814wOWy1yfJmC4FIoPUKDkp4XmkCWSnyTn+7V+csG1Ebnq46zgUQZjJUaCAVoqbn058EKEshtegMYJR+jQXRpKPkIvNGUWV7gbzM/6dN+bxtgtY+zQUIvxBDshS3Ncsfrzs6r/MpjTUEEiIf6ban35mBZXhGejPtnIN1eNm9y41fZ7/qLoOgga0CZNHcVbThSmOSszrpBJMGYY+AVGlc0VTPL5ZlgaXgA7hzEfWqgg4yaqFrOb02eWSeko1/ZTSBfsvycqyIyZZYl1ZoATs6rV5P+0YYmj11ElVFEiV+yy0aiUFHNaL4KmQnOGcmYBMC3sXSmbgAaGdl1LXRbZBWdLL6mmpRIsT/kKK3GKGixpOGYgVP2q6kBIST+CMvSwnvEf1cbWsvdOjAWa7qH9J1RnzWwXEqyOfx0cv+oFfi/4sN3ee9usZoM8Jk+JRwKyQ/bIe3JEBoSRX84T57nzwv3sfnW/ud8vra7TnHlElsr98RszCvet</latexit><latexit sha1_base64="UPaO0YTGZnfI8WJP4JzH6WbyakE=">AAADBnicbVJda9RAFJ3Er1q/tvoowuCiZmFdEilUkEqhQn3oQ8XdtrBJw81kdnfoZBJmbsoucd999Y/4Jr76N/wlvjrZRuvueiFwOOfMuTP3JimkMOj7Px332vUbN29t3N68c/fe/QetrYfHJi814wOWy1yfJmC4FIoPUKDkp4XmkCWSnyTn+7V+csG1Ebnq46zgUQZjJUaCAVoqbn058EKEshtegMYJR+jQXRpKPkIvNGUWV7gbzM/6dN+bxtgtY+zQUIvxBDshS3Ncsfrzs6r/MpjTUEEiIf6ban35mBZXhGejPtnIN1eNm9y41fZ7/qLoOgga0CZNHcVbThSmOSszrpBJMGYY+AVGlc0VTPL5ZlgaXgA7hzEfWqgg4yaqFrOb02eWSeko1/ZTSBfsvycqyIyZZYl1ZoATs6rV5P+0YYmj11ElVFEiV+yy0aiUFHNaL4KmQnOGcmYBMC3sXSmbgAaGdl1LXRbZBWdLL6mmpRIsT/kKK3GKGixpOGYgVP2q6kBIST+CMvSwnvEf1cbWsvdOjAWa7qH9J1RnzWwXEqyOfx0cv+oFfi/4sN3ee9usZoM8Jk+JRwKyQ/bIe3JEBoSRX84T57nzwv3sfnW/ud8vra7TnHlElsr98RszCvet</latexit>

Page 41: reinforcement learning through the optimization lens

“Greedy strategy”: Build control ut = Kxt

�t � N (0, �2I)• Sample a bunch of random vectors:

• Collect samples from control ut = Kxt + �t � = {x1, . . . , xT}:

minimize Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit>

Policy Gradient for LQR

policy gradientonly has access to 0-th orderinformation!!!

• Compute cost: C(⌧) =TX

t=1x⇤t Qxt + u⇤t Rut

<latexit sha1_base64="AQBP24lVIYJw82kPZZcKyU6iDTc=">AAACpXicbVHbbhMxEHWWWwmXpuWRF4sIkQKKdhESfalUUSR4KFILSVop2a5mnUli1etd2WOUaJXf4Gt4hX/gb/Bug0QSRrJ8fM5cPDNpoaSlMPzdCG7dvnP33s795oOHjx7vtvb2BzZ3RmBf5Co3lylYVFJjnyQpvCwMQpYqvEivTyr94hsaK3Pdo0WBcQZTLSdSAHkqaYUnnRGBOzjiI+uypKSjaHnV4/OErl7y8+rmr7irX1+qu5m02mE3rI1vg2gF2mxlZ8leIx6Nc+Ey1CQUWDuMwoLiEgxJoXDZHDmLBYhrmOLQQw0Z2risW1vy554Z80lu/NHEa/bfiBIyaxdZ6j0zoJnd1Cryf9rQ0eQwLqUuHKEWN4UmTnHKeTUnPpYGBamFByCM9H/lYgYGBPlprlWpcxco1jop505LkY9xg1U0JwOetEgZSF11VX6USvGvoC0/ldMZ/VV92krufJBTSfb1qV+ZPthy9guJNse/DQZvulHYjc7fto/fr1azw56yZ6zDIvaOHbNP7Iz1mWDf2Q/2k/0KXgSfg14wuHENGquYJ2zNguQPRonRJw==</latexit><latexit sha1_base64="AQBP24lVIYJw82kPZZcKyU6iDTc=">AAACpXicbVHbbhMxEHWWWwmXpuWRF4sIkQKKdhESfalUUSR4KFILSVop2a5mnUli1etd2WOUaJXf4Gt4hX/gb/Bug0QSRrJ8fM5cPDNpoaSlMPzdCG7dvnP33s795oOHjx7vtvb2BzZ3RmBf5Co3lylYVFJjnyQpvCwMQpYqvEivTyr94hsaK3Pdo0WBcQZTLSdSAHkqaYUnnRGBOzjiI+uypKSjaHnV4/OErl7y8+rmr7irX1+qu5m02mE3rI1vg2gF2mxlZ8leIx6Nc+Ey1CQUWDuMwoLiEgxJoXDZHDmLBYhrmOLQQw0Z2risW1vy554Z80lu/NHEa/bfiBIyaxdZ6j0zoJnd1Cryf9rQ0eQwLqUuHKEWN4UmTnHKeTUnPpYGBamFByCM9H/lYgYGBPlprlWpcxco1jop505LkY9xg1U0JwOetEgZSF11VX6USvGvoC0/ldMZ/VV92krufJBTSfb1qV+ZPthy9guJNse/DQZvulHYjc7fto/fr1azw56yZ6zDIvaOHbNP7Iz1mWDf2Q/2k/0KXgSfg14wuHENGquYJ2zNguQPRonRJw==</latexit><latexit sha1_base64="AQBP24lVIYJw82kPZZcKyU6iDTc=">AAACpXicbVHbbhMxEHWWWwmXpuWRF4sIkQKKdhESfalUUSR4KFILSVop2a5mnUli1etd2WOUaJXf4Gt4hX/gb/Bug0QSRrJ8fM5cPDNpoaSlMPzdCG7dvnP33s795oOHjx7vtvb2BzZ3RmBf5Co3lylYVFJjnyQpvCwMQpYqvEivTyr94hsaK3Pdo0WBcQZTLSdSAHkqaYUnnRGBOzjiI+uypKSjaHnV4/OErl7y8+rmr7irX1+qu5m02mE3rI1vg2gF2mxlZ8leIx6Nc+Ey1CQUWDuMwoLiEgxJoXDZHDmLBYhrmOLQQw0Z2risW1vy554Z80lu/NHEa/bfiBIyaxdZ6j0zoJnd1Cryf9rQ0eQwLqUuHKEWN4UmTnHKeTUnPpYGBamFByCM9H/lYgYGBPlprlWpcxco1jop505LkY9xg1U0JwOetEgZSF11VX6USvGvoC0/ldMZ/VV92krufJBTSfb1qV+ZPthy9guJNse/DQZvulHYjc7fto/fr1azw56yZ6zDIvaOHbNP7Iz1mWDf2Q/2k/0KXgSfg14wuHENGquYJ2zNguQPRonRJw==</latexit><latexit sha1_base64="AQBP24lVIYJw82kPZZcKyU6iDTc=">AAACpXicbVHbbhMxEHWWWwmXpuWRF4sIkQKKdhESfalUUSR4KFILSVop2a5mnUli1etd2WOUaJXf4Gt4hX/gb/Bug0QSRrJ8fM5cPDNpoaSlMPzdCG7dvnP33s795oOHjx7vtvb2BzZ3RmBf5Co3lylYVFJjnyQpvCwMQpYqvEivTyr94hsaK3Pdo0WBcQZTLSdSAHkqaYUnnRGBOzjiI+uypKSjaHnV4/OErl7y8+rmr7irX1+qu5m02mE3rI1vg2gF2mxlZ8leIx6Nc+Ey1CQUWDuMwoLiEgxJoXDZHDmLBYhrmOLQQw0Z2risW1vy554Z80lu/NHEa/bfiBIyaxdZ6j0zoJnd1Cryf9rQ0eQwLqUuHKEWN4UmTnHKeTUnPpYGBamFByCM9H/lYgYGBPlprlWpcxco1jop505LkY9xg1U0JwOetEgZSF11VX6USvGvoC0/ldMZ/VV92krufJBTSfb1qV+ZPthy9guJNse/DQZvulHYjc7fto/fr1azw56yZ6zDIvaOHbNP7Iz1mWDf2Q/2k/0KXgSfg14wuHENGquYJ2zNguQPRonRJw==</latexit>

• Update: Knew

Kold

� ↵t C(⌧)T�1X

t=0⌫tx⇤t

<latexit sha1_base64="nP4OTb/LZbrM8SY8KdA7c6pm1Ec=">AAACzXicbVFdaxNBFJ2sX7V+pfroy2AQUrFhVwR9EYoVFCxYsWkD2XS5OzubDJ2ZXWbu2IR18+q/8n/47qv+BmfTiCbxwsDhnPsx99y0lMJiGH5vBVeuXrt+Y+vm9q3bd+7ea+/cP7GFM4z3WSELM0jBcik076NAyQel4aBSyU/T84NGP/3MjRWFPsZZyUcKxlrkggF6KmkP3iexApwYVWl+UdNY8hzBmOKC/lUKmdV0j8YgywkkOD/oxghudx5bp5IKX4X1WXW8F/lq7RKk0wTPniTtTtgLF0E3QbQEHbKMo2SnNYqzgjnFNTIJ1g6jsMRRBQYFk7zejp3lJbBzGPOhhxoUt6NqYUFNH3smo3lh/NNIF+y/FRUoa2cq9ZnNUnZda8j/aUOH+ctRJXTpkGt2OSh3kmJBGz9pJgxnKGceADPC/5WyCRhg6F1fmbLoXXK2skk1dVqwIuNrrMQpGvCk5ahA6Gar6q2Qkn4CbemhGE/wj+rbNnL3jRgLtE8P/Wn17kayP0i0bv8mOHnWi8Je9PF5Z//18jRb5CF5RLokIi/IPnlHjkifMPKN/CA/ya/gQ+CCL8H8MjVoLWsekJUIvv4GVJrkjQ==</latexit><latexit sha1_base64="nP4OTb/LZbrM8SY8KdA7c6pm1Ec=">AAACzXicbVFdaxNBFJ2sX7V+pfroy2AQUrFhVwR9EYoVFCxYsWkD2XS5OzubDJ2ZXWbu2IR18+q/8n/47qv+BmfTiCbxwsDhnPsx99y0lMJiGH5vBVeuXrt+Y+vm9q3bd+7ea+/cP7GFM4z3WSELM0jBcik076NAyQel4aBSyU/T84NGP/3MjRWFPsZZyUcKxlrkggF6KmkP3iexApwYVWl+UdNY8hzBmOKC/lUKmdV0j8YgywkkOD/oxghudx5bp5IKX4X1WXW8F/lq7RKk0wTPniTtTtgLF0E3QbQEHbKMo2SnNYqzgjnFNTIJ1g6jsMRRBQYFk7zejp3lJbBzGPOhhxoUt6NqYUFNH3smo3lh/NNIF+y/FRUoa2cq9ZnNUnZda8j/aUOH+ctRJXTpkGt2OSh3kmJBGz9pJgxnKGceADPC/5WyCRhg6F1fmbLoXXK2skk1dVqwIuNrrMQpGvCk5ahA6Gar6q2Qkn4CbemhGE/wj+rbNnL3jRgLtE8P/Wn17kayP0i0bv8mOHnWi8Je9PF5Z//18jRb5CF5RLokIi/IPnlHjkifMPKN/CA/ya/gQ+CCL8H8MjVoLWsekJUIvv4GVJrkjQ==</latexit><latexit sha1_base64="nP4OTb/LZbrM8SY8KdA7c6pm1Ec=">AAACzXicbVFdaxNBFJ2sX7V+pfroy2AQUrFhVwR9EYoVFCxYsWkD2XS5OzubDJ2ZXWbu2IR18+q/8n/47qv+BmfTiCbxwsDhnPsx99y0lMJiGH5vBVeuXrt+Y+vm9q3bd+7ea+/cP7GFM4z3WSELM0jBcik076NAyQel4aBSyU/T84NGP/3MjRWFPsZZyUcKxlrkggF6KmkP3iexApwYVWl+UdNY8hzBmOKC/lUKmdV0j8YgywkkOD/oxghudx5bp5IKX4X1WXW8F/lq7RKk0wTPniTtTtgLF0E3QbQEHbKMo2SnNYqzgjnFNTIJ1g6jsMRRBQYFk7zejp3lJbBzGPOhhxoUt6NqYUFNH3smo3lh/NNIF+y/FRUoa2cq9ZnNUnZda8j/aUOH+ctRJXTpkGt2OSh3kmJBGz9pJgxnKGceADPC/5WyCRhg6F1fmbLoXXK2skk1dVqwIuNrrMQpGvCk5ahA6Gar6q2Qkn4CbemhGE/wj+rbNnL3jRgLtE8P/Wn17kayP0i0bv8mOHnWi8Je9PF5Z//18jRb5CF5RLokIi/IPnlHjkifMPKN/CA/ya/gQ+CCL8H8MjVoLWsekJUIvv4GVJrkjQ==</latexit><latexit sha1_base64="nP4OTb/LZbrM8SY8KdA7c6pm1Ec=">AAACzXicbVFdaxNBFJ2sX7V+pfroy2AQUrFhVwR9EYoVFCxYsWkD2XS5OzubDJ2ZXWbu2IR18+q/8n/47qv+BmfTiCbxwsDhnPsx99y0lMJiGH5vBVeuXrt+Y+vm9q3bd+7ea+/cP7GFM4z3WSELM0jBcik076NAyQel4aBSyU/T84NGP/3MjRWFPsZZyUcKxlrkggF6KmkP3iexApwYVWl+UdNY8hzBmOKC/lUKmdV0j8YgywkkOD/oxghudx5bp5IKX4X1WXW8F/lq7RKk0wTPniTtTtgLF0E3QbQEHbKMo2SnNYqzgjnFNTIJ1g6jsMRRBQYFk7zejp3lJbBzGPOhhxoUt6NqYUFNH3smo3lh/NNIF+y/FRUoa2cq9ZnNUnZda8j/aUOH+ctRJXTpkGt2OSh3kmJBGz9pJgxnKGceADPC/5WyCRhg6F1fmbLoXXK2skk1dVqwIuNrrMQpGvCk5ahA6Gar6q2Qkn4CbemhGE/wj+rbNnL3jRgLtE8P/Wn17kayP0i0bv8mOHnWi8Je9PF5Z//18jRb5CF5RLokIi/IPnlHjkifMPKN/CA/ya/gQ+CCL8H8MjVoLWsekJUIvv4GVJrkjQ==</latexit>

Page 42: reinforcement learning through the optimization lens

Direct Policy Searchminimize Eet,!

hPTt=1 Ct(xt, ut)

i

s.t. xt+1 = ft(xt, ut, et)ut = ⇡(⌧t;#+ !)

<latexit sha1_base64="O4J6OTJoEQZPJiTxt8GSzcRLUKY=">AAADNXicbVJNixNBEJ0Zv9b4ldWjl8ZgSNglZERQkJWFVfSQw4qb7EI6Dj2dmqTZ7p6hu2ZJHOZ3efVvePAmXv0L9iQjmsSChur3XlV1VXWcSWGx3//mB9eu37h5a+92487de/cfNPcfjmyaGw5DnsrUXMTMghQahihQwkVmgKlYwnl8eVLx51dgrEj1GS4zmCg20yIRnKGDouZXGsNM6IIZw5ZlIWXZoCpOF4USWijxGUrSJlQxnMdx8baMCojwkKYKZqykEhIcU5urqMCjsPx0Rk4i7CycIo+wS42YzXFCaZ3R9rBXZVs49UFYkiOS/FUfQhVBG23iLo4iNBMdiszdXtErZnAOyA7WhbsNCnpaPzlqtvq9/srIrhPWTsur7TTa9yd0mvJcgUYumbXjsJ/hxKVDwSW4/nMLGeOXbAZj52qmwE6K1ahL8tQhU5Kkxh2NZIX+G1EwZe1SxU5ZDc1ucxX4P26cY/JyUgid5QiarwsluSSYkmpvZCoMcJRL5zBuhHsr4XNmGEe33Y0qq9wZ8I1OikWuBU+nsIVKXKBhDrSAiglddVW8E1KSj0xbMqhW+Id1aSu680bMBNrDgftCursjdgsJt8e/64ye9cJ+L/zwvHX8ul7NnvfYe+J1vNB74R17771Tb+hxv+0P/KE/Cr4E34Mfwc+1NPDrmEfehgW/fgNXjgl/</latexit><latexit sha1_base64="O4J6OTJoEQZPJiTxt8GSzcRLUKY=">AAADNXicbVJNixNBEJ0Zv9b4ldWjl8ZgSNglZERQkJWFVfSQw4qb7EI6Dj2dmqTZ7p6hu2ZJHOZ3efVvePAmXv0L9iQjmsSChur3XlV1VXWcSWGx3//mB9eu37h5a+92487de/cfNPcfjmyaGw5DnsrUXMTMghQahihQwkVmgKlYwnl8eVLx51dgrEj1GS4zmCg20yIRnKGDouZXGsNM6IIZw5ZlIWXZoCpOF4USWijxGUrSJlQxnMdx8baMCojwkKYKZqykEhIcU5urqMCjsPx0Rk4i7CycIo+wS42YzXFCaZ3R9rBXZVs49UFYkiOS/FUfQhVBG23iLo4iNBMdiszdXtErZnAOyA7WhbsNCnpaPzlqtvq9/srIrhPWTsur7TTa9yd0mvJcgUYumbXjsJ/hxKVDwSW4/nMLGeOXbAZj52qmwE6K1ahL8tQhU5Kkxh2NZIX+G1EwZe1SxU5ZDc1ucxX4P26cY/JyUgid5QiarwsluSSYkmpvZCoMcJRL5zBuhHsr4XNmGEe33Y0qq9wZ8I1OikWuBU+nsIVKXKBhDrSAiglddVW8E1KSj0xbMqhW+Id1aSu680bMBNrDgftCursjdgsJt8e/64ye9cJ+L/zwvHX8ul7NnvfYe+J1vNB74R17771Tb+hxv+0P/KE/Cr4E34Mfwc+1NPDrmEfehgW/fgNXjgl/</latexit><latexit sha1_base64="O4J6OTJoEQZPJiTxt8GSzcRLUKY=">AAADNXicbVJNixNBEJ0Zv9b4ldWjl8ZgSNglZERQkJWFVfSQw4qb7EI6Dj2dmqTZ7p6hu2ZJHOZ3efVvePAmXv0L9iQjmsSChur3XlV1VXWcSWGx3//mB9eu37h5a+92487de/cfNPcfjmyaGw5DnsrUXMTMghQahihQwkVmgKlYwnl8eVLx51dgrEj1GS4zmCg20yIRnKGDouZXGsNM6IIZw5ZlIWXZoCpOF4USWijxGUrSJlQxnMdx8baMCojwkKYKZqykEhIcU5urqMCjsPx0Rk4i7CycIo+wS42YzXFCaZ3R9rBXZVs49UFYkiOS/FUfQhVBG23iLo4iNBMdiszdXtErZnAOyA7WhbsNCnpaPzlqtvq9/srIrhPWTsur7TTa9yd0mvJcgUYumbXjsJ/hxKVDwSW4/nMLGeOXbAZj52qmwE6K1ahL8tQhU5Kkxh2NZIX+G1EwZe1SxU5ZDc1ucxX4P26cY/JyUgid5QiarwsluSSYkmpvZCoMcJRL5zBuhHsr4XNmGEe33Y0qq9wZ8I1OikWuBU+nsIVKXKBhDrSAiglddVW8E1KSj0xbMqhW+Id1aSu680bMBNrDgftCursjdgsJt8e/64ye9cJ+L/zwvHX8ul7NnvfYe+J1vNB74R17771Tb+hxv+0P/KE/Cr4E34Mfwc+1NPDrmEfehgW/fgNXjgl/</latexit><latexit sha1_base64="O4J6OTJoEQZPJiTxt8GSzcRLUKY=">AAADNXicbVJNixNBEJ0Zv9b4ldWjl8ZgSNglZERQkJWFVfSQw4qb7EI6Dj2dmqTZ7p6hu2ZJHOZ3efVvePAmXv0L9iQjmsSChur3XlV1VXWcSWGx3//mB9eu37h5a+92487de/cfNPcfjmyaGw5DnsrUXMTMghQahihQwkVmgKlYwnl8eVLx51dgrEj1GS4zmCg20yIRnKGDouZXGsNM6IIZw5ZlIWXZoCpOF4USWijxGUrSJlQxnMdx8baMCojwkKYKZqykEhIcU5urqMCjsPx0Rk4i7CycIo+wS42YzXFCaZ3R9rBXZVs49UFYkiOS/FUfQhVBG23iLo4iNBMdiszdXtErZnAOyA7WhbsNCnpaPzlqtvq9/srIrhPWTsur7TTa9yd0mvJcgUYumbXjsJ/hxKVDwSW4/nMLGeOXbAZj52qmwE6K1ahL8tQhU5Kkxh2NZIX+G1EwZe1SxU5ZDc1ucxX4P26cY/JyUgid5QiarwsluSSYkmpvZCoMcJRL5zBuhHsr4XNmGEe33Y0qq9wZ8I1OikWuBU+nsIVKXKBhDrSAiglddVW8E1KSj0xbMqhW+Id1aSu680bMBNrDgftCursjdgsJt8e/64ye9cJ+L/zwvHX8ul7NnvfYe+J1vNB74R17771Tb+hxv+0P/KE/Cr4E34Mfwc+1NPDrmEfehgW/fgNXjgl/</latexit>

Random Search

parameter perturbation

• (μ,λ)-Evolution Strategies• SPSA• Bandit Convex Opt

aka…

G(!,#) =

TX

t=1C(xt, ut)

!r log p(!)

<latexit sha1_base64="Eh+lM1eG20CAIQWbIClxHaByF/4=">AAACxnicbVHLattAFB2rrzR9Oe2ym6GmIIMxUii0m5RACskii4TGScBSxdX4Wh4yGomZq9RGGPpX/ZYuum1/oyNHhdruhWEO59z3TUslLQXBj4537/6Dh492Hu8+efrs+Yvu3stLW1RG4EgUqjDXKVhUUuOIJCm8Lg1Cniq8Sm+OGv3qFo2Vhb6gRYlxDpmWUymAHJV0z4/9qMgxg0F0C4ZmSNDnBzxSOCU/slWe1HQQLr9c8CN/ntCgSqjPIyOzWfNrSBU45yLjZZunn3R7wTBYGd8GYQt6rLWzZK8TR5NCVDlqEgqsHYdBSXHt2pFC4XI3qiyWIG4gw7GDGnK0cb2afcnfOmbCp4VxTxNfsf9G1JBbu8hT55kDzeym1pD/08YVTT/EtdRlRajFXaFppTgVvFkkn0iDgtTCARBGul65mIEBQW7da1VWuUsUa5PU80pLUUxwg1U0JwOOtEg5SN1MVR9Lpfhn0JafNpv/q7q0jex/kpkkOzh1N9X9LWd3kHBz/dvgcn8YBsPw/F3v8GN7mh32mr1hPgvZe3bITtgZGzHBvrOf7Bf77Z142qu8r3euXqeNecXWzPv2B2p33yw=</latexit><latexit sha1_base64="Eh+lM1eG20CAIQWbIClxHaByF/4=">AAACxnicbVHLattAFB2rrzR9Oe2ym6GmIIMxUii0m5RACskii4TGScBSxdX4Wh4yGomZq9RGGPpX/ZYuum1/oyNHhdruhWEO59z3TUslLQXBj4537/6Dh492Hu8+efrs+Yvu3stLW1RG4EgUqjDXKVhUUuOIJCm8Lg1Cniq8Sm+OGv3qFo2Vhb6gRYlxDpmWUymAHJV0z4/9qMgxg0F0C4ZmSNDnBzxSOCU/slWe1HQQLr9c8CN/ntCgSqjPIyOzWfNrSBU45yLjZZunn3R7wTBYGd8GYQt6rLWzZK8TR5NCVDlqEgqsHYdBSXHt2pFC4XI3qiyWIG4gw7GDGnK0cb2afcnfOmbCp4VxTxNfsf9G1JBbu8hT55kDzeym1pD/08YVTT/EtdRlRajFXaFppTgVvFkkn0iDgtTCARBGul65mIEBQW7da1VWuUsUa5PU80pLUUxwg1U0JwOOtEg5SN1MVR9Lpfhn0JafNpv/q7q0jex/kpkkOzh1N9X9LWd3kHBz/dvgcn8YBsPw/F3v8GN7mh32mr1hPgvZe3bITtgZGzHBvrOf7Bf77Z142qu8r3euXqeNecXWzPv2B2p33yw=</latexit><latexit sha1_base64="Eh+lM1eG20CAIQWbIClxHaByF/4=">AAACxnicbVHLattAFB2rrzR9Oe2ym6GmIIMxUii0m5RACskii4TGScBSxdX4Wh4yGomZq9RGGPpX/ZYuum1/oyNHhdruhWEO59z3TUslLQXBj4537/6Dh492Hu8+efrs+Yvu3stLW1RG4EgUqjDXKVhUUuOIJCm8Lg1Cniq8Sm+OGv3qFo2Vhb6gRYlxDpmWUymAHJV0z4/9qMgxg0F0C4ZmSNDnBzxSOCU/slWe1HQQLr9c8CN/ntCgSqjPIyOzWfNrSBU45yLjZZunn3R7wTBYGd8GYQt6rLWzZK8TR5NCVDlqEgqsHYdBSXHt2pFC4XI3qiyWIG4gw7GDGnK0cb2afcnfOmbCp4VxTxNfsf9G1JBbu8hT55kDzeym1pD/08YVTT/EtdRlRajFXaFppTgVvFkkn0iDgtTCARBGul65mIEBQW7da1VWuUsUa5PU80pLUUxwg1U0JwOOtEg5SN1MVR9Lpfhn0JafNpv/q7q0jex/kpkkOzh1N9X9LWd3kHBz/dvgcn8YBsPw/F3v8GN7mh32mr1hPgvZe3bITtgZGzHBvrOf7Bf77Z142qu8r3euXqeNecXWzPv2B2p33yw=</latexit><latexit sha1_base64="Eh+lM1eG20CAIQWbIClxHaByF/4=">AAACxnicbVHLattAFB2rrzR9Oe2ym6GmIIMxUii0m5RACskii4TGScBSxdX4Wh4yGomZq9RGGPpX/ZYuum1/oyNHhdruhWEO59z3TUslLQXBj4537/6Dh492Hu8+efrs+Yvu3stLW1RG4EgUqjDXKVhUUuOIJCm8Lg1Cniq8Sm+OGv3qFo2Vhb6gRYlxDpmWUymAHJV0z4/9qMgxg0F0C4ZmSNDnBzxSOCU/slWe1HQQLr9c8CN/ntCgSqjPIyOzWfNrSBU45yLjZZunn3R7wTBYGd8GYQt6rLWzZK8TR5NCVDlqEgqsHYdBSXHt2pFC4XI3qiyWIG4gw7GDGnK0cb2afcnfOmbCp4VxTxNfsf9G1JBbu8hT55kDzeym1pD/08YVTT/EtdRlRajFXaFppTgVvFkkn0iDgtTCARBGul65mIEBQW7da1VWuUsUa5PU80pLUUxwg1U0JwOOtEg5SN1MVR9Lpfhn0JafNpv/q7q0jex/kpkkOzh1N9X9LWd3kHBz/dvgcn8YBsPw/F3v8GN7mh32mr1hPgvZe3bITtgZGzHBvrOf7Bf77Z142qu8r3euXqeNecXWzPv2B2p33yw=</latexit>

!i ⇠ N (0, I)<latexit sha1_base64="hBpV3rSXIVbVxbvTLqYeh9U0gig=">AAACk3icbVFtaxNBEN5c1db6llb8JMhiEFIo4U4KLQhStKBClYqmLeSOMLeZXJbuy7E7Jw1Hvvlr/Kp/xn/jXhrBJA4MPDzPvE9eKukpjn+3oo1bt+9sbt3dvnf/wcNH7Z3dc28rJ7AvrLLuMgePShrskySFl6VD0LnCi/zqbaNffEPnpTVfaVpipqEwciwFUKCG7Wep1VjAUPLUS81TDTQRoOpPs268/2Fv2O7EvXhufB0kC9BhCzsb7rSydGRFpdGQUOD9IIlLympwJIXC2XZaeSxBXEGBgwANaPRZPV9kxl8EZsTH1gU3xOfsvxk1aO+nOg+RzaB+VWvI/2mDisZHWS1NWREacdNoXClOljdX4SPpUJCaBgDCyTArFxNwICjcbqnLvHaJYmmT+royUtgRrrCKrslBID2SBmmarep3Uin+BYznp7KY0F81lG3k7oksJPn90/Ags7cWHB6SrJ5/HZy/7CVxL/l80Dl+vXjNFnvKnrMuS9ghO2bv2RnrM8G+sx/sJ/sVPYleRW+ik5vQqLXIecyWLPr4B0mUy9I=</latexit><latexit sha1_base64="hBpV3rSXIVbVxbvTLqYeh9U0gig=">AAACk3icbVFtaxNBEN5c1db6llb8JMhiEFIo4U4KLQhStKBClYqmLeSOMLeZXJbuy7E7Jw1Hvvlr/Kp/xn/jXhrBJA4MPDzPvE9eKukpjn+3oo1bt+9sbt3dvnf/wcNH7Z3dc28rJ7AvrLLuMgePShrskySFl6VD0LnCi/zqbaNffEPnpTVfaVpipqEwciwFUKCG7Wep1VjAUPLUS81TDTQRoOpPs268/2Fv2O7EvXhufB0kC9BhCzsb7rSydGRFpdGQUOD9IIlLympwJIXC2XZaeSxBXEGBgwANaPRZPV9kxl8EZsTH1gU3xOfsvxk1aO+nOg+RzaB+VWvI/2mDisZHWS1NWREacdNoXClOljdX4SPpUJCaBgDCyTArFxNwICjcbqnLvHaJYmmT+royUtgRrrCKrslBID2SBmmarep3Uin+BYznp7KY0F81lG3k7oksJPn90/Ags7cWHB6SrJ5/HZy/7CVxL/l80Dl+vXjNFnvKnrMuS9ghO2bv2RnrM8G+sx/sJ/sVPYleRW+ik5vQqLXIecyWLPr4B0mUy9I=</latexit><latexit sha1_base64="hBpV3rSXIVbVxbvTLqYeh9U0gig=">AAACk3icbVFtaxNBEN5c1db6llb8JMhiEFIo4U4KLQhStKBClYqmLeSOMLeZXJbuy7E7Jw1Hvvlr/Kp/xn/jXhrBJA4MPDzPvE9eKukpjn+3oo1bt+9sbt3dvnf/wcNH7Z3dc28rJ7AvrLLuMgePShrskySFl6VD0LnCi/zqbaNffEPnpTVfaVpipqEwciwFUKCG7Wep1VjAUPLUS81TDTQRoOpPs268/2Fv2O7EvXhufB0kC9BhCzsb7rSydGRFpdGQUOD9IIlLympwJIXC2XZaeSxBXEGBgwANaPRZPV9kxl8EZsTH1gU3xOfsvxk1aO+nOg+RzaB+VWvI/2mDisZHWS1NWREacdNoXClOljdX4SPpUJCaBgDCyTArFxNwICjcbqnLvHaJYmmT+royUtgRrrCKrslBID2SBmmarep3Uin+BYznp7KY0F81lG3k7oksJPn90/Ags7cWHB6SrJ5/HZy/7CVxL/l80Dl+vXjNFnvKnrMuS9ghO2bv2RnrM8G+sx/sJ/sVPYleRW+ik5vQqLXIecyWLPr4B0mUy9I=</latexit><latexit sha1_base64="hBpV3rSXIVbVxbvTLqYeh9U0gig=">AAACk3icbVFtaxNBEN5c1db6llb8JMhiEFIo4U4KLQhStKBClYqmLeSOMLeZXJbuy7E7Jw1Hvvlr/Kp/xn/jXhrBJA4MPDzPvE9eKukpjn+3oo1bt+9sbt3dvnf/wcNH7Z3dc28rJ7AvrLLuMgePShrskySFl6VD0LnCi/zqbaNffEPnpTVfaVpipqEwciwFUKCG7Wep1VjAUPLUS81TDTQRoOpPs268/2Fv2O7EvXhufB0kC9BhCzsb7rSydGRFpdGQUOD9IIlLympwJIXC2XZaeSxBXEGBgwANaPRZPV9kxl8EZsTH1gU3xOfsvxk1aO+nOg+RzaB+VWvI/2mDisZHWS1NWREacdNoXClOljdX4SPpUJCaBgDCyTArFxNwICjcbqnLvHaJYmmT+royUtgRrrCKrslBID2SBmmarep3Uin+BYznp7KY0F81lG3k7oksJPn90/Ags7cWHB6SrJ5/HZy/7CVxL/l80Dl+vXjNFnvKnrMuS9ghO2bv2RnrM8G+sx/sJ/sVPYleRW+ik5vQqLXIecyWLPr4B0mUy9I=</latexit>

random finite difference approximation to the gradient

C(#) =TX

t=1C(xt, ut)

<latexit sha1_base64="KDGzH6Hio9nc9JPwvim6pvPoVHI=">AAACnHicbVFtaxNBEN5cfWnrS9P6UZDFICZQwp0I9ktLIQUFK1Rs0kJyHnObSbJ0b+/YnS0JR36Cv8av9of4b9xLI5jEgYWH55mZZ2cmLZS0FIa/a8HWg4ePHm/v7D55+uz5Xn3/oGdzZwR2Ra5yc52CRSU1dkmSwuvCIGSpwqv0plPpV7dorMz1Jc0KjDMYazmSAshTSf1tpzm4BUMTJGjxYz6wLktKOo7m3y95pzlN6NAl1OJJvRG2w0XwTRAtQYMt4yLZr8WDYS5chpqEAmv7UVhQXHovKRTOdwfOYgHiBsbY91BDhjYuFxPN+RvPDPkoN/5p4gv234oSMmtnWeozM6CJXdcq8n9a39HoKC6lLhyhFvdGI6c45bxaDx9Kg4LUzAMQRvq/cjEBA4L8EldcFr0LFCuTlFOnpciHuMYqmpIBT1qkDKSupio/SqX4N9CWn8vxhP6qvm0lN8/kWJI9PPeX0q2NZH+QaH39m6D3rh2F7ejr+8bpyfI02+wle82aLGIf2Cn7xC5Ylwn2g/1kv9hd8Co4Cz4HX+5Tg9qy5gVbiaD3Bw5OzqM=</latexit><latexit sha1_base64="KDGzH6Hio9nc9JPwvim6pvPoVHI=">AAACnHicbVFtaxNBEN5cfWnrS9P6UZDFICZQwp0I9ktLIQUFK1Rs0kJyHnObSbJ0b+/YnS0JR36Cv8av9of4b9xLI5jEgYWH55mZZ2cmLZS0FIa/a8HWg4ePHm/v7D55+uz5Xn3/oGdzZwR2Ra5yc52CRSU1dkmSwuvCIGSpwqv0plPpV7dorMz1Jc0KjDMYazmSAshTSf1tpzm4BUMTJGjxYz6wLktKOo7m3y95pzlN6NAl1OJJvRG2w0XwTRAtQYMt4yLZr8WDYS5chpqEAmv7UVhQXHovKRTOdwfOYgHiBsbY91BDhjYuFxPN+RvPDPkoN/5p4gv234oSMmtnWeozM6CJXdcq8n9a39HoKC6lLhyhFvdGI6c45bxaDx9Kg4LUzAMQRvq/cjEBA4L8EldcFr0LFCuTlFOnpciHuMYqmpIBT1qkDKSupio/SqX4N9CWn8vxhP6qvm0lN8/kWJI9PPeX0q2NZH+QaH39m6D3rh2F7ejr+8bpyfI02+wle82aLGIf2Cn7xC5Ylwn2g/1kv9hd8Co4Cz4HX+5Tg9qy5gVbiaD3Bw5OzqM=</latexit><latexit sha1_base64="KDGzH6Hio9nc9JPwvim6pvPoVHI=">AAACnHicbVFtaxNBEN5cfWnrS9P6UZDFICZQwp0I9ktLIQUFK1Rs0kJyHnObSbJ0b+/YnS0JR36Cv8av9of4b9xLI5jEgYWH55mZZ2cmLZS0FIa/a8HWg4ePHm/v7D55+uz5Xn3/oGdzZwR2Ra5yc52CRSU1dkmSwuvCIGSpwqv0plPpV7dorMz1Jc0KjDMYazmSAshTSf1tpzm4BUMTJGjxYz6wLktKOo7m3y95pzlN6NAl1OJJvRG2w0XwTRAtQYMt4yLZr8WDYS5chpqEAmv7UVhQXHovKRTOdwfOYgHiBsbY91BDhjYuFxPN+RvPDPkoN/5p4gv234oSMmtnWeozM6CJXdcq8n9a39HoKC6lLhyhFvdGI6c45bxaDx9Kg4LUzAMQRvq/cjEBA4L8EldcFr0LFCuTlFOnpciHuMYqmpIBT1qkDKSupio/SqX4N9CWn8vxhP6qvm0lN8/kWJI9PPeX0q2NZH+QaH39m6D3rh2F7ejr+8bpyfI02+wle82aLGIf2Cn7xC5Ylwn2g/1kv9hd8Co4Cz4HX+5Tg9qy5gVbiaD3Bw5OzqM=</latexit><latexit sha1_base64="KDGzH6Hio9nc9JPwvim6pvPoVHI=">AAACnHicbVFtaxNBEN5cfWnrS9P6UZDFICZQwp0I9ktLIQUFK1Rs0kJyHnObSbJ0b+/YnS0JR36Cv8av9of4b9xLI5jEgYWH55mZZ2cmLZS0FIa/a8HWg4ePHm/v7D55+uz5Xn3/oGdzZwR2Ra5yc52CRSU1dkmSwuvCIGSpwqv0plPpV7dorMz1Jc0KjDMYazmSAshTSf1tpzm4BUMTJGjxYz6wLktKOo7m3y95pzlN6NAl1OJJvRG2w0XwTRAtQYMt4yLZr8WDYS5chpqEAmv7UVhQXHovKRTOdwfOYgHiBsbY91BDhjYuFxPN+RvPDPkoN/5p4gv234oSMmtnWeozM6CJXdcq8n9a39HoKC6lLhyhFvdGI6c45bxaDx9Kg4LUzAMQRvq/cjEBA4L8EldcFr0LFCuTlFOnpciHuMYqmpIBT1qkDKSupio/SqX4N9CWn8vxhP6qvm0lN8/kWJI9PPeX0q2NZH+QaH39m6D3rh2F7ejr+8bpyfI02+wle82aLGIf2Cn7xC5Ylwn2g/1kv9hd8Co4Cz4HX+5Tg9qy5gVbiaD3Bw5OzqM=</latexit>

G(m)(!,#) =1m

mX

i=1

C(#+ �!i)� C(#� �!i)

2� !i

<latexit sha1_base64="6+D+T3hYRYSuqveJ1ziZPUXU/BA=">AAADA3icbVJNa9wwEJXdryT92qS39iK6LXhpstih0FxSAimkhxxS2k0Ca8dotbJXRJKNNA5ZhI+99o/0VnrtD+n/6A+ovN6G7G4HBI/3nmY0MxqVghsIw9+ef+fuvfsP1tY3Hj56/ORpZ3Pr1BSVpmxAC1Ho8xExTHDFBsBBsPNSMyJHgp2NLg8b/eyKacML9QWmJUskyRXPOCXgqLTzbf3owgayVwdxIVlOtuMromHCgPTwPo4zTaiNaitrHJtKppbvR/WFbPnD4MaM3zid55LgNk3Ke3gH3zbsrBpqu9ty9Q2ZdrphP5wFXgXRHHTRPE7STS+JxwWtJFNABTFmGIUlJNaV5VSweiOuDCsJvSQ5GzqoiGQmsbPB1fi1Y8Y4K7Q7CvCMvX3DEmnMVI6cUxKYmGWtIf+nDSvI9hLLVVkBU7QtlFUCQ4GbLeAx14yCmDpAqOburZhOiBsquF0tVJnlLhld6MReV4rTYsyWWAHXoIkjDQNJuGq6skdcCPyZKIOPeT6Bf6pL28jBB55zMNvH7kOo3orZLSRaHv8qON3tR2E/+vS2e/B+vpo19AK9RAGK0Dt0gD6iEzRAFP3xnntd75X/1f/u//B/tlbfm995hhbC//UXPLn0yg==</latexit><latexit sha1_base64="6+D+T3hYRYSuqveJ1ziZPUXU/BA=">AAADA3icbVJNa9wwEJXdryT92qS39iK6LXhpstih0FxSAimkhxxS2k0Ca8dotbJXRJKNNA5ZhI+99o/0VnrtD+n/6A+ovN6G7G4HBI/3nmY0MxqVghsIw9+ef+fuvfsP1tY3Hj56/ORpZ3Pr1BSVpmxAC1Ho8xExTHDFBsBBsPNSMyJHgp2NLg8b/eyKacML9QWmJUskyRXPOCXgqLTzbf3owgayVwdxIVlOtuMromHCgPTwPo4zTaiNaitrHJtKppbvR/WFbPnD4MaM3zid55LgNk3Ke3gH3zbsrBpqu9ty9Q2ZdrphP5wFXgXRHHTRPE7STS+JxwWtJFNABTFmGIUlJNaV5VSweiOuDCsJvSQ5GzqoiGQmsbPB1fi1Y8Y4K7Q7CvCMvX3DEmnMVI6cUxKYmGWtIf+nDSvI9hLLVVkBU7QtlFUCQ4GbLeAx14yCmDpAqOburZhOiBsquF0tVJnlLhld6MReV4rTYsyWWAHXoIkjDQNJuGq6skdcCPyZKIOPeT6Bf6pL28jBB55zMNvH7kOo3orZLSRaHv8qON3tR2E/+vS2e/B+vpo19AK9RAGK0Dt0gD6iEzRAFP3xnntd75X/1f/u//B/tlbfm995hhbC//UXPLn0yg==</latexit><latexit sha1_base64="6+D+T3hYRYSuqveJ1ziZPUXU/BA=">AAADA3icbVJNa9wwEJXdryT92qS39iK6LXhpstih0FxSAimkhxxS2k0Ca8dotbJXRJKNNA5ZhI+99o/0VnrtD+n/6A+ovN6G7G4HBI/3nmY0MxqVghsIw9+ef+fuvfsP1tY3Hj56/ORpZ3Pr1BSVpmxAC1Ho8xExTHDFBsBBsPNSMyJHgp2NLg8b/eyKacML9QWmJUskyRXPOCXgqLTzbf3owgayVwdxIVlOtuMromHCgPTwPo4zTaiNaitrHJtKppbvR/WFbPnD4MaM3zid55LgNk3Ke3gH3zbsrBpqu9ty9Q2ZdrphP5wFXgXRHHTRPE7STS+JxwWtJFNABTFmGIUlJNaV5VSweiOuDCsJvSQ5GzqoiGQmsbPB1fi1Y8Y4K7Q7CvCMvX3DEmnMVI6cUxKYmGWtIf+nDSvI9hLLVVkBU7QtlFUCQ4GbLeAx14yCmDpAqOburZhOiBsquF0tVJnlLhld6MReV4rTYsyWWAHXoIkjDQNJuGq6skdcCPyZKIOPeT6Bf6pL28jBB55zMNvH7kOo3orZLSRaHv8qON3tR2E/+vS2e/B+vpo19AK9RAGK0Dt0gD6iEzRAFP3xnntd75X/1f/u//B/tlbfm995hhbC//UXPLn0yg==</latexit><latexit sha1_base64="6+D+T3hYRYSuqveJ1ziZPUXU/BA=">AAADA3icbVJNa9wwEJXdryT92qS39iK6LXhpstih0FxSAimkhxxS2k0Ca8dotbJXRJKNNA5ZhI+99o/0VnrtD+n/6A+ovN6G7G4HBI/3nmY0MxqVghsIw9+ef+fuvfsP1tY3Hj56/ORpZ3Pr1BSVpmxAC1Ho8xExTHDFBsBBsPNSMyJHgp2NLg8b/eyKacML9QWmJUskyRXPOCXgqLTzbf3owgayVwdxIVlOtuMromHCgPTwPo4zTaiNaitrHJtKppbvR/WFbPnD4MaM3zid55LgNk3Ke3gH3zbsrBpqu9ty9Q2ZdrphP5wFXgXRHHTRPE7STS+JxwWtJFNABTFmGIUlJNaV5VSweiOuDCsJvSQ5GzqoiGQmsbPB1fi1Y8Y4K7Q7CvCMvX3DEmnMVI6cUxKYmGWtIf+nDSvI9hLLVVkBU7QtlFUCQ4GbLeAx14yCmDpAqOburZhOiBsquF0tVJnlLhld6MReV4rTYsyWWAHXoIkjDQNJuGq6skdcCPyZKIOPeT6Bf6pL28jBB55zMNvH7kOo3orZLSRaHv8qON3tR2E/+vS2e/B+vpo19AK9RAGK0Dt0gD6iEzRAFP3xnntd75X/1f/u//B/tlbfm995hhbC//UXPLn0yg==</latexit>

Page 43: reinforcement learning through the optimization lens

Random Search for LQR

• Compute cost: J(�) =T�

t=1x�t Qxt + u�

t Rut

“Greedy strategy”: Build control ut = Kxt

• Sample a random perturbation: ⌫ ⇠ N (0,�2I)<latexit sha1_base64="B7bADqzS7+mpSXWxgArOnsG5mmk=">AAACmHicbVHtahNBFJ2sX7V+tfpP/wwGIYUSdougP4MKtVCkfqQtZNdyd3KzuXRmdpm5Kw1LHsCn8a8+im/jbBrBJF4YOJxz5n7mlSbPcfy7E924eev2na272/fuP3j4aGf38akva6dwqEpduvMcPGqyOGRijeeVQzC5xrP88m2rn31D56m0X3hWYWagsDQhBRyoi51uamuZejIyNcBTBbr5MO/F+y1XGPh6II/2givux4uQmyBZgq5YxsnFbidLx6WqDVpWGrwfJXHFWQOOSWmcb6e1xwrUJRQ4CtCCQZ81i2nm8kVgxnJSuvAsywX7748GjPczkwdn27Jf11ryf9qo5snrrCFb1YxWXRea1FpyKdvVyDE5VKxnAYByFHqVagoOFIcFrlRZ5K5QrUzSXNWWVDnGNVbzFTsIpEc2QLadqjkkreVnsF4eUzHlv2pI28q9d1QQ+/3jcCW7t2EOB0nW178JTg/6SdxPPr7sDt4sT7MlnonnoicS8UoMxHtxIoZCie/ih/gpfkVPo0F0GB1dW6PO8s8TsRLRpz/ttM1X</latexit><latexit sha1_base64="B7bADqzS7+mpSXWxgArOnsG5mmk=">AAACmHicbVHtahNBFJ2sX7V+tfpP/wwGIYUSdougP4MKtVCkfqQtZNdyd3KzuXRmdpm5Kw1LHsCn8a8+im/jbBrBJF4YOJxz5n7mlSbPcfy7E924eev2na272/fuP3j4aGf38akva6dwqEpduvMcPGqyOGRijeeVQzC5xrP88m2rn31D56m0X3hWYWagsDQhBRyoi51uamuZejIyNcBTBbr5MO/F+y1XGPh6II/2givux4uQmyBZgq5YxsnFbidLx6WqDVpWGrwfJXHFWQOOSWmcb6e1xwrUJRQ4CtCCQZ81i2nm8kVgxnJSuvAsywX7748GjPczkwdn27Jf11ryf9qo5snrrCFb1YxWXRea1FpyKdvVyDE5VKxnAYByFHqVagoOFIcFrlRZ5K5QrUzSXNWWVDnGNVbzFTsIpEc2QLadqjkkreVnsF4eUzHlv2pI28q9d1QQ+/3jcCW7t2EOB0nW178JTg/6SdxPPr7sDt4sT7MlnonnoicS8UoMxHtxIoZCie/ih/gpfkVPo0F0GB1dW6PO8s8TsRLRpz/ttM1X</latexit><latexit sha1_base64="B7bADqzS7+mpSXWxgArOnsG5mmk=">AAACmHicbVHtahNBFJ2sX7V+tfpP/wwGIYUSdougP4MKtVCkfqQtZNdyd3KzuXRmdpm5Kw1LHsCn8a8+im/jbBrBJF4YOJxz5n7mlSbPcfy7E924eev2na272/fuP3j4aGf38akva6dwqEpduvMcPGqyOGRijeeVQzC5xrP88m2rn31D56m0X3hWYWagsDQhBRyoi51uamuZejIyNcBTBbr5MO/F+y1XGPh6II/2givux4uQmyBZgq5YxsnFbidLx6WqDVpWGrwfJXHFWQOOSWmcb6e1xwrUJRQ4CtCCQZ81i2nm8kVgxnJSuvAsywX7748GjPczkwdn27Jf11ryf9qo5snrrCFb1YxWXRea1FpyKdvVyDE5VKxnAYByFHqVagoOFIcFrlRZ5K5QrUzSXNWWVDnGNVbzFTsIpEc2QLadqjkkreVnsF4eUzHlv2pI28q9d1QQ+/3jcCW7t2EOB0nW178JTg/6SdxPPr7sDt4sT7MlnonnoicS8UoMxHtxIoZCie/ih/gpfkVPo0F0GB1dW6PO8s8TsRLRpz/ttM1X</latexit><latexit sha1_base64="B7bADqzS7+mpSXWxgArOnsG5mmk=">AAACmHicbVHtahNBFJ2sX7V+tfpP/wwGIYUSdougP4MKtVCkfqQtZNdyd3KzuXRmdpm5Kw1LHsCn8a8+im/jbBrBJF4YOJxz5n7mlSbPcfy7E924eev2na272/fuP3j4aGf38akva6dwqEpduvMcPGqyOGRijeeVQzC5xrP88m2rn31D56m0X3hWYWagsDQhBRyoi51uamuZejIyNcBTBbr5MO/F+y1XGPh6II/2givux4uQmyBZgq5YxsnFbidLx6WqDVpWGrwfJXHFWQOOSWmcb6e1xwrUJRQ4CtCCQZ81i2nm8kVgxnJSuvAsywX7748GjPczkwdn27Jf11ryf9qo5snrrCFb1YxWXRea1FpyKdvVyDE5VKxnAYByFHqVagoOFIcFrlRZ5K5QrUzSXNWWVDnGNVbzFTsIpEc2QLadqjkkreVnsF4eUzHlv2pI28q9d1QQ+/3jcCW7t2EOB0nW178JTg/6SdxPPr7sDt4sT7MlnonnoicS8UoMxHtxIoZCie/ih/gpfkVPo0F0GB1dW6PO8s8TsRLRpz/ttM1X</latexit>

• Collect samples from control � = {x1, . . . , xT}:ut = (K+ ⌫)xt<latexit sha1_base64="KwyGZ+W7CNkjJVpxRgAVeFKYdqs=">AAAChnicbVFdaxNBFJ1stY1Ra2IffRkMQoISdkVpXwqhFRTsQ0STFpIl3J3cJENmZ5eZOzVh6U/xVX+T/8bZNIJJvDBw5pz7fZNcSUth+LsSHDx4eHhUfVR7/OTp8bN64/nAZs4I7ItMZeYmAYtKauyTJIU3uUFIE4XXyeKy1K9v0ViZ6W+0yjFOYablVAogT43rDTcmfs5bn1+PtGsv/Wdcb4adcG18H0Qb0GQb640blXg0yYRLUZNQYO0wCnOKCzAkhcK72shZzEEsYIZDDzWkaONi3fsdf+WZCZ9mxj9NfM3+G1FAau0qTbxnCjS3u1pJ/k8bOpqexYXUuSPU4r7Q1ClOGS8XwSfSoCC18gCEkb5XLuZgQJBf11aVde4cxdYkxdJpKbIJ7rCKlmTAkxYpBanLqYqPUin+FbTlV3I2p7+qT1vKrQ9yJsm+ufI30e09Z3+QaHf9+2DwthOFnejLu2b3YnOaKnvBXrIWi9gp67JPrMf6TLDv7Af7yX4F1aATvA9O712DyibmhG1Z0P0D9Y7GBg==</latexit><latexit sha1_base64="KwyGZ+W7CNkjJVpxRgAVeFKYdqs=">AAAChnicbVFdaxNBFJ1stY1Ra2IffRkMQoISdkVpXwqhFRTsQ0STFpIl3J3cJENmZ5eZOzVh6U/xVX+T/8bZNIJJvDBw5pz7fZNcSUth+LsSHDx4eHhUfVR7/OTp8bN64/nAZs4I7ItMZeYmAYtKauyTJIU3uUFIE4XXyeKy1K9v0ViZ6W+0yjFOYablVAogT43rDTcmfs5bn1+PtGsv/Wdcb4adcG18H0Qb0GQb640blXg0yYRLUZNQYO0wCnOKCzAkhcK72shZzEEsYIZDDzWkaONi3fsdf+WZCZ9mxj9NfM3+G1FAau0qTbxnCjS3u1pJ/k8bOpqexYXUuSPU4r7Q1ClOGS8XwSfSoCC18gCEkb5XLuZgQJBf11aVde4cxdYkxdJpKbIJ7rCKlmTAkxYpBanLqYqPUin+FbTlV3I2p7+qT1vKrQ9yJsm+ufI30e09Z3+QaHf9+2DwthOFnejLu2b3YnOaKnvBXrIWi9gp67JPrMf6TLDv7Af7yX4F1aATvA9O712DyibmhG1Z0P0D9Y7GBg==</latexit><latexit sha1_base64="KwyGZ+W7CNkjJVpxRgAVeFKYdqs=">AAAChnicbVFdaxNBFJ1stY1Ra2IffRkMQoISdkVpXwqhFRTsQ0STFpIl3J3cJENmZ5eZOzVh6U/xVX+T/8bZNIJJvDBw5pz7fZNcSUth+LsSHDx4eHhUfVR7/OTp8bN64/nAZs4I7ItMZeYmAYtKauyTJIU3uUFIE4XXyeKy1K9v0ViZ6W+0yjFOYablVAogT43rDTcmfs5bn1+PtGsv/Wdcb4adcG18H0Qb0GQb640blXg0yYRLUZNQYO0wCnOKCzAkhcK72shZzEEsYIZDDzWkaONi3fsdf+WZCZ9mxj9NfM3+G1FAau0qTbxnCjS3u1pJ/k8bOpqexYXUuSPU4r7Q1ClOGS8XwSfSoCC18gCEkb5XLuZgQJBf11aVde4cxdYkxdJpKbIJ7rCKlmTAkxYpBanLqYqPUin+FbTlV3I2p7+qT1vKrQ9yJsm+ufI30e09Z3+QaHf9+2DwthOFnejLu2b3YnOaKnvBXrIWi9gp67JPrMf6TLDv7Af7yX4F1aATvA9O712DyibmhG1Z0P0D9Y7GBg==</latexit><latexit sha1_base64="KwyGZ+W7CNkjJVpxRgAVeFKYdqs=">AAAChnicbVFdaxNBFJ1stY1Ra2IffRkMQoISdkVpXwqhFRTsQ0STFpIl3J3cJENmZ5eZOzVh6U/xVX+T/8bZNIJJvDBw5pz7fZNcSUth+LsSHDx4eHhUfVR7/OTp8bN64/nAZs4I7ItMZeYmAYtKauyTJIU3uUFIE4XXyeKy1K9v0ViZ6W+0yjFOYablVAogT43rDTcmfs5bn1+PtGsv/Wdcb4adcG18H0Qb0GQb640blXg0yYRLUZNQYO0wCnOKCzAkhcK72shZzEEsYIZDDzWkaONi3fsdf+WZCZ9mxj9NfM3+G1FAau0qTbxnCjS3u1pJ/k8bOpqexYXUuSPU4r7Q1ClOGS8XwSfSoCC18gCEkb5XLuZgQJBf11aVde4cxdYkxdJpKbIJ7rCKlmTAkxYpBanLqYqPUin+FbTlV3I2p7+qT1vKrQ9yJsm+ufI30e09Z3+QaHf9+2DwthOFnejLu2b3YnOaKnvBXrIWi9gp67JPrMf6TLDv7Af7yX4F1aATvA9O712DyibmhG1Z0P0D9Y7GBg==</latexit>

• Update: K K� ↵t J(⌧) ⌫<latexit sha1_base64="gbej/5jhFqpxmesx5RJMY6uj8Yg=">AAACnHicbVHtahNBFJ2sX7V+pfpTkMEgpqBhV4T2Z9GCYipUNGkhu4S7k7vJ0NmZZeaONiztG/g0/tUH8W2cTSOYxAsDh3Pux9x78kpJR3H8uxVdu37j5q2t29t37t67/6C983DojLcCB8IoY09zcKikxgFJUnhaWYQyV3iSn71t9JOvaJ00+gvNK8xKmGpZSAEUqHH7eZ+nCgsCa8033ucveQqqmsGYLj90UwK/e8lT7fm43Yl78SL4JkiWoMOWcTzeaWXpxAhfoiahwLlREleU1WBJCoUX26l3WIE4gymOAtRQosvqxUYX/FlgJrwwNjxNfMH+W1FD6dy8zENmCTRz61pD/k8beSr2s1rqyhNqcTWo8IqT4c15+ERaFKTmAYCwMvyVixlYEBSOuDJl0btCsbJJfe61FGaCa6yic7IQSIdUgtTNVvU7qRT/DNrxIzmd0V81tG3k7qGcSnIvjoJTencjORiSrJ9/Ewxf9ZK4l3x63Tl4s7Rmiz1mT1mXJWyPHbD37JgNmGDf2Q/2k/2KnkSHUT/6eJUatZY1j9hKRMM/7q/PEg==</latexit><latexit sha1_base64="gbej/5jhFqpxmesx5RJMY6uj8Yg=">AAACnHicbVHtahNBFJ2sX7V+pfpTkMEgpqBhV4T2Z9GCYipUNGkhu4S7k7vJ0NmZZeaONiztG/g0/tUH8W2cTSOYxAsDh3Pux9x78kpJR3H8uxVdu37j5q2t29t37t67/6C983DojLcCB8IoY09zcKikxgFJUnhaWYQyV3iSn71t9JOvaJ00+gvNK8xKmGpZSAEUqHH7eZ+nCgsCa8033ucveQqqmsGYLj90UwK/e8lT7fm43Yl78SL4JkiWoMOWcTzeaWXpxAhfoiahwLlREleU1WBJCoUX26l3WIE4gymOAtRQosvqxUYX/FlgJrwwNjxNfMH+W1FD6dy8zENmCTRz61pD/k8beSr2s1rqyhNqcTWo8IqT4c15+ERaFKTmAYCwMvyVixlYEBSOuDJl0btCsbJJfe61FGaCa6yic7IQSIdUgtTNVvU7qRT/DNrxIzmd0V81tG3k7qGcSnIvjoJTencjORiSrJ9/Ewxf9ZK4l3x63Tl4s7Rmiz1mT1mXJWyPHbD37JgNmGDf2Q/2k/2KnkSHUT/6eJUatZY1j9hKRMM/7q/PEg==</latexit><latexit sha1_base64="gbej/5jhFqpxmesx5RJMY6uj8Yg=">AAACnHicbVHtahNBFJ2sX7V+pfpTkMEgpqBhV4T2Z9GCYipUNGkhu4S7k7vJ0NmZZeaONiztG/g0/tUH8W2cTSOYxAsDh3Pux9x78kpJR3H8uxVdu37j5q2t29t37t67/6C983DojLcCB8IoY09zcKikxgFJUnhaWYQyV3iSn71t9JOvaJ00+gvNK8xKmGpZSAEUqHH7eZ+nCgsCa8033ucveQqqmsGYLj90UwK/e8lT7fm43Yl78SL4JkiWoMOWcTzeaWXpxAhfoiahwLlREleU1WBJCoUX26l3WIE4gymOAtRQosvqxUYX/FlgJrwwNjxNfMH+W1FD6dy8zENmCTRz61pD/k8beSr2s1rqyhNqcTWo8IqT4c15+ERaFKTmAYCwMvyVixlYEBSOuDJl0btCsbJJfe61FGaCa6yic7IQSIdUgtTNVvU7qRT/DNrxIzmd0V81tG3k7qGcSnIvjoJTencjORiSrJ9/Ewxf9ZK4l3x63Tl4s7Rmiz1mT1mXJWyPHbD37JgNmGDf2Q/2k/2KnkSHUT/6eJUatZY1j9hKRMM/7q/PEg==</latexit><latexit sha1_base64="gbej/5jhFqpxmesx5RJMY6uj8Yg=">AAACnHicbVHtahNBFJ2sX7V+pfpTkMEgpqBhV4T2Z9GCYipUNGkhu4S7k7vJ0NmZZeaONiztG/g0/tUH8W2cTSOYxAsDh3Pux9x78kpJR3H8uxVdu37j5q2t29t37t67/6C983DojLcCB8IoY09zcKikxgFJUnhaWYQyV3iSn71t9JOvaJ00+gvNK8xKmGpZSAEUqHH7eZ+nCgsCa8033ucveQqqmsGYLj90UwK/e8lT7fm43Yl78SL4JkiWoMOWcTzeaWXpxAhfoiahwLlREleU1WBJCoUX26l3WIE4gymOAtRQosvqxUYX/FlgJrwwNjxNfMH+W1FD6dy8zENmCTRz61pD/k8beSr2s1rqyhNqcTWo8IqT4c15+ERaFKTmAYCwMvyVixlYEBSOuDJl0btCsbJJfe61FGaCa6yic7IQSIdUgtTNVvU7qRT/DNrxIzmd0V81tG3k7qGcSnIvjoJTencjORiSrJ9/Ewxf9ZK4l3x63Tl4s7Rmiz1mT1mXJWyPHbD37JgNmGDf2Q/2k/2KnkSHUT/6eJUatZY1j9hKRMM/7q/PEg==</latexit>

minimize Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit><latexit sha1_base64="j4leBCDOJZuWdWUCyrZFILAbTYQ=">AAADHHicbVLfb9MwEHbCr1EYdPDIi0UFGgxVCSDBA5XGAMHDHjZot0lNFjmu01qznci+oBYr/wqv/CO8IV6R+G+wuyDRjpOiu3zf3dl3n/NKcANR9DsIL12+cvXaxvXOjZubt253t+4cmbLWlI1oKUp9khPDBFdsBBwEO6k0IzIX7Dg/e+P5489MG16qISwqlkoyVbzglICDsu63JGdTrizRmiwaK0TTSWRezq3kikv+hTX4IU4kgVme23dNIlgB46TQhNq4scMGJ6aWmYVB3JwO8TyD08f40Hu8g+vl30fvE82nM0iTpO1u+tD3neeudCdu8AC/bov2fLrzLINOwtSkvVnW7UX9aGn4YhC3QQ+1dpBtBWkyKWktmQIqiDHjOKogde2AU8HcmLVhFaFnZMrGLlREMpPa5UYb/MAhE1yU2n0K8BL9t8ISacxC5i7T78ascx78HzeuoXiZWq6qGpii5wcVtcBQYi8PnnDNKIiFCwjV3N0V0xlx6wYn4sopy94VoyuT2HmtOC0nbA0VMAdNHGgYSMKVn8q+50LgT0QZvO/V+cu6tp7efsunHMyTffdS1KMLyU6QeH39F4Ojp/046seHz3u7e600G+geuo+2UYxeoF30AR2gEaLBZvAseBUMwq/h9/BH+PM8NQzamrtoxcJffwDK/f3K</latexit>

Page 44: reinforcement learning through the optimization lens

• Reinforce is NOT Magic

• What is the variance?

• Necessarily becomes derivative free as you are accessing the decision variable by sampling

• Approximation can be far off

• But it’s certainly super easy!

Page 45: reinforcement learning through the optimization lens

Deep Reinforcement Learning

• Simply parameterize Q-function or policy as a deep net• Note, ADP is tricky to analyze with function approximation• Policy search is considerably more straightforward: make the

log-prob a deep net.

Page 46: reinforcement learning through the optimization lens

“Simplest” Example: LQR

+ru2t

subject to xt+1 =1 10 1

�xt +

01/m

�ut

<latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit>

xt =

ztvt

<latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit>

minimize<latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit>

TX

t=0(xt)

21

<latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit>

samples

nominal control and ADP with 10 samples

Page 47: reinforcement learning through the optimization lens

“Simplest” Example: LQR

+ru2t

subject to xt+1 =1 10 1

�xt +

01/m

�ut

<latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit><latexit sha1_base64="oI5Ov9KcOeHyn9bWcwwJat8Txu4=">AAAC4HicbVHLihNBFK1uX2P7yujSTWFQRkYyXSLMuFAGFHQxixGNM5BuQnX1TaeYquqm6rYkNP0BrsStn+XOP3FpdRJlknih4HDOuY+6N6uUdBjHv4LwytVr12/s3Ixu3b5z915v9/5nV9ZWwFCUqrTnGXegpIEhSlRwXlngOlNwll286fSzL2CdLM0nnFeQal4YOZGCo6fGPTUbN7jPWvqKJhkU0jSZ5mjlrI0YfUIZTZIoXgIw+T+xy2rpPo02k+IkYQfa85fdtO7sUTTu9eNBvAi6DdgK9MkqTse7QZrkpag1GBSKOzdicYVpwy1KoaCNktpBxcUFL2DkoeEaXNos1tLSx57J6aS0/hmkC/ZyRsO1c3OdeacfdOo2tY78nzaqcXKUNtJUNYIRy0aTWlEsabdjmksLAtXcAy6s9LNSMeWWC/SXWOuyqF2BWPtJM6uNFGUOG6zCGVruSQeouTTdr5p3Uin6kRtHT2Qxxb+qL9vJe29lIdE9O/HnNk+3zP4gbHP922D4fPBywD686B+/Xl1mhzwkj8geYeSQHJP35JQMiSA/ye8gCMJQhF/Db+H3pTUMVjkPyFqEP/4A9obmoA==</latexit>

xt =

ztvt

<latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit><latexit sha1_base64="+ojsm2yurvuosb1Y5f0dyh5w9EA=">AAACo3icbVHbbtNAEN2YWym3FB55WRGBioSCjZBaHkCVQIKHPBRoaKXYisbriTPqem3tjqsEK5/B1/AKH8HfsE4DIgkjrfbonLlPWmlyHIa/OsGVq9eu39i5uXvr9p2797p797+4srYKh6rUpT1LwaEmg0Mm1nhWWYQi1Xianr9t9dMLtI5Kc8LzCpMCckMTUsCeGnefz8YsX8s4xZxMkxbAlmYL+dWzcSwv/CdjNNlfZdzthf1waXIbRCvQEys7Hu91kjgrVV2gYaXBuVEUVpw0YJmUxsVuXDusQJ1DjiMPDRTokmY52UI+9kwmJ6X1z7Bcsv9GNFA4Ny9S7+kbnLpNrSX/p41qnhwmDZmqZjTqstCk1pJL2a5JZmRRsZ57AMqS71WqKVhQ7Je5VmWZu0K1Nkkzqw2pMsMNVvOMLXjSIRdApp2qeU9ay89gnBxQPuU/qk/byvvvKCd2zwb+YubplrM/SLS5/m0wfNF/1Y8+vuwdvVldZkc8FI/EvojEgTgSH8SxGAolvonv4of4GTwJBsGn4OTSNeisYh6INQuS31Rr01Q=</latexit>

minimize<latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit><latexit sha1_base64="mr94ezQtH17vzwJopx3THjSdtck=">AAACg3icbVFbaxNRED5ZtdZ6aaqPvhwMQoUSdktBfSgUFfShDxWNLSRLmD2ZJEPPZTlntiQu+SW+6o/y33g2jWASBwY+vm/uU5SaAqfp71Zy5+69nfu7D/YePnr8ZL998PRbcJVX2FNOO39VQEBNFntMrPGq9Aim0HhZXL9v9Msb9IGc/crzEnMDE0tjUsCRGrb3B6Zws9qQJUPfcTFsd9JuujS5DbIV6IiVXQwPWvlg5FRl0LLSEEI/S0vOa/BMSuNib1AFLEFdwwT7EVowGPJ6OflCvozMSI6dj25ZLtl/M2owIcxNESMN8DRsag35P61f8fhNXpMtK0arbhuNKy3ZyeYMckQeFet5BKA8xVmlmoIHxfFYa12WtUtUa5vUs8qSciPcYDXP2EMkA7IBss1W9UfSWn4BG+Q5Tab8V41lG/nwA02Iw9F5/Ih9tRUcH5Jtnn8b9I67b7vZ55PO2bvVZ3bFc/FCHIpMvBZn4pO4ED2hRCV+iJ/iV7KTHCXHycltaNJa5TwTa5ac/gFfIcb8</latexit>

TX

t=0(xt)

21

<latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit><latexit sha1_base64="C3vaSFLHSyP2ZENoaL07qvHm1sk=">AAACi3icbVFdSxtBFJ1sbWttq7E++NCXoaEQoYRdEaqlgvhBffDBUlOFZF3uTm6SwZnZZeauJCz5NX1tf5D/xtmYQpP0wsDhnPsx9540V9JRGD7Ugmcrz1+8XH219vrN2/WN+ua7ny4rrMC2yFRmb1JwqKTBNklSeJNbBJ0qvE7vTir9+h6tk5m5onGOsYaBkX0pgDyV1Le7rtBJSYfh5PaKN0cJ7STR7S5P6o2wFU6DL4NoBhpsFpfJZi3u9jJRaDQkFDjXicKc4hIsSaFwstYtHOYg7mCAHQ8NaHRxOd1gwj96psf7mfXPEJ+y/1aUoJ0b69RnaqChW9Qq8n9ap6D+flxKkxeERjwN6heKU8arc/CetChIjT0AYaX/KxdDsCDIH21uyrR3jmJuk3JUGCmyHi6wikZkwZMOSYM01VblN6kU/wHG8Qs5GNJf1bet5OapHEhyny68M2ZnKdkbEi2efxm0d1sHrej7XuPoeObMKnvPPrAmi9hndsTO2SVrM8Em7Bf7zf4E68Fe8CX4+pQa1GY1W2wugrNHJQfIdw==</latexit>

samples

nominal control and ADP with 10 samples

Page 48: reinforcement learning through the optimization lens

Extraordinary Claims Require Extraordinary Evidence*

“How can we dismiss an entire field which claims such success?”* only if your prior is correct

“Reinforcement learning results are tricky to reproduce: performance is very noisy, algorithms have many moving parts which allow for subtle bugs, and many papers don’t report all the required tricks.”

“RL algorithms are challenging to implement correctly; good results typically only come after fixing many seemingly-trivial bugs.”

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

Timesteps ×106

0

1000

2000

3000

4000

5000

Average

Return

HalfCheetah-v1 (TRPO, Different Random Seeds)

Random Average (5 runs)

Random Average (5 runs)

0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00

Timesteps ×106

−500

0

500

1000

1500

2000

Average

Return

HalfCheetah-v1 (TRPO, Codebase Comparison)

Schulman 2015

Schulman 2017

Duan 2016

blog.openai.com/openai-baselines-dqn/

There has to be a better way!

arxiv:1709.06560

Page 49: reinforcement learning through the optimization lens

G

K

u

y

Coarse-ID control

Page 50: reinforcement learning through the optimization lens

G

K

u

y

Coarse-ID control

^Coarse-grained model is trivial

to fit

Δw

v

High dimensional stats bounds the

error

Design robust control for

feedback loop

Page 51: reinforcement learning through the optimization lens

Coarse-ID control (static case)minimize

ux⇤Qx

subject to x = Bu+ x0{(xi, ui)}Collect data: xi = Bui + x0 + ei

kB� Bk ✏Guarantee: with high probability

B unknown!

B

Note: x = Bu+ x0 +�Bu

minimizeu

supk�Bk✏ kQ1/2(x��Bu)ksubject to x = Bu+ x0

Robust optimization problem:

Estimate B: minimizeB

PNi=1 kBui + x0 � xik2

<latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit>

Page 52: reinforcement learning through the optimization lens

Coarse-ID control (static case)minimize

ux⇤Qx

subject to x = Bu+ x0{(xi, ui)}Collect data: xi = Bui + x0 + ei

Solve robust optimization problem:

B unknown!

minimizeu

supk�Bk✏ kQ1/2(x��Bu)ksubject to x = Bu+ x0

Relaxation: (Triangle inequality!)

minimizeu

kQ1/2xk+ ✏�kuksubject to x = Bu+ x0

kB� Bk ✏Guarantee: with high probabilityBEstimate B: minimize

B

PNi=1 kBui + x0 � xik2

<latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit>

Page 53: reinforcement learning through the optimization lens

Coarse-ID control (static case)minimize

ux⇤Qx

subject to x = Bu+ x0{(xi, ui)}Collect data: xi = Bui + x0 + ei

minimizeB

PNi=1 kBui � xik2Estimate B:

kB� Bk ✏Guarantee: with high probability

B unknown!

B

Relaxation: (Triangle inequality!)

minimizeu

kQ1/2xk+ ✏�kuksubject to x = Bu+ x0

Generalization bound

cost(u) cost(u?) + 4✏�ku?kkQ1/2x?k+ 4✏2�2ku?k2

Page 54: reinforcement learning through the optimization lens

Coarse-ID control (static case)minimize

ux⇤Qx

subject to x = Bu+ x0{(xi, ui)}Collect data: xi = Bui + x0 + ei

B unknown!

Relaxation: (Triangle inequality!)

minimizeu

kQ1/2xk+ ✏�kuksubject to x = Bu+ x0

Generalization bound

cost(u) = cost(u?) +O(✏)

kB� Bk ✏Guarantee: with high probabilityBEstimate B: minimize

B

PNi=1 kBui + x0 � xik2

<latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit><latexit sha1_base64="6BzQPF3vHYC8OzGgGdLVwHkVsY4=">AAAC2nicbVFNbxMxEHWWrxK+UjhysYhA5aPRboUEHJCqgASHChVBaKXsduV1Jsmotndlz6KE7Z44Ia78LP4Af4MrHPCmQSINI1l+em9mPJ6XFQodheGPVnDu/IWLlzYut69cvXb9Rmfz5geXl1bCQOYqt4eZcKDQwICQFBwWFoTOFBxkxy8a/eAjWIe5eU/zAhItJgbHKAV5Ku2M4wwmaCphrZjXlVJ1O84LsIJya4SGB5VGgxo/QR0rf5NLq37N7/HYlTqt8HlUH73h8Um/TJE/nKUh3+azFOOTo512DGa0bJx2umEvXARfB9ESdNky9tPNVhKPcllqMCSVcG4YhQUlvh2hVOCnLB0UQh6LCQw9bGZ1SbVYSM3vembEx7n1xxBfsP9WVEI7N9eZz9SCpu6s1pD/04YljZ8mFZqiJDDy9KFxqTjlvNkuH6EFSWrugZAW/axcToUVkrwHK68sehcgV35SzUqDMh/BGVbRjKzwpAPSAk3zq+oVKsXfCeP4Hk6m9Ff1bRt56yVOvFuP9rzR5v5asjckOrv+dTDY6T3rRW8fd3f7S2c22G12h22xiD1hu+w122cDJtl39pP9Yr+DJPgcfAm+nqYGrWXNLbYSwbc/kS3qPw==</latexit>

Page 55: reinforcement learning through the optimization lens

Coarse-ID optimization unknown!#

Collect data: {(xi, g(xi;#)}

minimizex

f(x)

subject to g(x;#) 0

Estimate :

Guarantee: with high probability

# #

dist(#, #) ✏

Relaxation: minimize

xfrobust

(x)

subject to g(x; #) 0

Generalization bound f(x) f(x?) + err⇣✏, kf � f

robust

k, x?,#⌘

Page 56: reinforcement learning through the optimization lens

“Simple” Example: LQR

“Obvious strategy”: Estimate (A,B), build control ut = Kxt

[Dean, Mania, Matni, R.,Tu, 2017]

Gaussian noise

Run an experiment for T steps with random input. Then minimize(A,B)

PTi=1 kxi+1 � Axi � Buik2

[Mania, R., Simchowitz, Tu, 2018]

controllability Gramianwhere �c = A�cA� + BB�If T � O

✓�2(d+ p)�min(⇤c)✏2

<latexit sha1_base64="Uo+1SKYh9OB/xN5KEqMafdWhDVA=">AAAC03icbVFdixMxFE3Hr3X92K4++hIsQotSOkXQfVtU0IeCK253F5puSTN3pmGTzGxyR7bEeRFf/Vn+CH+Dr/pupq1gWy8EDufcj9xzp4WSDnu9H43o2vUbN2/t3N69c/fe/b3m/oMTl5dWwFDkKrdnU+5ASQNDlKjgrLDA9VTB6fTida2ffgLrZG6OcV7AWPPMyFQKjoGaNCfHLINLylCqBPz7iilIsc1Sy4VnTmaan/dpO3ladCrPVOib8IlnmuPMaq+lqao2Gyxp0WFQOKlyc96vKLMym2Fn0mz1ur1F0G0Qr0CLrOJost8YsyQXpQaDQnHnRnGvwLHnFqVQUO2y0kHBxQXPYBSg4Rrc2C+cqOiTwCQ0zW14BumC/bfCc+3cXE9DZr2D29Rq8n/aqMT05dhLU5QIRiwHpaWimNPaVppICwLVPAAurAx/pWLGg4sYzF+bsuhdgFjbxF+VRoo8gQ1W4RVaHkgHqLk09Vb+rVSKfuTG0UHt8V81tK3l9huZSXTPBuHCprOVHA4Sb9q/DYb97kE3/vC8dfhqdZkd8og8Jm0SkxfkkLwjR2RIBPlOfpJf5Hd0En2OvkRfl6lRY1XzkKxF9O0PNFDnhA==</latexit><latexit sha1_base64="Uo+1SKYh9OB/xN5KEqMafdWhDVA=">AAAC03icbVFdixMxFE3Hr3X92K4++hIsQotSOkXQfVtU0IeCK253F5puSTN3pmGTzGxyR7bEeRFf/Vn+CH+Dr/pupq1gWy8EDufcj9xzp4WSDnu9H43o2vUbN2/t3N69c/fe/b3m/oMTl5dWwFDkKrdnU+5ASQNDlKjgrLDA9VTB6fTida2ffgLrZG6OcV7AWPPMyFQKjoGaNCfHLINLylCqBPz7iilIsc1Sy4VnTmaan/dpO3ladCrPVOib8IlnmuPMaq+lqao2Gyxp0WFQOKlyc96vKLMym2Fn0mz1ur1F0G0Qr0CLrOJost8YsyQXpQaDQnHnRnGvwLHnFqVQUO2y0kHBxQXPYBSg4Rrc2C+cqOiTwCQ0zW14BumC/bfCc+3cXE9DZr2D29Rq8n/aqMT05dhLU5QIRiwHpaWimNPaVppICwLVPAAurAx/pWLGg4sYzF+bsuhdgFjbxF+VRoo8gQ1W4RVaHkgHqLk09Vb+rVSKfuTG0UHt8V81tK3l9huZSXTPBuHCprOVHA4Sb9q/DYb97kE3/vC8dfhqdZkd8og8Jm0SkxfkkLwjR2RIBPlOfpJf5Hd0En2OvkRfl6lRY1XzkKxF9O0PNFDnhA==</latexit><latexit sha1_base64="Uo+1SKYh9OB/xN5KEqMafdWhDVA=">AAAC03icbVFdixMxFE3Hr3X92K4++hIsQotSOkXQfVtU0IeCK253F5puSTN3pmGTzGxyR7bEeRFf/Vn+CH+Dr/pupq1gWy8EDufcj9xzp4WSDnu9H43o2vUbN2/t3N69c/fe/b3m/oMTl5dWwFDkKrdnU+5ASQNDlKjgrLDA9VTB6fTida2ffgLrZG6OcV7AWPPMyFQKjoGaNCfHLINLylCqBPz7iilIsc1Sy4VnTmaan/dpO3ladCrPVOib8IlnmuPMaq+lqao2Gyxp0WFQOKlyc96vKLMym2Fn0mz1ur1F0G0Qr0CLrOJost8YsyQXpQaDQnHnRnGvwLHnFqVQUO2y0kHBxQXPYBSg4Rrc2C+cqOiTwCQ0zW14BumC/bfCc+3cXE9DZr2D29Rq8n/aqMT05dhLU5QIRiwHpaWimNPaVppICwLVPAAurAx/pWLGg4sYzF+bsuhdgFjbxF+VRoo8gQ1W4RVaHkgHqLk09Vb+rVSKfuTG0UHt8V81tK3l9huZSXTPBuHCprOVHA4Sb9q/DYb97kE3/vC8dfhqdZkd8og8Jm0SkxfkkLwjR2RIBPlOfpJf5Hd0En2OvkRfl6lRY1XzkKxF9O0PNFDnhA==</latexit><latexit sha1_base64="Uo+1SKYh9OB/xN5KEqMafdWhDVA=">AAAC03icbVFdixMxFE3Hr3X92K4++hIsQotSOkXQfVtU0IeCK253F5puSTN3pmGTzGxyR7bEeRFf/Vn+CH+Dr/pupq1gWy8EDufcj9xzp4WSDnu9H43o2vUbN2/t3N69c/fe/b3m/oMTl5dWwFDkKrdnU+5ASQNDlKjgrLDA9VTB6fTida2ffgLrZG6OcV7AWPPMyFQKjoGaNCfHLINLylCqBPz7iilIsc1Sy4VnTmaan/dpO3ladCrPVOib8IlnmuPMaq+lqao2Gyxp0WFQOKlyc96vKLMym2Fn0mz1ur1F0G0Qr0CLrOJost8YsyQXpQaDQnHnRnGvwLHnFqVQUO2y0kHBxQXPYBSg4Rrc2C+cqOiTwCQ0zW14BumC/bfCc+3cXE9DZr2D29Rq8n/aqMT05dhLU5QIRiwHpaWimNPaVppICwLVPAAurAx/pWLGg4sYzF+bsuhdgFjbxF+VRoo8gQ1W4RVaHkgHqLk09Vb+rVSKfuTG0UHt8V81tK3l9huZSXTPBuHCprOVHA4Sb9q/DYb97kE3/vC8dfhqdZkd8og8Jm0SkxfkkLwjR2RIBPlOfpJf5Hd0En2OvkRfl6lRY1XzkKxF9O0PNFDnhA==</latexit>

�A � A� � � �B � B� � �andthen w.h.p.

minimize limT!1 Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit>

Page 57: reinforcement learning through the optimization lens

“Simple” Example: LQR

[Dean, Mania, Matni, R., Tu 2017]

“Obvious strategy”: Estimate , build control ut = Kxt(A, B)

�c = A�cA� + BB�

controllability Gramian closed loop gain�cl := k(zI� A� BK?)�1kH1

minimizeu

supk�Ak2✏A, k�Bk2✏B

limT!1

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

s.t. xt+1 = (A+�A)xt + (B+�B)ut

This also tells you when your cost is finite!

Solving an SDP relaxation of this robust control problem yieldsJ(K)� J?

J? C �cl

⇣�min(⇤c)

�1/2 + kK?k2⌘r

�2(d+ p)T

<latexit sha1_base64="qYBpFO5ltn9luocv7DK5ZOe+lXE=">AAADHHicbVJNbxMxEPUuXyVQSOHIxSJC2qhpuxuQ4ACoUpGK2hyKaNpKcRo5Xm9i1fZubS9q5PqvcOWPcENckfg3eJNUahJGsvT03njGM8/DgjNt4vhvEN65e+/+g7WHtUeP1588rW88O9F5qQjtkpzn6myINeVM0q5hhtOzQlEshpyeDi/2Kv30G1Wa5fLYTAraF3gkWcYINp4a1H+gTGFiDyI0xsYeuubWwQBpg5WzNwAiTi/hHmqhVg3tYyHwwCKBzVgJS7hzXs5MhLhvmt6SBJPORagzo0nz3G4lO20HNyG6PpyVRteDNlJsNDbNGtKXylg4ew7SbCTweTtKN4ums8fODeqNeDueBlwFyRw0wDyOBhtBH6U5KQWVhnCsdS+JC9O3WBlGOHU1VGpaYHKBR7TnocSC6r6dbtTBV55JYZYrf6SBU/b2DYuF1hMx9JnVuHpZq8j/ab3SZO/6lsmiNFSSWaOs5NDksLIHpkxRYvjEA0wU82+FZIz9Tow3caHLtHZBycIk9qqUjOQpXWK5uTIKe1JTIzCT1VR2n3EOv2KpYafy4Eb1ZSs5+sRGzOhWx/8U2VxJ9oYky+tfBSft7STeTr68aex+nFuzBl6AlyACCXgLdsFncAS6gATrwevgffAh/B7+DH+Fv2epYTC/8xwsRPjnH9UaAEs=</latexit><latexit sha1_base64="qYBpFO5ltn9luocv7DK5ZOe+lXE=">AAADHHicbVJNbxMxEPUuXyVQSOHIxSJC2qhpuxuQ4ACoUpGK2hyKaNpKcRo5Xm9i1fZubS9q5PqvcOWPcENckfg3eJNUahJGsvT03njGM8/DgjNt4vhvEN65e+/+g7WHtUeP1588rW88O9F5qQjtkpzn6myINeVM0q5hhtOzQlEshpyeDi/2Kv30G1Wa5fLYTAraF3gkWcYINp4a1H+gTGFiDyI0xsYeuubWwQBpg5WzNwAiTi/hHmqhVg3tYyHwwCKBzVgJS7hzXs5MhLhvmt6SBJPORagzo0nz3G4lO20HNyG6PpyVRteDNlJsNDbNGtKXylg4ew7SbCTweTtKN4ums8fODeqNeDueBlwFyRw0wDyOBhtBH6U5KQWVhnCsdS+JC9O3WBlGOHU1VGpaYHKBR7TnocSC6r6dbtTBV55JYZYrf6SBU/b2DYuF1hMx9JnVuHpZq8j/ab3SZO/6lsmiNFSSWaOs5NDksLIHpkxRYvjEA0wU82+FZIz9Tow3caHLtHZBycIk9qqUjOQpXWK5uTIKe1JTIzCT1VR2n3EOv2KpYafy4Eb1ZSs5+sRGzOhWx/8U2VxJ9oYky+tfBSft7STeTr68aex+nFuzBl6AlyACCXgLdsFncAS6gATrwevgffAh/B7+DH+Fv2epYTC/8xwsRPjnH9UaAEs=</latexit><latexit sha1_base64="qYBpFO5ltn9luocv7DK5ZOe+lXE=">AAADHHicbVJNbxMxEPUuXyVQSOHIxSJC2qhpuxuQ4ACoUpGK2hyKaNpKcRo5Xm9i1fZubS9q5PqvcOWPcENckfg3eJNUahJGsvT03njGM8/DgjNt4vhvEN65e+/+g7WHtUeP1588rW88O9F5qQjtkpzn6myINeVM0q5hhtOzQlEshpyeDi/2Kv30G1Wa5fLYTAraF3gkWcYINp4a1H+gTGFiDyI0xsYeuubWwQBpg5WzNwAiTi/hHmqhVg3tYyHwwCKBzVgJS7hzXs5MhLhvmt6SBJPORagzo0nz3G4lO20HNyG6PpyVRteDNlJsNDbNGtKXylg4ew7SbCTweTtKN4ums8fODeqNeDueBlwFyRw0wDyOBhtBH6U5KQWVhnCsdS+JC9O3WBlGOHU1VGpaYHKBR7TnocSC6r6dbtTBV55JYZYrf6SBU/b2DYuF1hMx9JnVuHpZq8j/ab3SZO/6lsmiNFSSWaOs5NDksLIHpkxRYvjEA0wU82+FZIz9Tow3caHLtHZBycIk9qqUjOQpXWK5uTIKe1JTIzCT1VR2n3EOv2KpYafy4Eb1ZSs5+sRGzOhWx/8U2VxJ9oYky+tfBSft7STeTr68aex+nFuzBl6AlyACCXgLdsFncAS6gATrwevgffAh/B7+DH+Fv2epYTC/8xwsRPjnH9UaAEs=</latexit><latexit sha1_base64="qYBpFO5ltn9luocv7DK5ZOe+lXE=">AAADHHicbVJNbxMxEPUuXyVQSOHIxSJC2qhpuxuQ4ACoUpGK2hyKaNpKcRo5Xm9i1fZubS9q5PqvcOWPcENckfg3eJNUahJGsvT03njGM8/DgjNt4vhvEN65e+/+g7WHtUeP1588rW88O9F5qQjtkpzn6myINeVM0q5hhtOzQlEshpyeDi/2Kv30G1Wa5fLYTAraF3gkWcYINp4a1H+gTGFiDyI0xsYeuubWwQBpg5WzNwAiTi/hHmqhVg3tYyHwwCKBzVgJS7hzXs5MhLhvmt6SBJPORagzo0nz3G4lO20HNyG6PpyVRteDNlJsNDbNGtKXylg4ew7SbCTweTtKN4ums8fODeqNeDueBlwFyRw0wDyOBhtBH6U5KQWVhnCsdS+JC9O3WBlGOHU1VGpaYHKBR7TnocSC6r6dbtTBV55JYZYrf6SBU/b2DYuF1hMx9JnVuHpZq8j/ab3SZO/6lsmiNFSSWaOs5NDksLIHpkxRYvjEA0wU82+FZIz9Tow3caHLtHZBycIk9qqUjOQpXWK5uTIKe1JTIzCT1VR2n3EOv2KpYafy4Eb1ZSs5+sRGzOhWx/8U2VxJ9oYky+tfBSft7STeTr68aex+nFuzBl6AlyACCXgLdsFncAS6gATrwevgffAh/B7+DH+Fv2epYTC/8xwsRPjnH9UaAEs=</latexit>

w.h.p.

Page 58: reinforcement learning through the optimization lens

Why robust?

Slightly unstable system, system ID tends to think some nodes are stable

xt+1 =

2

41.01 0.01 00.01 1.01 0.010 0.01 1.01

3

5 xt +

2

41 0 00 1 00 0 1

3

5 ut + et

Page 59: reinforcement learning through the optimization lens

Least-squares estimate may yield unstable controller

Robust synthesis yields stable controller

Page 60: reinforcement learning through the optimization lens

Model-free performs worse

than model-based

Page 61: reinforcement learning through the optimization lens

Why has no one done this before?• Our guarantees for least-squares estimation required

some heavy machinery. • Indeed, best bounds building on papers from last few

years• Our SDP relaxation uses brand new techniques in

controller parameterization (Matni et al)• Naive robust synthesis is nonconvex and requires

solving very large SDPs• The Singularity has arrived!

Page 62: reinforcement learning through the optimization lens

Even LQR is not simple!!!

50 papers on Cosma Shalizi’s blog say otherwise!Need to fix learning theory for time series.

Hard to estimateControl insensitive to mismatch

Easy to estimateControl very sensitive to mismatch

where

Gaussian noise

�c = A�cA� + BB�

controllability Gramian closed loop gain

minimize J :=��

t=1 xTt Qxt + uTt Rut

s.t. xt+1 = Axt + But + et

�cl := k(zI� A� BK?)�1kH1

J(ˆK)� J?J?

C �cl

⇣�min(⇤c)

�1/2+ kK?k2

⌘r�2(d+ p) log(1/�)

n<latexit sha1_base64="EUmGLOqUtsXKDwjK4WwVpYsQP+Q=">AAADKnicbVJNaxsxEN3dfqXuR5z2mIuoKayJ4+yaQnsqKSmk2D6ktE4ClmNkrWyLSNqNNFtilP1HvfaP9BZ67Q+p1nYgtjsgeLw3mtHM0ygT3EAU3frBg4ePHj/Zelp59vzFy+3qzqtTk+aash5NRarPR8QwwRXrAQfBzjPNiBwJdja6PCr1sx9MG56q7zDL2ECSieJjTgk4alj9hceaUNsO8ZSA7RT1/fYQGyC6sHcAYcGu0BFu4EYFHxMpydBiSWCqpaWiKJw8hhAL1zS5J0muiiLE3QVN6xd2Pz5oFWgP4ZvOojS+Gbaw5pMp1CvYXGmwaPEcbPhEkotWmOxldSzSSRgf4IQJIPXCurLDai1qRvNAmyBegpq3jJPhjj/ASUpzyRRQQYzpx1EGA0s0cCpYUcG5YRmhl2TC+g4qIpkZ2Pl+C/TWMQkap9odBWjO3r9hiTRmJkcusxzerGsl+T+tn8P4w8ByleXAFF00GucCQYpKs1DCNaMgZg4Qqrl7K6JT4jYEztKVLvPaGaMrk9jrXHGaJmyNFXANmjjSMJCEq3Iqe8yFQN+IMqhbOnKnurKlHH7mEw6m0XX/RtU3kp0h8fr6N8FpqxlHzfjru9rhx6U1W96u98YLvdh77x16X7wTr+dRf9f/5Lf9TvAz+B3cBn8WqYG/vPPaW4ng7z9WcAWm</latexit><latexit sha1_base64="EUmGLOqUtsXKDwjK4WwVpYsQP+Q=">AAADKnicbVJNaxsxEN3dfqXuR5z2mIuoKayJ4+yaQnsqKSmk2D6ktE4ClmNkrWyLSNqNNFtilP1HvfaP9BZ67Q+p1nYgtjsgeLw3mtHM0ygT3EAU3frBg4ePHj/Zelp59vzFy+3qzqtTk+aash5NRarPR8QwwRXrAQfBzjPNiBwJdja6PCr1sx9MG56q7zDL2ECSieJjTgk4alj9hceaUNsO8ZSA7RT1/fYQGyC6sHcAYcGu0BFu4EYFHxMpydBiSWCqpaWiKJw8hhAL1zS5J0muiiLE3QVN6xd2Pz5oFWgP4ZvOojS+Gbaw5pMp1CvYXGmwaPEcbPhEkotWmOxldSzSSRgf4IQJIPXCurLDai1qRvNAmyBegpq3jJPhjj/ASUpzyRRQQYzpx1EGA0s0cCpYUcG5YRmhl2TC+g4qIpkZ2Pl+C/TWMQkap9odBWjO3r9hiTRmJkcusxzerGsl+T+tn8P4w8ByleXAFF00GucCQYpKs1DCNaMgZg4Qqrl7K6JT4jYEztKVLvPaGaMrk9jrXHGaJmyNFXANmjjSMJCEq3Iqe8yFQN+IMqhbOnKnurKlHH7mEw6m0XX/RtU3kp0h8fr6N8FpqxlHzfjru9rhx6U1W96u98YLvdh77x16X7wTr+dRf9f/5Lf9TvAz+B3cBn8WqYG/vPPaW4ng7z9WcAWm</latexit><latexit sha1_base64="EUmGLOqUtsXKDwjK4WwVpYsQP+Q=">AAADKnicbVJNaxsxEN3dfqXuR5z2mIuoKayJ4+yaQnsqKSmk2D6ktE4ClmNkrWyLSNqNNFtilP1HvfaP9BZ67Q+p1nYgtjsgeLw3mtHM0ygT3EAU3frBg4ePHj/Zelp59vzFy+3qzqtTk+aash5NRarPR8QwwRXrAQfBzjPNiBwJdja6PCr1sx9MG56q7zDL2ECSieJjTgk4alj9hceaUNsO8ZSA7RT1/fYQGyC6sHcAYcGu0BFu4EYFHxMpydBiSWCqpaWiKJw8hhAL1zS5J0muiiLE3QVN6xd2Pz5oFWgP4ZvOojS+Gbaw5pMp1CvYXGmwaPEcbPhEkotWmOxldSzSSRgf4IQJIPXCurLDai1qRvNAmyBegpq3jJPhjj/ASUpzyRRQQYzpx1EGA0s0cCpYUcG5YRmhl2TC+g4qIpkZ2Pl+C/TWMQkap9odBWjO3r9hiTRmJkcusxzerGsl+T+tn8P4w8ByleXAFF00GucCQYpKs1DCNaMgZg4Qqrl7K6JT4jYEztKVLvPaGaMrk9jrXHGaJmyNFXANmjjSMJCEq3Iqe8yFQN+IMqhbOnKnurKlHH7mEw6m0XX/RtU3kp0h8fr6N8FpqxlHzfjru9rhx6U1W96u98YLvdh77x16X7wTr+dRf9f/5Lf9TvAz+B3cBn8WqYG/vPPaW4ng7z9WcAWm</latexit><latexit sha1_base64="EUmGLOqUtsXKDwjK4WwVpYsQP+Q=">AAADKnicbVJNaxsxEN3dfqXuR5z2mIuoKayJ4+yaQnsqKSmk2D6ktE4ClmNkrWyLSNqNNFtilP1HvfaP9BZ67Q+p1nYgtjsgeLw3mtHM0ygT3EAU3frBg4ePHj/Zelp59vzFy+3qzqtTk+aash5NRarPR8QwwRXrAQfBzjPNiBwJdja6PCr1sx9MG56q7zDL2ECSieJjTgk4alj9hceaUNsO8ZSA7RT1/fYQGyC6sHcAYcGu0BFu4EYFHxMpydBiSWCqpaWiKJw8hhAL1zS5J0muiiLE3QVN6xd2Pz5oFWgP4ZvOojS+Gbaw5pMp1CvYXGmwaPEcbPhEkotWmOxldSzSSRgf4IQJIPXCurLDai1qRvNAmyBegpq3jJPhjj/ASUpzyRRQQYzpx1EGA0s0cCpYUcG5YRmhl2TC+g4qIpkZ2Pl+C/TWMQkap9odBWjO3r9hiTRmJkcusxzerGsl+T+tn8P4w8ByleXAFF00GucCQYpKs1DCNaMgZg4Qqrl7K6JT4jYEztKVLvPaGaMrk9jrXHGaJmyNFXANmjjSMJCEq3Iqe8yFQN+IMqhbOnKnurKlHH7mEw6m0XX/RtU3kp0h8fr6N8FpqxlHzfjru9rhx6U1W96u98YLvdh77x16X7wTr+dRf9f/5Lf9TvAz+B3cBn8WqYG/vPPaW4ng7z9WcAWm</latexit>

Page 63: reinforcement learning through the optimization lens

“Simplest” Example: LQR

How many samples are needed for near optimal control?• Lots of asymptotic work in the 80s (adaptive control)• Fietcher, 1997: PAC, discounted costs, many assumptions on contractivity,

bugs in proof.

• Abbas-Yadkori & Szepesvári, 2011: Regret, exponential in dimension, no guarantee on parameter convergence, NP-hard subroutine.

• Ibrahimi et al, 2012: require sparseness in state transitions

• Ouyang et al, 2017: Bayesian setting, unrealistic assumptions, not implementable

• Abeille and Lazaric , 2015: Suboptimal bounds

minimize limT!1 Eh

1T

PTt=1 x

⇤t Qxt + u⇤t Rut

i

s.t. xt+1 = Axt + But + et<latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit><latexit sha1_base64="eUYQlM8OqOqnwPVqLDlbjJBJAnM=">AAADNXicbVJLbxMxEPYurxIeTeHIxSKiKhRFuwgJLpVKAcEhhxaStFK8jbyON7Fqe1f2LCRY+7u48jc4cENc+Qt400UiCSNZM/7m5ZnPaSGFhSj6HoRXrl67fmPrZuvW7Tt3t9s794Y2Lw3jA5bL3Jyl1HIpNB+AAMnPCsOpSiU/TS9e1/7TT9xYkes+LAqeKDrVIhOMgofG7W8k5VOhHTWGLionZdUiKs3nTgktlPjCK7yLiRRq7PrEiOkMfGT+mQidwaIiisIsTd3bikiewYhkhjIXV65fYWJLnwQHcXXex/MxnD/BJ7XG+7hc3j7U+rJmQkjT1nahW7ec+9T9uMIH+FWTdFSHe83H0CJcT5onj9udqBstBW8acWN0UCPH450gIZOclYprYJJaO4qjAhJfDgST3M9fWl5QdkGnfORNTRW3iVuuusKPPDLBWW780YCX6L8ZjiprFyr1kfVu7LqvBv/nG5WQvUyc0EUJXLPLRlkpMeS45g1PhOEM5MIblBnh34rZjPp1g2d3pcuydsHZyiRuXmrB8glfQyXMwVAPWg6Kelb9VO6dkBJ/pNriXs3OX68vW7v33oipAPu057+QfrwR7AmJ19e/aQyfdeOoG5887xweNdRsoQfoIdpDMXqBDtF7dIwGiAW7QS8YBMPwa/gj/Bn+ugwNgybnPlqR8PcfXx4JRw==</latexit>

Page 64: reinforcement learning through the optimization lens

The Linearization PrincipleIf a machine learning algorithm does crazy things when

restricted to linear models, it’s going to do crazy things on complex nonlinear models too.

What happens when we return to nonlinear models?

Page 65: reinforcement learning through the optimization lens

365 366 365 1313909 3651 3810 36686722 4149 6620 480011389 5234 5867 55945146 4607 4816 500711600 6440 6849 6482

<latexit sha1_base64="GeqTxXmS1VOZI4hKkogwzIUGZUE=">AAADzXicbVNdb9MwFM1aPkb42uCRF4uFiZdNdppm6dtgD0xCQNlWNmmpJsd1VquxExyn28jCK/+PN975IThxgNJxpSjH555z73XsRFnCcgXhj5VO99btO3dX79n3Hzx89Hht/cmnPC0koSOSJqk8iXBOEyboSDGV0JNMUsyjhB5Hs706fzynMmepOFJXGR1zfC5YzAhWmjpbX/kZRvSciVLhqEiwrEpCrgkhlb25CUJeJIrpJgUXZU+nqrIMoxi8w5eMFxxgXRmfUyDpBZaTqgJhaIfTehQ7VPRSRXF5hPNZBXSpdn1wuLh6/2ZLq5cYGcWLzNHB8MOWEAvFgX14wTincmuOtNDp+X3HvH3n79pBPeTUpv00y4y2SQ7g4LcKGRAgWOu1PXBqPU7ivSmlCk/bBv6O6zZSD3nG7Ptu4/ECCGvPMU5mVDYNEOoFRtR3e54Bgb9jQH/g1fJXQoE6agp5ZmrPh0bkBahmnD7URD1PwbFI2cQU9yE0E3heC4JmJk0ErrNwAFRM/hzq2doG3IZNgJsAtWDDamN4tvY9nKSk4FQokuA8P0UwU+MSS30dElrZYZHTDJOZPv1TDQXmNB+XzXWswAvNTECcSv3ojTbsoqPEPM+veKSVHKtpvpyryf/lTgsVB+OSiaxQVBDTKC4SoFJQ320wYZISlVxpgIlkelZAplhiovQfYOuPgJa3fBOM3O3BNvrobuy+br/GqvXMem69tJC1Y+1a+9bQGlmk87bzufOlU3aH3Xn3uvvVSDsrreep9U90v/0CyXIRPg==</latexit><latexit sha1_base64="GeqTxXmS1VOZI4hKkogwzIUGZUE=">AAADzXicbVNdb9MwFM1aPkb42uCRF4uFiZdNdppm6dtgD0xCQNlWNmmpJsd1VquxExyn28jCK/+PN975IThxgNJxpSjH555z73XsRFnCcgXhj5VO99btO3dX79n3Hzx89Hht/cmnPC0koSOSJqk8iXBOEyboSDGV0JNMUsyjhB5Hs706fzynMmepOFJXGR1zfC5YzAhWmjpbX/kZRvSciVLhqEiwrEpCrgkhlb25CUJeJIrpJgUXZU+nqrIMoxi8w5eMFxxgXRmfUyDpBZaTqgJhaIfTehQ7VPRSRXF5hPNZBXSpdn1wuLh6/2ZLq5cYGcWLzNHB8MOWEAvFgX14wTincmuOtNDp+X3HvH3n79pBPeTUpv00y4y2SQ7g4LcKGRAgWOu1PXBqPU7ivSmlCk/bBv6O6zZSD3nG7Ptu4/ECCGvPMU5mVDYNEOoFRtR3e54Bgb9jQH/g1fJXQoE6agp5ZmrPh0bkBahmnD7URD1PwbFI2cQU9yE0E3heC4JmJk0ErrNwAFRM/hzq2doG3IZNgJsAtWDDamN4tvY9nKSk4FQokuA8P0UwU+MSS30dElrZYZHTDJOZPv1TDQXmNB+XzXWswAvNTECcSv3ojTbsoqPEPM+veKSVHKtpvpyryf/lTgsVB+OSiaxQVBDTKC4SoFJQ320wYZISlVxpgIlkelZAplhiovQfYOuPgJa3fBOM3O3BNvrobuy+br/GqvXMem69tJC1Y+1a+9bQGlmk87bzufOlU3aH3Xn3uvvVSDsrreep9U90v/0CyXIRPg==</latexit><latexit sha1_base64="GeqTxXmS1VOZI4hKkogwzIUGZUE=">AAADzXicbVNdb9MwFM1aPkb42uCRF4uFiZdNdppm6dtgD0xCQNlWNmmpJsd1VquxExyn28jCK/+PN975IThxgNJxpSjH555z73XsRFnCcgXhj5VO99btO3dX79n3Hzx89Hht/cmnPC0koSOSJqk8iXBOEyboSDGV0JNMUsyjhB5Hs706fzynMmepOFJXGR1zfC5YzAhWmjpbX/kZRvSciVLhqEiwrEpCrgkhlb25CUJeJIrpJgUXZU+nqrIMoxi8w5eMFxxgXRmfUyDpBZaTqgJhaIfTehQ7VPRSRXF5hPNZBXSpdn1wuLh6/2ZLq5cYGcWLzNHB8MOWEAvFgX14wTincmuOtNDp+X3HvH3n79pBPeTUpv00y4y2SQ7g4LcKGRAgWOu1PXBqPU7ivSmlCk/bBv6O6zZSD3nG7Ptu4/ECCGvPMU5mVDYNEOoFRtR3e54Bgb9jQH/g1fJXQoE6agp5ZmrPh0bkBahmnD7URD1PwbFI2cQU9yE0E3heC4JmJk0ErrNwAFRM/hzq2doG3IZNgJsAtWDDamN4tvY9nKSk4FQokuA8P0UwU+MSS30dElrZYZHTDJOZPv1TDQXmNB+XzXWswAvNTECcSv3ojTbsoqPEPM+veKSVHKtpvpyryf/lTgsVB+OSiaxQVBDTKC4SoFJQ320wYZISlVxpgIlkelZAplhiovQfYOuPgJa3fBOM3O3BNvrobuy+br/GqvXMem69tJC1Y+1a+9bQGlmk87bzufOlU3aH3Xn3uvvVSDsrreep9U90v/0CyXIRPg==</latexit><latexit sha1_base64="GeqTxXmS1VOZI4hKkogwzIUGZUE=">AAADzXicbVNdb9MwFM1aPkb42uCRF4uFiZdNdppm6dtgD0xCQNlWNmmpJsd1VquxExyn28jCK/+PN975IThxgNJxpSjH555z73XsRFnCcgXhj5VO99btO3dX79n3Hzx89Hht/cmnPC0koSOSJqk8iXBOEyboSDGV0JNMUsyjhB5Hs706fzynMmepOFJXGR1zfC5YzAhWmjpbX/kZRvSciVLhqEiwrEpCrgkhlb25CUJeJIrpJgUXZU+nqrIMoxi8w5eMFxxgXRmfUyDpBZaTqgJhaIfTehQ7VPRSRXF5hPNZBXSpdn1wuLh6/2ZLq5cYGcWLzNHB8MOWEAvFgX14wTincmuOtNDp+X3HvH3n79pBPeTUpv00y4y2SQ7g4LcKGRAgWOu1PXBqPU7ivSmlCk/bBv6O6zZSD3nG7Ptu4/ECCGvPMU5mVDYNEOoFRtR3e54Bgb9jQH/g1fJXQoE6agp5ZmrPh0bkBahmnD7URD1PwbFI2cQU9yE0E3heC4JmJk0ErrNwAFRM/hzq2doG3IZNgJsAtWDDamN4tvY9nKSk4FQokuA8P0UwU+MSS30dElrZYZHTDJOZPv1TDQXmNB+XzXWswAvNTECcSv3ojTbsoqPEPM+veKSVHKtpvpyryf/lTgsVB+OSiaxQVBDTKC4SoFJQ320wYZISlVxpgIlkelZAplhiovQfYOuPgJa3fBOM3O3BNvrobuy+br/GqvXMem69tJC1Y+1a+9bQGlmk87bzufOlU3aH3Xn3uvvVSDsrreep9U90v/0CyXIRPg==</latexit>

Larger is better

Random search of linear policies outperforms Deep Reinforcement Learning

Page 66: reinforcement learning through the optimization lens

365 366 365 1313909 3651 3810 36686722 4149 6620 480011389 5234 5867 55945146 4607 4816 500711600 6440 6849 6482

<latexit sha1_base64="GeqTxXmS1VOZI4hKkogwzIUGZUE=">AAADzXicbVNdb9MwFM1aPkb42uCRF4uFiZdNdppm6dtgD0xCQNlWNmmpJsd1VquxExyn28jCK/+PN975IThxgNJxpSjH555z73XsRFnCcgXhj5VO99btO3dX79n3Hzx89Hht/cmnPC0koSOSJqk8iXBOEyboSDGV0JNMUsyjhB5Hs706fzynMmepOFJXGR1zfC5YzAhWmjpbX/kZRvSciVLhqEiwrEpCrgkhlb25CUJeJIrpJgUXZU+nqrIMoxi8w5eMFxxgXRmfUyDpBZaTqgJhaIfTehQ7VPRSRXF5hPNZBXSpdn1wuLh6/2ZLq5cYGcWLzNHB8MOWEAvFgX14wTincmuOtNDp+X3HvH3n79pBPeTUpv00y4y2SQ7g4LcKGRAgWOu1PXBqPU7ivSmlCk/bBv6O6zZSD3nG7Ptu4/ECCGvPMU5mVDYNEOoFRtR3e54Bgb9jQH/g1fJXQoE6agp5ZmrPh0bkBahmnD7URD1PwbFI2cQU9yE0E3heC4JmJk0ErrNwAFRM/hzq2doG3IZNgJsAtWDDamN4tvY9nKSk4FQokuA8P0UwU+MSS30dElrZYZHTDJOZPv1TDQXmNB+XzXWswAvNTECcSv3ojTbsoqPEPM+veKSVHKtpvpyryf/lTgsVB+OSiaxQVBDTKC4SoFJQ320wYZISlVxpgIlkelZAplhiovQfYOuPgJa3fBOM3O3BNvrobuy+br/GqvXMem69tJC1Y+1a+9bQGlmk87bzufOlU3aH3Xn3uvvVSDsrreep9U90v/0CyXIRPg==</latexit><latexit sha1_base64="GeqTxXmS1VOZI4hKkogwzIUGZUE=">AAADzXicbVNdb9MwFM1aPkb42uCRF4uFiZdNdppm6dtgD0xCQNlWNmmpJsd1VquxExyn28jCK/+PN975IThxgNJxpSjH555z73XsRFnCcgXhj5VO99btO3dX79n3Hzx89Hht/cmnPC0koSOSJqk8iXBOEyboSDGV0JNMUsyjhB5Hs706fzynMmepOFJXGR1zfC5YzAhWmjpbX/kZRvSciVLhqEiwrEpCrgkhlb25CUJeJIrpJgUXZU+nqrIMoxi8w5eMFxxgXRmfUyDpBZaTqgJhaIfTehQ7VPRSRXF5hPNZBXSpdn1wuLh6/2ZLq5cYGcWLzNHB8MOWEAvFgX14wTincmuOtNDp+X3HvH3n79pBPeTUpv00y4y2SQ7g4LcKGRAgWOu1PXBqPU7ivSmlCk/bBv6O6zZSD3nG7Ptu4/ECCGvPMU5mVDYNEOoFRtR3e54Bgb9jQH/g1fJXQoE6agp5ZmrPh0bkBahmnD7URD1PwbFI2cQU9yE0E3heC4JmJk0ErrNwAFRM/hzq2doG3IZNgJsAtWDDamN4tvY9nKSk4FQokuA8P0UwU+MSS30dElrZYZHTDJOZPv1TDQXmNB+XzXWswAvNTECcSv3ojTbsoqPEPM+veKSVHKtpvpyryf/lTgsVB+OSiaxQVBDTKC4SoFJQ320wYZISlVxpgIlkelZAplhiovQfYOuPgJa3fBOM3O3BNvrobuy+br/GqvXMem69tJC1Y+1a+9bQGlmk87bzufOlU3aH3Xn3uvvVSDsrreep9U90v/0CyXIRPg==</latexit><latexit sha1_base64="GeqTxXmS1VOZI4hKkogwzIUGZUE=">AAADzXicbVNdb9MwFM1aPkb42uCRF4uFiZdNdppm6dtgD0xCQNlWNmmpJsd1VquxExyn28jCK/+PN975IThxgNJxpSjH555z73XsRFnCcgXhj5VO99btO3dX79n3Hzx89Hht/cmnPC0koSOSJqk8iXBOEyboSDGV0JNMUsyjhB5Hs706fzynMmepOFJXGR1zfC5YzAhWmjpbX/kZRvSciVLhqEiwrEpCrgkhlb25CUJeJIrpJgUXZU+nqrIMoxi8w5eMFxxgXRmfUyDpBZaTqgJhaIfTehQ7VPRSRXF5hPNZBXSpdn1wuLh6/2ZLq5cYGcWLzNHB8MOWEAvFgX14wTincmuOtNDp+X3HvH3n79pBPeTUpv00y4y2SQ7g4LcKGRAgWOu1PXBqPU7ivSmlCk/bBv6O6zZSD3nG7Ptu4/ECCGvPMU5mVDYNEOoFRtR3e54Bgb9jQH/g1fJXQoE6agp5ZmrPh0bkBahmnD7URD1PwbFI2cQU9yE0E3heC4JmJk0ErrNwAFRM/hzq2doG3IZNgJsAtWDDamN4tvY9nKSk4FQokuA8P0UwU+MSS30dElrZYZHTDJOZPv1TDQXmNB+XzXWswAvNTECcSv3ojTbsoqPEPM+veKSVHKtpvpyryf/lTgsVB+OSiaxQVBDTKC4SoFJQ320wYZISlVxpgIlkelZAplhiovQfYOuPgJa3fBOM3O3BNvrobuy+br/GqvXMem69tJC1Y+1a+9bQGlmk87bzufOlU3aH3Xn3uvvVSDsrreep9U90v/0CyXIRPg==</latexit><latexit sha1_base64="GeqTxXmS1VOZI4hKkogwzIUGZUE=">AAADzXicbVNdb9MwFM1aPkb42uCRF4uFiZdNdppm6dtgD0xCQNlWNmmpJsd1VquxExyn28jCK/+PN975IThxgNJxpSjH555z73XsRFnCcgXhj5VO99btO3dX79n3Hzx89Hht/cmnPC0koSOSJqk8iXBOEyboSDGV0JNMUsyjhB5Hs706fzynMmepOFJXGR1zfC5YzAhWmjpbX/kZRvSciVLhqEiwrEpCrgkhlb25CUJeJIrpJgUXZU+nqrIMoxi8w5eMFxxgXRmfUyDpBZaTqgJhaIfTehQ7VPRSRXF5hPNZBXSpdn1wuLh6/2ZLq5cYGcWLzNHB8MOWEAvFgX14wTincmuOtNDp+X3HvH3n79pBPeTUpv00y4y2SQ7g4LcKGRAgWOu1PXBqPU7ivSmlCk/bBv6O6zZSD3nG7Ptu4/ECCGvPMU5mVDYNEOoFRtR3e54Bgb9jQH/g1fJXQoE6agp5ZmrPh0bkBahmnD7URD1PwbFI2cQU9yE0E3heC4JmJk0ErrNwAFRM/hzq2doG3IZNgJsAtWDDamN4tvY9nKSk4FQokuA8P0UwU+MSS30dElrZYZHTDJOZPv1TDQXmNB+XzXWswAvNTECcSv3ojTbsoqPEPM+veKSVHKtpvpyryf/lTgsVB+OSiaxQVBDTKC4SoFJQ320wYZISlVxpgIlkelZAplhiovQfYOuPgJa3fBOM3O3BNvrobuy+br/GqvXMem69tJC1Y+1a+9bQGlmk87bzufOlU3aH3Xn3uvvVSDsrreep9U90v/0CyXIRPg==</latexit>

Larger is better

Page 67: reinforcement learning through the optimization lens

0 25000 50000 75000Episodes

�1000

0

1000

2000

3000

4000

Ant-v1

0 - 30 30 - 70 70 - 100

0 5000 100000

1000

2000

3000

4000

5000

6000

HalfCheetah-v1

0 - 5 5 - 20 20 - 100

0 5000 100000

1000

2000

3000

4000Hopper-v1

0 - 20 20 - 30 30 - 100

0 100000 200000 300000 400000Episodes

0

2000

4000

6000

8000

Humanoid-v1

0 - 30 30 - 70 70 - 100

0 500 1000 1500

0

100

200

300

Ave

rage

Rew

ard

Swimmer-v1

0 - 10 10 - 20 20 - 100

0 25000 50000Episodes

0

2000

4000

6000

8000

10000

Ave

rage

Rew

ard

Walker2d-v1

0 - 80 80 - 90 90 - 100

Larger is better

Page 68: reinforcement learning through the optimization lens

Model Predictive Control

⇡k(⌧k) = argminu

Ck(xk, u) + Ee [Vk+1(fk(xk, u, e))]<latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit><latexit sha1_base64="pkdDr1xNi4717UmKASM7I9GZeJ0=">AAAC0HicbVHLattAFB2rryR9Oe0ym6GmYBFjpFJoN4XQtLSLLNKHnYAkxGh8JQ8ajcTMVbERonTbv+pn9Au6bf+gI8eB2u6FgcM59zH33KSSwqDn/ew5N27eun1nb//g7r37Dx72Dx9NTVlrDhNeylJfJsyAFAomKFDCZaWBFYmEiyQ/7fSLL6CNKNVnXFYQFSxTIhWcoaXifhBWIs6HIbI6zl36ioZMZ2EhVNzULT210iLOR7VLj2lYMJwnSfO2jRtoQwkpBnQaN/mx3w7T68wRuG6oRTbHKO4PvLG3CroL/DUYkHWcx4e9KJyVvC5AIZfMmMD3KowaplFwCe1BWBuoGM9ZBoGFihVgomblQkufWmZG01Lbp5Cu2H8rGlYYsywSm9ltYra1jvyfFtSYvowaoaoaQfGrQWktKZa0s5TOhAaOcmkB41rYv1I+Z5pxtMZvTFn1roBvbNIsaiV4OYMtVuICNbOkASyYUN1WzTshJf3ElKFnncfXqm3bycM3IhNoRmf2usrdSbYH8bft3wXTZ2PfG/sfng9OXq9Ps0eOyBMyJD55QU7Ie3JOJoSTH+QX+U3+OB+dhfPV+XaV6vTWNY/JRjjf/wJmcOOV</latexit>

Optimal Policy:

minimize Ee

hPTt=1 Ct(xt, ut) + Cf(xT+1)

i

s.t. xt+1 = ft(xt, ut, et), x1 = xut = ⇡t(⌧t)

<latexit sha1_base64="rLI7jLp75guz6H4R9BivhoZu6YM=">AAADMHicbVJNj9MwEHXC11K+unDkYqhYtdqqShASXCqtWNBy2ENXtLsr1SVyXKe11nYie4JSovCjuPJH4IS48itwukGwLSNFGr838+yZlziTwkIQfPf8a9dv3Ly1c7t15+69+w/auw9PbZobxicslak5j6nlUmg+AQGSn2eGUxVLfhZfHNb82UdurEj1GFYZnym60CIRjIKDovZXEvOF0CU1hq6qUsqqRVScFqUSWijxiVd4DxNFYRnH5dsq4kTyBKbE5ioqYRhWH8b4MIJuEUE/j6CH990xccdyvB9WPWLEYgkzQhpVO4BBreh4cDwe4uRvc587gf7nIgqHhevYww5zFSQTroYArfVbhOt589qo3QkGwTrwdhI2SQc1MYp2vRmZpyxXXAOT1NppGGQwc3IgmORu9NzyjLILuuBTl2qquJ2V6y1X+JlD5jhJjfs04DX6b0dJlbUrFbvKel92k6vB/3HTHJJXs1LoLAeu2eVFSS4xpLi2DM+F4QzkyiWUGeHeitmSGsrAGXvllrV2xtmVScoi14Klc76BSijAUAdaDooKXU9VHgkp8XuqLT6unfvDOtma7r4RCwG2f+z+Ht3bKnaGhJvr305Onw/CYBCevOgcvG6s2UGP0VPURSF6iQ7QOzRCE8S8J96RN/JO/C/+N/+H//Oy1PeankfoSvi/fgNhzQXM</latexit><latexit sha1_base64="rLI7jLp75guz6H4R9BivhoZu6YM=">AAADMHicbVJNj9MwEHXC11K+unDkYqhYtdqqShASXCqtWNBy2ENXtLsr1SVyXKe11nYie4JSovCjuPJH4IS48itwukGwLSNFGr838+yZlziTwkIQfPf8a9dv3Ly1c7t15+69+w/auw9PbZobxicslak5j6nlUmg+AQGSn2eGUxVLfhZfHNb82UdurEj1GFYZnym60CIRjIKDovZXEvOF0CU1hq6qUsqqRVScFqUSWijxiVd4DxNFYRnH5dsq4kTyBKbE5ioqYRhWH8b4MIJuEUE/j6CH990xccdyvB9WPWLEYgkzQhpVO4BBreh4cDwe4uRvc587gf7nIgqHhevYww5zFSQTroYArfVbhOt589qo3QkGwTrwdhI2SQc1MYp2vRmZpyxXXAOT1NppGGQwc3IgmORu9NzyjLILuuBTl2qquJ2V6y1X+JlD5jhJjfs04DX6b0dJlbUrFbvKel92k6vB/3HTHJJXs1LoLAeu2eVFSS4xpLi2DM+F4QzkyiWUGeHeitmSGsrAGXvllrV2xtmVScoi14Klc76BSijAUAdaDooKXU9VHgkp8XuqLT6unfvDOtma7r4RCwG2f+z+Ht3bKnaGhJvr305Onw/CYBCevOgcvG6s2UGP0VPURSF6iQ7QOzRCE8S8J96RN/JO/C/+N/+H//Oy1PeankfoSvi/fgNhzQXM</latexit><latexit sha1_base64="rLI7jLp75guz6H4R9BivhoZu6YM=">AAADMHicbVJNj9MwEHXC11K+unDkYqhYtdqqShASXCqtWNBy2ENXtLsr1SVyXKe11nYie4JSovCjuPJH4IS48itwukGwLSNFGr838+yZlziTwkIQfPf8a9dv3Ly1c7t15+69+w/auw9PbZobxicslak5j6nlUmg+AQGSn2eGUxVLfhZfHNb82UdurEj1GFYZnym60CIRjIKDovZXEvOF0CU1hq6qUsqqRVScFqUSWijxiVd4DxNFYRnH5dsq4kTyBKbE5ioqYRhWH8b4MIJuEUE/j6CH990xccdyvB9WPWLEYgkzQhpVO4BBreh4cDwe4uRvc587gf7nIgqHhevYww5zFSQTroYArfVbhOt589qo3QkGwTrwdhI2SQc1MYp2vRmZpyxXXAOT1NppGGQwc3IgmORu9NzyjLILuuBTl2qquJ2V6y1X+JlD5jhJjfs04DX6b0dJlbUrFbvKel92k6vB/3HTHJJXs1LoLAeu2eVFSS4xpLi2DM+F4QzkyiWUGeHeitmSGsrAGXvllrV2xtmVScoi14Klc76BSijAUAdaDooKXU9VHgkp8XuqLT6unfvDOtma7r4RCwG2f+z+Ht3bKnaGhJvr305Onw/CYBCevOgcvG6s2UGP0VPURSF6iQ7QOzRCE8S8J96RN/JO/C/+N/+H//Oy1PeankfoSvi/fgNhzQXM</latexit><latexit sha1_base64="rLI7jLp75guz6H4R9BivhoZu6YM=">AAADMHicbVJNj9MwEHXC11K+unDkYqhYtdqqShASXCqtWNBy2ENXtLsr1SVyXKe11nYie4JSovCjuPJH4IS48itwukGwLSNFGr838+yZlziTwkIQfPf8a9dv3Ly1c7t15+69+w/auw9PbZobxicslak5j6nlUmg+AQGSn2eGUxVLfhZfHNb82UdurEj1GFYZnym60CIRjIKDovZXEvOF0CU1hq6qUsqqRVScFqUSWijxiVd4DxNFYRnH5dsq4kTyBKbE5ioqYRhWH8b4MIJuEUE/j6CH990xccdyvB9WPWLEYgkzQhpVO4BBreh4cDwe4uRvc587gf7nIgqHhevYww5zFSQTroYArfVbhOt589qo3QkGwTrwdhI2SQc1MYp2vRmZpyxXXAOT1NppGGQwc3IgmORu9NzyjLILuuBTl2qquJ2V6y1X+JlD5jhJjfs04DX6b0dJlbUrFbvKel92k6vB/3HTHJJXs1LoLAeu2eVFSS4xpLi2DM+F4QzkyiWUGeHeitmSGsrAGXvllrV2xtmVScoi14Klc76BSijAUAdaDooKXU9VHgkp8XuqLT6unfvDOtma7r4RCwG2f+z+Ht3bKnaGhJvr305Onw/CYBCevOgcvG6s2UGP0VPURSF6iQ7QOzRCE8S8J96RN/JO/C/+N/+H//Oy1PeankfoSvi/fgNhzQXM</latexit>

MPC: use the policy at every time step

⇡1(x) = argminu

Ck(x, u) + Ee [V1(f1(x, u, e))]<latexit sha1_base64="WJXwBC8KtnS7BQU2KDXmD9r45bw=">AAACxXicbVFdi9NAFJ3Gr3X96uqjL4NFSLGUZBHcl4XFVdaHfahouwtNCJPpTTp0ZhJmbpaWEPxX/hfBV/0dTroVbOuFgcM592PuuWkphcUg+NHx7ty9d//BwcPDR4+fPH3WPXo+sUVlOIx5IQtznTILUmgYo0AJ16UBplIJV+nivNWvbsBYUeivuCohVizXIhOcoaOS7igqRRL6yz49pREzeaSETuqqoefJwl8Oqj59QyPFcJ6m9ccmqaGJJGQ4pZOkDhs/a2sH1QD6/ciIfI5x0u0Fw2AddB+EG9AjmxglR504mhW8UqCRS2btNAxKjGtmUHAJzWFUWSgZX7Acpg5qpsDG9Xr1hr52zIxmhXFPI12z/1bUTFm7UqnLbLewu1pL/k+bVpidxLXQZYWg+e2grJIUC9r6SGfCAEe5coBxI9xfKZ8zwzg6t7emrHuXwLc2qZeVFryYwQ4rcYmGOdICKiZ0u1V9IaSkX5i29LL1+K/q2ray/0HkAu3g0p1U9/eS3UHCXfv3weR4GAbD8PPb3tn7zWkOyEvyivgkJO/IGflERmRMOPlOfpJf5Ld34SkPvZvbVK+zqXlBtsL79gfagt6R</latexit><latexit sha1_base64="WJXwBC8KtnS7BQU2KDXmD9r45bw=">AAACxXicbVFdi9NAFJ3Gr3X96uqjL4NFSLGUZBHcl4XFVdaHfahouwtNCJPpTTp0ZhJmbpaWEPxX/hfBV/0dTroVbOuFgcM592PuuWkphcUg+NHx7ty9d//BwcPDR4+fPH3WPXo+sUVlOIx5IQtznTILUmgYo0AJ16UBplIJV+nivNWvbsBYUeivuCohVizXIhOcoaOS7igqRRL6yz49pREzeaSETuqqoefJwl8Oqj59QyPFcJ6m9ccmqaGJJGQ4pZOkDhs/a2sH1QD6/ciIfI5x0u0Fw2AddB+EG9AjmxglR504mhW8UqCRS2btNAxKjGtmUHAJzWFUWSgZX7Acpg5qpsDG9Xr1hr52zIxmhXFPI12z/1bUTFm7UqnLbLewu1pL/k+bVpidxLXQZYWg+e2grJIUC9r6SGfCAEe5coBxI9xfKZ8zwzg6t7emrHuXwLc2qZeVFryYwQ4rcYmGOdICKiZ0u1V9IaSkX5i29LL1+K/q2ray/0HkAu3g0p1U9/eS3UHCXfv3weR4GAbD8PPb3tn7zWkOyEvyivgkJO/IGflERmRMOPlOfpJf5Ld34SkPvZvbVK+zqXlBtsL79gfagt6R</latexit><latexit sha1_base64="WJXwBC8KtnS7BQU2KDXmD9r45bw=">AAACxXicbVFdi9NAFJ3Gr3X96uqjL4NFSLGUZBHcl4XFVdaHfahouwtNCJPpTTp0ZhJmbpaWEPxX/hfBV/0dTroVbOuFgcM592PuuWkphcUg+NHx7ty9d//BwcPDR4+fPH3WPXo+sUVlOIx5IQtznTILUmgYo0AJ16UBplIJV+nivNWvbsBYUeivuCohVizXIhOcoaOS7igqRRL6yz49pREzeaSETuqqoefJwl8Oqj59QyPFcJ6m9ccmqaGJJGQ4pZOkDhs/a2sH1QD6/ciIfI5x0u0Fw2AddB+EG9AjmxglR504mhW8UqCRS2btNAxKjGtmUHAJzWFUWSgZX7Acpg5qpsDG9Xr1hr52zIxmhXFPI12z/1bUTFm7UqnLbLewu1pL/k+bVpidxLXQZYWg+e2grJIUC9r6SGfCAEe5coBxI9xfKZ8zwzg6t7emrHuXwLc2qZeVFryYwQ4rcYmGOdICKiZ0u1V9IaSkX5i29LL1+K/q2ray/0HkAu3g0p1U9/eS3UHCXfv3weR4GAbD8PPb3tn7zWkOyEvyivgkJO/IGflERmRMOPlOfpJf5Ld34SkPvZvbVK+zqXlBtsL79gfagt6R</latexit><latexit sha1_base64="WJXwBC8KtnS7BQU2KDXmD9r45bw=">AAACxXicbVFdi9NAFJ3Gr3X96uqjL4NFSLGUZBHcl4XFVdaHfahouwtNCJPpTTp0ZhJmbpaWEPxX/hfBV/0dTroVbOuFgcM592PuuWkphcUg+NHx7ty9d//BwcPDR4+fPH3WPXo+sUVlOIx5IQtznTILUmgYo0AJ16UBplIJV+nivNWvbsBYUeivuCohVizXIhOcoaOS7igqRRL6yz49pREzeaSETuqqoefJwl8Oqj59QyPFcJ6m9ccmqaGJJGQ4pZOkDhs/a2sH1QD6/ciIfI5x0u0Fw2AddB+EG9AjmxglR504mhW8UqCRS2btNAxKjGtmUHAJzWFUWSgZX7Acpg5qpsDG9Xr1hr52zIxmhXFPI12z/1bUTFm7UqnLbLewu1pL/k+bVpidxLXQZYWg+e2grJIUC9r6SGfCAEe5coBxI9xfKZ8zwzg6t7emrHuXwLc2qZeVFryYwQ4rcYmGOdICKiZ0u1V9IaSkX5i29LL1+K/q2ray/0HkAu3g0p1U9/eS3UHCXfv3weR4GAbD8PPb3tn7zWkOyEvyivgkJO/IGflERmRMOPlOfpJf5Ld34SkPvZvbVK+zqXlBtsL79gfagt6R</latexit>

MPC ethos: plan on short time horizons, use feedback to correct modeling error and disturbance.

MPC trades improved computation for control cleverness, requiring significant planning for each action.

Page 69: reinforcement learning through the optimization lens

Model Predictive Control

Videos from Todorov Labhttps://homes.cs.washington.edu/~todorov/

Page 70: reinforcement learning through the optimization lens

Reinforcement Learning is the study of how to use past data to enhance the future manipulation of a dynamical system

Actionable IntelligenceControl Theory

Page 71: reinforcement learning through the optimization lens

Actionable Intelligence is the study of how to use past data to enhance the future manipulation of a dynamical system

As soon as a machine learning system is unleashed in feedback with humans, that system is an actionable intelligence system, not a machine learning system.

Page 72: reinforcement learning through the optimization lens

Actionable Intelligence trustable, scalable, predictable

Page 73: reinforcement learning through the optimization lens

• D. Bertsekas. Dynamic Programming and Optimal Control. 4th edition, volumes 1 (2017) and 2 (2012). Athena Scientific.

• D. Bertsekas. and J. Tsitsiklis. Neuro-dynamic Programming. Athena Scientific, 1996.• F. Borrelli, A. Bemporad, and M. Morari. Predictive Control for Linear and Hybrid

Systems. Cambridge, 2017.

Recommended Texts

Page 74: reinforcement learning through the optimization lens

References from the Actionable Intelligence Lab• argmin.net

• “On the Sample Complexity of the Linear Quadratic Regulator.” S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu. arXiv:1710.01688

• “Non-asymptotic Analysis of Robust Control from Coarse-grained Identification.” S. Tu, R. Boczar, A. Packard, and B. Recht. arXiv:1707.04791

• “Least-squares Temporal Differencing for the Linear Quadratic Regulator” S. Tu and B. Recht. In submission to ICML 2018. arXiv:1712.08642

• “Learning without Mixing.” H. Mania, B. Recht, M. Simchowitz, and S. Tu. In submission to COLT 2018. arXiv:1802.08334

• “Simple random search provides a competitive approach to reinforcement learning.” H. Mania, A. Guy, and B. Recht. arXiv:1803.07055

• “Regret Bounds for Robust Adaptive Control of the Linear Quadratic Regulator.” S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu. arXiv:1805.09388

• B. Recht “A short and biased tour of reinforcement learning.” Coming soon…

https://people.eecs.berkeley.edu/~brecht/publications.html