Control And Learning Of Uncertain Dynamical Systems: Optimization, Sampling, And Regret
Technical Report,13 Apr 2018,13 Oct 2019
University of Washington Seattle United States
Pagination or Media Count:
This report shows that first order methods can be used to provide an effective bridge between optimal control theory and sample-based reinforcement learning. The work focuses on the linear quadratic regulator problem and Markov decision processes. Some of the results include a proof that gradient descent starting from a stabilizing policy converges to the globally optimal policy and an algorithm that provides nearly tight regret bounds for the control of a linear dynamical system with adversarial disturbances.
- Electrical and Electronic Equipment