Accession Number:



Control And Learning Of Uncertain Dynamical Systems: Optimization, Sampling, And Regret

Corporate Author:

University of Washington Seattle United States

Report Date:



This report shows that first order methods can be used to provide an effective bridge between optimal control theory and sample-based reinforcement learning. The work focuses on the linear quadratic regulator problem and Markov decision processes. Some of the results include a proof that gradient descent starting from a stabilizing policy converges to the globally optimal policy and an algorithm that provides nearly tight regret bounds for the control of a linear dynamical system with adversarial disturbances.

Descriptive Note:

Technical Report,13 Apr 2018,13 Oct 2019

Supplementary Note:

01 Jan 0001, 01 Jan 0001, This report is the result of contracted fundamental research, which is deemed exempt from Public Affairs Office security and policy review in accordance with Deputy Assistant Secretary of the Air Force (Science, Technology, Engineering) (SAF/AQR) memorandum dated 10 Dec 08 and Air Force Research Laboratory Executive Director (AFRL/CA) policy clarification memorandum dated 16 Jan 09.



Communities Of Interest:

Modernization Areas:

Distribution Statement:

Approved For Public Release;

Contract Number:


File Size: