# Accession Number:

## ADA280844

# Title:

## Reinforcement Learning With High-Dimensional, Continuous Actions

# Descriptive Note:

## Final technical rept.,

# Corporate Author:

## WRIGHT LAB WRIGHT-PATTERSON AFB OH

# Personal Author(s):

# Report Date:

## 1993-11-04

# Pagination or Media Count:

## 20.0

# Abstract:

Many reinforcement learning systems, such as Q-learning Watkins, 1989, or advantage updating Baird, 1993, require that a function fx,u be learned, and that the value of argmax fx,u be calculated quickly for any given x. The function f could be learned by a function approximation system such as a multilayer preceptron, but the maximum of f for a given x cannot found analytically and is difficult to approximate numerically for high-dimensional u vectors. A new method is proposed, wire fitting, in which a function approximation system is used to learn a set of functions called control wires, and the function f is found by fitting a surface to the control wires. Wire fitting has the following four properties 1 any continuous f function can represented to any desired accuracy given sufficient parameters 2 the function fx,u can be evaluated quickly 3 argmax fx,u can found exactly in constant time after evaluating fx,U 4 wire fitting can incorporate any general function approximation system. These four properties are discussed and it is shown how wire fitting can be combined with a memory-based learning system and Q-learning to control an inverted-pendulum system

# Descriptors:

# Subject Categories:

- Psychology
- Computer Programming and Software