Reinforcement Methods for Autonomous Online Learning of Optimal Robot Behaviors

Frank L. Lewis, The University of Texas at Arlington

Abstract:

Reinforcement learning (RL) is a method of learning better control actions by observing the responses to our current actions. RL is based on the way natural organisms and animals learn in response to their environment. Ivan Pavlov used precepts of RL in training his dogs in the 19-th century. In RL, a desired performance measure is specified, and techniques are given for updating control actions so as to improve that prescribed performance measure. Performance measures can capture minimum energy, minimum time motion, minimum fuel, maximum cost benefit, and so on.

In this talk we present methods for learning optimal robot behaviors online using reinforcement methods. Novel methods of using RL for updating the control inputs in dynamical system models are presented. These are online learning methods based on RL actor-critic structures that update control actions so as to learn optimal motion solutions in real time using data measured along the system trajectories. Thus, perception is used to learn skills autonomously during run-time operation.

Modern robotic systems have complex nonlinear dynamics that may not be fully known. Moreover, performance objectives may change with time as the robot progresses through several tasks or as environments change. The methods presented in this talk learn optimal feedback control solutions on-line forward-in-time for systems whose dynamical description is only partially known. The algorithms given allow performance metrics to vary with time. These learning methods are based on reinforcement learning techniques that capture multi-timescale cognitive phenomena in the human brain.

An application of Q learning is given for learning of optimal motions by a humanoid robot.

Navigation

Reinforcement Methods for Autonomous Online Learning of Optimal Robot Behaviors