Satisficing Q-Learning: Efficient Learning in Problems With Dichotomous Attributes
Abstract:
In some environments, a learning agent must learn to balance competing objectives. For example, a Q-learner agent may meed to learn which choices expose the agent to risk and which choices lead to a goal. This paper presents a variant of Q learning that learns a pair of utilities die worlds with dicotomous attributes and showe that this algorithm prpperly balances the competing objectives and, as a result, efficiently identifies satisficing solutions. This occurs because exploration of the environment is restricted to those options which, according to current knowledge, are likely to avoid exposure to risk. We empirically validate the algorithm by a showing that the algorithm quickly comnverges to good policies in several simulated worlds of various complexities and b applying the algorithm to learning a force feedback profile for a gas pedal that helps drivers avoid risk situations.