Rich Representations with Exposed Semantics for Deep Visual Reasoning
Carnegie Melon University Pittsburgh United States
Pagination or Media Count:
The objective of this MURI is to develop techniques that can explain complex images and videos in commonsense terms. The emphasis is on how to acquire flexible visual representations that can be shared across tasks and interpreted by humans. Our approach to representations addresses the challenges of describing unfamiliar objects, scenes, activities by exploiting shared properties, and by including the complex interactions. Our reasoning tools emphasize efficient approaches for dealing with the complex structure of the world, focused reasoning to reason about the relevant parts of the visual input, and temporal reasoning to deal with events. We aim to develop approaches to visual reasoning that can incorporate the constraints of the specific task at hand and the need to present useful and relevant information to a human. In addition, we explore in detail the design of the datasets used for training visual reasoning elements and the limitations of these datasets and we explore connections between the computer vision aspects of the work and human vision studies in cognitive neuroscience.