Topic 5.2.1: Action Co-Discovery as a Cross-Reconstruction Problem
Technical Report,01 Aug 2015,30 Apr 2019
University of Michigan - Ann Arbor Ann Arbor United States
Pagination or Media Count:
This project seeks a new method for automatically discovering actions that are common to a given set of videos.These common actions, called coactions, are represented either as a set of frames or in more detail as a set of space-time segmentations Figure 1. The project seeks a formulation that does not require strong assumptions on the content or quality of the video signals and yet is computationally efficient. The proposed formulation is based on the idea that each videos role in a certain coaction is measured by how well that video can be used to reconstruct the other videos also participating in the coaction. The proposed methodology explicitly does not incorporate features from the given video into the basis that is representing it to avoid the basis being overwhelmed by the background rather than the action itself. Hence, it is called a cross-reconstruction problem by the investigators. Neither does this novel formulation require a common or joint representation over all videos to represent the coaction, which is the de facto approach for co-detection and co-segmentation methods in particular, in video, it is not clear if the necessary underlying action invariants exist in sufficient descriptiveness to actually specify such a joint model. The novel formulation does not require any such assumption. Ultimately, the main objective is to formulate, solve and study the new cross reconstruction problem. Motivating CONOP Consider a forward operating base FOB in a highly dangerous location that sees dozens or hundreds of vehicles and people in its vicinity in any given day. The FOB is enabled with dozens of security cameras that are constantly on and acquiring video. FOB security team members are under constant watch for suspicious behavior. Some such behaviors have been noticed, such as peculiar walking paths in the neighboring hills, or vehicles making U-turns. However, there are hours and hours of video too much to watch manually.