Data, Algorithms, and Framework for Automated Analytics of Surveillance Camera Networks
Kitware Inc. Clifton Park United States
Pagination or Media Count:
Recent advances in areas such as video classification, captioning, and activity detection have significantly expanded the scope of automated analytics that can be performed on videos. These advances are supported by large datasets designed to facilitate training algorithms that can scale up to leverage data at the scale of gigabytes and terabytes. However, almost all these datasets have limited number of hours and annotations or use web-based services like flickr or YouTube to obtain more data, which results in data biased towards consumer expectations in terms of distance to the object of interest, object resolution, ratio of interesting to uninteresting video, and so forth. These videos differ from those of visual surveillance and public safety data, in which activities of interest are rare, may not be centered in the field of view, and may occur across multiple video streams. These differences limit the scope of transfer learning of models trained on consumer datasets when applied to public safety data. In this paper we address these challenges by presenting new datasets developed for the IARPA Deep Intermodal Video Analytics DIVA program. The first dataset, DIVA-V1, extends the annotations for existing videos collected for the VIRAT Video Data project. The second dataset, DIVA-M1, was designed expressly for the DIVA program and collected approximately 9300 hours of video, using a 38 camera network to image over 100 actors executing scripted and unscripted activities across approximately two weeks. Annotation of the DIVA-M1 data is ongoing. We additionally discuss results of baseline activity and object detection algorithms on the DIVA-V1 data. Portions of this data are available for public research via NISTs ActEV Activities in Extended Video challenge.