Multi-Frame Convolutional Neural Networks for Object Detection in Temporal Data
Technical Report,06 Jul 2015,31 Mar 2017
Naval Postgraduate School Monterey United States
Pagination or Media Count:
Given the problem of detecting objects in video, existing neural-network solutions rely on a post-processing step to combine informationacross frames and strengthen conclusions. This technique has been successful for videos with simple, dominant objects but it cannotdetect objects if a single frame does not contain enough information to distinguish the object from its background. This problem isespecially relevant in the maritime environment, where a whitecap and a human survivor may look identical except for their movementthrough the scene. In order to evaluate a neural networks ability to combine information across multiple frames of information, wedeveloped two versions of a convolutional neural network one version was given multiple frames as input while the other versionwas only provided a single frame. We measured the performance of both versions on the benchmark 3DPeS Dataset and observed asignificant increase in both recall and precision when the network was given 10 frames instead of just one.We also developed our ownnoisy dataset consisting of small birds flying across the Monterey Bay. This dataset contained many instances where, in a singleframe, the objects to be detected were indistinguishable from the surrounding waves and debris. For this dataset, multiple frames wereessential for reliable detections. We also observed a greater improvement in the false negative rate compared to the false positive ratein this noisier dataset, suggesting that the additional frames were especially useful for improving the detection of hard-to-detectobjects.
- Computer Systems