Enhanced Annotation for Semantic Segmentation in Unstructured Video Sequences for Robotic Navigation
Abstract:
Methods of visual perception provide identification of different landmarks and terrain that can help improve the intelligence of a robot, allowing it to efficiently optimize its paths to avoid obstacles and rough terrain. Semantic segmentation is a type of perception task that applies labels to every pixel within an image after being rigorously trained on large datasets that are accurately annotated. Due to human error there are bound to be annotated images with mislabeled or unlabeled pixels that can distort learning, affecting the visual perception of a robot. To address and correct these errors we propose automated relabeling algorithms. We exploit minute changes in object location between consecutive frames in a video sequence by referencing and comparing related pixels in adjacent frames; if the label values of those pixels in the neighboring images match, we can infer the label of the unlabeled pixel at the corresponding location in the image of interest. As a way to collect more evidence, we extend this approach to use peripheral pixels within a radius threshold in the neighboring images. These pixel-wise labeling solutions and analyses of their resulting annotated images will enable faster annotation and error correction by eliminating human labeling effort. We provide initial results of our automatic annotation inference and discuss the implications this will have on machine learning models used to provide perception information to autonomous robots.