NAVAL RESEARCH LAB WASHINGTON DC WASHINGTON United States
Research into person attributes recognition has focused on approaches to describe a person in terms of their appearance. Typically, this includes a wide range of traits including age, gender, clothing, and footwear. Although this could be used in a wide variety of scenarios, it generally is applied to video surveillance, where attribute recognition is impacted by low resolution, and other issues such as variable pose, occlusion and shadow. Recent approaches have used deep convolutional neural networks CNNs to improve the accuracy in person attribute recognition. However, many of these networks are relatively shallow and it is unclear to what extent they use contextual cues to improve classification accuracy. This paper builds upon prior research by proposing to use a modified ResNet architecture with calibrations that permit us to train networks that are deeper than previously published approaches. Interpretation suggests that this deeper architectures allows the network to take more contextual information into consideration, which helps to improve classification accuracy and generalizability. We present experimental analysis and results for whole body attributes using the PA-100K and PETA datasets and facial attributes using the CelebA dataset.
Journal Article - Open Access
2019 14th IEEE International Conference on Automatic Face and Gesture Recognition (Fg 2019) , 01 Jan 0001, 01 Jan 0001,