Bottle-Neck Feature Extraction Structures for Multilingual Training and Porting (Pub Version, Open Access)
Journal Article - Open Access
Brno University of Technology Brno Czech Republic
Pagination or Media Count:
Stacked-Bottle-Neck SBN feature extraction is a crucial part of modern automatic speech recognition ASR systems. The SBN network traditionally contains a hidden layer between the BN and output layers. Recently, we have observed that an SBN architecture without this hidden layer i.e. direct BN-layer output-layer connection performs better for a single language but fails in scenarios where a network pre-trained in multilingual fashion is ported to a target language. In this paper, we describe two strategies allowing the direct-connection SBN network to indeed benefit from pre-training with a multilingual net 1 pre-training multilingual net with the hidden layer which is discarded before porting to the target language and 2 using only the direct- connection SBN with triphone targets both in multilingual pre-training and porting to the target language. The results are reported on IARPA-BABEL limited language pack LLP data.
- Voice Communications