Learning to Extract Gene-Protein Names from Weakly-Labeled Text
Abstract:
Training a named entity recognizer NER has always been a difficult task due to the effort required to generate a significant amount of annotated training data. In this paper, we reduce or eliminate the effort required to create training data by automatically converting other sources of data into annotated training data. The performance of this approach is tested on a gene-protein name extractor by using the mouse and fly data obtained from the BioCreAtIvE challenge. Results show that our methods are effective and that our trained NER system outperforms all of our baseline results.
Security Markings
DOCUMENT & CONTEXTUAL SUMMARY
Distribution:
Approved For Public Release
Distribution Statement:
Approved For Public Release; Distribution Is Unlimited.
RECORD
Collection: TR