Relation Extraction with Weak Supervision and Distributional Semantics
New York University New York United States
Pagination or Media Count:
Relation Extraction aims at detecting and categorizing semantic relations between pairs of entities in unstructured text. It benefits an enormous number of applications such as Web search and Question Answering. Traditional approaches for relation extraction either rely on learning from a large number of accurate human-labeled examples or pattern matching with hand-crafted rules. These resources are very laborious to obtain and can only be applied to a narrow set of target types of interest. This dissertation focuses on learning relations with little or no human supervision. First, we examine the approach that treats relation extraction as a supervised learning problem. We develop an algorithm that is able to train a model with approximately 13 of the human-annotation cost and that matches the performance of models trained with high-quality annotation. Second, we investigate distant supervision, a weakly supervised algorithm that automatically generates its own labeled training data. We develop a latent Bayesian framework for this purpose. By using a model which provides a better approximation of the weak source of supervision, it outperforms the state-of-the-art methods. Finally, we investigate the possibility of building all relational tables beforehand with an unsupervised relation extraction algorithm. We develop an effective yet efficient algorithm that combines the power of various semantic resources that are automatically mined from a corpus based on distributional semantics. The algorithm is able to extract a very large set of relations from the web at high precision.
- Information Science