Large-Scale Paraphrasing for Natural Language Understanding

Callison-burch, Chris; Van Durme, Benjamin

Large-Scale Paraphrasing for Natural Language Understanding

Active / Technical Report | Accession Number: AD1050977 |

Abstract:

In this project, we researched and developed technologies to automatically extract large-volumes of paraphrases to aid in natural language understanding NLU tasks. We developed three core algorithms to 1 generate extremely large paraphrase databases, and 2 adapt paraphrase databases to new domains, and 3 augment paraphrase rules with fine-grained semantic entailment relations. Our work introduced the paraphrase database PPDB, the largest paraphrase resource developed to date. The resource contains over 100 million paraphrases for English. We generated paraphrase databases for 23 foreign languages.

Author(s):

Callison-burch, Chris ; Van Durme, Benjamin

Author Organization(s):

Johns Hopkins University Baltimore United States

Descriptive Note:

Technical Report

Pagination:

0071

Security Markings

DOCUMENT & CONTEXTUAL SUMMARY

Distribution:

Approved For Public Release

Distribution Statement:

Approved For Public Release;

RECORD

Collection: TR

Identifying Numbers

Report Number(s):

AFRL/RI-AFRL-RI-RS-TR-2018-098

Task Number(s):

12

Project Number(s):

DEFT

Monitor Series:

AFRL-RI-RS-TR-2018-098

Subject Terms

Joint Capability Areas:

JCA_5_Command and Control; JCA_5.2_Understand; JCA_5.2.3_Share Knowledge and Situational Awareness; JCA_5.6.1_Assess Compliance with Guidance; JCA_5.6_Monitor; JCA_5.5.2_Task; JCA_5.5_Direct; JCA_1.2.1_Training; JCA_5.3_Planning; JCA_1.2.7_Experimentation; JCA_5.2.2_Develop Knowledge and Situational Awareness; JCA_1.3.2_Personnel Management; JCA_1.2.3_Educating; JCA_1.2.5_Lessons Learned; JCA_1.2.6_Concepts; JCA_5.4_Decide; JCA_5.4.5_Intuit

Modernization Areas:

AI and Machine Learning

Communities of Interest:

Energy and Power Technologies

Descriptor(s):

natural language computing, computational linguistics, natural language understanding, supervised machine learning, ontologies, artificial neural networks, artificial intelligence software, automated text summarization, algorithms, data set, machine translation, GRAPHS, CLUSTERING

Field(s)/Group(s):

Linguistics, Cybernetics

Keyword(s):

Paraphrase, knowledge base, FrameNet, semantic entailment, graph clustering

Report Date:

2018 Apr 01

Creation Date:

2018 May 01