Entity Resolution Workflow Installation Process and User Guide
Abstract:
Entity resolution, in the context of text processing and information extraction domain, refers to the process of uniquely disambiguating a specific person or an object that appears in a text. For instance, if John Smith appears in a document, entity resolution seeks to identify who that John Smith specifically refers to from available choices in a database. This report describes the setup and configuration of the U.S. Army Research Laboratory s ARL software implementation of an entity resolution algorithm called Relationship-based Data Cleaning RelDC, which systematically exploits not only features but also relationships among entities for the purpose of disambiguation. The main concept is that RelDC views the database as a graph of entities that are linked to each other via relationships. It first utilizes a feature-based method to identify a set of candidate entities choices for a reference to be disambiguated. Graph theoretic techniques are then used to discover and analyze relationships that exist between the entity containing the reference and the set of candidates. In order to demonstrate the RelDC entity resolution algorithm in an intuitive and seamless way, ARL developed an Entity Resolution Workflow ERW.