Accession Number:



Domain-Specific Insight Graphs

Descriptive Note:

[Technical Report, Final Report]

Corporate Author:

University of Southern California

Report Date:


Pagination or Media Count:



Developing scalable, semi-automatic approaches to derive insights from a domain-specific Web corpus is a longstanding research problem in the knowledge discovery and web communities. The problem is particularly challenging in illicit fields, such as human trafficking, where traditional assumptions concerning information representation are frequently violated. In the Domain-Specific Insight Graphs project DIG,we developed technology to build end-to-end investigative knowledge discovery and search systems, focused primarily on illicit Web domains. The technologies include components for information extraction, semantic modeling and query execution, and was tested in on a variety of real world domains, including a human trafficking Web corpus containing over 100 million pages. The prototype includes a GUI that was used by US law enforcement agencies to combat illicit activity. The research results were widely disseminated in multiple publications in journals and conferences, and the software produced is publicly available on Github under the MIT license.

Subject Categories:

  • Statistics and Probability
  • Computer Programming and Software

Distribution Statement:

[A, Approved For Public Release]