Towards Locating and Exploring Hard-to-Find Information on the Web
Technical Report,01 Sep 2014,31 Mar 2018
New York University New York United States
Pagination or Media Count:
This work developed new methods and tools to empower subject matter experts to effectively discover and track information on the Web that is relevant to a given task or domain. Our approach consists of two main components that address these challenges 1 Domain discovery and 2 Crawling and information gathering. For each of these components we have designed new methods, and developed open-source tools that implement these methods. Notably, we have designed a new framework that facilitates domain discovery, organization and presentation. We have also developed a general and extensible crawling infrastructure that substantially extends the ACHE open-source focused crawler to support complex crawling tasks and multiple crawling strategies to discover new content in a timely manner.
- Information Science