Accession Number:

ADA581303

Title:

SAWUS: Siena's Automatic Wikipedia Update System

Descriptive Note:

Conference paper

Corporate Author:

SIENA COLLEGE LOUDONVILLE NY

Report Date:

2012-11-01

Pagination or Media Count:

6.0

Abstract:

The National Institute of Standards and Technology NIST has been running an annual Text Retrieval Competition and Conference TREC since 1992. This is a premier conference that offers researchers in the field of Computational Linguistics the opportunity to showcase their work and compare their results against other leading researchers. Our Siena research team participated in the TREC Knowledge Based Acquisition KBA Track which was offered for the first time in 2012. The objective of this track is to drive research into automatic acquisition of knowledge such as automatically updating Wikipedia by utilizing online news. Specifically our team of researchers developed a system that filters a stream of content for information that should be included on a given Wikipedia page. It was not yet clear how traditional Information Retrieval IR techniques perform for this task therefore we began with a baseline test using current state of the art IR techniques. We then went on to experiment with query expansion building a module that utilized Wikipedia Infoboxes to add terms to our query. This module was incorporated with our IR component to create SAWUS. Four submissions were sent to NIST to undergo a formal evaluation.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE