Accession Number:

ADA581499

Title:

PRIS at 2012 Microblog Track

Descriptive Note:

Conference paper

Corporate Author:

BEIJING UNIV OF POSTS AND TELECOMMUNICATIONS (CHINA)

Report Date:

2012-11-01

Pagination or Media Count:

4.0

Abstract:

Take account of that most tags are keyword rich and indicate the topic of tweets directly, but there was no space between two words. So Word Segmentation was used to separate the tags by space. This time we used the former max matching algorithm. The problem is that no dictionary is appropriate. Common words dictionary is partial and Oxford Dictionary doesn t distinguish plurality. Then we made a combination of Common words dictionary, Oxford Dictionary and D.A.B. Dictionary of American Biography. But due to abbreviation and unknown words, there is still some mistakes. To avoid undesirable influence from these mistakes, we remained both the original tags and separated tags.

Subject Categories:

  • Information Science

Distribution Statement:

APPROVED FOR PUBLIC RELEASE