DID YOU KNOW? DTIC has over 3.5 million final reports on DoD funded research, development, test, and evaluation activities available to our registered users. Click
HERE to register or log in.
Accession Number:
ADA581499
Title:
PRIS at 2012 Microblog Track
Descriptive Note:
Conference paper
Corporate Author:
BEIJING UNIV OF POSTS AND TELECOMMUNICATIONS (CHINA)
Report Date:
2012-11-01
Pagination or Media Count:
4.0
Abstract:
Take account of that most tags are keyword rich and indicate the topic of tweets directly, but there was no space between two words. So Word Segmentation was used to separate the tags by space. This time we used the former max matching algorithm. The problem is that no dictionary is appropriate. Common words dictionary is partial and Oxford Dictionary doesn t distinguish plurality. Then we made a combination of Common words dictionary, Oxford Dictionary and D.A.B. Dictionary of American Biography. But due to abbreviation and unknown words, there is still some mistakes. To avoid undesirable influence from these mistakes, we remained both the original tags and separated tags.
Distribution Statement:
APPROVED FOR PUBLIC RELEASE