PRIS at 2012 Microblog Track
BEIJING UNIV OF POSTS AND TELECOMMUNICATIONS (CHINA)
Pagination or Media Count:
Take account of that most tags are keyword rich and indicate the topic of tweets directly, but there was no space between two words. So Word Segmentation was used to separate the tags by space. This time we used the former max matching algorithm. The problem is that no dictionary is appropriate. Common words dictionary is partial and Oxford Dictionary doesn t distinguish plurality. Then we made a combination of Common words dictionary, Oxford Dictionary and D.A.B. Dictionary of American Biography. But due to abbreviation and unknown words, there is still some mistakes. To avoid undesirable influence from these mistakes, we remained both the original tags and separated tags.
- Information Science