Automated text categorization with collaboratively tagged data
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Ottawa (Canada)
Abstract
Recent popularity of collaborative tagging as a component of a retrieval system has lead us to study such a system. Similar to text categorization, albeit in a less centralized fashion, collaborative tagging relies on humans to annotate documents with metadata descriptions, i.e. tags. For that reason, this thesis attempts to extend the tagging process to include a more consistent non-human annotations in the form of automatic text categorization.
In applying automatic text categorization to collaboratively tagged data, we have created two sets of experiment. The first experiment compares two classification methods, Naive Bayes and Support Vector Machine (SVM) in a straightforward 1-vs. all classification. The results of the comparison allow us to make important observations such as the benefit of using a maximum margin classifiers (SVM) in annotating concepts with skewed document distributions as well as establishing a baseline result.
For the second experiment, we have found that the lack of structure in tagging has limited our learning approach to the simple 1-vs. all setting. Inspired by the application of hierarchical categorization in web directories[15], we introduce in our second experiment a categorization approach that automatically builds a hierarchy from the tag space and incorporates it to the training and classification process. Unlike previous hierarchical categorizations that rely on human-generated hierarchies, our hierarchical approach relies on an artificial hierarchy that is created from tag usage analysis. After the method was applied to the dataset, we compared the result of the new methods with the baseline results from the first experiment. Based on that comparison, we observed that our hierarchical approach improves not only on the quality of predictions, but also the efficiency (total training and classification time) of our automatic text categorization system.
Description
Keywords
Citation
Source: Masters Abstracts International, Volume: 48-01, page: 0465.
