Representing text documents in training document spaces: A novel model for document representation

TitreRepresenting text documents in training document spaces: A novel model for document representation
Publication TypeJournal Article
Year of Publication2013
AuthorsMountassir, A, Benbrahim, H, Berrada, I
JournalJournal of Theoretical and Applied Information Technology

In this paper, we propose a novel model for Document Representation in an attempt to address the problem of huge dimensionality and vector sparseness that are commonly faced in Text Classification tasks. The proposed model consists of representing text documents in the space of training documents at a first stage. Afterward, the generated vectors are projected in a new space where the number of dimensions corresponds to the number of categories. To evaluate the effectiveness of our model, we focus on a problem of binary classification. We conduct our experiments on Arabic and English data sets of Opinion Mining. We use as classifiers Support Vector Machines (SVM) and k-Nearest Neighbors (k-NN) which are known by their effectiveness in classical Text Classification tasks. We compare the performance of our model with that of the classical Vector Space Model (VSM) by the consideration of three evaluative criteria, namely dimensionality of the generated vectors, time (of learning and testing) taken by the classifiers, and classification results in terms of accuracy. Our experiments show that the effectiveness of our model (in comparison with the classical VSM) depends on the used classifier. Results yielded by k-NN when applying our model are better or as those obtained when applying the classical VSM. For SVM, results yielded when applying our model are in general, slightly lower than those obtained when using VSM. However, the gain in terms of time and dimensionality reduction is so promising since they are dramatically decreased by the application of our model. © 2005 - 2013 JATIT & LLS. All rights reserved.




Location map

Suivez-nous sur




Avenue Mohammed Ben Abdallah Regragui, Madinat Al Irfane, BP 713, Agdal Rabat, Maroc

Résultat de recherche d'images pour "icone fax" Télécopie : (+212) 5 37 77 72 30

    Compteur de visiteurs:327,918
    Education - This is a contributing Drupal Theme
    Design by WeebPal.