LES DERNIÈRES INFORMATIONS
The nearest centroid based on vector norms: A new classification algorithm for a new document representation model
Titre | The nearest centroid based on vector norms: A new classification algorithm for a new document representation model |
Publication Type | Journal Article |
Year of Publication | 2014 |
Authors | Mountassir, A, Benbrahim, H, Berrada, I |
Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Volume | 8556 LNAI |
Pagination | 442-456 |
Abstract | In this paper, we present a novel model that we propose for document representation. In contrast with the classical Vector Space Model which represents each document by a unique vector in the feature space, our model consists in representing each document by a vector in the space of training documents of each category. We develop, for this novel model, a discriminative classifier which is based on the norms of the generated vectors by our model. We call this algorithm the Nearest Cetroid based on Vector Norms. Our major goal, by the proposition of such new classification framework, is to overcome the problems related to huge dimensionality and vector sparsity which are commonly faced in Text Classification problems. We evaluate the performance of the proposed framework by comparing its effectiveness and efficiency with those of some standard classifiers when used with the classical document representation. The studied classifiers are Naïve Bayes (NB), Support Vector Machines (SVM) and k-Nearest Neighbors (kNN). We conduct our experiments on multi-lingual balanced and unbalanced binary data sets. Our results show that our algorithm typically performs well since it is competitive with the classical methods and, at the same time, dramatically faster especially in comparison with NB and kNN. We also apply our model on the Reuters21578 corpus so as to evaluate its performance in a multi-class environment. We can say that the obtained result (85.4% in terms of micro-F1) is promising and that it can be improved in future works. © 2014 Springer International Publishing Switzerland.
|
URL | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84958526350&doi=10.1007%2f978-3-319-08979-9_34&partnerID=40&md5=a5d3d5cd03b740999c7e4bcf28e33e1a |
DOI | 10.1007/978-3-319-08979-9_34 |
Contactez-nous
ENSIAS
Avenue Mohammed Ben Abdallah Regragui, Madinat Al Irfane, BP 713, Agdal Rabat, Maroc
Télécopie : (+212) 5 37 68 60 78
Secrétariat de direction : 06 61 48 10 97
Secrétariat général : 06 61 34 09 27
Service des affaires financières : 06 61 44 76 79
Service des affaires estudiantines : 06 62 77 10 17 / n.mhirich@um5s.net.ma
Résidences : 06 61 82 89 77
Contacts
Compteur de visiteurs:544,988
Education - This is a contributing Drupal Theme
Design by
WeebPal.