The nearest centroid based on vector norms: A new classification algorithm for a new document representation model

TitreThe nearest centroid based on vector norms: A new classification algorithm for a new document representation model
Publication TypeJournal Article
Year of Publication2014
AuthorsMountassir, A, Benbrahim, H, Berrada, I
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8556 LNAI

In this paper, we present a novel model that we propose for document representation. In contrast with the classical Vector Space Model which represents each document by a unique vector in the feature space, our model consists in representing each document by a vector in the space of training documents of each category. We develop, for this novel model, a discriminative classifier which is based on the norms of the generated vectors by our model. We call this algorithm the Nearest Cetroid based on Vector Norms. Our major goal, by the proposition of such new classification framework, is to overcome the problems related to huge dimensionality and vector sparsity which are commonly faced in Text Classification problems. We evaluate the performance of the proposed framework by comparing its effectiveness and efficiency with those of some standard classifiers when used with the classical document representation. The studied classifiers are Naïve Bayes (NB), Support Vector Machines (SVM) and k-Nearest Neighbors (kNN). We conduct our experiments on multi-lingual balanced and unbalanced binary data sets. Our results show that our algorithm typically performs well since it is competitive with the classical methods and, at the same time, dramatically faster especially in comparison with NB and kNN. We also apply our model on the Reuters21578 corpus so as to evaluate its performance in a multi-class environment. We can say that the obtained result (85.4% in terms of micro-F1) is promising and that it can be improved in future works. © 2014 Springer International Publishing Switzerland.




Suivez-nous sur




Avenue Mohammed Ben Abdallah Regragui, Madinat Al Irfane, BP 713, Agdal Rabat, Maroc

 Télécopie : (+212) 5 37 77 72 30

  Secrétariat de direction : 06 61 48 10 97

        Secrétariat général : 06 61 70 77 02

        Service des affaires estudiantines : 06 62 44 87 47

        Résidences : 06 61 82 89 77


    Compteur de visiteurs:393,673
    Education - This is a contributing Drupal Theme
    Design by WeebPal.