Message d'état

PURL test ID: finland

Some methods to address the problem of Unbalanced Sentiment Classification in an Arabic context

TitreSome methods to address the problem of Unbalanced Sentiment Classification in an Arabic context
Publication TypeConference Paper
Year of Publication2011
AuthorsMountassir, A, Benbrahim, H, Berrada, I
EditorElMohajir, M, Begdouri, A, ElMohajir, BE, Zarghili, A
Conference Name2012 COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST'12)
PublisherIEEE; IEEE Morocco Sect; IEEE Morocco Comp & Commun Joint Chapter; USMBA IEEE Student Branch; Faculty of Sciences Dhar Mahraz; Faculty of Technical Sciences of Fez; IEEE Comp Soc; IEEE Commun Soc
ISBN Number978-1-4673-2725-1
Abstract

The rise of social media (such as online web forums and social networking sites) has attracted interests to mining and analyzing opinions available on the web. The online opinion has become the object of studies in many research areas; especially that called ``Opinion Mining and Sentiment Analysis{''}. Several interesting and advanced works were performed on few languages (in particular English). However, there were very few studies on some languages such as Arabic. This paper presents the study we have carried out to address the problem of unbalanced data sets in supervised sentiment classification in an Arabic context. We propose three different methods to under-sample the majority class documents. Our goal is to compare the effectiveness of the proposed methods with the common random under-sampling. We also aim to evaluate the behavior of the classifier toward different under-sampling rates. We use two different common classifiers, namely Naive Bayes and Support Vector Machines. The experiments are carried out on an Arabic data set that we have built from Aljazeera's web site and labeled manually. The results show that Naive Bayes is sensitive to data set size, the more we reduce the data the more the results degrade. However, it is not sensitive to unbalanced data sets on the contrary of Support Vector Machines which is highly sensitive to unbalanced data sets. The results show also that we can rely on the proposed techniques and that they are typically competitive with random under-sampling.

Revues: 

Partenaires

Localisation

Suivez-nous sur

         

    

Contactez-nous

ENSIAS

Avenue Mohammed Ben Abdallah Regragui, Madinat Al Irfane, BP 713, Agdal Rabat, Maroc

  Télécopie : (+212) 5 37 68 60 78

  Secrétariat de direction : 06 61 48 10 97

        Secrétariat général : 06 61 34 09 27

        Service des affaires financières : 06 61 44 76 79

        Service des affaires estudiantines : 06 62 77 10 17 / n.mhirich@um5s.net.ma

        CEDOC ST2I : 06 66 39 75 16

        Résidences : 06 61 82 89 77

Contacts

    

    Compteur de visiteurs:638,420
    Education - This is a contributing Drupal Theme
    Design by WeebPal.