The Large Annotated Corpus for the Arabic Language (LACAL)

TitreThe Large Annotated Corpus for the Arabic Language (LACAL)
Publication TypeJournal Article
Year of Publication2022
AuthorsYousfi, A, Boumehdi, A, Laaroussi, S, Makoudi, R, Aouragh, SL, Gueddah, H, Habibi, B, Nejja, M, Said, I
JournalStudies in Computational Intelligence

Annotated corpora has an important role in the NLP field. They are used in almost all NLP applications: automatic dictionary construction, text analysis, information retrieval, machine translation, etc. Annotated corpora are the basis for training operation in NLP systems. Without these corpora, it is difficult to build an efficient system that takes into account all variations and linguistic phenomena. In this paper, we present the annotated corpus we developed. This corpus contains more than 12 million different words labeled by different types of labels: syntactic, morphological, and semantic. This large corpus adds value to the Arabic NLP field, and will certainly improve the quality of the training phase of Arabic NLP systems. Moreover it can be a suitable corpus to test and evaluate the quality of these systems. © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.




Suivez-nous sur





Avenue Mohammed Ben Abdallah Regragui, Madinat Al Irfane, BP 713, Agdal Rabat, Maroc

  Télécopie : (+212) 5 37 68 60 78

  Secrétariat de direction : 06 61 48 10 97

        Secrétariat général : 06 61 34 09 27

        Service des affaires financières : 06 61 44 76 79

        Service des affaires estudiantines : 06 62 77 10 17 /

        Résidences : 06 61 82 89 77



    Compteur de visiteurs:628,975
    Education - This is a contributing Drupal Theme
    Design by WeebPal.