Data preprocessing for heart disease classification: A systematic literature review

TitreData preprocessing for heart disease classification: A systematic literature review
Publication TypeJournal Article
Year of Publication2020
AuthorsBenhar, H, Idri, A, Fernández-Alemán, JL
JournalComputer Methods and Programs in Biomedicine
Mots-clésCardiac datasets, Cardiology, Data preprocessing, Datamining, Literature review

Context Early detection of heart disease is an important challenge since 17.3 million people yearly lose their lives due to heart diseases. Besides, any error in diagnosis of cardiac disease can be dangerous and risks an individual's life. Accurate diagnosis is therefore critical in cardiology. Data Mining (DM) classification techniques have been used to diagnosis heart diseases but still limited by some challenges of data quality such as inconsistencies, noise, missing data, outliers, high dimensionality and imbalanced data. Data preprocessing (DP) techniques were therefore used to prepare data with the goal of improving the performance of heart disease DM based prediction systems. Objective The purpose of this study is to review and summarize the current evidence on the use of preprocessing techniques in heart disease classification as regards: (1) the DP tasks and techniques most frequently used, (2) the impact of DP tasks and techniques on the performance of classification in cardiology, (3) the overall performance of classifiers when using DP techniques, and (4) comparisons of different combinations classifier-preprocessing in terms of accuracy rate. Method A systematic literature review is carried out, by identifying and analyzing empirical studies on the application of data preprocessing in heart disease classification published in the period between January 2000 and June 2019. A total of 49 studies were therefore selected and analyzed according to the aforementioned criteria. Results The review results show that data reduction is the most used preprocessing task in cardiology, followed by data cleaning. In general, preprocessing either maintained or improved the performance of heart disease classifiers. Some combinations such as (ANN + PCA), (ANN + CHI) and (SVM + PCA) are promising terms of accuracy. However the deployment of these models in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of interpretation.




Suivez-nous sur





Avenue Mohammed Ben Abdallah Regragui, Madinat Al Irfane, BP 713, Agdal Rabat, Maroc

  Télécopie : (+212) 5 37 68 60 78

  Secrétariat de direction : 06 61 48 10 97

        Secrétariat général : 06 61 34 09 27

        Service des affaires financières : 06 61 44 76 79

        Service des affaires estudiantines : 06 62 77 10 17 /

        Résidences : 06 61 82 89 77



Education - This is a contributing Drupal Theme
Design by WeebPal.