Tree-based algorithms for classification purposes in Health insurance


  • Fatima EL KASSIMI University Hassan 1st, Faculty of Economics and Management, LM2CE, Settat, Morocco
  • Ghita HAJRAOUI University Hassan 1st, Faculty of Economics and Management, LM2CE, Settat, Morocco
  • Jamal ZAHI University Hassan 1st, Faculty of Economics and Management, LM2CE, Settat, Morocco


Classification, Health insurance, Machine Learning, Decision trees, Random forests.


Within a heterogeneous insurance portfolio, not all policyholders are equal in terms of risk; some have a riskier profile than others. Therefore, charging the same premium to all may seem unfair. This heterogeneity can be reduced using risk classes (based on risk factors such as gender, age, or other factors). Given this risk classification, the pure premium for each risk class is estimated using a priori techniques. This emphasizes the importance of risk classification in establishing a fair and reasonable rate structure. This paper aims to classify the insured in terms of risk regarding severities costs. To do so, we used Machine Learning algorithms, namely decision trees and random forests.


Download data is not yet available.


O'Neil, C. (2016). Weapons of Math Destruction. new york: Crown.

Charpentier, A., Denuit, M., & Elie, R. (2015). SEGMENTATION ET MUTUALISATION LES DEUX FACES D’UNE MÊME PIÈCE ? Risques n° 103, 19-23.

Feldblum, S., & Schirmacher, E. (2006). Financial Pricing Models for Property-Casualty Insurance Products: Retrospective Analysis. North American Actuarial Journal 10(2), 1-27.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.

Genuer, R., & Poggi, J. M. (2017). Arbres CART et Forêts aléatoires Importance et sélection de variables. HAL Id: hal-01387654.

Breiman, L., Friedman, J. H., Olshen, R. A., & J. Stone, C. (1984). Classification And Regression Trees. Boca Raton: Routledge.

Breiman, L. (2001). Random Forests. Machine Learning volume 45, 5–32.

Rodenburg, W., Heidema, A. G., Boer, J. M., Bovee-Oudenhoven, I. M., Feskens, E. J., Mariman, E. C., & Keijer, J. (2008). A framework to identify physiological responses in microarray based gene expression studies: selection and interpretation of biologically relevant genes. the American Physiological Society.

Lariviere, B., & Poel, D. V. (2005). Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques. Expert Systems with Applications 29(2), 472-484.

Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. American Chemical Society, 1947–1958.

Shotton, J., Fitzgibbon, A., Cook, M., & Sharp, T. (2011). Real-Time Human Pose Recognition in Parts from Single Depth Images. Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 56(1), 1297-1304.

Díaz-Uriarte, R., & Andrés, S. A. (2006). Gene selection and classi microarray data using random forest. BMC Bioinformatics,7, 1-13.

Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News.

Liaw, A., & Wiener, M. (n.d.). Random Forest: Breiman and Cutler's Random Forests for Classification and Regression. Retrieved octobre 10, 2021, from

Hapfelmeier, A., Hothorn, T., & Ulm, K. (2012). Recursive partitioning on incomplete data using surrogate decisions and multiple imputation. Computational Statistics & Data Analysis, 56(6), 1552-1565.

Breiman, L. (2003). Setting up, using, and understanding random forests V4.0.

Nicodemus, K. K., & Malley, J. D. (2009). Predictor Correlation Impacts Machine Learning Algorithms: Implications for Genomic Studies. Bioinformatics 25(15):1884-90.

Nicodemus, K. K. (2011). Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures. Briefings in bioinformatics ,Volume 12, Issue 4.

Gregorutti, B., Michel, B., & Saint-Pierre, P. (2017). Correlation and variable importance in random forests. Statistics and Computing.



2023-06-27 — Updated on 2023-07-17


How to Cite

EL KASSIMI, F., HAJRAOUI, G. ., & ZAHI, J. (2023). Tree-based algorithms for classification purposes in Health insurance. International Journal of Computer Engineering and Data Science (IJCEDS), 3(1), 1–7. Retrieved from (Original work published June 27, 2023)