Tree-based algorithms for classification purposes in Health insurance
Keywords:
Classification, Health insurance, Machine Learning, Decision trees, Random forests.Abstract
Within a heterogeneous insurance portfolio, not all policyholders are equal in terms of risk; some have a riskier profile than others. Therefore, charging the same premium to all may seem unfair. This heterogeneity can be reduced using risk classes (based on risk factors such as gender, age, or other factors). Given this risk classification, the pure premium for each risk class is estimated using a priori techniques. This emphasizes the importance of risk classification in establishing a fair and reasonable rate structure. This paper aims to classify the insured in terms of risk regarding severities costs. To do so, we used Machine Learning algorithms, namely decision trees and random forests.
Downloads
References
O'Neil, C. (2016). Weapons of Math Destruction. new york: Crown.
Charpentier, A., Denuit, M., & Elie, R. (2015). SEGMENTATION ET MUTUALISATION LES DEUX FACES D’UNE MÊME PIÈCE ? Risques n° 103, 19-23.
Feldblum, S., & Schirmacher, E. (2006). Financial Pricing Models for Property-Casualty Insurance Products: Retrospective Analysis. North American Actuarial Journal 10(2), 1-27.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.
Genuer, R., & Poggi, J. M. (2017). Arbres CART et Forêts aléatoires Importance et sélection de variables. HAL Id: hal-01387654.
Breiman, L., Friedman, J. H., Olshen, R. A., & J. Stone, C. (1984). Classification And Regression Trees. Boca Raton: Routledge.
Breiman, L. (2001). Random Forests. Machine Learning volume 45, 5–32.
Rodenburg, W., Heidema, A. G., Boer, J. M., Bovee-Oudenhoven, I. M., Feskens, E. J., Mariman, E. C., & Keijer, J. (2008). A framework to identify physiological responses in microarray based gene expression studies: selection and interpretation of biologically relevant genes. the American Physiological Society.
Lariviere, B., & Poel, D. V. (2005). Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques. Expert Systems with Applications 29(2), 472-484.
Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling. American Chemical Society, 1947–1958.
Shotton, J., Fitzgibbon, A., Cook, M., & Sharp, T. (2011). Real-Time Human Pose Recognition in Parts from Single Depth Images. Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 56(1), 1297-1304.
Díaz-Uriarte, R., & Andrés, S. A. (2006). Gene selection and classi microarray data using random forest. BMC Bioinformatics,7, 1-13.
Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News.
Liaw, A., & Wiener, M. (n.d.). Random Forest: Breiman and Cutler's Random Forests for Classification and Regression. Retrieved octobre 10, 2021, from https://cran.r-project.org/package=randomForest
Hapfelmeier, A., Hothorn, T., & Ulm, K. (2012). Recursive partitioning on incomplete data using surrogate decisions and multiple imputation. Computational Statistics & Data Analysis, 56(6), 1552-1565.
Breiman, L. (2003). Setting up, using, and understanding random forests V4.0. https://www.stat.berkeley.edu/~breiman/Using_random_forests_v4.0.pdf.
Nicodemus, K. K., & Malley, J. D. (2009). Predictor Correlation Impacts Machine Learning Algorithms: Implications for Genomic Studies. Bioinformatics 25(15):1884-90.
Nicodemus, K. K. (2011). Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures. Briefings in bioinformatics ,Volume 12, Issue 4.
Gregorutti, B., Michel, B., & Saint-Pierre, P. (2017). Correlation and variable importance in random forests. Statistics and Computing.
Downloads
Published
Versions
- 2023-07-17 (2)
- 2023-06-27 (1)
How to Cite
Issue
Section
License
Copyright (c) 2023 Fatima EL KASSIMI, Ghita HAJRAOUI, Jamal ZAHI
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright on any article in the International Journal of Computer Engineering and Data Science (IJCEDS) is retained by the author(s) under the Creative Commons license, which permits unrestricted use, distribution, and reproduction provided the original work is properly cited.
License agreement
Authors grant IJCEDS a license to publish the article and identify IJCEDS as the original publisher.
Authors also grant any third party the right to use, distribute and reproduce the article in any medium, provided the original work is properly cited.