Remittances Review

ISSN:2059-6588 | e-ISSN: 2059-6596

ISSN:2059-6588 | e-ISSN: 2059-6596

Predictive Analysis of Thalassemia Risk using Statistical and Machine Learning Approaches

Authors:
Muhammad Azhar Mushtaq, Faisal Bukhari, Sadaqat Ali Ramay, Muddassar Ali Zaidi, Tahir Abbas Khan, Sayyid Kamran Hussain, Faheem Ahmed Khalid, Ahmad Makki, Abu Huraira
Keywords

Abstract

Abstract—Thalassemia is a hereditary condition where the body is unable to manufacture enough hemoglobin. Made up of alpha and beta globin proteins, hemoglobin is the most important component of red blood cells (RCB) that delivers oxygen throughout the body. Alpha and beta-globin genes are either rare or nonexistent, which results in alpha and beta- thalassemia. Beta thalassemia is more dangerous because of the increase in the probability of conceiving a kid with thalassemia than the alpha one. Most forms of thalassemia cause chronic and lifelong anemia that exists in early childhood and requires a blood transfusion due to deformity of blood cells frequently throughout the patient’s life. The body makes glucose as a result of the oxygen carried by red blood cells, which enables normal body function. Thus, thalassemia impacts the body’s ability to distribute oxygen to all of its cells, which can have an impact on organs with severity and even cause death. According to the research anemia caused affects 42% of women worldwide, including 52% of pregnant women in developing nations, compared to 23% in developed economies. In this study, machine learning and statistical analysis are used to forecast and assess the behavior of thalassemia. Moreover, the person with thalassemia should be referred to proper genetic counseling. The person with the alpha thalassemia trait has a normal life expectancy. People with beta-thalassemia often die by the age of 30. The statistical analysis applied in our research are the Independent Samples t-test for Age, the Paired Samples t-test for Hemoglobin (HGB) Levels, Analysis of Variance (ANOVA) for Mean Corpuscular Volume (MCV) Levels Across Age Groups, and the Comparison of Two Hypotheses with Different Means. Moreover, we also investigate the correlation between Red Blood Cells and Hemoglobin. As for the machine learning approaches, we applied supervised machine learning models, Random Forest, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN).