Survival Factors Analysis with application in causal inference and machine learning for lung cancer
Location
CELA & Mary Church Terrell Library, First Floor
Document Type
Poster - Open Access
Start Date
4-25-2025 12:00 PM
End Date
4-25-2025 2:00 PM
Abstract
Based on the National Cancer Institute (NIH), other than breast and prostate cancer, lung cancer is the third most common cancer in the United States across all races and ethnicities. However, according to the Lung Cancer Research Institution, lung cancer research has not been focused and invested compared to the others. Our objective is to investigate why some lung cancer patients live longer than others. Also, we can understand lung and bronchus cancer survival by utilizing advanced causal inference and predictive modeling techniques. We did the analysis based on data from the SEER registry in Texas from 2011 to 2021. We determine the cause factors through three main approaches. First, the classical survival analysis, which consists of performing the Kaplan-Meier method, is used, and then the risks of getting lung cancer among explanatory variables are analyzed using the Cox Proportional Hazards Regression model. Secondly, machine learning models, including Random Forest and SVM, are commonly used classification algorithms to better understand the relationship of the variables to lung cancer survival time. Finally, we confirm the closest actual factors that explain differences in survival time among variables with causal inferences.
Keywords:
Causal inference, Neural network, Lung cancer, Classical survival analysis
Recommended Citation
Bui, Tobi and Mohamed, Zeinab, "Survival Factors Analysis with application in causal inference and machine learning for lung cancer" (2025). Research Symposium. 5.
https://digitalcommons.oberlin.edu/researchsymp/2025/posters/5
Major
Computer Science
Economics
Project Mentor(s)
Zeinab Mohamed, Statistics and Data Science
2025
Survival Factors Analysis with application in causal inference and machine learning for lung cancer
CELA & Mary Church Terrell Library, First Floor
Based on the National Cancer Institute (NIH), other than breast and prostate cancer, lung cancer is the third most common cancer in the United States across all races and ethnicities. However, according to the Lung Cancer Research Institution, lung cancer research has not been focused and invested compared to the others. Our objective is to investigate why some lung cancer patients live longer than others. Also, we can understand lung and bronchus cancer survival by utilizing advanced causal inference and predictive modeling techniques. We did the analysis based on data from the SEER registry in Texas from 2011 to 2021. We determine the cause factors through three main approaches. First, the classical survival analysis, which consists of performing the Kaplan-Meier method, is used, and then the risks of getting lung cancer among explanatory variables are analyzed using the Cox Proportional Hazards Regression model. Secondly, machine learning models, including Random Forest and SVM, are commonly used classification algorithms to better understand the relationship of the variables to lung cancer survival time. Finally, we confirm the closest actual factors that explain differences in survival time among variables with causal inferences.
Notes
Presenter: Tobi Bui