Survival Factors Analysis with application in causal inference and machine learning for lung cancer

Location

CELA & Mary Church Terrell Library, First Floor

Document Type

Poster - Open Access

Start Date

4-25-2025 12:00 PM

End Date

4-25-2025 2:00 PM

Abstract

Based on the National Cancer Institute (NIH), other than breast and prostate cancer, lung cancer is the third most common cancer in the United States across all races and ethnicities. However, according to the Lung Cancer Research Institution, lung cancer research has not been focused and invested compared to the others. Our objective is to investigate why some lung cancer patients live longer than others. Also, we can understand lung and bronchus cancer survival by utilizing advanced causal inference and predictive modeling techniques. We did the analysis based on data from the SEER registry in Texas from 2011 to 2021. We determine the cause factors through three main approaches. First, the classical survival analysis, which consists of performing the Kaplan-Meier method, is used, and then the risks of getting lung cancer among explanatory variables are analyzed using the Cox Proportional Hazards Regression model. Secondly, machine learning models, including Random Forest and SVM, are commonly used classification algorithms to better understand the relationship of the variables to lung cancer survival time. Finally, we confirm the closest actual factors that explain differences in survival time among variables with causal inferences.

Keywords:

Causal inference, Neural network, Lung cancer, Classical survival analysis

Notes

Presenter: Tobi Bui

Major

Computer Science
Economics

Project Mentor(s)

Zeinab Mohamed, Statistics and Data Science

2025

This document is currently not available here.

Share

COinS
 
Apr 25th, 12:00 PM Apr 25th, 2:00 PM

Survival Factors Analysis with application in causal inference and machine learning for lung cancer

CELA & Mary Church Terrell Library, First Floor

Based on the National Cancer Institute (NIH), other than breast and prostate cancer, lung cancer is the third most common cancer in the United States across all races and ethnicities. However, according to the Lung Cancer Research Institution, lung cancer research has not been focused and invested compared to the others. Our objective is to investigate why some lung cancer patients live longer than others. Also, we can understand lung and bronchus cancer survival by utilizing advanced causal inference and predictive modeling techniques. We did the analysis based on data from the SEER registry in Texas from 2011 to 2021. We determine the cause factors through three main approaches. First, the classical survival analysis, which consists of performing the Kaplan-Meier method, is used, and then the risks of getting lung cancer among explanatory variables are analyzed using the Cox Proportional Hazards Regression model. Secondly, machine learning models, including Random Forest and SVM, are commonly used classification algorithms to better understand the relationship of the variables to lung cancer survival time. Finally, we confirm the closest actual factors that explain differences in survival time among variables with causal inferences.