Authors (including presenting author) :
Mok HM(1), Ng HL(1), Ho PC(1), Fung V(1)
Affiliation :
(1) Information Technology and Health Informatics Division
Introduction :
With the advance of data science, clinical applications using machine learning to predict diseases have become popular. This is a proof of concept (POC) study on Hong Kong adult population that we were running machine learning algorithms to predict colorectal cancer by using patient's age, sex and his/her Complete Blood Count (CBC) laboratory results.
Objectives :
A local POC study on supervised machine learning to use patient’s age, sex and CBC results to predict the likelihood of colorectal cancer.
Methodology :
A cohort of de-identified patient selection was conducted for this study. One year CBC and related pathology data were extracted from the laboratory Information System of a general acute hospital. If patients do not have any pathology investigation requested before the CBC reporting date, the related CBC data are classified as negative dataset. Whereas if the patient has any colorectal cancer pathology result reported within one year after the CBC results, the CBC data are classified as positive dataset.
Result & Outcome :
After the cohort selection, training and testing data were curated and saved as a comma separated variable file for the supervised machine learning. A machine learning software – Weka (Waikato Environment for Knowledge Analysis) was used and several machine learning algorithms were tested. In this study, we found that running 13 features (CBC, age and sex) and using Random Forest with Cost Sensitive Classifier could produce the best accuracy of predictive modelling for the colorectal cancer – Area Under the Curve (AUC) was 0.814. This shows the machine learning algorithm can predict the likelihood of colorectal cancer.