Big Data in Healthcare Literature Review

Introduction

Machine learning (ML) allows computers (machines) to learn about datasets without explicit programming but rather through various algorithms and statistical techniques (An et al., 2023). ML has been highlighted as being vital to the progression of the healthcare field (Habehh & Gohel, 2021). Researchers have even suggested that medical ML systems may improve access to healthcare as it is more economical and scalable when compared to human expertise (Yu & Vizcaychipi, 2022). However, the application of machine learning does not come without challenges and risks (Habehh & Gohel, 2021).

This paper aims to conduct a thorough literature review of various ML techniques being applied to the healthcare industry. The review will cover tools, methods, and frameworks used for predictive diagnostics alongside various challenges, risks, and opportunities associated with implementing these ML approaches. A brief background of the topic will be discussed, followed by a specific analysis of ML related to the following topics: Predicting Mental Disorders, Wearable Sensors, Medical Images, Prenatal Care, and difficulties in applying ML to the healthcare industry.

Background

Researchers have identified several areas where ML is currently being applied to the healthcare industry; some of those areas include predictive analytics and diagnosis (An et al., 2023). Specific examples of predictive analytics leveraging ML in the healthcare industry include predicting health outcomes, predicting hospital readmissions, or predicting when conditions become chronic (An et al., 2023). Specific examples of ML being applied to diagnostics include analyzing medical images to aid the diagnostic process or to suggest treatment plans (An et al., 2023).

Sharma and Chariar (2024) performed a literature review and bibliometric mapping of papers that used ML to diagnose mental disorders and found that the topic was experiencing an annual research growth rate of 170.5% between 2012 and 2023. They also identified co-occurrence in the literature between the topic of ML and the following mental disorders: “depression, schizophrenia, autism, anxiety, ADHD, obsessive-compulsive disorder, and PTSD” (Sharam & Chariar, 2024, Section 8). Popular algorithms identified by Sharma and Chariar (2024) for diagnosing mental disorders included random forests, decision trees, and support vector machines (SVMs).

ML techniques can generally be placed into one of the following two categories: supervised learning or unsupervised learning (An et al., 2023). An ML method is considered to be supervised when the training data has associated expected outputs (labels) (Hernandez & Ang, 2023). In contrast, unsupervised ML techniques use only the raw data without explicitly knowing what the expected results should be (Hernandez & Ang, 2023).

Unsupervised ML algorithms generally use some form of clustering technique, such as K-means, K-Medoids, Hierarchical Clustering, Fuzzy c-Means, the Gaussian Mixture Model (GMM), or the Hidden Markov Model (An et al., 2023). On the other hand, supervised ML techniques use either a regression or a classification algorithm (An et al., 2023). Common regression algorithms include Linear Regression, Logistic Regression, Ensemble methods, and Support Vector Regression (SVR) (An et al., 2023). Common classification algorithms include Decision Trees, Support Vector Machine (SVM), Naïve Bayes, and K-Nearest Neighbours (K-NN) (An et al., 2023).

Predicting Mental Disorders

ML has been successfully implemented to predict mental disorders ranging from ADHD to Alzheimer’s Disease (Habehh & Gohel, 2021). Several data types can be used to train ML models to be effective at diagnosing and monitoring degenerative brain disorders such as Alzheimer’s disease (Yu & Vizcaychipi, 2022). These data types include MRI scans, CT scans, and EEG waveforms (Yu & Vizcaychipi, 2022). Researchers have demonstrated that EEG signals (or waveforms), for example, can be effective at both identifying and diagnosing ADHD (Nash et al., 2022).

Parkinsons Disease is one of the most prevalent degenerative brain disorders across the globe (Castelli et al., 2022). Researchers have suggested that ML may be an effective candidate for the early detection of Parkinsons Disease due to the high levels of analytical effort required for such detection (Castelli et al., 2022).

Wearable Sensors

The progress made in wearable sensors has enabled new types of health-related data to be collected across various contexts (Douthwaite & Georgiou, 2022). ML is being applied to wearable sensors to tackle problems ranging from personalized healthcare to population-level disease control and detection (Xiao et al., 2024).

Some researchers are even using wearable sweat sensors as an alternative to blood tests to monitor the body chemistry of patients (Douthwaite & Georgiou, 2022). It’s suggested that body chemistry can be used to reveal health issues before symptoms even become noticeable (Douthwaite & Georgiou, 2022). Body chemistry can be influenced by factors ranging from physical to mental (Douthwaite & Georgiou, 2022). The quantity of data that these sensors can produce, alongside the complexity of potential interpretations, makes these wearable sweat sensors perfect candidates for ML applications (Douthwaite & Georgiou, 2022).

As discussed in the previous section, researchers are also working towards the effective early diagnosis of Parkinsons Disease using ML (Castelli et al., 2022). These efforts are being enabled by wearable sensors (Castelli et al., 2022). Wearable sensors allow researchers to collect data on a patient’s postural instability, one of the most debilitating motor symptoms of Parkinsons Disease (Castellie et al., 2022).

Wearable sensors combined with ML increase the availability of personalized healthcare options (Xiao et al., 2024). Since an individual's specific health condition can be influenced by a wide array of factors, it is necessary to collect data from multiple sources (Xiao et al., 2024). These sources could contain data such as genetic data, general lifestyle data, and even environmental data (Xiao et al., 2024). ML techniques allow for the discovery of complex relationships between these disparate data sources that may not have been previously identifiable by human practitioners alone (Xiao et al., 2024).

Medical Images

Use cases for applying ML to the analysis of medical images include using image texture to predict the “MGMT methylation status of brain tumors” or “distinguish[ing] frontal from lateral chest radiographs” (Kohli et al., 2017, Use Cases Section). Convolutional Neural Networks (CNNs) are being used in radiology for various tasks, including classification, segmentation, and detection (Yamashita et al., 2018). An example of CNNs being applied to classification problems is using them to determine whether lesions are benign or if they are malignant (Yamashita et al., 2018). CNNs can also be deployed for segmentation tasks, such as segmenting specific organs in an image (Yamashita et al., 2018). CNNs could also be used to detect abnormalities in medical images, such as indications of breast cancer (Yamashita et al., 2018).

While classification, segmentation, and detection are some of the most common applications of ML to medical images, there are other potential use cases as well (Yamashita et al., 2018). One potential use case is using ML to denoise medical images (Yamashita et al., 2018). Chen et al. (2017) demonstrated this by using a CNN to remove noise from low-dose CT images. Low-dose CT scans allow patients to reduce their radiation exposure; however, the resulting images would conventionally be low quality without applying these ML techniques (Chen et al., 2017). Similar results were achieved by Nishio et al. (2017), who used neural networks with a convolutional auto-encoder to accomplish the denoising of low-dose CT scan images. The neural network used by Nishio et al. (2017) was trained on pairs of low-dose CT scan images with their corresponding standard-dose CT scan image.

Prenatal Care

Early identification of various risk factors associated with pregnancy can significantly reduce the likelihood of pregnancy complications (Kopanitsa et al., 2023). Researchers have suggested that ML models can be applied to data that is already being collected throughout many pregnancies to help practitioners identify these risk factors in a timely manner (Kopanista et al., 2023).

In their article titled “Predicting risk of stillbirth and preterm pregnancies with machine learning,” Koivu and Sairanen (2020) demonstrated that ML techniques can effectively predict stillbirth risk and the risk of preterm pregnancies. To accomplish this, Koivu and Sairanen (2020) leveraged open-source datasets from the Centers for Disease Control and Prevention (CDC) and the New York City Department of Health and Mental Hygiene. The ML techniques used included logistic regression, gradient-boosting decision trees, and artificial neural network models (Koivu & Sairanen, 2020).

ML techniques certainly have the potential for real-time insights; however, predictive modeling techniques may also allow researchers to find retroactive insights (Koivu & Sairanen, 2020). For example, researchers using prediction models to study pregnancy care found that “variables increasing the risk for late stillbirth included increased age and BMI, previous pregnancies with adverse effect, various comorbidities, and having an ART pregnancy” (Koivu & Sairanen, 2020, Conclusions). Koivu and Sairanen (2020) also used their prediction model to find that increased education level was inversely correlated with various pregnancy risks.

Difficulties with Machine Learning and Healthcare

Practitioners may experience difficulties while attempting to apply ML techniques to healthcare problems such as data privacy issues, ethics concerns, and rigorous testing thresholds (An et al., 2023). Many difficulties are also associated with the acquisition of quality datasets to train effective ML models (Kohli et al., 2017). Specific ML methods may also present unique challenges. Unsupervised ML, for example, may present results that are difficult to interpret and perhaps not always meaningful in a clinical setting (An et al., 2023). Supervised ML techniques may not have these specific drawbacks; however, they are more dependent on quality labeled datasets (An et al., 2023).

Habehh and Gohel (2021) claim that one of the largest risks of applying ML techniques to healthcare is the dependence on probability. This probabilistic approach to medical issues has the potential to increase skepticism among stakeholders (Habehh & Gohel, 2021). Habbeh and Gohel (2021) suggest that this risk could be remedied through strict regulatory oversight and subject matter expert approval.

Certain applications of ML to medical issues may already currently be subject to various regulations. Organizations offering ML-based medical imaging products, for example, may be subject to approval by the FDA (Kohli et al., 2017). Beyond just the healthcare industry, many jurisdictions are working to implement general regulatory frameworks to govern how Artificial Intelligence (AI) AI/ML solutions are used in industry environments (Habehh & Gohel, 2021). Singapore proposed ethical guidelines for developing and implementing AI in 2019 (Habehh & Gohel, 2021). The United States has also issued executive orders concerning the regulation of AI (Habehh & Gohel, 2021).

Conclusion

ML has many potential applications within the field of healthcare, particularly for predictive diagnostics. Some of these applications include leveraging ML to pull meaningful data out of medical images, improving prenatal care, analyzing data pulled from wearable sensors, and even predicting mental disorders. Wearable sensors may improve access to personalized healthcare (Xiao et al., 2024) and help with the early detection of various disorders (Castellie et al., 2022). Various data sources, ranging from images to EEG signals, can be fed into ML models to aid in the detection of mental disorders (Yu & Vizcaychipi, 2022). Medical images can also be used with ML models for segmentation, identification, and classification problems (Yamashita et al., 2018).

However, there are still several risks and challenges that may need to be dealt with when applying ML techniques. Practitioners may face issues collecting quality training data due to general availability or perhaps even privacy concerns. There are also ethical and regulatory questions that may need to be addressed prior to implementing these technologies.

References

An, Q., Rahman, S., Zhou, J., & Kang, J. J. (2023). A Comprehensive Review on Machine Learning in Healthcare Industry: Classification, Restrictions, Opportunities and Challenges. Sensors (Basel, Switzerland), 23(9), 4178-. https://doi.org/10.3390/s23094178

Castelli Gattinara Di Zubiena, F., Menna, G., Mileti, I., Zampogna, A., Asci, F., Paoloni, M., Suppa, A., Del Prete, Z., & Palermo, E. (2022). Machine Learning and Wearable Sensors for the Early Detection of Balance Disorders in Parkinson’s Disease. Sensors (Basel, Switzerland), 22(24), 9903-. https://doi.org/10.3390/s22249903

Chen, H., Zhang, Y., Zhang, W., Liao, P., Li, K., Zhou, J., & Wang, G. (2017). Low-dose CT via convolutional neural network. Biomedical Optics Express, 8(2), 679. https://doi.org/10.1364/boe.8.000679

Douthwaite, M., & Georgiou, P. (2022). Wearable electrochemical sensors and machine learning for real-time sweat analysis. In Institution of Engineering and Technology eBooks (pp. 317–351). https://doi.org/10.1049/pbhe036e_ch10

Habehh, H., & Gohel, S. (2021). Machine learning in healthcare. Current Genomics, 22(4), 291–300. https://doi.org/10.2174/1389202922666210705124359

Hernandez Silveira, M., & Ang, S.-S. (2023). Applications of Machine Learning in Digital Healthcare. (1st ed.). Institution of Engineering & Technology.

Kohli, M. D., Summers, R. M., & Geis, J. R. (2017). Medical Image Data and Datasets in the Era of Machine Learning—Whitepaper from the 2016 C-MIMI Meeting Dataset Session. Journal of Digital Imaging, 30(4), 392–399. https://doi.org/10.1007/s10278-017-9976-3

Koivu, A., & Sairanen, M. (2020). Predicting risk of stillbirth and preterm pregnancies with machine learning. Health Information Science and Systems, 8(1), 14–14. https://doi.org/10.1007/s13755-020-00105-9

Kopanitsa, G., Metsker, O., & Kovalchuk, S. (2023). Machine Learning Methods for Pregnancy and Childbirth Risk Management. Journal of Personalized Medicine, 13(6), 975-. https://doi.org/10.3390/jpm13060975

Nash, C., Nair, R., & Naqvi, S. M. (2022). Machine Learning and ADHD Mental Health Detection - A Short Survey. 2022 25th International Conference on Information Fusion (FUSION), 1–8. https://doi.org/10.23919/FUSION49751.2022.9841277

Nishio, M., Nagashima, C., Hirabayashi, S., Ohnishi, A., Sasaki, K., Sagawa, T., Hamada, M., & Yamashita, T. (2017). Convolutional auto-encoder for image denoising of ultra-low-dose CT. Heliyon, 3(8), e00393. https://doi.org/10.1016/j.heliyon.2017.e00393

Sharma, C. M., & Chariar, V. M. (2024). Diagnosis of mental disorders using machine learning: Literature review and bibliometric mapping from 2012 to 2023. Heliyon, 10(12), e32548-. https://doi.org/10.1016/j.heliyon.2024.e32548

Xiao, X., Yin, J., Xu, J., Tat, T., & Chen, J. (2024). Advances in Machine Learning for Wearable Sensors. ACS Nano, 18(34), 22734-. https://doi.org/10.1021/acsnano.4c05851

Yamashita, R., Nishio, M., Gian, R. K., DO, & Togashi, K. (2018). Convolutional neural networks: an overview and application in radiology. Insights Into Imaging, 9(4), 611–629. https://doi.org/10.1007/s13244-018-0639-9

Yu, J., & Vizcaychipi, M. (2022). Brain networking and early diagnosis of Alzheimer’s disease with machine learning. In Institution of Engineering and Technology eBooks (pp. 197–227). https://doi.org/10.1049/pbhe036e_ch6

© Trevor French.RSS