Δημοσιεύσεις

A Catalogue of Machine Learning Algorithms for Healthcare Risk Predictions

08/11/2022

Περίληψη

Extracting useful knowledge from proper data analysis is a very challenging task for efficient and timely decision-making. To achieve this, there exist a ... plethora of machine learning (ML) algorithms, while, especially in healthcare, this complexity increases due to the domain’s requirements for analytics-based risk predictions. This manuscript proposes a data analysis mechanism experimented in diverse healthcare scenarios, towards constructing a catalogue of the most efficient ML algorithms to be used depending on the healthcare scenario’s requirements and datasets, for efficiently predicting the onset of a disease. To this context, seven (7) different ML algorithms (Naïve Bayes, K-Nearest Neighbors, Decision Tree, Logistic Regression, Random Forest, Neural Networks, Stochastic Gradient Descent) have been executed on top of diverse healthcare scenarios (stroke, COVID-19, diabetes, breast cancer, kidney disease, heart failure). Based on a variety of performance metrics (accuracy, recall, precision, F1-score, specificity, confusion matrix), it has been identified that a sub-set of ML algorithms are more efficient for timely predictions under specific healthcare scenarios, and that is why the envisioned ML catalogue prioritizes the ML algorithms to be used, depending on the scenarios’ nature and needed metrics. Further evaluation must be performed considering additional scenarios, involving state-of-the-art techniques (e.g., cloud deployment, federated ML) for improving the mechanism’s efficiency.

Συγγραφείς

Argyro Mavrogiorgou, Athanasios Kiourtis, Spyridon Kleftakis, Konstantinos Mavrogiorgos, Nikolaos Zafeiropoulos, Dimosthenis Kyriazis

Περισσότερα

A Comparative Study of Monolithic and Microservices Architectures in Machine Learning Scenarios

01/10/2022

Περίληψη

Choosing the most suitable architecture for applications is not an easy decision. While the software giants have almost all put in place the microservices ... architecture, on smaller platforms such decision it is not so obvious. In the healthcare domain and specifically when accomplishing Machine Learning (ML) tasks in this domain, considering its special characteristics, the decision should be made based on specific metrics. In the context of the beHEALTHIER platform, a platform that is able to handle heterogeneous healthcare data towards their successful management and analysis by applying various ML tasks, such research gap was fully investigated. There has been conducted an experiment by installing the platform in three (3) different architectural ways, referring to the monolithic architecture, the clustered microservices architecture exploiting docker compose, and the microservices architecture exploiting Kubernetes cluster. For these three (3) environments, time-based measurements were made for each Application Programming Interface (API) of the diverse platform’s functionalities (i.e., components) and useful conclusions were drawn towards the adoption of the most suitable software architecture.

Συγγραφείς

Spyridon Kleftakis, Argyro Mavrogiorgou, Nikolaos Zafeiropoulos, Konstantinos Mavrogiorgos, Athanasios Kiourtis, Dimosthenis Kyriazis

Περισσότερα

Automated Rule-Based Data Cleaning Using NLP

01/10/2022

Περίληψη

Data Cleaning is a subfield of Data Mining that is thriving in the recent years. Ensuring the reliability of data, either when generated or received, is of vital importance to provide the best services possible... of vital importance to provide the best services possible to users. Accomplishing the aforementioned task is easier said than done, since data are complex, generated at an extremely high rate and are of enormous size. A variety of techniques and methods that are part of other subfields from the domain of the Computer Science have been invoked to assist in making Data Cleaning the most efficient and effective possible. Those subfields include, among others, Natural Language Processing (NLP), which in essence refers to the interaction among computers and human language, seeking to find a way to program computers to be able to process and analyze huge volumes of human language data. NLP is a concept that exists for a long time, but, as time goes by, it is proposed that it can be applied to a variety of concepts that are not solely NLP-related. In this paper, a rule-based data cleaning mechanism is proposed, which utilizes NLP to ensure data reliability. Making use of NLP enabled the mechanism not only to be extremely effective but also to be a lot more efficient compared to other corresponding mechanisms that do not utilize NLP. The mechanism was evaluated upon diverse healthcare datasets, not however being limited to the healthcare domain, but supporting a generalized data cleaning concept.

Συγγραφείς

Konstantinos Mavrogiorgos, Argyro Mavrogiorgou, Athanasios Kiourtis, Nikolaos Zafeiropoulos, Spyridon Kleftakis, Dimosthenis Kyriazis

Περισσότερα

A Comparative Study of Collaborative Filtering in Product Recommendation

01/09/2022

Περίληψη

Product recommendation is considered a well-known technique for bringing customers and products together. With applications in music, electronic shops, or almost any platform the user daily ... deals with, the recommendation system’s sole scope is to help customers and attract new ones to discover new products. Through product recommendation, transaction costs can also be decreased, improving overall decision-making and quality. To perform recommendations, a recommendation system must utilize customer feedback, such as habits, interests, prior transactions as well as information used in customer profiling, and finally deliver suggestions. Hence, data is the key factor in choosing the appropriate recommendation method and drawing specific suggestions. This research investigates the data challenges of recommendation systems, specifying collaborative-based, content-based, and hybrid-based recommendations. In this context, collaborative filtering is being explored, with the Surprise library and LightFM embeddings being analysed and compared on top of foodservice transactional data. The involved algorithms’ metrics are being identified and parameterized, while hyperparameters are being tuned properly on top of this transactional data, concluding that LightFM provides more efficient recommendation results following the evaluation’s precision and recall outcomes. Nevertheless, even though the Surprise library outperforms, it should be used when constructing user-friendly models, requiring low code and low technicalities.

Συγγραφείς

Agori Argyro Patoulia, Athanasios Kiourtis, Argyro Mavrogiorgou, Dimosthenis Kyriazis

Περισσότερα

Interpretable Stroke Risk Prediction Using Machine Learning Algorithms

01/08/2022

Περίληψη

Stroke is the second most common cause of death globally according to the World Health Organization (WHO). Information Technology (IT), and especially Machine Learning (ML), may be beneficial and useful in ... many aspects of stroke management. However, the majority of the existing studies focus on the development of ML models for confronting such cases without checking the degree of confidence and reliability of the constructed models. To strengthen models’ performance, diverse metric functions have to be estimated, also finding the most important features of the underlying datasets. Thus, this paper studies whether the results from diverse ML models are true and realistic or not, based on diverse metric functions to verify that they extract efficient and reliable results. With this in mind, a plethora of models are built to predict the likelihood of stroke, referring to Support Vector Classifier, K-Nearest Neighbors, Logistic Regression, Random Forest, XGB Classifier, and LGBM Classifier. All the captured results are compared based on the chosen metric functions, concluding into the most suitable and accurate model for stroke prediction.

Συγγραφείς

Nikolaos Zafeiropoulos, Argyro Mavrogiorgou, Spyridon Kleftakis, Konstantinos Mavrogiorgos, Athanasios Kiourtis, Dimosthenis Kyriazis



Περισσότερα

A Comparative Study of ML Algorithms for Scenario-agnostic Predictions in Healthcare

01/07/2022

Περίληψη

The extraction of useful knowledge from collected data has always been the holy grail for enterprises and researchers, supporting efficient decision making, provided ... service's optimization and profit maximization. However, this task is easier said than done, since it presupposes the application of complex mathematical models/algorithms. Data Analysis has prospered due to the continuous demand to simplify and optimize the knowledge extraction process. Several mechanisms in different domains have been developed, consisting of various techniques to analyze specific data. The need for such mechanisms is even greater in healthcare, since there exist data of different complexity that may provide high-valuable knowledge, if properly analyzed. Considering these challenges, this paper proposes a mechanism for performing Data Analysis in diverse scenarios' healthcare data to extract valuable insights. The mechanism can collect data and apply several Machine Learning algorithms to ensure the best result about the prediction of certain features of the provided data.

Συγγραφείς

Argyro Mavrogiorgou, Spyridon Kleftakis, Nikolaos Zafeiropoulos, Konstantinos Mavrogiorgos, Athanasios Kiourtis, Dimosthenis Kyriazis



Περισσότερα

Digital Twin in Healthcare Through the Eyes of the Vitruvian Man

20/06/2022

Περίληψη

In recent years, worldwide, with the development of technology, a huge amount of data is collected in Electronic Health Records (EHRs). Although vast progress has been made with the use of ... artificial intelligence in various areas of health domain and for specific problems, it is a fact that to date there is no holistic approach to a patient’s state of health using these technologies. Digital Twin refers to a complete physical and functional description of an item, product, or system, which includes pretty much all the information that could be useful in all—current and next—life cycle phases. This paper presents a platform that, using state of the art technologies such as Microservice Architecture (MSA), containerization (Docker), orchestration (Kubernetes) and Machine Learning Operations (MLOps), whereas it is inspired by Leonardo DaVinci’s Vitruvian man, building the Digital Twin of Patient platform. To achieve that, the platform’s architecture is designed with multiple clusters of Docker containers and Kubernetes orchestration. Specific parts or organs of the human body, are represented by clusters called “digital_twin_components”—DTCs. The set of those DTCs structure the “patient_digital_twin” cluster in which appropriate pipelines define and monitor in real time the “best” possible construction of the patient’s digital twin.

Συγγραφείς

Spyridon Kleftakis, Argyro Mavrogiorgou, Konstantinos Mavrogiorgos, Athanasios Kiourtis, Dimosthenis Kyriazis



Περισσότερα

A Multi-layer Approach for Data Cleaning in the Healthcare Domain

11/03/2022

Περίληψη

It is an undeniable fact that nowadays there exists a plethora of sources that can generate data with complex and, most of the time, error-prone nature, as well as multiple origins. Those sources may be of different ... Those sources may be of different complexity, but most of them share a common characteristic: the lack of performing quality checks on the collected data. The aforementioned implies that, in every platform that utilizes data originating from those sources, there should be a mechanism that is responsible for assuring the reliability of the collected data, thus providing to the rest of the platform's mechanisms (e.g., risk analysis and prediction mechanisms) data of high quality that could lead to the best knowledge extraction possible for decision making.

Συγγραφείς

Konstantinos Mavrogiorgos, Athanasios Kiourtis, Argyro Mavrogiorgou, Spyridon Kleftakis, Dimosthenis Kyriazis



Περισσότερα

A Comparative Study of MongoDB, ArangoDB and CouchDB for Big Data Storage

15/08/2021

Περίληψη

A distinctive aspect of the current era is the ferocious amount of data that is generated and processed in a daily basis. There is no wonder that this epoch is generally characterized as ... the “Era of Big Data”. Thus, many enterprises and research initiatives strive to find a way to effectively and efficiently collect, store and analyze Big Data in order to improve their services and make efficient decisions. Those approaches refer to several domains such as healthcare, transportation, governance, or insurance. Towards this direction, in this paper we contribute into the selection of the most appropriate database for efficiently storing and retrieving Big Data. More specifically, taking into account the nature of Big Data and the main categories of databases that currently exist, three (3) NoSQL document-based databases were considered for this comparative study, namely the ArangoDB, the MongoDB and the CouchDB. The performance of these databases was measured based on specific metrics and criteria, including the total execution time for the same CRUD operations and their corresponding demands for resources, concluding to the most suitable database for storing Big Data.

Συγγραφείς

Konstantinos Mavrogiorgos, Athanasios Kiourtis, Argyro Mavrogiorgou, Dimosthenis Kyriazis



Περισσότερα

beHEALTHIER: A Microservices Platform for Analyzing and Exploiting Healthcare Data

14/07/2021

Περίληψη

The era of big data is surrounded by plenty of challenges, concerning aspects related to data quality, data management, and data analysis. Plenty of these challenges are met ... in several domains, such as the healthcare domain, where the corresponding healthcare platforms not only have to deal with managing and/or analyzing a tremendous quantity of health data, but also have to accomplish these actions in the most efficient and secure way possible. Towards this direction, medical institutions are paying attention to the replacement of traditional approaches such as the Monolithic and Service Oriented Architecture (SOA), which deal with many difficulties for handling the increasing amount of healthcare data. This paper presents a platform for overcoming these issues, by adopting the Microservice Architecture (MSA), being able to efficiently manage and analyze these vast amounts of data. More specifically, the proposed platform, namely beHEALTHIER, offers the ability to construct health policies out of data of collective knowledge, by utilizing a newly proposed kind of electronic health records (i.e., eXtended Health Records (XHRs)) and their corresponding networks, through the efficient analysis and management of ingested healthcare data. In order to achieve that, beHEALTHIER is architected based upon four (4) discrete and interacting pillars, namely the Data, the Information, the Knowledge and the Actions pillars. Since the proposed platform is based on MSA, it fully utilizes MSA's benefits, achieving fast response times and efficient mechanisms for healthcare data collection, processing, and analysis.

Συγγραφείς

Argyro Mavrogiorgou, Spyridon Kleftakis, Konstantinos Mavrogiorgos, Nikolaos Zafeiropoulos Andreas Menychtas, Athanasios Kiourtis, Ilias Maglogiannis, Dimosthenis Kyriazis

Περισσότερα

Analyzing Collective Knowledge Towards Public Health Policy Making

07/07/2021

Περίληψη

Nowadays there exists a plethora of diverse data sources producing tons of healthcare data, augmenting the size of data that finally is stored both in Electronic Health Records (EHRs) and in Personal Health Records (PHRs).... Thus, the great challenge that emerges is not only to gather all this data in an efficient and effective manner, but also to extract knowledge out of it. The latter is the key factor that enables healthcare professionals to take serious clinical decisions both on individual and on collective level, finally forming representative public health policies. Towards this direction, the current paper proposes a system that supports a new paradigm of EHRs, the eXtended Health Records (XHRs), which include the majority of the health determinants. XHRs are then transformed into XHRs Networks that capture the clinical, social and human context of diverse population segmentations, producing the corresponding collective knowledge. By exploiting this knowledge, the proposed system is finally able to create multi-modal policies, addressing various facts and evolving risks that arise from diverse population segmentations.

Συγγραφείς

Spyridon Kleftakis, Konstantinos Mavrogiorgos, Nikolaos Zafeiropoulos, Argyro Mavrogiorgou, Athanasios Kiourtis, Ilias Maglogiannis, Dimosthenis Kyriazis

Περισσότερα

An Optimized KDD Process for Collecting and Processing Ingested and Streaming Healthcare Data

01/07/2021

Περίληψη

Nowadays organizations are surrounded with enormous amounts of data, losing all the important information that resides in it. Knowledge Discovery in Databases (KDD) can aid organizations to transform this data into valuable... information, by extracting complex patterns and relationships from it. To achieve that, various KDD techniques and tools have been proposed, resulting into impressive outcomes in various domains, especially in healthcare. Due to the huge amount of data available within the healthcare systems, data mining is extremely important for the healthcare sector. However, what is of major importance as well, is the way through which the data is collected, preprocessed and integrated with each other, considering its heterogeneous and diverse nature and format. To address all these challenges, this paper proposes a generalized KDD approach, which in essence constitutes a supplement of all the existing approaches that study and analyse the data mining part of the KDD process. This approach primarily concentrates on the phases of the selection, the preprocessing, as well as the transformation of the collected healthcare data, which are considered to be of great importance for its successful mining, analysis, and interpretation. The prototype of the proposed approach provides an example of the developed mechanism, explaining in deep detail its phases, verifying its possible wide applicability and adoption in various healthcare scenarios.

Συγγραφείς

Argyro Mavrogiorgou, Athanasios Kiourtis, George Manias, Dimosthenis Kyriazis

Περισσότερα