A Multi-layer Approach for Data Cleaning in the Healthcare Domain
It is an undeniable fact that nowadays there exists a plethora of sources that can generate data with complex
and, most of the time, error-prone nature, as well as multiple origins. Those sources may be of different complexity, but most of them share a common
characteristic: the lack of performing quality checks on the collected data. The aforementioned implies that, in every platform that utilizes data
originating from those sources, there should be a mechanism that is responsible for assuring the reliability of the collected data, thus providing
to the rest of the platform's mechanisms (e.g., risk analysis and prediction mechanisms) data of high quality that could lead to the best knowledge
extraction possible for decision making. The need for this kind of mechanism is even greater when it comes to the healthcare domain because the clean
data, which a data cleaning mechanism produces, are essential to bring consistency to healthcare data that might be inaccurate, outdated, redundant or
incomplete. Considering these challenges, in this paper it is being proposed a data cleaning mechanism for assuring the quality and the reliability of
the data regardless of their origin. The mechanism consists of three (3) sub-components, being responsible for ingesting and storing the data, also
including a set of cleaning actions. These actions, namely “Validation”, “Cleaning”, “Verification” and “Logging”, combine multiple well-established
data cleaning techniques to ensure the effectiveness and the efficiency of the whole data cleaning procedure. Its evaluation process includes the usage
of three (3) separate datasets from the healthcare domain that contain different types of data and errors in their corresponding records.
The results of the mechanism (i.e., the cleaned data) are being compared with the ground truth of these datasets, resulting that the data
cleaning mechanism was successfully and efficiently preformed, thus providing an extensive insight regarding the mechanism's capabilities.
Konstantinos Mavrogiorgos, Athanasios Kiourtis, Argyro Mavrogiorgou, Spyridon Kleftakis, Dimosthenis Kyriazis