With the introduction of electronic medical record system and an increase in its use, many researchers have conducted various studies through secondary use of clinical data.
In addition, by using Common Data Model (CDM), multi-institutional studies are being conducted actively through Distributed Research Network (DRN).
To deal with this data quality issues, many studies were conducted on data quality assessment, and the DRN provides a Data Quality Assessment (DQA) tool to help manage data quality. However, there was a difference between the terms, evaluation criteria and methods of data quality provided by the data quality studies and DQA Tools.
In this study, we developed a DQA model that can assess its data quality with a standardized quality assessment method and verified its performance by conducting an assessment of the source data and OMOP-CDM. (Observational Medical Outcomes Partnership -Common Data Model, OMOP-CDM).
To select an appropriate criteria to assess clinical data quality, documents of five Data Quality Frameworks, one Data Quality Guideline and three DQA tools were reviewed, and seven data quality concepts were selected as subjects of standardized quality assessment. Data quality rules provided by Achilles, DQe-c, DQ tool kit and the data quality rules integrated through the relevant literatures were classified according to their data quality concept and integrated by two researchers independently and went through cross validation. Excluding a total of 302 redundant data quality rules, a total of 1,255 quality rules were integrated. Among them, 600 rules were applied to the source data, whereas the other 655 applied to OMOP-CDM.
The process of the model that carried out data evaluation was designed by dividing the process into three levels. Rules-based data quality assessment and statistical analysis were conducted to develop a data quality model so that it can provide visualized data for measured data quality information and analyzed information. To evaluate the performance of the data quality assessment model, joint researchers have created test data sets by independently including the error data which violates data quality concepts in source data and OMOP-CDM. As a result of verifying the error data detection performance of the data quality assessment model for the test data, the number of detected errors matched those of each preloaded concept of data quality.
To compare the performance of error detection between the three DQA tools and the data quality assessment model, four data quality rules that are common subjects of evaluation were applied to the test data sets. As a result, Accuracy and AUROC both recorded 1, confirming its superiority over DQA tools in terms of error data detection. Also, as an objective indicator of data quality, the quality index per data quality concept and the comprehensive quality index were proposed. The quality index is derived by calculating the weighted average error rate which is derived by multiplying weight per data concept to the error rate within the data.
To derive comprehensive quality index, the calculated quality index is multiplied with weight per data quality concept and the data usage group of the weight were taken into account by the survey. The data quality assessment model developed in this study shows higher performance of error detection compared with the existing evaluation tools and provides an objective data quality index, which enables verification and comparison of data quality before-/after- CDM conversion. It is expected that it will help the user who performs CDM conversion to identify the cause of error more easily and the researchers to determine compatibility of data use.