Reducing defects in the datasets of clinical research studies: conformance with data quality metrics
BMC Med Res Methodol.
Shaheen NA1,2,3, Manezhi B4, Thomas A5,6,7, AlKelya M8,6,7,9.
1 Department of Biostatistics and Bioinformatics, King Abdullah International Medical Research Center, P.O. Box 22490, Mail Code 1515, Riyadh, 11426, Kingdom of Saudi Arabia. email@example.com.
2 King Saud bin Abdulaziz University for Health Sciences, Riyadh, Kingdom of Saudi Arabia. firstname.lastname@example.org.
3 Ministry of National Guard-Health Affairs, Riyadh, Kingdom of Saudi Arabia. email@example.com.
4 Public Health Division, Central Australian Aboriginal Congress, Alice Springs, Australia.
5 Department of Biostatistics and Bioinformatics, King Abdullah International Medical Research Center, P.O. Box 22490, Mail Code 1515, Riyadh, 11426, Kingdom of Saudi Arabia.
6 King Saud bin Abdulaziz University for Health Sciences, Riyadh, Kingdom of Saudi Arabia.
7 Ministry of National Guard-Health Affairs, Riyadh, Kingdom of Saudi Arabia.
8 Research Quality Management Section, King Abdullah International Medical Research Center, Riyadh, Kingdom of Saudi Arabia.
9 Center for Health Research Studies, Saudi Health Council, Riyadh, Kingdom of Saudi Arabia.
Year of Publication:
A dataset is indispensable to answer the research questions of clinical research studies. Inaccurate data lead to ambiguous results, and the removal of errors results in increased cost. The aim of this Quality Improvement Project (QIP) was to improve the Data Quality (DQ) by enhancing conformance and minimizing data entry errors.
This is a QIP which was conducted in the Department of Biostatistics using historical datasets submitted for statistical data analysis from the department’s knowledge base system. Forty-five datasets received for statistical data analysis, were included at baseline. A 12-item checklist based on six DQ domains (i) completeness (ii) uniqueness (iii) timeliness (iv) accuracy (v) validity and (vi) consistency was developed to assess the DQ. The checklist was comprised of 12 items; missing values, un-coded values, miscoded values, embedded values, implausible values, unformatted values, missing codebook, inconsistencies with the codebook, inaccurate format, unanalyzable data structure, missing outcome variables, and missing analytic variables. The outcome was the number of defects per dataset. Quality improvement DMAIC (Define, Measure, Analyze, Improve, Control) framework and sigma improvement tools were used. Pre-Post design was implemented using mode of interventions. Pre-Post change in defects (zero, one, two or more defects) was compared by using chi-square test.
At baseline, out of forty-five datasets; six (13.3%) datasets had zero defects, eight (17.8%) had one defect, and 31(69%) had ≥2 defects. The association between the nature of data capture (single vs. multiple data points) and defective data was statistically significant (p = 0.008). Twenty-one datasets were received during post-intervention for statistical data analysis. Seventeen (81%) had zero defects, two (9.5%) had one defect, and two (9.5%) had two or more defects. The proportion of datasets with zero defects had increased from 13.3 to 81%, whereas the proportion of datasets with two or more defects had decreased from 69 to 9.5% (p = < 0.001).
Clinical research study teams often have limited knowledge of data structuring. Given the need for good quality data, we recommend training programs, consultation with data experts prior to data structuring and use of electronic data capturing methods.