Statistical Design of a Sampling Method in Quality Control of Research Data

Document Type : Original Article

Author

Associate Professor of Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran

10.22034/aimj.2022.163827

Abstract

In the scientific literature, indexing and quality control are key processes that, if done correctly, can be properly retrieved by search engines by researchers. On the other hand, the use of mechanisms such as infallibility and empowerment of users has made research organizations 100% free from quality control. Also, the restriction on the use of specialized organizational human resources has doubled the importance of paying attention to sampling methods. Although in scientific sources, sampling methods in physical and tangible products have been well and adequately addressed, but in the field of data, especially research data, little work has been done. In this research, a framework for sampling in data quality control processes is provided. Also, an algorithm has been developed for statistical design to minimize type I and II errors. As a case study of research data, the information dissemination database of dissertations / dissertations (pious) of graduates of the whole country (Ganj) has been selected and the research method has been implemented in this database. The results of this study showed that, considering the acceptable quality of many pious information items after registration, sampling is a vital task in improving the efficiency of the information organization and analysis unit. The classification of information items into three categories is critical, main and partial, and determining the number and method of sampling for each category is another result of this research. The framework presented in this research can be localized for various data-driven organizations, especially businesses based on research data. Since any revision of AQL and LTPD values affects type I and II errors, it is necessary to apply the algorithms developed in this research to new AQL and LTPD‌ values as well. Obviously, the results of the algorithm implementation such as number of samples, acceptance number and rejection number will be updated in this process.

Keywords

Batini, C., & Scannapieco, M. (2016). Data and information quality.‏ Cham, Switzerland: Springer International Publishing.
Bhave, P. P., & Sadhwani, K. (2021). Sampling in environmental matrices: a critical review.‏ Environmental Forensics, 23(1-2), 75- 92.
Bruce, T. R., & Hillmann, D. I. (2004). The continuum of metadata quality: defining, expressing, exploiting. ALA editions.
Cárdenas-García, J. F., De Mesa, B. S., & Castro, D. R. (2019). Understanding Globalized Digital Labor in the Information Age.‏ Perspectives on Global Development and Technology,‏ 18(3), 308-326.
David, R.H. & Thomas, D. (2015) Assessing Metadata and Controlling Quality in Scholarly Ebooks. Cataloging & Classification Quarterly, 53(7), 801-824.
Day, M., Guy, M., & Powell, A. (2004). Improving the quality of metadata in eprint archives.‏ Ariadne,‏ 38.
Hillmann, D. I. (2008). Metadata quality: From evaluation to augmentation.‏ Cataloging & Classification Quarterly,‏ 46(1), 65-80.
Lau, A., & Moere, A. V. (2007, July). Towards a model of information aesthetics in information visualization. In‏ 2007 11th International Conference Information Visualization (IV'07)‏ (pp. 87-92). IEEE.
Makeleni, N., & Cilliers, L. (2021). Critical success factors to improve data quality of electronic medical records in public healthcare institutions.‏ South African Journal of Information Management,‏ 23(1), 1-8.
Marchand, D. (1990). Managing information quality. In: Wormell I.(ed.) Information Quality: Definitions and Dimensions.‏ Taylor Graham, Londres.
McWilliams, T. P., Saniga, E. M., & Davis, D. J. (2001). On the design of single sample acceptance sampling plans.‏ Stochastics and Quality Control,‏ 16(2), 193-198.
Montgomery, D.C. (2009). Design and analysis of experiments (7th ed.). John Wiley & Sons, Inc., New York.
Palavitsinis, N., Manouselis, N., & Sanchez-Alonso, S. (2014). Metadata quality in learning object repositories: a case study.‏ The Electronic Library, 32(1).
Park, J. R., Tosaka, Y., Maszaros, S., & Lu, C. (2010). From metadata creation to metadata quality control: Continuing education needs among cataloging and metadata professionals.‏ Journal of education for library and information science, 158-176.
Price, R., & Shanks, G. (2016). A semiotic information quality framework: development and comparative analysis. In‏ Enacting Research Methods in Information Systems‏ (pp. 219- 250). Palgrave Macmillan, Cham.
Russell, R.T., Chamberlain, J. & Azzopardi, L. (2018). Information retrieval in the workplace: A comparison of professional search practices. Information Processing & Management, 54(6), 1042-1057.
Schilling, E. G., & Neubauer, D. V. (2009).‏ Acceptance sampling in quality control. Chapman and Hall/CRC.
Stvilia, B., & Gasser, L. (2008). Value-based metadata quality assessment.‏ Library & Information Science Research,‏ 30(1), 67-74.
Stvilia, B., Gasser, L. & Twidale, M. B. (2007). Metadata quality problems in federated collections. In‏ Challenges of Managing Information Quality in Service Organizations‏ (pp. 154-186). IGI Global.
Tenopir, C. (1992). A Day in the Life of a Database Producer.‏ Database,‏ 15(3), 15-17.
Vliegen, L., Moroff, N. U., & Riehl, K. (2020, September). Evaluation of data quality in dimensioning capacity. In‏ Hamburg International Conference of Logistics (HICL) 2020‏ (pp. 355-394).
Wilkinson, L. (2006). Revising the Pareto chart.‏ The American Statistician, 60 (4), 332- 334.
  • Receive Date: 06 February 2022
  • Revise Date: 28 June 2022
  • Accept Date: 08 December 2022