طراحی آماری یک روش نمونه‌برداری در کنترل کیفیت داده‌های پژوهشی

ارشادی, محمدجواد

doi:10.22034/aimj.2022.163827

طراحی آماری یک روش نمونه‌برداری در کنترل کیفیت داده‌های پژوهشی

نوع مقاله : مقاله پژوهشی

نویسنده

محمدجواد ارشادی

دانشیار، پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک)، تهران، ایران

10.22034/aimj.2022.163827

چکیده

در مدارک علمی، نمایه‌سازی و کنترل کیفیت، فرایندهایی کلیدی وجود دارد که در صورت انجام درست آنها، امکان بازیابی مناسب در موتورهای جست‌وجو فراهم می‌آید. در منابع علمی، به روش‌های نمونه‌برداری در محصولات فیزیکی به اندازۀ کافی پرداخته شده است؛ اما در حوزه‌ داده‌ها، به‌ویژه داده‌های پژوهشی، کارهای اندکی انجام شده است. در این پژوهش، چارچوبی برای نمونه‌برداری فرایندهای کنترل کیفیت داده فراهم شده است. به‌عنوان مطالعه موردی، داده‌های پژوهشی پایگاه اشاعه اطلاعات پایان‌نامه‌ها/ رساله (پارسا)‌های دانش‌آموختگان کل کشور (گنج) انتخاب شده است. بر اساس نتایج، با توجه به کیفیت پذیرفتنی بسیاری از اقلام اطلاعاتی پارسا، پس از ثبت، نمونه‌برداری کاری حیاتی برای ارتقای کارایی واحد سازمان‌دهی و تحلیل اطلاعات است. منحنی OC برای طرح‌های گوناگون نشان می‌دهد که طرح‌های ارائه‌شده برای ارزیابی سطح کیفیت داده‌های پژوهشی، از کارایی مناسبی برخوردارند. چارچوب ارائه‌شده در این پژوهش، برای سازمان‌های گوناگون داده‌محور، به‎ویژه کسب‌وکارهای مبتنی بر داده،‌ قابلیت بومی‌سازی دارد.

کلیدواژه‌ها

کیفیت داده

نمونه‌برداری

کنترل کیفیت

منحنی OCOC

سازمان‌دهی

تحلیل اطلاعات

عنوان مقاله English

Statistical Design of a Sampling Method in Quality Control of Research Data

نویسنده English

Mohammad Javad Ershadi

Associate Professor of Iranian Research Institute for Information Science and Technology (IranDoc), Tehran, Iran

چکیده English

In the scientific literature, indexing and quality control are key processes that, if done correctly, can be properly retrieved by search engines by researchers. On the other hand, the use of mechanisms such as infallibility and empowerment of users has made research organizations 100% free from quality control. Also, the restriction on the use of specialized organizational human resources has doubled the importance of paying attention to sampling methods. Although in scientific sources, sampling methods in physical and tangible products have been well and adequately addressed, but in the field of data, especially research data, little work has been done. In this research, a framework for sampling in data quality control processes is provided. Also, an algorithm has been developed for statistical design to minimize type I and II errors. As a case study of research data, the information dissemination database of dissertations / dissertations (pious) of graduates of the whole country (Ganj) has been selected and the research method has been implemented in this database. The results of this study showed that, considering the acceptable quality of many pious information items after registration, sampling is a vital task in improving the efficiency of the information organization and analysis unit. The classification of information items into three categories is critical, main and partial, and determining the number and method of sampling for each category is another result of this research. The framework presented in this research can be localized for various data-driven organizations, especially businesses based on research data. Since any revision of AQL and LTPD values affects type I and II errors, it is necessary to apply the algorithms developed in this research to new AQL and LTPD‌ values as well. Obviously, the results of the algorithm implementation such as number of samples, acceptance number and rejection number will be updated in this process.

کلیدواژه‌ها English

Data quality

Sampling

Quality Control

OC curve

Organize

Analyze information

اثنی عشری، حمیده و اسدی، غلامحسین (1394). معیارهای سنجش کیفیت اطلاعات حسابداری و بازده اضافی مطلق. دانش حسابداری مالی، 2(4)، 47- 70.

ارشادی، محمدجواد و احترامی، تینا (۱۳۹۵). طرح‎‌ریزی و استقرار نظام تضمین کیفیت در سامانه‌های گردآوری و ثبت، سازمان‌دهی و اشاعه اطلاعات پایان‌نامه‌ها/رساله‌های دانش‌آموختگان داخل کشور. تهران: پژوهشگاه علوم و فناوری اطلاعات ایران.

ارشادی، محمدجواد؛ رجبی، تقی؛ شیرانی، فرهاد و رضایی، نسا. (1395). کاربرد تکنیک تحلیل ریشه در حل مشکلات کیفی سامانه‌های اطلاعاتی تحقیقاتی: مطالعه موردی سامانه اشاعه اطلاعات پایان نامه‌ها/رساله‌های دانش‌آموختگان داخل کشور (گنج). مدیریت اطلاعات، 1(1-2)، 75-89.

Batini, C., & Scannapieco, M. (2016). Data and information quality.‏ Cham, Switzerland: Springer International Publishing.

Bhave, P. P., & Sadhwani, K. (2021). Sampling in environmental matrices: a critical review.‏ Environmental Forensics, 23(1-2), 75- 92.

Bruce, T. R., & Hillmann, D. I. (2004). The continuum of metadata quality: defining, expressing, exploiting. ALA editions.

Cárdenas-García, J. F., De Mesa, B. S., & Castro, D. R. (2019). Understanding Globalized Digital Labor in the Information Age.‏ Perspectives on Global Development and Technology,‏ 18(3), 308-326.

David, R.H. & Thomas, D. (2015) Assessing Metadata and Controlling Quality in Scholarly Ebooks. Cataloging & Classification Quarterly, 53(7), 801-824.

Day, M., Guy, M., & Powell, A. (2004). Improving the quality of metadata in eprint archives.‏ Ariadne,‏ 38.

Hillmann, D. I. (2008). Metadata quality: From evaluation to augmentation.‏ Cataloging & Classification Quarterly,‏ 46(1), 65-80.

Lau, A., & Moere, A. V. (2007, July). Towards a model of information aesthetics in information visualization. In‏ 2007 11th International Conference Information Visualization (IV'07)‏ (pp. 87-92). IEEE.

Makeleni, N., & Cilliers, L. (2021). Critical success factors to improve data quality of electronic medical records in public healthcare institutions.‏ South African Journal of Information Management,‏ 23(1), 1-8.

Marchand, D. (1990). Managing information quality. In: Wormell I.(ed.) Information Quality: Definitions and Dimensions.‏ Taylor Graham, Londres.

McWilliams, T. P., Saniga, E. M., & Davis, D. J. (2001). On the design of single sample acceptance sampling plans.‏ Stochastics and Quality Control,‏ 16(2), 193-198.

Montgomery, D.C. (2009). Design and analysis of experiments (7th ed.). John Wiley & Sons, Inc., New York.

Palavitsinis, N., Manouselis, N., & Sanchez-Alonso, S. (2014). Metadata quality in learning object repositories: a case study.‏ The Electronic Library, 32(1).

Park, J. R., Tosaka, Y., Maszaros, S., & Lu, C. (2010). From metadata creation to metadata quality control: Continuing education needs among cataloging and metadata professionals.‏ Journal of education for library and information science, 158-176.

Price, R., & Shanks, G. (2016). A semiotic information quality framework: development and comparative analysis. In‏ Enacting Research Methods in Information Systems‏ (pp. 219- 250). Palgrave Macmillan, Cham.

Russell, R.T., Chamberlain, J. & Azzopardi, L. (2018). Information retrieval in the workplace: A comparison of professional search practices. Information Processing & Management, 54(6), 1042-1057.

Schilling, E. G., & Neubauer, D. V. (2009).‏ Acceptance sampling in quality control. Chapman and Hall/CRC.

Stvilia, B., & Gasser, L. (2008). Value-based metadata quality assessment.‏ Library & Information Science Research,‏ 30(1), 67-74.

Stvilia, B., Gasser, L. & Twidale, M. B. (2007). Metadata quality problems in federated collections. In‏ Challenges of Managing Information Quality in Service Organizations‏ (pp. 154-186). IGI Global.

Tenopir, C. (1992). A Day in the Life of a Database Producer.‏ Database,‏ 15(3), 15-17.

Vliegen, L., Moroff, N. U., & Riehl, K. (2020, September). Evaluation of data quality in dimensioning capacity. In‏ Hamburg International Conference of Logistics (HICL) 2020‏ (pp. 355-394).

Wilkinson, L. (2006). Revising the Pareto chart.‏ The American Statistician, 60 (4), 332- 334.