Classification of Figures in Scientific Documents Based on a Deep Learning Method

Document Type : Original Article

Authors
Assistant Prof., Department of Information Technology Research, Iranian Research Institute for Information Science and Technology (IranDoc). Tehran. Iran
10.22034/aimj.2023.211855
Abstract
There are two ways to retrieve information from figures: context-oriented and content-oriented. The content-oriented methods use the visual content of the figures for retrieval. However, scientific figures are complex, so they need to be classified first before using content-oriented methods to extract information from them. This paper presents a classification method for scientific figures. The training data for the classification task was chosen from Ganj, a rich source of Persian scientific documents. The training data consisted of 5892 figures randomly selected from dissertations and theses of Ganj in seven different fields. Experts labeled the figures into six classes: natural photos, maps, x-y diagrams, tables, structured diagrams or flowcharts, and statistical diagrams. The training data was unbalanced, so augmentation methods were used to increase the number of figures in underrepresented classes.  Scientific images from different classes, in some cases, look very similar, so finding features that can distinguish them is difficult. We applied deep learning methods that learn the features directly from the images. Due to the scarcity of data, we used  neural network with fewer layers and parameters. We found that networks that were pre-trained on a large image database performed better. Our research shows that the pre-trained VGG16 network with sixteen layers can classify scientific images with 97% accuracy.

Keywords


Chagas, P., Akiyama, R., Meiguins, A., Santos, C., Saraiva, F., Meiguins, B. & Morais, J. (2018). Evaluation of Convolutional Neural Network Architectures for Chart Image Classification, International Joint Conference on Neural Networks (IJCNN), 1–8
Cheng, B., Stanley, R., Antani, S. & Thoma, G. (2013). Graphical Figure Classification, Using Data Fusion for Integrating Text and Image Features. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, Aug. 2013.
Clark, C. & Divvala, S. (2016). Pdf figures 2.0: Mining figures from research papers. In Digital Libraries (JCDL), IEEE/ACM Joint Conference, pp. 143–152.
Gao, J., Zhou, Y. & Barner, K. E. (2018). View: Visual Information Extraction Widget for improving chart images accessibility. 19th IEEE International Conference on Image Processing, pp. 2865–2868, Sept.2012. ISSN: 2381-8549.
He, H. & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engine, 21(9), 1263–1284.
He, K., Zhang X., Ren, S. & Sun, J. (2016). Deep Residual Learning for Image Recognition. Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30; pp. 770–778
Hinton, G.E, Osindero, S. & Teh, Y.W. (2006). A fast learning algorithm for deep belief nets. Neural Comput, 18(7), 1527-54. doi: 10.1162/neco.2006.18.7.1527. PMID: 16764513.
Jobin, K.V., Mondal A. & Jawahar, C. V. (2019). DocFigure: a dataset for scientific document figure classification. 13th IAPR International Workshop on Graphics Recognition. GREC 2019, Sydney, Australia, 20–22.
Jung, D., Kim, W., Song, H., Hwang, J., Lee, B., Kim, B. H. & Seo J. (2017). ChartSense: Interactive Data Extraction from Chart Images, Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. Pages 6706–6717.
Kavasidis, I., Palazzo, S., Spampinato, C., Pino, C., Giordano, D., Giuffrida, D. & Messina P. (2019). A saliency-based convolutional neural network for table and chart detection in digitized documents. Ricci, E., Rota Bulò, S., Snoek, C., Lanz, O., Messelodi, S., Sebe, N. (eds) Image Analysis and Processing – ICIAP 2019. ICIAP 2019. Lecture Notes in Computer Science. Vol 11752. Springer, Cham.
Krizhevsky, A., Sutskever, I. & Hinton, G. (2012). ImageNet Classification with Deep Convolutional Neural Networks. 2012 Neural Inf. Process. Syst., 25, 1097–1105
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998). Gradient-Based Learning Applied to Document Recognition. Proc. IEEE, 86, 2278–2324
Li, X., Yang, J. & Ma, J. (2021). Recent developments of content-based image retrieval (CBIR), Neurocomputing, 452 (675-689).
Liu, X., Tang, B., Wang, Z., Xu, X., Pu, S., Tao, D. & Song, M. (2015). Chart classification by combining deep convolutional networks and deep belief networks. In 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 801–805, Aug. 2015.
Morris, D., Müller-Budack, E. & Ewerth, R. (2020). SlideImages: A Dataset for Educational Image Classification. Jose J. et al. (eds) Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, Vol 12036. Springer, Cham. https://doi.org/10.1007/978-3-030-45442-5_36
Naga Prasad, V.S., Siddiquie, B., Golbeck, J. & Davis, L.S. (2007) Classifying computer generated charts. International Workshop on Content-Based Multimedia Indexing. CBMI'07. IEEE, 85-92.
Samih, H., Rady, S. & Tarek Gharib, F. (2020). Enhancing image retrieval for complex queries using external knowledge sources. Multimedia Tools and Applications, 79( 27633–27657).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), PP. 4510-4520
Savva, M., Kong, N., Chhajta A., Fei-Fei L., Agrawala, M. & Heer, J. (2011). ReVision: Automated Classification, Analysis and Redesign of Chart Images. ACM User Interface Software & Technology (UIST)
Shao, M. & Futrelle, R. (2006) . Recognition and classification of figures in pdf documents. W. Liu and J. Llados, editors, ´ Graphics Recognition. Ten Years Review and Future Perspectives, Vol. 3926 (231-242) of Lecture Notes in Computer Science. Springer Berlin.
Shin, H.C., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M. (2016). Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Trans Med Imaging. May; 35(5):1285-98. doi: 10.1109/TMI.2016.2528162. Epub 2016 Feb 11. PMID: 26886976; PMCID: PMC4890616
Siegel, N., Horvitz, Z., Levin, R., Divvala, S., Farhadi, A. (2016). FigureSeer: Parsing Result-Figures in Research Papers. Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science, Vol. 9911. Springer, Cham. https://doi.org/10.1007/978-3-319-46478-7_41
 Simonyan, K. &  Zisserman, A. (2015). AVery Deep Convolutional Networks for Large-Scale Image Recognition, arXiv:1409.1556v6 
Singh, V.P., Srivastava, R., Pathak, Y., Tiwari, S. & Kaur, K. (2019). Content-based image retrieval based on supervised learning and statistical-based moments. World Scientific, 33 (19), 1-23.
Yang, J., Jiang, Y.G., Hauptmann, A.G. & Ngo, C.W. (2007) . Evaluating bag-of-visual-words representations in scene classification. Workshop on Multimedia Information Retrieval, pages 197–206.
Volume 9, Issue 1 - Serial Number 16
September 2024
Pages 58-76

  • Receive Date 08 January 2024
  • Revise Date 05 March 2024
  • Accept Date 31 August 2024