Enhanced Feature Fusion and Transfer Learning for Multi-Format Government Document Classification
Main Article Content
Abstract
Article Details
Section
How to Cite
References
1. T. Hong, D. Kim, M. Ji, W. Hwang, D. Nam, and S. Park, "Bros: A pre-trained language model focusing on text and layout for better key information extraction from documents," In Proceedings of the AAAI Conference on Artificial Intelligence, June, 2022, pp. 10767-10775. doi: 10.1609/aaai.v36i10.21322
2. S. Appalaraju, B. Jasani, B. U. Kota, Y. Xie, and R. Manmatha, "Docformer: End-to-end transformer for document understanding," In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 993-1003.
3. Y. Xu, M. Li, L. Cui, S. Huang, F. Wei, and M. Zhou, "Layoutlm: Pre-training of text and layout for document image understanding," In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, August, 2020, pp. 1192-1200. doi: 10.1145/3394486.3403172
4. Y. Liu, S. Yan, L. Leal-Taixé, J. Hays, and D. Ramanan, "Soft augmentation for image classification," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16241-16250. doi: 10.1109/cvpr52729.2023.01558
5. G. Jaume, H. K. Ekenel, and J. P. Thiran, "Funsd: A dataset for form understanding in noisy scanned documents," In 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), September, 2019, pp. 1-6.
6. J. Li, Y. Xu, T. Lv, L. Cui, C. Zhang, and F. Wei, "Dit: Self-supervised pre-training for document image transformer," In Proceedings of the 30th ACM international conference on multimedia, October, 2022, pp. 3530-3539. doi: 10.1145/3503161.3547911
7. Y. Huang, T. Lv, L. Cui, Y. Lu, and F. Wei, "Layoutlmv3: Pre-training for document ai with unified text and image masking," In Proceedings of the 30th ACM international conference on multimedia, October, 2022, pp. 4083-4091. doi: 10.1145/3503161.3548112
8. X. Zhang, J. Yoon, M. Bansal, and H. Yao, "Multimodal representation learning by alternating unimodal adaptation," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 27456-27466.
9. C. Da, C. Luo, Q. Zheng, and C. Yao, "Vision grid transformer for document layout analysis," In Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 19462-19472.
10. Y. Xu, Y. Xu, T. Lv, L. Cui, F. Wei, G. Wang, and L. Zhou, "Layoutlmv2: Multi-modal pre-training for visually-rich document understanding," In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), August, 2021, pp. 2579-2591. doi: 10.18653/v1/2021.acl-long.201
11. L. Yefan, L. Yijing, Y. Yina, H. Miaowan, and T. Da, "Multimodal Document Classification Based on Two-Stream Adaptive Feature Fusion," In 2025 IEEE 5th International Conference on Electronic Technology, Communication and Information (ICETCI), May, 2025, pp. 693-698. doi: 10.1109/icetci64844.2025.11084186
12. Z. Gu, C. Meng, K. Wang, J. Lan, W. Wang, M. Gu, and L. Zhang, "Xylayoutlm: Towards layout-aware multimodal networks for visually-rich document understanding," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4583-4592.
13. M. Li, T. Lv, J. Chen, L. Cui, Y. Lu, D. Florencio, and F. Wei, "Trocr: Transformer-based optical character recognition with pre-trained models," In Proceedings of the AAAI conference on artificial intelligence, June, 2023, pp. 13094-13102. doi: 10.1609/aaai.v37i11.26538
14. C. Auer, A. Nassar, M. Lysak, M. Dolfi, N. Livathinos, and P. Staar, "Icdar 2023 competition on robust layout segmentation in corporate documents," In International Conference on Document Analysis and Recognition, August, 2023, pp. 471-482. doi: 10.1007/978-3-031-41679-8_27
15. A. G. AV, "Efficient Document Classification Using Fused CNN-SVM Model," In 2024 International Conference on Communication, Computing and Energy Efficient Technologies (I3CEET), September, 2024, pp. 879-885.