Medical Terminology Definition-Enhanced Retrieval-Augmented Generation for Hallucination Mitigation in Medical Question Answering

Haoyang Guan

PDF

Published: 2025-11-10

Keywords:

medical question answering, hallucination mitigation, retrieval-augmented generation, medical terminology

Haoyang Guan

Data Science, Columbia University, NY, USA

Abstract

The rapid emergence of large language models (LLMs) in healthcare applications presents critical challenges related to factual accuracy and hallucination control. This paper proposes an alternative approach that integrates enhanced medical terminology definitions with retrieval-augmented generation (RAG) techniques to mitigate hallucinations in medical question-answering systems. The primary technical contributions include: (1) a Medical-Adaptive Confidence Calibration (MACC) algorithm that departs from traditional RAG methods by dynamically adjusting thresholds based on clinical risk; (2) a multi-source medical knowledge fusion framework that incorporates hierarchical relationships from SNOMED-CT, UMLS, and ICD-10; and (3) a comprehensive robustness validation procedure featuring real-time monitoring. The proposed approach achieves substantial accuracy improvements, reducing hallucinations by 23.7% (p < 0.001, 95% CI: 19.4%, 28.0%) compared with baseline systems. Experimental evaluations on medical consultation datasets demonstrate superior precision and reliability in clinical information delivery, yielding an 18.4% increase in precision and a 15.2% enhancement in recall. The framework effectively addresses major limitations of existing automated medical consultation systems while maintaining computational efficiency and scalability for practical deployment.

Issue

Vol. 1 No. 1 (2025): 2025 International Conference on Intelligent Computing and Automated Systems (ICAS 2025)

Section

Articles

How to Cite

Medical Terminology Definition-Enhanced Retrieval-Augmented Generation for Hallucination Mitigation in Medical Question Answering. (2025). Journal of Science, Innovation & Social Impact, 1(1), 222-240. https://sagespress.com/index.php/JSISI/article/view/29

References

1. Z. Bao, W. Chen, S. Xiao, K. Ren, J. Wu, C. Zhong, and Z. Wei, "Disc-medllm: Bridging general large language models and real-world medical consultation," arXiv preprint arXiv:2308.14346, 2023.

2. X. Zhang, B. Peng, Y. Tian, J. Zhou, L. Jin, L. Song, and H. Meng, "Self-alignment for factuality: Mitigating hallucinations in LLMs via self-evaluation," arXiv preprint arXiv:2402.09267, 2024. doi: 10.18653/v1/2024.acl-long.107

3. A. Abdulnazar, R. Roller, S. Schulz, and M. Kreuzthaler, "Large language models for clinical text cleansing enhance medical concept normalisation," IEEE Access, 2024.

4. M. Motegi, M. Shino, M. Kuwabara, H. Takahashi, T. Matsuyama, H. Tada, and K. Chikamatsu, "Comparison of physician and large language model chatbot responses to online ear, nose, and throat inquiries," Scientific Reports, vol. 15, no. 1, p. 21346, 2025. doi: 10.1038/s41598-025-06769-1

5. Y. Tang, Y. Yuan, F. Tao, and M. Tang, "Cross-modal augmented transformer for automated medical report generation," IEEE Journal of Translational Engineering in Health and Medicine, 2025. doi: 10.1109/jtehm.2025.3536441

6. Z. Ji, T. Yu, Y. Xu, N. Lee, E. Ishii, and P. Fung, "Towards mitigating hallucination in large language models via self-reflection," arXiv preprint arXiv:2310.06271, 2023.

7. P. Saxena, J. Saxena, K. Gupta, M. Kumar, and P. Chauhan, "Development of a symptom-based GI cancer diagnostic bot using BioBERT-NLI, FLAN-T5, and RAG model," In 2025, the 4th OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 5.0, April, 2025, pp. 1-7.

8. L. Liu, X. Yang, J. Lei, Y. Shen, J. Wang, P. Wei, and K. Ren, "A survey on medical large language models: Technology, application, trustworthiness, and future directions," arXiv preprint arXiv:2406.03712, 2024.

9. C. Wang, Q. Chen, W. Shao, and X. He, "KEMedGPT: Intelligent medical pre-consultation with knowledge-enhanced large language model," In 2024 IEEE International Conference on Medical Artificial Intelligence (MedAI), November, 2024, pp. 386-391. doi: 10.1109/medai62885.2024.00058

10. O. Tippins, T. Alvarez, J. Novak, R. Martinez, E. Thompson, and V. Williams, "Domain-specific retrieval-augmented generation through token factorisation: An experimental study," Authorea Preprints, 2024.

11. X. Zhang, and Y. Zhang, "A retrieval-augmented dialogue framework for multimodal medical consultation," In 2024 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), December, 2024, pp. 618-623. doi: 10.1109/wi-iat62293.2024.00099

12. K. Chen, J. Qi, J. Huo, P. Tian, F. Meng, X. Yang, and Y. Gao, "A self-evolving framework for multi-agent medical consultation based on large language models," In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), April, 2025, pp. 1-5. doi: 10.1109/icassp49660.2025.10889517

13. Z. Ji, T. Yu, Y. Xu, N. Lee, E. Ishii, and P. Fung, "Towards mitigating LLM hallucination via self-reflection," In Findings of the Association for Computational Linguistics: EMNLP 2023, December, 2023, pp. 1827-1843.

14. Y. Wang, Y. Yang, C. Hu, L. Xu, J. Li, L. Sun, and J. Gao, "A medical consultation system based on a federated learning framework," In 2024 International Conference on Ubiquitous Computing and Communications (IUCC), December, 2024, pp. 567-572.

15. H. Y. Leong, Y. Gao, and S. Ji, "A gen AI framework for medical note generation," In 2024, the 6th International Conference on Artificial Intelligence and Computer Applications (ICAICA), November, 2024, pp. 423-429. doi: 10.1109/icaica63239.2024.10823004

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section

How to Cite

References