EVALUATION OF THE QUALITY OF LARGE LANGUAGE MODELS' ANSWERS IN ANSWERING RELIGIOUS QUESTIONS FOR DIGITAL DA'WAH
DOI:
https://doi.org/10.36769/asy.v27i1.1822Keywords:
: Large Language Models, Digital Da'wah, AI Evaluation, Hallucination, AI Ethics.Abstract
The development of Large Language Models (LLMs) has opened new opportunities for utilizing artificial intelligence for digital da'wah. However, the use of LLMs in the religious domain presents challenges related to accuracy, potential errors (hallucinations), and conformity with Islamic values. This study aims to evaluate the quality of LLM responses in answering religious questions as part of digital da'wah. The method used is a descriptive evaluation of 125 religious questions covering aspects of worship, faith, morals, contemporary issues, and ambiguous questions. LLM responses were analyzed based on three main indicators: accuracy, potential hallucinations, and conformity with Islamic values. The results show that LLMs have a fairly good level of accuracy on basic religious questions, but the potential for hallucinations is still found, especially in ambiguous and contemporary questions. These findings indicate that LLMs have potential as a digital da'wah tool, but their use requires a verification mechanism to maintain accuracy and conformity of values.
References
Alan, A. Y. (2025). Improving LLM Reliability with RAG in Religious QA. Turkish Journal of Engineering.
Atif, F., Agrawal, A., Awadallah, A. H., Caruana, R., & Ribeiro, M. T. (2025). Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions. arXiv. https://arxiv.org/abs/2508.08287
Atif, F., Askarbekuly, N., Darwish, K., & Choudhury, M. (2025). Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(1), 217–226. https://doi.org/10.1609/aies.v8i1.36543
Bhatia, G. (2026). Advances in AI Systems on Islamic Knowledge Capabilities.
Brown, T. B. (2020). Language Models are Few-Shot Learners. NeurIPS.
Dam, S. K., Hong, C. S., Qiao, Y., & Zhang, C. (2024). A Complete Survey on LLM-based AI Chatbots. arXiv.
Ehrlich-Sommer, A. (2025). ForestGPT: Domain-Specific LLM. Electronics.
Floridi, L. (2019). Establishing the Rules for Building Trustworthy AI. Nature Machine Intelligence.
Geifman, Y., & El-Yaniv, R. (2019). SelectiveNet: A Deep Neural Network with Reject Option. ICML.
Guci, A. (2024). Tantangan Pendidikan Islam Zaman Modern. Tarqiyatuna.
Huang, L. (2025). A Survey on Hallucination in Large Language Models. ACM Transactions on Information Systems.
Kadavath, S. (2022). Uncertainty Estimation for Language Model Predictions.
Khalila, Z., Khaled, H., & Elmahdy, M. (2025). Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source Large Language Models. arXiv. https://arxiv.org/abs/2503.16581
Kirichenko, P. (2025). AbstentionBench: Evaluating LLM Abstention.
Lewis, P. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS.
Muin, M. T. (2024). Konsep Pendidikan Anak di Era Digital dalam Perspektif Al-Qur'an. Tarqiyatuna.
Muzakki, Z. (2023). Integrasi Ilmu Ekonomi Islam dan Pendidikan Agama Islam dalam Era Society 5.0. I-BEST: Islamic Banking and Economic Law Studies.
Ouyang, L. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS.
Papakostas, A., Kallergis, G. D., & Politis, D. (2025). Artificial Intelligence in Religious Education: Ethical, Pedagogical, and Theological Perspectives. Religions, 16(5), 563.
Plaza-del-Arco, F. M., Cercas Curry, A., Paoli, S., Cercas Curry, A., & Hovy, D. (2024). Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models. Findings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4346–4366.
Sharma, A., & Gupta, M. (2025). Quantifying Religious Bias in Open LLMs through Demographic Attributes. arXiv. https://arxiv.org/html/2503.07510v1
Simbeck, D., & Mahran, M. (2025). Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models. arXiv. https://arxiv.org/abs/2509.17665
Vaughan, G. (2025). Wisdom of the Heart: A Review of Religion and AI. Religions.
Wang, L. (2023). Survey on LLM-based Autonomous Agents.
Wen, B. (2025). Know Your Limits: A Survey of Abstention in LLMs. TACL.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Munirah Munirah, Aslan Alwi, Abdul Karim

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.






