EVALUATION OF THE QUALITY OF LARGE LANGUAGE MODELS' ANSWERS IN ANSWERING RELIGIOUS QUESTIONS FOR DIGITAL DA'WAH

Munirah Munirah; Aslan Alwi; Abdul Karim

doi:10.36769/asy.v27i1.1822

Authors

Munirah Munirah Faculty of Engineering, Universitas Muhammadiyah Ponorogo
Aslan Alwi Faculty of Engineering, Universitas Muhammadiyah Ponorogo
Abdul Karim Hallym University, Chunceon, Gangwon, South Korea

DOI:

https://doi.org/10.36769/asy.v27i1.1822

Keywords:

: Large Language Models, Digital Da'wah, AI Evaluation, Hallucination, AI Ethics.

Abstract

The development of Large Language Models (LLMs) has opened new opportunities for utilizing artificial intelligence for digital da'wah. However, the use of LLMs in the religious domain presents challenges related to accuracy, potential errors (hallucinations), and conformity with Islamic values. This study aims to evaluate the quality of LLM responses in answering religious questions as part of digital da'wah. The method used is a descriptive evaluation of 125 religious questions covering aspects of worship, faith, morals, contemporary issues, and ambiguous questions. LLM responses were analyzed based on three main indicators: accuracy, potential hallucinations, and conformity with Islamic values. The results show that LLMs have a fairly good level of accuracy on basic religious questions, but the potential for hallucinations is still found, especially in ambiguous and contemporary questions. These findings indicate that LLMs have potential as a digital da'wah tool, but their use requires a verification mechanism to maintain accuracy and conformity of values.

References

Alan, A. Y. (2025). Improving LLM Reliability with RAG in Religious QA. Turkish Journal of Engineering.

Atif, F., Agrawal, A., Awadallah, A. H., Caruana, R., & Ribeiro, M. T. (2025). Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions. arXiv. https://arxiv.org/abs/2508.08287

Atif, F., Askarbekuly, N., Darwish, K., & Choudhury, M. (2025). Sacred or Synthetic? Evaluating LLM Reliability and Abstention for Religious Questions. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 8(1), 217–226. https://doi.org/10.1609/aies.v8i1.36543

Bhatia, G. (2026). Advances in AI Systems on Islamic Knowledge Capabilities.

Brown, T. B. (2020). Language Models are Few-Shot Learners. NeurIPS.

Dam, S. K., Hong, C. S., Qiao, Y., & Zhang, C. (2024). A Complete Survey on LLM-based AI Chatbots. arXiv.

Ehrlich-Sommer, A. (2025). ForestGPT: Domain-Specific LLM. Electronics.

Floridi, L. (2019). Establishing the Rules for Building Trustworthy AI. Nature Machine Intelligence.

Geifman, Y., & El-Yaniv, R. (2019). SelectiveNet: A Deep Neural Network with Reject Option. ICML.

Guci, A. (2024). Tantangan Pendidikan Islam Zaman Modern. Tarqiyatuna.

Huang, L. (2025). A Survey on Hallucination in Large Language Models. ACM Transactions on Information Systems.

Kadavath, S. (2022). Uncertainty Estimation for Language Model Predictions.

Khalila, Z., Khaled, H., & Elmahdy, M. (2025). Investigating Retrieval-Augmented Generation in Quranic Studies: A Study of 13 Open-Source Large Language Models. arXiv. https://arxiv.org/abs/2503.16581

Kirichenko, P. (2025). AbstentionBench: Evaluating LLM Abstention.

Lewis, P. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS.

Muin, M. T. (2024). Konsep Pendidikan Anak di Era Digital dalam Perspektif Al-Qur'an. Tarqiyatuna.

Muzakki, Z. (2023). Integrasi Ilmu Ekonomi Islam dan Pendidikan Agama Islam dalam Era Society 5.0. I-BEST: Islamic Banking and Economic Law Studies.

Ouyang, L. (2022). Training Language Models to Follow Instructions with Human Feedback. NeurIPS.

Papakostas, A., Kallergis, G. D., & Politis, D. (2025). Artificial Intelligence in Religious Education: Ethical, Pedagogical, and Theological Perspectives. Religions, 16(5), 563.

Plaza-del-Arco, F. M., Cercas Curry, A., Paoli, S., Cercas Curry, A., & Hovy, D. (2024). Divine LLaMAs: Bias, Stereotypes, Stigmatization, and Emotion Representation of Religion in Large Language Models. Findings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), 4346–4366.

Sharma, A., & Gupta, M. (2025). Quantifying Religious Bias in Open LLMs through Demographic Attributes. arXiv. https://arxiv.org/html/2503.07510v1

Simbeck, D., & Mahran, M. (2025). Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models. arXiv. https://arxiv.org/abs/2509.17665

Vaughan, G. (2025). Wisdom of the Heart: A Review of Religion and AI. Religions.

Wang, L. (2023). Survey on LLM-based Autonomous Agents.

Wen, B. (2025). Know Your Limits: A Survey of Abstention in LLMs. TACL.

EVALUATION OF THE QUALITY OF LARGE LANGUAGE MODELS' ANSWERS IN ANSWERING RELIGIOUS QUESTIONS FOR DIGITAL DA'WAH

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information

Tools