A Comparative Study of Arabic Text Classification using k-NN, SVM, and Naive Bayes

Salih Saad Garash 1

doi:10.65405/.v10i37.326

Authors

Salih Saad Garash 1 1 Libyan Academy, Tripoli, LIBYA , Author

DOI:

https://doi.org/10.65405/.v10i37.326

Keywords:

Text Classification, KNN, SVM, Naive Bayes, TF-IDF, Machine Learning

Abstract

Arabic text classification is a critical task in natural language processing, yet it remains challenging due to the language’s morphological complexity and the scarcity of annotated datasets. This study presents a comparative evaluation of three classical machine learning algorithms—k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), and Naive Bayes—for multi-category Arabic text classification. We employ a curated dataset of 700 articles from Al-Hayat newspaper, evenly distributed across seven categories: Technology, Economy, Sports, General News, Science, Culture, and Politics. The texts, written in Modern Standard Arabic, undergo standard preprocessing including normalization, tokenization, stopword removal, and light stemming., and models are evaluated based on accuracy, precision, recall, and F1-score. Experimental results show that SVM achieves the highest performance with 89.3% accuracy and 88.8% F1-score, followed by Naive Bayes (86.4% accuracy) and k-NN (79.3% accuracy). The findings confirm SVM as the most effective classical model for this task, while Naive Bayes offers a computationally efficient alternative. k-NN underperforms, particularly in high-dimensional spaces. This work provides a reproducible benchmark for Arabic text classification and highlights the importance of preprocessing and feature representation. The results serve as a foundation for future research, including the integration of deep learning models and expansion to dialectal Arabic content..

Downloads

Download data is not yet available.

References

[1] H. Al-Khalifa and H. Al-Aqary, "Arabic web page classification using machine learning techniques," in Proceedings of the IEEE International Conference on Computer Systems and Applications (AICCSA), 2005, pp. 1–6.

[2] A. Khreishah, I. Chelloug, and M. Alsyouf, "Comparative study of machine learning algorithms for Arabic text classification," Journal of King Saud University – Computer and Information Sciences, vol. 22, no. 2, pp. 87–96, 2010. [Online]. Available: https://doi.org/10.1016/j.jksuci.2010.02.001

[3] O. Mustafa, S. El-Masri, and K. Darwish, "Hybrid stemming for Arabic text classification," in Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2013, pp. 3120–3124.

[4] A. Al-Azani and S. El-Beltagy, "A comparative analysis of machine learning classifiers for Arabic text categorization," International Journal of Computer Applications, vol. 180, no. 3, pp. 1–7, 2018. [Online]. Available: https://doi.org/10.5120/ijca2018916088

[5] M. Al-Smadi and I. Al-Natsheh, "Arabic news text classification using support vector machines," Procedia Computer Science, vol. 32, pp. 752–759, 2014. [Online]. Available: https://doi.org/10.1016/j.procs.2014.05.468

[6] N. Al-Twairesh and A. Al-Osaimi, "Performance evaluation of machine learning algorithms for Arabic news classification," International Journal of Advanced Computer Science and Applications, vol. 10, no. 5, pp. 445–451, 2019. [Online]. Available: https://doi.org/10.14569/IJACSA.2019.0100559

[7] M. Diab, K. Leidos, and R. Maamouri, "Automatic morphological tagging of Arabic," Natural Language Engineering, vol. 9, no. 2, pp. 149–181, 2003. [Online]. Available: https://doi.org/10.1017/S1351324903003073

[8] K. Darwish, "Building and using a lexical database for Arabic," in Proceedings of the Language Resources and Evaluation Conference (LREC), 2006, pp. 111–116.

[9] A. Almaksour and M. Cecchini, "Arabic text classification: A survey," in Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI), 2011, pp. 1–8. [Online]. Available: https://doi.org/10.1109/IRI.2011.6009532

[10] W. Aljedaani and S. Alqaraawi, "A comparative study of TF-IDF and word embeddings for Arabic text classification," in Proceedings of the International Conference on Computational Science and Computational Intelligence (CSCI), 2020, pp. 1024–1029. [Online]. Available: https://doi.org/10.1109/CSCI51800.2020.00174

[11] A. Antoun, F. Baly, and H. Hajj, "AraBERT: Transformer-based model for Arabic language understanding," in Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, pp. 9–15, 2021. [Online]. Available: https://aclanthology.org/2021.osact-1.2

[12] M. A. Al-Badrashiny, A. R. Sadat, and N. Diab, "CAMeL Tools: An open-source toolkit for Arabic natural language processing," Natural Language Engineering, vol. 27, no. 4, pp. 589–608, 2021. [Online]. Available: https://doi.org/10.1017/S1351324921000189

[13] T. El-Halees, "Arabic text classification using machine learning and deep learning approaches," IEEE Access, vol. 8, pp. 158 420–158 429, 2020. [Online]. Available: https://doi.org/10.1109/ACCESS.2020.3019467

[14] A. Alharbi and A. Azmi, "A survey of Arabic text classification: Challenges and solutions," Information Processing & Management, vol. 57, no. 6, p. 102347, 2020. [Online]. Available: https://doi.org/10.1016/j.ipm.2020.102347

[15] M. S. Abuarqoub, M. Al-Ayyoub, and Y. Jararweh, "Deep learning approaches for Arabic sentiment analysis," Future Generation Computer Systems, vol. 118, pp. 344–353, 2021. [Online]. Available: https://doi.org/10.1016/j.future.2020.12.020

A Comparative Study of Arabic Text Classification using k-NN, SVM, and Naive Bayes

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

INDEXING

Crossref

open-access

ISSN

DOLJ

Turnitin

doi

googlescholar

Orcid

Language