Evaluation of Support Vector Machine, Naive Bayes, Decision Tree, and Gradient Boosting Algorithms for Sentiment Analysis on ChatGPT Twitter Dataset
DOI:
https://doi.org/10.24014/ijaidm.v7i1.24662Keywords:
Sentiment Analysis, Support Vector Machine, Naïve Bayes, Decision Tree, Gradient BoostingAbstract
ChatGPT is a language model employed to produce text and engage in conversation with users. It serves as a tool for generating text and facilitating interactions in a conversational manner. The model was designed to provide relevant and useful responses based on the context of the ongoing conversation. By the increasing popularity of using ChatGPT, it makes it difficult for users to classify responses about the use of ChatGPT. Therefore, sentiment classification of ChatGPT is carried out. The dataset used is sourced from the kaggle website with a total of 20,000 data. The classification methods used in this research include Support Vector Machine (SVM), Naïve Bayes, Decision Tree, and Gradient Boosting. Through the research results, the Support Vector Machine algorithm had the highest accuracy value with 80% compared to other methods, when the data is divided by a ratio of 90:10. This research is expected to help developers and service providers to improve ChatGPT and understand user responses better.
References
Parimala, M., Swarna Priya, R. M., Praveen Kumar Reddy, M., Lal Chowdhary, C., Kumar Poluru, R., & Khan, S. (2021). Spatiotemporal‐based sentiment analysis on tweets for risk assessment of event using deep learning approach. Software: Practice and Experience, 51(3), 550-570.
George, A. S., & George, A. H. (2023). A review of ChatGPT AI's impact on several business sectors. Partners Universal International Innovation Journal, 1(1), 9-23.
Juniarsih, S., Ripanti, E. F., & Pratama, E. E. (2020). Implementasi Naive Bayes Classifier pada Opinion Mining Berdasarkan Tweets Masyarakat Terkait Kinerja Presiden dalam Aspek Ekonomi. JUSTIN (Jurnal Sistem dan Teknologi Informasi), 8(3), 239-249.
D. Marutho, Muljono, S. Rustad and Purwanto, "Sentiment Analysis Optimization Using Vader Lexicon on Machine Learning Approach," 2022 International Seminar on Intelligent Technology and Its Applications (ISITIA), Surabaya, Indonesia, 2022, pp. 98-103, doi: 10.1109/ISITIA56226.2022.9855341.
Aldisa, R. T., & Maulana, P. (2022). Analisis Sentimen Opini Masyarakat Terhadap Vaksinasi Booster COVID-19 Dengan Perbandingan Metode Naive Bayes, Decision Tree dan SVM. Building of Informatics, Technology and Science (BITS), 4(1), 106-109.
Ramadhan, M. A., & Wahyudin, M. I. (2022). Analisis Sentimen Mengenai Keberhasilan Indonesia di Ajang Thomas Cup 2020 (Studi Kasus Media Sosial Twitter) Menggunakan Metode Naïve Bayes dan Decision Tree. Jurnal JTIK (Jurnal Teknologi Informasi dan Komunikasi), 6(4), 505-511.
Lund, B. D., & Wang, T. (2023). Chatting about ChatGPT: how may AI and GPT impact academia and libraries?. Library Hi Tech News, 40(3), 26-29.
Zhong, Q., Ding, L., Liu, J., Du, B., & Tao, D. (2023). Can chatgpt understand too? a comparative study on chatgpt and fine-tuned bert. arXiv preprint arXiv:2302.10198.
Rahmaddeni, & Akbar, F. (2021). Public Opinion on Covid-19 Vaccination in Indonesia: A Sentiment Analysis on Twitter. International Journal of Advanced Intelligence and Data Mining, 6(1), 8-17.
Sidik, F., Suhada, I., Anwar, A. H., & Hasan, F. N. (2022). Analisis Sentimen Terhadap Pembelajaran Daring Dengan Algoritma Naive Bayes Classifier. Jurnal Linguistik Komputasional, 5(1), 34-43.
A. S. Alammary, "Arabic Questions Classification Using Modified TF-IDF," in IEEE Access, vol. 9, pp. 95109-95122, 2021, doi: 10.1109/ACCESS.2021.3094115.
M. S. Neethu and R. Rajasree, “Sentiment analysis in twitter using machine learning techniques,” in 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), 2013, pp. 1–5.
Khaira, U., Johanda, R., Utomo, P. E. P., & Suratno, T. (2020). Sentiment analysis of cyberbullying on twitter using SentiStrength. Indones. J. Artif. Intell. Data Min, 3(1), 21.
S. Al-Saqqa, A. Awajan, and S. Ghoul, ‘‘Stemming effects on sentiment analysis using large arabic multi-domain resources,’’ in Proc. 6th Int. Conf. Social Netw. Anal., Manage. Secur. (SNAMS), Oct. 2019, pp. 211–216.
S. Amin et al., "Recurrent Neural Networks With TF-IDF Embedding Technique for Detection and Classification in Tweets of Dengue Disease," in IEEE Access, vol. 8, pp. 131522-131533, 2020, doi: 10.1109/ACCESS.2020.3009058.
S. Nirmal and T. Verma, ‘‘E-Mail spam detection and classification using SVM and feature Extraction,’’ Int. J. Advance Res., Ideas Innov. Technol., vol. 3, no. 3, pp. 1491–1495, 2017.
M. O. Pratama et al., “The sentiment analysis of Indonesia commuter line using machine learning based on twitter data,” J. Phys. Conf. Ser., vol. 1193, no. 1, pp. 1–6, 2019, doi: 10.1088/1742-6596/1193/1/012029.
Y. Findawati, I. R. I. Astutik, A. S. Fitroni, I. Indrawati, and N. Yuniasih, “Comparative analysis of Naïve Bayes, K Nearest Neighbor and C.45 method in weather forecast,” J. Phys. Conf. Ser., vol. 1402, p. 066046, Dec. 2019, doi: 10.1088/1742-6596/1402/6/066046.
I. S. Damanik, A. P. Windarto, A. Wanto, S. R. Andani, and W. Saputra, “Decision Tree Optimization in C4. 5 Algorithm Using Genetic Algorithm,” in Journal of Physics: Conference Series, 2019, vol. 1255, no. 1, p. 012012.
Yang, H., Luo, Y., Ren, X., Wu, M., He, X., Peng, B., et al. (2021). Risk Prediction of Diabetes: Big Data Mining with Fusion of Multifarious Physical Examination Indicators. Inf. Fusion 75, 140–149. doi:10.1016/j.inffus.2021.02.015.