Named Entity Recognition Using Conditional Random Fields for Flood Detection In Gerbang Kertosusila Based Twitter Data

Authors

  • Ikrimatul Ulumiyyah Universitas Islam Negeri Sunan Ampel Surabaya
  • Dwi Rolliawati Universitas Islam Negeri Sunan Ampel Surabaya
  • Andik Izzuddin Universitas Islam Negeri Sunan Ampel Surabaya
  • Khalid Khalid Universitas Islam Negeri Sunan Ampel Surabaya
  • Anang Khunaefi Universitas Islam Negeri Sunan Ampel Surabaya
  • Mujib Ridwan Universitas Islam Negeri Sunan Ampel Surabaya

DOI:

https://doi.org/10.24014/ijaidm.v7i2.27062

Keywords:

Conditional Random Fields, Flood Detection, Gerbang Ketosusila, Natural Language Processing, Named Entity Recognition

Abstract

The national strategic area Gerbang Kertosusila East Java should be aware of floods. One of the existing efforts is to place flood sensors at several flood-prone points. However, that way is constrained by the need for more equipment to handle the many needy areas. So it is necessary to develop technology for the dissemination of flood information. Dissemination of flood information was quickly obtained from social media Twitter. One way is to use Twitter's text data source for a Named Entity Recognition model to help detect flood events and their locations. The Named Entity Recognition (NER) model was constructed using the Conditional Random Fields (CRFs) method to achieve research objectives. This research adds slang word handling at the preprocessing stage to improve model performance and the use of the BIO format in the labeling process and POS Tagging in the Feature Extraction process. Evaluation results with five Kfolds, 80% training data, and 20% test data show that the NER CRFs model performs excellently with a Precision of 0.981, Recall of 0.926, and f-measure of 0.950 so that these results can help the community and government regarding the information on the distribution of floods.

Author Biographies

Ikrimatul Ulumiyyah, Universitas Islam Negeri Sunan Ampel Surabaya

Ikrimatul Ulumiyyah received the S.Kom. degree in information systems from UIN Sunan Ampel, Surabaya, Indonesia, in 2022. Recently she research interest has been data analytics, data processing, modeling machine learning, and data science. Besides that, she is also interested in text mining, and natural language processing. she can be contacted at email: ikrimatul@gmail.com

Dwi Rolliawati, Universitas Islam Negeri Sunan Ampel Surabaya

Dwi Rolliawati holds a M.T in Electrical Engineering from ITS, Surabaya, Indonesia. She has been a Lecturer in the Information Systems Study Program at UIN Sunan Ampel from 2014 until now. Besides that, she also serves as head of the information systems study program, where his research concentration is Computer Science, Modeling Simulation, Machine Learning. She can be contacted at email: dwi_roll@uinsby.ac.id

Andik Izzuddin, Universitas Islam Negeri Sunan Ampel Surabaya

Andik Izzuddin is a lecturer at the UIN Sunan Ampel Surabaya in Indonesia. His research interests include computer network, information sytem, information technology and community based research. He can be contacted at email: andik@uinsby.ac.id

Khalid Khalid, Universitas Islam Negeri Sunan Ampel Surabaya

Khalid has been a Lecturer in the Information Systems Study Program at UIN Sunan Ampel from 2014 until now, where his research concentration is Data Mining, Natural Language Processing, Data Science, Text Mining, Machine Learning . he can be contacted at email: khalid@uinsby.ac.id

Anang Khunaefi, Universitas Islam Negeri Sunan Ampel Surabaya

Anang Khunaefi received the B.C. and M.C. degree in informatics engineering from Sepuluh Nopember Institute of Technology, Surabaya, Indonesia, in 2004 and 2013, respectively. He received the Ph.D. degree in computer science and electrical engineering from Kumamoto University, Kumamoto, Japan in 2021. From 2003 to 2010, he was a software engineer focusing on the development of web-based application system using Java programming language, PHP, and Javascript for an IT company based in Jakarta, Indonesia. He wrote two books about the analysis and implementation of information system for mapping students' interest in Indonesia's educational institutions. His research interest includes software engineering, data-driven requirement, business process automation, semantic web service, and the implementation of information system for educational institutions. he can be contacted at email: kunaefi@uinsby.ac.id

Mujib Ridwan, Universitas Islam Negeri Sunan Ampel Surabaya

Mujib Ridwan has been a Lecturer in the Information Systems Study Program at UIN Sunan Ampel, where his research concentration is technology information, machine learning, intelegent system. he can be contacted at email: mujibrw@uinsby.ac.id

References

K. Aruna, Dr. M. V. Subramanian (RTD), Dr. B. Jaya sudha, and Bharathidasan university, “Studies on Seasonal variations of rainfall in java island at Indonesia,” J Algebr Stat, vol. 13, no. 3, pp. 1481–1489, 2022.

D. B. Baranowski et al., “Social-media and newspaper reports reveal large-scale meteorological drivers of floods on Sumatra,” Nat Commun, vol. 11, no. 1, pp. 1–10, 2020, doi: 10.1038/s41467-020-16171-2.

I. Utami and M. Marzuki, “Analisis sistem informasi banjir berbasis media twitter,” Jurnal Fisika Unand, vol. 9, no. 1, pp. 67–72, 2020.

M. H. Awalludin, F. Teknik, U. K. Indonesia, and J. D. Bandung, “EVENT DETECTION PADA MICROBLOGGING TWITTER DENGAN METODE DENCLUE UNTUK PEMETAAN LOKASI BENCANA LONGSOR,” JBPTUNIKOMPP, 2018, [Online]. Available: https://repository.unikom.ac.id/id/eprint/58405

I. Utami and M. Marzuki, “Analisis sistem informasi banjir berbasis media twitter,” Jurnal Fisika Unand, vol. 9, no. 1, pp. 67–72, 2020, [Online]. Available: http://jfu.fmipa.unand.ac.id/index.php/jfu/article/view/454

E. Kapetanios, D. Tatar, and C. Sacarea, “Named Entity Recognition,” Natural Language Processing, vol. 8, no. 2, pp. 309–322, 2013, doi: 10.1201/b15472-19.

F. Béchet and B. Mohit, “Named Entity Recognition,” Spoken Language Understanding: Systems for Extracting Semantic Information from Speech, pp. 257–290, 2011, doi: 10.1002/9781119992691.ch10.

F. Muhammad and M. L. Khodra, “Event information extraction from Indonesian tweets using conditional random field,” ICAICTA 2015 - 2015 International Conference on Advanced Informatics: Concepts, Theory and Applications, pp. 0–5, 2015, doi: 10.1109/ICAICTA.2015.7335383.

M. Ermawati and J. L. Buliali, “Text Based Approach For Similar Traffic Incident Detection from Twitter,” Lontar Komputer : Jurnal Ilmiah Teknologi Informasi, vol. 9, no. 2, p. 63, 2018, doi: 10.24843/lkjiti.2018.v09.i02.p01.

Y. Munarko, U. M. Malang, and Y. Munarko, “Ekstraksi Nama Lokasi Dari Tweets Informasi,” Seminar Teknologi dan Rekayasa (SENTRA), pp. 978–979, 2015.

N. Jaariyah and E. Rainarli, “Conditional Random Fields Untuk Pengenalan Entitas Bernama Pada Teks Bahasa Indonesia,” Komputa : Jurnal Ilmiah Komputer dan Informatika, vol. 6, no. 1, pp. 29–34, 2017, doi: 10.34010/komputa.v6i1.2474.

Y. Munarko, M. S. Sutrisno, W. A. I. Mahardika, I. Nuryasin, and Y. Azhar, “Named entity recognition model for Indonesian tweet using CRF classifier,” IOP Conference Series: Materials Science and Engineering PAPER, 2018, doi: 10.1088/1757-899X/403/1/012067.

W. Ahmed, P. A. Bath, and G. Demartini, “USING TWITTER AS A DATA SOURCE: AN OVERVIEW OF ETHICAL, LEGAL, AND METHODOLOGICAL CHALLENGES,” Emerald Publishing Limited, vol. 2, pp. 79–107, 2017, doi: https://doi.org/10.1108/S2398-601820180000002004.

N. Patil, A. Patil, and B. V. Pawar, “Named Entity Recognition using Conditional Random Fields,” Procedia Comput Sci, vol. 167, no. 2019, pp. 1181–1188, 2020, doi: 10.1016/j.procs.2020.03.431.

L. Owen, “Indonesian Stopword Combined.” [Online]. Available: https://github.com/louisowen6/NLP_bahasa_resources/blob/master/combined_stop_words.txt

N. A. Salsabila, Y. Ardhito, W. Ali, A. Septiandri, and A. Jamal, “Colloquial Indonesian Lexicon,” 2018 International Conference on Asian Language Processing (IALP), pp. 226–229, 2018.

D. T. Wijaya, “IndoCollex : A Testbed for Morphological Transformation of Indonesian Colloquial Words,” no. 2017, pp. 3170–3183, 2021.

Sastrawi · GitHub. Accessed: Jun. 22, 2022. [Online]. Available: https://github.com/sastrawi

A. Dinakaramani, F. Rashel, A. Luthfi, and R. Manurung, “Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus,” Proceedings of the International Conference on Asian Language Processing 2014, IALP 2014, pp. 66–69, 2014, doi: 10.1109/IALP.2014.6973519.

Yudi Wibisono, “POS Tagger Bahasa Indonesia dengan Python – Blog Yudi Wibisono.” Accessed: Jun. 22, 2022. [Online]. Available: https://yudiwbs.wordpress.com/2018/02/20/pos-tagger-bahasa-indonesia-dengan-pytho/

L. Mardiana, D. Kusnandar, and N. Satyahadewi, “Analisis Diskriminan Dengan K Fold Cross Validation Untuk Klasifikasi Kualitas Air Di Kota Pontianak,” Bimaster : Buletin Ilmiah Matematika, Statistika dan Terapannya, vol. 11, no. 1, pp. 97–102, 2022.

R. Klinger, “Classical Probabilistic Models and Conditional Random Fields,” Entropy, vol. 51, no. December, pp. 282–289, 2007.

C. Sutton and A. McCallum, “An introduction to conditional random fields,” Foundations and Trends in Machine Learning, vol. 4, no. 4, pp. 267–373, 2011, doi: 10.1561/2200000013.

H. M. Wallach, “ScholarlyCommons Conditional Random Fields : An Introduction Conditional Random Fields : An Introduction,” no. February, 2004.

J. Suzuki, E. McDermott, and H. Isozaki, Training Conditional Random Fields with Multivariate Evaluation Measures. 2006. doi: 10.3115/1220175.1220203.

N. Okazaki, “a fast implementation of Conditional Random Fields.” 2007.

D. J. Hand, P. Christen, and N. Kirielle, “F*: an interpretable transformation of the F-measure,” Mach Learn, vol. 110, no. 3, pp. 451–456, 2021, doi: 10.1007/s10994-021-05964-1.

Downloads

Published

2024-05-09