NORMALIZATION OF NOISY TEXTS IN MALAYSIAN ONLINE REVIEWS

Authors

  • Norlela Samsudin Faculty of Computer and Mathematical Science, Universiti Teknologi MARA Terengganu, Dungun, 23000, Terengganu, Malaysia
  • Mazidah Puteh Faculty of Computer and Mathematical Science, Universiti Teknologi MARA Terengganu, Dungun, 23000, Terengganu, Malaysia
  • Abdul Razak Hamdan Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600, Bangi, Selangor, Malaysia
  • Mohd Zakree Ahmad Nazri Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600, Bangi, Selangor, Malaysia

Keywords:

Noisy texts, normalization of noisy texts, artificial abbreviation

Abstract

The process of gathering useful information from online messages has increased as more and more people use the Internet and other online applications such as Facebook and Twitter to communicate with each other. One of the problems in processing online messages is the high number of noisy texts that exist in these messages. Few studies have shown that the noisy texts decreased the result of text mining activities. On the other hand, very few works have investigated on the patterns of noisy texts that are created by Malaysians. In this study, a common noisy terms list and an artificial abbreviations list were created using specific rules and were utilized to select candidates of correct words for a noisy term. Later, the correct term was selected based on a bi-gram words index. The experiments used online messages that were created by the Malaysians. The result shows that normalization of noisy texts using artificial abbreviations list compliments the use of common noisy texts list.

 

Additional Files

Published

23-04-2013

How to Cite

Samsudin, N., Puteh, M., Hamdan, A. R., & Ahmad Nazri, M. Z. (2013). NORMALIZATION OF NOISY TEXTS IN MALAYSIAN ONLINE REVIEWS. Journal of Information and Communication Technology, 12, 147–159. Retrieved from https://e-journal.uum.edu.my/index.php/jict/article/view/8141

Most read articles by the same author(s)