ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA

Authors

  • Shamini Raja Kumaran Universiti Teknologi Malaysia
  • Mohd Shahizan Othman Faculty Engineering, School of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor Bahru,Malaysia
  • Lizawati Mi Yusuf Faculty Engineering, School of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor Bahru,Malaysia

DOI:

https://doi.org/10.32890/jict2020.19.4.1

Keywords:

Fuzzy C-means, majority vote, missing values, microarray data, data optimisation

Abstract

Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process in microarray technology. Missing value imputation methods may increase the classification accuracy. Although these methods might predict the values, classification accuracy rates prove the ability of the methods to identify the missing values in gene expression data. In this study, a novel method, Optimised Hybrid of Fuzzy C-Means and Majority Vote (opt-FCMMV), was proposed to identify the missing values in the data. Using the Majority Vote (MV) and optimisation through Particle Swarm Optimisation (PSO), this study predicted missing values in the data to form more informative and solid data. In order to verify the effectiveness of opt-FCMMV, several experiments were carried out on two publicly available microarray datasets (i.e. Ovary and Lung Cancer) under three missing value mechanisms with five different percentage values in the biomedical domain using Support Vector Machine (SVM) classifier. The experimental results showed that the proposed method functioned efficiently by showcasing the highest accuracy rate as compared to the one without imputations, with imputation by Fuzzy C-Means (FCM), and imputation by Fuzzy C-Means with Majority Vote (FCMMV). For example, the accuracy rates for Ovary Cancer data with 5% missing values were 64.0% for no imputation, 81.8% (FCM), 90.0% (FCMMV), and 93.7% (opt-FCMMV). Such an outcome indicates that the opt-FCMMV may also be applied in different domains in order to prepare the dataset for various data mining tasks.

 

Metrics

Metrics Loading ...

Author Biographies

Mohd Shahizan Othman, Faculty Engineering, School of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor Bahru,Malaysia

Mohd Shahizan Othman is a Senior Lecturer in the Department of Information Systems, Faculty of Computing, UTM. Nearly, 15 years of his service in an academic field and has been actively involved in teaching, research, publications, professional services, consulting, and administration. He is also been appointed for holding various administrative positions in faculty and university. Beginning on October 1, 2015 and up to now, he serves as Deputy Director (Computing), at the Centre for Information & Communication Technology (CICT), UTM. He has graduated Diploma in Computer Science and a Bachelor of Computer Science at the University of Technology Malaysia in 1995 and 1998. In 2000, he has continued his studies at Universiti Kebangsaan Malaysia and obtained Master of Information Technology (Computer Science) in 2001. He received his Ph.D. Information Science from Universiti Kebangsaan Malaysia in 2008. During his tenure as academic staff, he has taught more than 20 different course for undergraduate and postgraduate. His experience gained not only from the aspect of academic, but also from the aspect of maintenance, software development and information management. One of his main contributions toward an effective management in UTM is development of Graduate Studies Management System (GSMS) which aims to preserve the use of technology in the management of UTM post-graduate students. Prior to his position as Senior Lecturer, IT Manager at School of Graduate Studies (SPS, UTM) and Deputy Director (Computing) at CICT, UTM, he produced original works of ten books written either alone or jointly with other authors to be used during the process of teaching. These original books produced are his effort to add to the proceeds of the Universiti Teknologi Malaysia, in addition, to further enrich the reference materials in Malay Language. All books published are also suitable for teaching, learning and reference purposes for lecturers, students and general readers.

Lizawati Mi Yusuf, Faculty Engineering, School of Computing, Universiti Teknologi Malaysia, 81310, Skudai, Johor Bahru,Malaysia

Lizawati Mi Yusuf received her BSc in Computer Science with a major in Industrial Computing from Universiti Teknologi Malaysia (UTM), Malaysia, in 2000. Then she earned Msc in Information Technology from the Universiti Kebangsaan Malaysia (UKM), Malaysia. She is currently a lecturer at the School of Computing, Faculty Engineering, UTM. Her research interests are in optimization, web information extraction and retrieval, web data mining, machine learning, social learning, business intelligence, high performance computing, and numerical analysis.

Additional Files

Published

20-08-2020

How to Cite

Raja Kumaran, S., Othman, M. S., & Mi Yusuf, L. (2020). ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA. Journal of Information and Communication Technology, 19(4), 459–482. https://doi.org/10.32890/jict2020.19.4.1