Online Machine Learning from Non-stationary Data Streams in the Presence of Concept Drift and Class Imbalance: A Systematic Review

Authors

  • Abdul Sattar Palli Computer and Information Science Department, University Technology Petronas, Malaysia and Anti-Narcotics Force, Ministry of Narcotics Control, Pakistan
  • Jafreezal Jaafar Computer and Information Science Department, University Technology Petronas, Malaysia and Center for Research in Data Science (CERDAS), University Technology Petronas, Malaysia
  • Abdul Rehman Gilal Knight Foundation School of Computing and Information Sciences, Florida International University, United States
  • Aeshah Alsughayyir College of Computer Science and Engineering, Taibah University, Saudi Arabia
  • Heitor Murilo Gomes School of Engineering and Computer Science, Victoria University of Wellington and AI Institute, University of Waikato, Wellington, New Zealand
  • Abdullah Alshanqiti Faculty of Computer and Information Systems, Islamic University of Madinah, Saudi Arabia
  • Mazni Omar School of Computing, Universiti Utara Malaysia, Malaysia

DOI:

https://doi.org/10.32890/jict2024.23.1.5

Keywords:

Concept Adaptation, Concept Drift, Class Imbalance, Data Streams, Non-stationary

Abstract

In IoT environment applications generate continuous non-stationary data streams with in-built problems of concept drift and class imbalance which cause classifier performance degradation. The imbalanced data affects the classifier during concept detection and concept adaptation. In general, for concept detection, a separate mechanism is added in parallel with the classifier to detect the concept drift called a drift detector. For concept adaptation, the classifier updates itself or trains a new classifier to replace the older one. In case, the data stream faces a class imbalance issue, the classifier may not properly adapt to the latest concept. In this survey, we study how the existing work addresses the issues of class imbalance and concept drift while learning from nonstationary
data streams. We further highlight the limitation of existing work and challenges caused by other factors of class imbalance along
with concept drift in data stream classification. Results of our survey found that, out of 1110 studies, by using our inclusion and exclusion criteria, we were able to narrow the pool of articles down to 35 that directly addressed our study objectives. The study found that issues such as multiple concept drift types, dynamic class imbalance ratio, and multi-class imbalance in presence of concept drift are still open for further research. We also observed that, while major research efforts have been dedicated to resolving concept drift and class imbalance, not much attention has been given to with-in-class imbalance, rear examples, and borderline instances when they exist with concept drift in multi-class data. This paper concludes with some suggested future directions.

Metrics

Metrics Loading ...

Additional Files

Published

30-01-2024

How to Cite

Palli, A. S., Jaafar, J., Gilal, A. R., Alsughayyir, A., Gomes, H. M., Alshanqiti, A., & Omar, M. (2024). Online Machine Learning from Non-stationary Data Streams in the Presence of Concept Drift and Class Imbalance: A Systematic Review. Journal of Information and Communication Technology, 23(1), 105–139. https://doi.org/10.32890/jict2024.23.1.5