PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING

An Adaptive Non-linear Activation Function for Deep Learning

Authors

  • Hock Hung Chieng Faculty of Information Technology and Computer Science, Universiti Tun Hussein Onn Malaysia, Malaysia
  • Noorhaniza Wahid Faculty of Information Technology and Computer Science, Universiti Tun Hussein Onn Malaysia, Malaysia
  • Pauline Ong Faculty of Mechanical and Manufacturing Engineering, Universiti Tun Hussein Onn Malaysia, Malaysia

DOI:

https://doi.org/10.32890/jict.20.1.2021.9267

Keywords:

Activation function, deep learning, Flatten-T Swish, non-linearity, ReLU

Abstract

Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN-5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.

Metrics

Metrics Loading ...

Author Biographies

Noorhaniza Wahid, Faculty of Information Technology and Computer Science, Universiti Tun Hussein Onn Malaysia, Malaysia

NOORHANIZA BINTI WAHID received her Ph.D. degree in Information Technology from the University of Sydney. She is currently an assosiate professor at Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia. Her research interests include multimedia, metaheuristic optimization algorithm and machine leraning.

Pauline Ong, Faculty of Mechanical and Manufacturing Engineering, Universiti Tun Hussein Onn Malaysia, Malaysia

Ong Pauline recieved the Ph.D degree in Applied Mathematics from Universiti Sains Malaysia. She is currently a associate professor at Faculty of Mechanical Engineering and Manufacturing, Universiti Tun Hussein Onn Malaysia. Her research interest include Artificial Intelligence, Artificial Neural Networks, Evolutionary Computing and Mathematical Modeling.

 

Additional Files

Published

04-11-2020

How to Cite

Chieng, H. H., Wahid, N., & Ong, P. (2020). PARAMETRIC FLATTEN-T SWISH: AN ADAPTIVE NONLINEAR ACTIVATION FUNCTION FOR DEEP LEARNING: An Adaptive Non-linear Activation Function for Deep Learning. Journal of Information and Communication Technology, 20(1), 21–39. https://doi.org/10.32890/jict.20.1.2021.9267