Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition

Yalamanchili  Bhanusree; Samayamantula Srinivas  Kumar; Anne Koteswara  Rao

doi:10.32890/jict2023.22.1.3

Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition

Authors

Yalamanchili Bhanusree
Samayamantula Srinivas Kumar
Anne Koteswara Rao

DOI:

https://doi.org/10.32890/jict2023.22.1.3

Abstract

Speech Emotion Detection (SER) is a field of identifying human emotions from human speech utterances. Human speech utterances
are a combination of linguistic and non-linguistic information. Nonlinguistic SER provides a generalized solution in human–computer
interaction applications as it overcomes the language barrier. Machine learning and deep learning techniques were previously proposed for classifying emotions using handpicked features. To achieve effective and generalized SER, feature extraction can be performed using deep neural networks and ensemble learning for classification. The proposed model employed a time-distributed attention-layered convolution neural network (TDACNN) for extracting spatiotemporal features at the first stage and a random forest (RF) classifier, which is an ensemble classifier for efficient and generalized classification of emotions, at the second stage. The proposed model was implemented on the RAVDESS and IEMOCAP data corpora and compared with the CNN-SVM and CNN-RF models for SER. The TDACNN-RF model exhibited test classification accuracies of 92.19 percent and 90.27 percent on the RAVDESS and IEMOCAP data corpora, respectively. The experimental results proved that the proposed model is efficient in extracting spatiotemporal features from time-series speech signals and can classify emotions with good accuracy. The class confusion among the emotions was reduced for both data corpora, proving that the model achieved generalization.

Additional Files

Published

18-01-2023

Issue

Vol. 22 No. 1 (2023): Journal of Information and Communication Technology (JICT) Vol.22, No.1, January 2023

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition. (2023). Journal of Information and Communication Technology, 22(1), 49-76. https://doi.org/10.32890/jict2023.22.1.3

Download Citation

Publisher	UUM PRESS
ISSN	1675-414X
eISSN	2180-3862
Established	2001
DOI	10.32890/jict
Publishing Frequency	Quarterly (January, April, July and October)

Time-Distributed Attention-Layered Convolution Neural Network with Ensemble Learning using Random Forest Classifier for Speech Emotion Recognition

Authors

DOI:

Abstract

Additional Files

Published

Issue

Section

License

How to Cite

citescore

issn

Journal Information

indexing

new_side