NOISY IMAGE CLASSIFICATION USING HYBRID DEEP LEARNING METHODS

In real-world scenario, image classification models degrade in performance as the images are corrupted with noise, while these models are trained with preprocessed data. Although deep neural networks (DNNs) are found efficient for image classification due to their deep layer-wise design to emulate latent features from data, they suffer from the same noise issue. Noise in image is common phenomena in real life scenarios and a number of studies have been conducted in the previous couple of decades with the intention to overcome the effect of noise in the image data. The aim of this study was to investigate the DNN-based better noisy image classification system. At first, the autoencoder (AE)-based denoising techniques were considered to reconstruct native image from the input noisy image. Then, convolutional neural network (CNN) is employed to classify the reconstructed image; as CNN was a prominent DNN method with the ability to preserve better representation of the internal structure of the image data. In the denoising step, a variety of existing AEs, named denoising autoencoder (DAE), convolutional denoising autoencoder (CDAE) and denoising variational autoencoder Journal of ICT, 17, No. 2 (April) 2018, pp: 233–269 234 (DVAE) as well as two hybrid AEs (DAE-CDAE and DVAECDAE) were used. Therefore, this study considered five hybrid models for noisy image classification termed as: DAE-CNN, CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAECDAE-CNN. The proposed hybrid classifiers were validated by experimenting over two benchmark datasets (i.e. MNIST and CIFAR-10) after corrupting them with noises of various proportions. These methods outperformed some of the existing eminent methods attaining satisfactory recognition accuracy even when the images were corrupted with 50% noise though these models were trained with 20% noise in the image. Among the proposed methods, DVAE-CDAE-CNN was found to be better than the others while classifying massive noisy images, and DVAE-CNN was the most appropriate for regular noise. The main significance of this work is the employment of the hybrid model with the complementary strengths of AEs and CNN in noisy image classification. AEs in the hybrid models enhanced the proficiency of CNN to classify highly noisy data even though trained with low level noise.


INTRODUCTION
In recent years, deep learning approaches have been extensively studied for image classification and image processing tasks such as perceiving the underlying knowledge from images.Deep neural networks (DNN) utilize their deep layer-wise design to emulate latent features from data and thus pick up the possibility to appropriately classify patterns.Arigbabu et al. (2017) combined Laplacian filters over images with the Pyramid Histogram of Gradient (PHOG) shape descriptor (Bosch, et al., 2007) to extract face shape description.Later, they used the Support Vector Machine (SVM) (Cortes & Vapnik, 1995) for face recognition tasks.One progressive feature of extracting variants of DNNs, the convolutional neural network (CNN) (LeCun et al.,1998;Krizhevsky et al.,2012;Schmidhuber, 2015), has surpassed the vast majority of the image classification methods.Different research work outcomes boldly indicate that feature selection from deep learning with CNN should be the primary candidate in most of the image recognition tasks (Sharif et al. 2014).The convolution and the following pooling (Scherer et al., 2010) layers preserve the possession of the corresponding location of features and along these lines make the CNN empowered to preserve a better epitome of the input data.Current CNN works are concentrated on computer vision issues, for example 3D objects recognition, traffic signs and natural images classification (Huang and LeCun, 2006;Cireşan et al., 2011a;Cireşan et al., 2011b), image segmentation (Turaga et al., 2010), face detection (Matsugu et al., 2003), chest pathology identification (Bar et al., 2015), Magnetic Resonance Image (MRI) segmentation (Bezdek et al., 1993) and so on.However, the performance of deep CNN highly depends on the tremendous amount of pre-processed labeled data.Simonyan (2013) proposed an improved variant of the Fisher vector image encoding method and combined it with a CNN to develope a hybrid architecture that can classify images requiring a comparatively smaller computational cost than the traditional models, as well as assess the performance of the image classification pipeline with increased depth in layers.Some variants of deep models, named unsupervised deep networks, learn underlying representation from input images overcoming the necessity of these input data to be labeled.One traditional model of this type is stacked autoencoders (SAE) (Bourlard and Kamp, 1988;Bengio, 2009;Rumelhart, 1985) in which the basic architecture holds a stack of shallow encoders which enable them to learn features from the data by means of encoding the input data into a vector and then decoding this vector to its native representation.Shin et al. (2013) pertained the stacked sparse autoencoders (SSAEs) for medical image classification task and achieved notable promotion in classification accuracy.Norouzi et al. (2009) introduced the stacked convolutional restricted Boltzmann machine (SCRBM) which incorporates dimensional locality and also weight sharing by maintaining the stack of the convolutional restricted Boltzmann machine (CRBM) to build deep models.Lee et al. (2009) introduced convolutional deep belief network (CDBN), which places the CRBM in each layer instead of RBM unlike the deep belief network (DBN), and utilization convolution structure to join the layers and thus build hierarchical models.Contrasted with the conventional DBN, it preserves spatial locality and enhances the performance of feature representation (Hinton et al., 2006).With comparable thoughts, Zeiler et al. (2010Zeiler et al. ( , 2011) ) proposed a deconvolutional deep model in view of the conventional sparse coding technique (Olshausen and Field, 1997).The deconvolution operation depends on the convolutional deterioration of information under a sparsity imperative.It is a modification of the traditional sparse coding methods.Contrasted with sparse coding, it can learn better feature representation.
Data subjected to noise is a hinder once to the success of the deep networkbased image recognition systems in real world applications.Nonetheless, in most of the cases in real life scenarios, during transmission and acquisition, digital images are adulterated with noise resulting in degenerating the performance of image classification, medical image diagnosis, etc.One major issue originating from one of the intrinsic attributes of a DNN is its affectability to the input data.Because of being sensitive to little perturbance, DNNs may be misled and misclassify an image having a certain amount of imperceptible perturbation (Szegedy et al., 2013).As a result, when there is noise present in the input data, learned features by the DNN may not be vigorous.As examples, medical imaging techniques which are vulnerable to noise such as: MRI, X-rays, Computer Tomography (CT) can be considered (Sanches et al., 2008).Reasons fluctuate from the utilization of various image acquisition systems to endeavors at diminishing patients' introduction to radiation.As the measure of radiation is diminished, there is adulteration of the images with noise increments (Gondara, 2016;Agostinelli et al., 2013).A survey conducted by Lu and Weng (2007) investigated the image classification methods and suggested that image denoising prior to classification is efficient in case of remotely sensed data in a thematic map such as the geographical information system (GIS).Even if, the classifier is trained with noisy data, it does not show a much better performance in case of image classification.So, image denoising has become a compulsory requirement prior to feeding the image to the classifier in order to achieve a better classification result.
A notable number of researches have been directed over image denoising in the time period of the previous couple of years to make the deep learning-based image classification systems more compatible with practical applications.In the past, research in this field hasconducted where denoising was accomplished on the premise of the wavelet transformation technique (Coifman and Donoho, 1995), the partial differential equation-based methods (Perona and Malik, 1990;Rudin and Osher, 1994;Subakan et al., 2007), and in addition conveyed scant coding approaches (Elad and Aharon, 2006;Olshausen and Field, 1997;Mairal et al., 2009).Singh et al. (2014) proposed an efficient classification model for multi-class object images subject to Gaussian noise.They applied wavelet transform-based image denoising techniques by means of employing the NeighShrink thresholding over the wavelet coefficients to eliminate wavelet coefficients causing noise in the image and picking up only useful ones.with the intention to accomplish image denoising (Krizhevsky et al., 2012;Bengio et al., 2007;Glorot et al., 2011).Burger et al. (2012) demonstrated that similar execution to the previously described strategies can be accomplished by applying plain multi-layer perception (MLP).Jain et al. (2009) employed CNN to denoise images which performed superior to wavelets notwithstanding utilizing a smaller set of training images.An assortment of autoencoders (AEs) has been employed to denoise images and these techniques have definitely surpassed the conventional denoising methods as they are less restrictive for details of noise generative mechanisms (Cho, 2013;Vincent et al., 2008;Vincent et al., 2010).Vincent et al. (2008) introduced the denoising autoencoder (DAE) which figures out how to recreate local images from adulterated forms by injecting arbitrary noise into the images of the training set amid the learning period.These DAEs are stacked to develop a deep unsupervised learning network called stacked DAE (SDAE) for adapting profound depiction (Vincent et al., 2010).Xie et al. (2012) deployed a combination of sparse coding along with DAE for tasks of image denoising and blind inpainting.It was designed to work with images subject to white Gaussian noise and superimposed text.Cho (2013) employed Boltzmann machines as well SDAEs for image denoising tasks in case of high level of noise injected in the images.He employed three distinct depth settings (one, two and four layers) for both the SDAEs and the Boltzmann machines to evaluate the performance of noise omission.Agostinelli et al. (2013) introduced the adaptive multi-column DNN with a combination of multiplestacked sparse DAEs (SSDAE) that can denoise various types of noises in the images in a standalone manner.They computed optimal column weights using a nonlinear optimization program and later trained the individual networks to anticipate the optimal weights.One common disadvantage of these DAEbased models is that they learn the underlying hierarchical features from the image by reshaping the high dimensional data to vectors and thus discard the intrinsic structures of the images.
With the intention to solve this problem, Masci et al. (2011) proposed another variant of the autoencoder called convolutional AE (CAE) which trains itself for reconstructing images from the input image data in a convolutional manner.

Convolutional Neural Network (CNN) as Image Classifier
CNNs (LeCun et al., 1998) which are multiple-layered variants of artificial neural network (ANN) are well applied to classify images and perceive visual patterns straightforwardly from pixel images.In a CNN architecture, the information propagation throughout its multiple layers allows it to extract features from the perceived data at layers apiece by means of applying digital filtering techniques.CNNs perform on the basis of two main processes: convolution and subsampling.During the convolution process, a small-sized kernel is applied over input feature map (IFM) and produces a convolved feature map (CFM).The first set of CFMs are produced by applying the convolutional operation over the original input image.Here, a kernel is only an arrangement of weights and a bias.Every particular point in the CFM is gained by applying the same kernel over every small portion of the IFM, called a local receptive field (LRF).In this way, weights are shared among all positions throughout the convolutional process and spatial locality is preserved.The CFM computed from an IFM would be, where and represent the bias of the kernel activation function respectively, whereas the 2-D convolution is symbolized by *.Throughout all the experiments here, the scaled sigmoid activation function as well as a single bias is used for every latent map used.While particular kernels may create distinct CFMs from the same IFM operations of numerous kernels are formed to deliver CFMs for different IFMs.
In CNN, each convolutional layer is followed by a subsampling layer to simplify the feature map gained from the convolution operation.This simplification process is done by selecting significant features from a region and discarding the rest (Du et al., 2017).Among various sub-sampling methods, max-pooling (Scherer et al., 2010) was used throughout our experiments.It takes the maximum incentive over non-overlapping sub-locales and can be defined as: (2) where R and C denote size of the pooling area as R × C matrix and d denotes the subsampling operation on the pooling area.The size of SFM becomes half of the CFM if R × C is 2 × 2. In max-pooling, each point in the SFM is the maximum value computed from a particular 2 × 2 locale of the CFM (Akhand et al., 2016(Akhand et al., , 2017)).
In CNN, the series of convolution-subsampling operation is followed by a hidden layer and then an output layer sequentially.Where nodes of a hidden layer and output layers are fully connected there lies a linear representation of terminal SFM values as a hidden layer.The error in the classification task can be measured from: (3) where n is the product of the total number of patterns and the total number of output nodes in that particular classification task, every particular pattern , and denotes the desired output and obtained output respectively.The learning parameters are updated during backpropagation.Throughout our experiment, back-propagation (BP) (Liu et al., 2015;Bouvrie, 2006) was used for training the CNN.The CNN applied here in our experiment is demonstrated in Fig. 1.It consists of two convolutional layers (conv1 and conv2) and two subsampling layers (sub1 and sub2) each following a single convolutional layer.Throughout the experiments, the CNN used here was trained with noiseless raw images.

Denoising Autoencoder (DAE)
The DAE expands the conventional autoencoder alongside some stochastic augmentations keeping in mind the end goal to attain the ability to reproduce the native image from its noisy form (Vincent et al., 2008).This noise is usually included by physically utilizing deterministic distribution.The architecture of the DAE is demonstrated in Fig. 2. (2) (,) = (∑ ∑  (,) *  (+,+) For a given input , DAE adulterates x into with some random noise.It is added with a certain probability using a stochastic mapping. (3) The type of distribution is regulated by the distribution of the original input x and the kind of arbitrary noise added to it.In practical cases, binomial noise is used for black and white images, whereas for color images uncorrelated Gaussian noise is better suited.However, the zero masking (binomial) noise as well as Gaussian noise were applied throughout the experiments here.Then, was mapped to a underlying hidden representation y by means of a nonlinear deterministic function . (4) In the very same way as in the traditional autoencoder, this hidden representation then mapped to the reconstructed feature, z ∈ [0.1] dimension of by original input applying another nonlinear deterministic function . (5) The construction error was assessed by computing the mean squared error ∆ between input x and the reconstructed feature representation z.This is defined as: 7 Denoising Autoencoder (DAE) The DAE expands the conventional autoencoder alongside some stochastic augmentations keeping in mind the end goal to attain the ability to reproduce the native image from its noisy form (Vincent et al., 2008).This noise is usually included by physically utilizing deterministic distribution.The architecture of the DAE is demonstrated in Fig. 2.
For a given input , DAE adulterates into with some random noise.It is added with a certain probability using a stochastic mapping.
The type of distribution is regulated by the distribution of the original input and the kind of arbitrary noise added to it.In practical cases, binomial noise is used for black and white images, whereas for color images uncorrelated Gaussian noise is better suited.However, the zero masking (binomial) noise as well as Gaussian noise were applied throughout the experiments here.Then, was mapped to a underlying hidden representation by means of a nonlinear deterministic function .
In the very same way as in the traditional autoencoder, this hidden representation then mapped to the reconstructed feature, of by original input applying another nonlinear deterministic function .
The construction error was assessed by computing the mean squared error between input and the reconstructed feature representation .This is defined as:  The main aim of this reconstruction process is to minimize the construction error and this is done by optimizing the model parameters in such a way that: (7) For our experiment, the DAE was trained with images corrupted by 20% noise.

Convolutional Denoising Autoencoder (CDAE)
The fundamental contrast between CDAE (Masci et al., 2011) and conventional autoencoders is unlike others.CDAE shares weights among all positions in the input and consequently it conserves spatial locality.Subsequently, the consequent reconstruction process is finished by a linear combination of allimportant IMAGE PATCHES on the premise of the latent code.For a single channel input x the latent representation of the k th feature map would be: ∫   (, ) = ∫   (|)() ( 11) 12) 8 The main aim of this reconstruction process is to minimize the construction error and this is done by optimizing the model parameters in such a way that: For our experiment, the DAE was trained with images corrupted by 20% noise.

Convolutional Denoising Autoencoder (CDAE)
The fundamental contrast between CDAE (Masci et al., 2011) and conventional autoencoders is unlike others.CDAE shares weights among all positions in the input and consequently it conserves spatial locality.Subsequently, the consequent reconstruction process is finished by a linear combination of allimportant IMAGE PATCHES on the premise of the latent code.For a single channel input the latent representation of the k th feature map would be: where denotes the bias, represents the activation function and the 2-D convolution is symbolized by .The scaled hyperbolic tangent activation function and a single bias were used for every latent map during the experiments.The reconstruction was achieved by applying:  As in the previous step, for every input channel, one bias was also used here also.H denotes the group of underlying feature maps, the flip operation over one and the other dimensions of the weights are identified by .The error function used here is defined as: The gradient of this error function is computed during the backpropagation (Liu et al., 2015;Bouvrie, 2006) step.The overall architectural description is illustrated in Fig. 3.The convolution operation employed here were uniform to the convolution operation depicted in the CNN section.Amid the training period the native image was utilized as the output label with a specific end goal to update the kernel weights and different parameters so that in times of testing the CDAE could reproduce a noise-omitted picture given a noise-injected one.In this experiment, the CDAE was trained with 20% noisy images.

Denoising Variational Autoencoder
The denoising variational autoencoder (DVAE) (Ciresan et al., 2011c;Kingma and Welling, 2013), a modern variant of AE, is a deep directed graphical model that interprets the output of the encoder by means of variational inference.There are basically three components as the building block of a DVAE: an encoder, the following decoder and finally a loss function.The structure of the DVAE used all through this experiment is demonstrated in Fig. 4.Both the encoder and the decoder can be any variant of the neural network.It computes probability distribution and thus finds out the probability distribution of data x by employing the following equation: As in the previous step, for every input channel, one bias was also used here also.denotes the gr of underlying feature maps, the flip operation over one and the other dimensions of the weights identified by .The error function used here is defined as: The gradient of this error function is computed during the backpropagation (Liu et al., 2015;Bou 2006) step.The overall architectural description is illustrated in Fig. 3.The convolution opera employed here were uniform to the convolution operation depicted in the CNN section.Amid the trai period the native image was utilized as the output label with a specific end goal to update the ke weights and different parameters so that in times of testing the CDAE could reproduce a noise-om picture given a noise-injected one.In this experiment, the CDAE was trained with 20% noisy images.

Denoising Variational Autoencoder
The denoising variational autoencoder (DVAE) (Ciresan et al., 2011c;Kingma and Welling, 2013 modern variant of AE, is a deep directed graphical model that interprets the output of the encode means of variational inference.There are basically three components as the building block of a DVAE encoder, the following decoder and finally a loss function.The structure of the DVAE used all thro this experiment is demonstrated in Fig. 4.Both the encoder and the decoder can be any variant of neural network.It computes probability distribution and thus finds out the probab distribution of data by employing the following equation: () = ∫   (, ) = ∫   (|)() ( 11) Journal of ICT, 17, No. 2 (April) 2018, pp: 233-269 244 (11) where denotes the weights and biases of the decoder, is the probability distribution of the latent variable y which is often the standard normal distribution (0, I), and is the decoder's output under noise rumination in terms of probability distribution of the reconstructed data given latent features.
The encoder neural network takes data point x as input and translates it to a hidden representation y which has significantly less dimension than x.As the encoder learns to compress the data into a significantly stochastic less dimensional space, it produces output parameters which is a Gaussian probability density .represents the weights and biases of the encoder.This posterior is the uncorrelated multivariate normal determined by the encoder: (12) where represents the standard normal, and σ denote the mean and the standard deviation respectively.The decoder neural network takes the latent feature representation y as input and its outputs are the parameters to the probability distribution of the data .As the decoder tries to reconstruct from the real-valued numbers in y with less dimensionality to real-valued numbers in x of higher dimensionality, some information may be lost.This reconstruction loss is calculated using log-likelihood .
Unlike other conventional autoencoders, the loss function used in DVAE is the negative log-likelihood with an additional regularizer.As all the data points do not share global representation, the loss function is decomposed into just terms that rely on a single data point.The loss function for a single data point x i is computed by: (13) Thus, for total data points the overall loss would be: (14) This DVAE is trained to reconstruct native images from their 20% noisy form.

Hybrid Model 1: DAE-CNN Architecture
The proposed DAE-CNN is a supervised deep network designed in order to perform image classification regardless of the possibility of they being noisy.
With layer-wise training, the whole architecture of the DAE-CNN is optimized.Fig. 5 shows the all-inclusive architecture of the proposed DAE-CNN model.This model is a fusion of DAE and a two-layered CNN.In the first place, the noisy image is refined by the DAE, and afterward the reconstructed image is fed to the accompanying CNN.DAE filters the noises from the input images via the reconstruction process.All the encoder and decoder parameters (the input-hidden and the hidden-output weights) are initialized by the weights of the DAE trained before (discussed in the DAE section).The following CNN is designed with two convolution-subsampling layers; at first, a following dense layer and finally an output layer.All the parameters of the CNN (the Thus, for total data points the overall loss would be: This DVAE is trained to reconstruct native images from their 20% noisy form.

Proposed Hybrid Models for Noisy Image Classification
This section explains the proposed hybrid models DAE-CNN, CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN for noisy image classification.The common feature of all these models is that a CNN is used as a classifier which takes denoised image (i.e.reconstructed) from the prior AE of a particular model.Conventional AE(s) of a model are trained individually with regular noise and CNN is trained with noise-free image.Finally, AE(s) and CNN are cascaded to form a particular hybrid model and no further training is performed.The following subsections explain the architectural description as well as the working procedures of each individual model.

Hybrid Model 1: DAE-CNN Architecture
The proposed DAE-CNN is a supervised deep network designed in order to perform image classification regardless of the possibility of they being noisy.With layer-wise training, the whole architecture of the DAE-CNN is optimized.Fig. 5 shows the all-inclusive architecture of the proposed DAE-CNN model.This model is a fusion of DAE and a two-layered CNN.In the first place, the noisy image is refined by the DAE, and afterward the reconstructed image is fed to the accompanying CNN.DAE filters the noises from the input images via the reconstruction process.All the encoder and decoder parameters (the inputhidden and the hidden-output weights) are initialized by the weights of the DAE trained before (discussed in the DAE section).The following CNN is designed with two convolution-subsampling layers; at first, a following dense layer and finally an output layer.All the parameters of the CNN (the hidden-output weights, local averaging parameters, and kernels) are set to the corresponding parameters used in the pretrained CNN as discussed in the CNN section.In the end, only via a forward pass, this architecture does hidden-output weights, local averaging parameters, and kernels) are set to the corresponding parameters used in the pre-trained CNN as discussed in the CNN section.In the end, only via a forward pass, this architecture does the noisy image classification task.model.All the weights between the input and hidden layers as well as between the hidden and output layers of the DVAE are initialized with the corresponding weights of the pretrained DVAE (discussed in the DVAE section).After this, the classifier CNN is initialized containing two convolution-subsampling layers, a dense layer and in the end, an output layer.All the parameters of this CNN are initialized with the ones of the very same parameters used in the pretrained CNN (discussed in the CNN section).A simple forward pass would then employ DVAE-CNN in the classification task.

PERFORMANCE EVALUATION
This section investigates the performances of the proposed hybrid models on the benchmark datasets of two different categories: MNIST numeral images and CIFAR-10 object images.This section first describes the datasets and the experimental setups used to work over these datasets.Experiments were conducted at different noise levels and the proficiency of the models were compared against existing models.These models were implemented in Matlab R2015a.The performance analysis was conducted on MacBook Pro Laptop (CPU: Intel Core i5 @ 2.70 GHz and RAM: 8.00 GB) in OS-X Yosemite environment.

Data Description
Image data corrupted with noise to occurs while dealing with real life practical applications.Even when a well-established system is employed on real-life data that system might fail only because of the inappropriateness of the data.Therefore, it is highly required to preprocess those image data prior to applying them in the practical application plot.With the intention to cope with this type of scenario, and at the same time to show the significance of the proposed models we considered two benchmark datasets: MNIST (LeCun et al. 2010) andCIFAR-10 (Coates et al. 2011), in this study.A large number 15

PERFORMANCE EVALUATION
This section investigates the performances of the proposed hybrid models on the benchmark datasets of two different categories: MNIST numeral images and CIFAR-10 object images.This section first describes the datasets and the experimental setups used to work over these datasets.Experiments were conducted at different noise levels and the proficiency of the models were compared against existing models.These models were implemented in Matlab R2015a.The performance analysis was conducted on MacBook Pro Laptop (CPU: Intel Core i5 @ 2.70 GHz and RAM: 8.00 GB) in OS-X Yosemite environment.

Data Description
Image data corrupted with noise to occurs while dealing with real life practical applications.Even when a well-established system is employed on real-life data that system might fail only because of the inappropriateness of the data.Therefore, it is highly required to preprocess those image data prior to applying them in the practical application plot.With the intention to cope with this type of scenario, and at the same time to show the significance of the proposed models we considered two benchmark datasets: MNIST (LeCun et al. 2010) andCIFAR-10 (Coates et al. 2011), in this study.A large number of recent studies utilized these two datasets considering the image data as a source (LeCun et al., 1998;Vincent et al., 2008;Vincent et al., 2010;Masci et al., 2011).

MNIST Dataset:
The dataset contains 70000 28x28-sized sample images with a large variety of distinct numeral images from various individuals rehearsing distinctive individual writing patterns.The images of recent studies utilized these two datasets considering the image data as a source (LeCun et al., 1998;Vincent et al., 2008;Vincent et al., 2010;Masci et al., 2011).

Experimental Setup
We experimented the proposed hybrid models over MNIST and grey-scaled CIFAR-10 datasets.Images in these two datasets were different in size.
MNIST contained images of size 28x28 whereas images in CIFAR-10 were of size 32x32 forcing us to apply different architecture for the proposed models.This section describes the actual architectural setup used to work with MNIST and CIFAR-10 datasets.
A uniform experimental environment was set up for fair investigation among the proposed and the existing methods.As the images from the dataset were of size 28×28 (MNIST) and 32×32 (CIFAR-10), each of these classifiers

Experimental Setup
We experimented the proposed hybrid models over MNIST and grey-scaled CIFAR-10 datasets.Images in these two datasets were different in size.MNIST contained images of size 28x28 whereas images in CIFAR-10 were of size 32x32 forcing us to apply different architecture for the proposed models.This section describes the actual architectural setup used to work with MNIST and CIFAR-10 datasets.
A uniform experimental environment was set up for fair investigation among the proposed and the existing methods.As the images from the dataset were of size 28×28 (MNIST) and 32×32 (CIFAR-10), each of these classifiers had 784 (and 1024) input units so as to take the linearized version of the data.As had 784 (and 1024) input units so as to take the linearized version of the data.As the data was divided into 10 classes, each of these classifiers had 10 units in the output layer.The intermediate portion of each of the network varied based on its architecture.DAE and DVAE had hidden layer size of 500 (and 700).Additionally, DVAE had an additional latent representation layer of size two.On the other hand, CDAE had kernel size of 5×5 and a subsample window of 2×2 local averaging area.Throughout the experiments, a two-layered CNN was used with all of the AEs (conventional and hybrid) having two convolution-subsampling layers.For both convolutional layers, the kernel size remained fixed and was 5 × 5 , in both subsampling layers; the size of the pooling area was 2 × 2.
Due to the large-sized training set, batch-wise training was performed; and all of the experiments were conducted with a fixed batch size of 50.Weights of each of these networks were updated once for a batch of image patterns and batch size (BS), i.e. the number of patterns in a batch, was considered as a user-defined parameter in such a way that the total training patterns were completely divisible by the BS value.For the experiments, the learning rate (i.e.eta) values were varied in the range of 0.1 to 1.0.

EXPERIMENTAL RESULTS AND ANALYSIS
As these models were validated against two datasets, the experimental results and analysis are presented in two different subsections.

Result on MNIST Dataset
This section illustrates the performance of the proposed models over the MNIST data set with noise of different proportions injected in it.Fig. 11 delineates the result of the noise removal step utilizing DAE, CDAE, DAE-CDAE, DVAE-CNN, DVAE-CDAE-CNN separately on 50% noisy image data and these reconstructed images were fed to the following CNN classifier.Initially, the images in the dataset were pre-processed and without any additional noise added in it.With a specific end goal to assess the performance of these proposed hybrid classifiers on noisy images, noise was added manually to the images in the dataset.Zero masking noise was used for conducting the experiments in which an arbitrary matrix with the equal size of training image data was initialized where some of the pixels being arbitrarily OFF having the probability of 20% for both training and test cases and then 50% only for the test case.It can be clearly seen that the reconstructed image from DAE-CDAE and DVAE-CDAE were much better than the reconstructed ones from the standalone AEs.Because of DVAE using the variational upper bound on the log-likelihood of the data as loss function rather than normal reconstruction error as DAE, it produces better representation of the reconstructed images than DAE.However, DVAE produced a little blurry image but it kept the shapes of the objects more accurate than DAE.So, when a CDAE was used in a cascaded manner after DVAE this blurriness also got omitted resulting in the DVAE-CDAE architecture to output better reconstructed images than DAE-CDAE in terms of 50% noisy input images.The test set classification performance of all five models proposed in this study along with a simple CNN for both scenarios when the images were corrupted with 20% noise as well as 50% noise are portrayed in Fig. 12.
The classification accuracy notes up to 400 interactions and Fig. 12 data set, a larger number of pixels are found to be forcefully turned ON/OFF for 50% noisy images.That's why, whenever the frontier DVAE trained to work with 20% noisy images is fed with 50% noisy images it forces a portion of the turned OFF pixels due to zero mask noise to get turned ON.As the following CDAE works with this intermediate image it reconstructs other affected pixels completely making the classification task for the CNN easier.
For the very same reasons, the DAE-CDAE-CNN architecture performs better than DAE-CNN, CDAE-CNN, DVAE-CNN architectures and achieves classification accuracy of 96.34%.The 50% noisy image recognition accuracy obtained by DAE-CNN, CDA-CNN, DVAE-CNN are 95.01%,94.22% and 95.63% respectively.When these 50% noisy data are fed to the CNN classifier without any additional denoising and reconstruction process, the performance shown by the simple CNN is the worst and attains the lowest classification accuracy (85.15%) compared to the other models.Fig. 12(b) supports the fact that, whenever each of these hybrid supervised classifiers give more than 95% accuracy with just 50 iterations, a simple CNN's accuracy was less than 85% at that time.In times of few initial iterations, the test set accuracy was lower compared to later iterations.This incident was not unexpected as these samples were not conspicuous by these hybrid networks during training period.Still, the classification accuracy improved significantly for test image sets quickly at a lower number of iterations (e.g. up to 100).DAE-CNN and CDAE-CNN, it is obviously noticeable from the table that they most exceedingly horrendously performed for the numeral "5" and out of 1000 test cases, they classified accurately 957 and 976 cases respectively.The DVAE-CNN classifier performed worst while classifying "5" as well.Still, it performed better than all the other models.For numeral "0" it showed the best classification result.In 999 cases, out of 1000 cases it classified "0" correctly.The DAE-CDAE-CNN and DVAE-CDAE-CNN architecture also misclassified the same digit 62 and 60 times respectivelyly.These perplexities in a couple of manually written numeral images are a result of various handwriting styles of individuals, and furthermore, the arbitrary noise injected in the images slightly misconstrue the patterns with each other. of the cases the occurrences of other numerals misclassified as "0" were few.For all models, the error that occurred mostly were for numerals "8", "9" , "5" and "3" in the descending order.As this time test set images were corrupted with 50% noises, the shapes of the handwritten numerals were almost totally distorted resulting in less accurate classification performances for the hybrid models than in the case of 20% noisy images because when the images were adulterated with 50% noise it was quite difficult to recognize them even with clear eyes.Table 2 demonstrates some handwritten numeral images and their corresponding class labels in the original as well as in the reconstructed form.It is clearly seen that the first image was classified correctly as "2" when reconstructed with DAE, CDAE and DVAE, but the reconstruction using DAE-CDAE as well as DVAE-CDAE distorted the pattern, thereby causing the CNN to classify it as "8".Numeral "4" was classified correctly when reconstructed using CDAE and DVAE, but misclassified as "6" in the case of DAE-CNN and "8" by DAE-CDAE-CNN as well as DVAE-CDAE-CNN.However, the third pattern was classified correctly using all the models expect CDAE-CNN which misclassified it as "5".On the other hand, the fourth pattern from the table was misclassified by all of the networks.It is important to that all of these patterns are pretty difficult to identify even by humans because of the diverse writing styles of different persons and adding noise with these ambiguous patterns makes their classification even more difficult.

. Sample Handwritten Numeral Images along with their Original and Predicted Class Labels
Table 2 demonstrates some handwritten numeral images and their corresponding class labels in original as well as in the reconstructed form.It is clearly seen that the first image was classified corr as "2" when reconstructed with DAE, CDAE and DVAE, but the reconstruction using DAE-CDA well as DVAE-CDAE distorted the pattern, thereby causing the CNN to classify it as "8".Numera was classified correctly when reconstructed using CDAE and DVAE, but misclassified as "6" in the of DAE-CNN and "8" by DAE-CDAE-CNN as well as DVAE-CDAE-CNN.However, the third pa was classified correctly using all the models expect CDAE-CNN which misclassified it as "5".On other hand, the fourth pattern from the table was misclassified by all of the networks.It is importa that all of these patterns are pretty difficult to identify even by humans because of the diverse wr styles of different persons and adding noise with these ambiguous patterns makes their classification more difficult.Table

Sample Handwritten Numeral Images along with their Original and Predicted Class Labels
Table 2 demonstrates some handwritten numeral images and their corresponding class labels in original as well as in the reconstructed form.It is clearly seen that the first image was classified corr as "2" when reconstructed with DAE, CDAE and DVAE, but the reconstruction using DAE-CDA well as DVAE-CDAE distorted the pattern, thereby causing the CNN to classify it as "8".Numera was classified correctly when reconstructed using CDAE and DVAE, but misclassified as "6" in the of DAE-CNN and "8" by DAE-CDAE-CNN as well as DVAE-CDAE-CNN.However, the third pa was classified correctly using all the models expect CDAE-CNN which misclassified it as "5".On other hand, the fourth pattern from the table was misclassified by all of the networks.It is importa that all of these patterns are pretty difficult to identify even by humans because of the diverse wr styles of different persons and adding noise with these ambiguous patterns makes their classification more difficult.Table

Sample Handwritten Numeral Images along with their Original and Predicted Class Labels
Table 2 demonstrates some handwritten numeral images and their corresponding class labels in original as well as in the reconstructed form.It is clearly seen that the first image was classified corr as "2" when reconstructed with DAE, CDAE and DVAE, but the reconstruction using DAE-CDA well as DVAE-CDAE distorted the pattern, thereby causing the CNN to classify it as "8".Numera was classified correctly when reconstructed using CDAE and DVAE, but misclassified as "6" in the of DAE-CNN and "8" by DAE-CDAE-CNN as well as DVAE-CDAE-CNN.However, the third pa was classified correctly using all the models expect CDAE-CNN which misclassified it as "5".On other hand, the fourth pattern from the table was misclassified by all of the networks.It is importa that all of these patterns are pretty difficult to identify even by humans because of the diverse wr styles of different persons and adding noise with these ambiguous patterns makes their classification more difficult.Table

Sample Handwritten Numeral Images along with their Original and Predicted Class Labels
Table 2 demonstrates some handwritten numeral images and their corresponding class labels in original as well as in the reconstructed form.It is clearly seen that the first image was classified corr as "2" when reconstructed with DAE, CDAE and DVAE, but the reconstruction using DAE-CDA well as DVAE-CDAE distorted the pattern, thereby causing the CNN to classify it as "8".Numera was classified correctly when reconstructed using CDAE and DVAE, but misclassified as "6" in the of DAE-CNN and "8" by DAE-CDAE-CNN as well as DVAE-CDAE-CNN.However, the third pa was classified correctly using all the models expect CDAE-CNN which misclassified it as "5".On other hand, the fourth pattern from the table was misclassified by all of the networks.It is importa that all of these patterns are pretty difficult to identify even by humans because of the diverse wr styles of different persons and adding noise with these ambiguous patterns makes their classification more difficult.

Classification Result on CIFAR-10 Dataset
In this section, the proposed methods were tested over the CIFAR-10 data set.From the grey-scaled CIFAR-10 dataset, 50000 sample images were considered for training purpose and 10000 for testing.Sample images were uniformly distributed over the elementary 10 classes.Noiseless initial data were adulterated with 20% Gaussian noise for the training of the autoencoders, whereas 50% noise was injected for the testing purpose only with the intention to check the performances of the proposed models in the noisy environment.Fig. 13 displays some reconstructed images after the noise removal steps using DAE, CDAE, DVAE, DAE-CDAE, DVAE-CDAE respectively in case the images were corrupted with 50% noise.Without any doubt, the DVAE-CDAE provides the best reconstruction in the case of 50% noisy images and the reconstructed images displayed in the figure show the evidence of the statement.Figure 14 shows the noisy-image classification accuracy of the different proposed models along with a simple CNN in case the images were corrupted with 20% noise as well as 50% noise.The reported values were captured up to 400 iterations.Figure 14 respectively.Fig. 14 Table 5 demonstrates some sample images from the CIFAR-10 datasets and their corresponding class labels in the original as well as in the reconstructed form.The first image was of "Airplane".All the models misclassified it as "Bird".All the models except the DAE-CNN classified the second image correctly as "Horse", whereas DAE-CNN classified it as "Dog".The third image of a "Ship" was classified accurately by all the models.The last image was of "Truck".Only DAE-CDAE-CNN and DVAE-CDAE-CNN classified it correctly.Table 5 Sample Objects from CIFAR-10 Dataset along with their Original and Predicted Class Labels

Actual label
Classified with hybrid methods     table, it is clearly visible that our proposed models outperform some of the existing prominent the case of classifying noisy images, especially when the images were subject to massiv Moreover, the classifier was the CNN.All the autoencoders and their hybrid models served on

Significance of the Proposed Hybrid Models
There are several significant differences between the proposed hybrid methods and the existing ones in terms of noisy-image classification.Conventional models train AEs for denoising in a stacked or standalone way, whereas AEs are trained independently and then cascaded for denoising in any of the proposed hybrid models.The proposed methods use CNN as a classifier rather than MLP or other classifiers as CNN performs well for image classification.
The experimental results on benchmark datasets revealed the effectiveness of the proposed hybrid models for both regular and massive noise.
Deep learning-based models have the dependency over training data; therefore, existing models perform well only when they work with the images corrupted with the very same proportion of noise as in the training data and performance degrades when noise level increases.Our proposed hybrid models DAE-CDAE-CNN and DVAE-CDAE-CNN have overcome this problem.Both the architectures are very good at classifying images injected with massive noisy data even if they are trained with images corrupted with regular noise.The underlying cascaded structures of these two models make it possible for them to perform well in this case.These two models use two AEs as image denoiser and both the AEs are trained to reconstruct native images from images subject to the same level of regular noises.So, in both cases, the frontier AE omits a proportion of noise from the input image and the reconstructed image is passed to the following AE for further filtering.As a result, whenever the percentage of noise is massive in the input images these two noisy-image classifiers perform better than other models.
On the other hand, the proposed DAE-CNN, CDAE-CNN and DVAE-CNN models performed well for regular noise.As single AEs in these three models are trained with regular level noisy images, their standalone structure is sufficient to denoise regular noisy images which are later easy to classify with CNN.

CONCLUSION
Conventional image classifiers perform really well with preprocessed data generated in the laboratory.But when they are employed to classify real world data, most often these images are corrupted with noise during acquisition and transmission.As a result, there is a high chance that they would fail drastically when applied in real life tasks.The solution to this problem is to denoise the images prior to feeding to the classifier.This research work proposed five supervised deep architectures named DAE-CNN, CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN among which the first three perform well when the images are subjected to a small amount of noise, whereas, the last two are for classifying massive noisy data.These models utilize the ideas of various autoencoders and along with CNN construct classifiers for noisy image data.These deep models have the ability to filter noise from the image data and classify them by learning latent feature representations from them.These models' classification accuracy over MNIST and CIFAR-10 datasets (corrupted with noise of different proportions) gives evidence that they have the capability to learn hierarchical representations of the images.Still, there are scopes for further developments in future.The different hybrid models proposed here are good with a different level of noises.Our future research work would focus on building a standalone model using these techniques that would be able to classify images adulterated with any proportion of noise.
Figure 2. DAE architecture for image denoising.

Figure 6 .
Figure 6.CDAE-CNN architecture for noisy-image classification.The DVAE-CNN architecture incorporates one image reconstructor and a following classifier like DAE-CNN and CDAE-CNN architecture.In this model DVAE serves as the noise filter as well as the image reconstructor.At first the noisy image is fed to the DVAE.Like DAE and CDAE it also reconstructs noise-free native images from the noisy input images but in a variational inference manner.Moreover, it uses an additional regularizer along with the negative log-likelihood which is common in all other traditional autoencoders.The following two-layered CNN takes this reconstructed and less noisy image as input and classifies it.The inclusive architecture is optimized via layer-wise training.Fig.7gives a proper demonstration of this model.All the weights between the input and hidden layers as well as between the hidden and output layers of the DVAE are initialized with the corresponding weights of the pretrained DVAE (discussed in the DVAE section).After this, the classifier CNN is initialized containing two convolution-subsampling layers, a dense layer and in the end, an output layer.All the parameters of this CNN are initialized with the ones of the very same parameters used in the pre-trained CNN (discussed in the CNN section).A simple forward pass would then employ DVAE-CNN in the classification task.

Figure 9 .
Figure 9. DVAE-CDAE-CNN architecture for recreating native images from debased configurations of them because of noise.

Figure 9 .
Figure 9. DVAE-CDAE-CNN architecture for recreating native images from debased configurations of them because of noise.
Figure 10.Samples of some images from MNIST and CIFAR-10 datasets.

Figure 11 .
Figure 11.Sample of original images from MNIST dataset with and without noise and their reconstruction using different AEs.

Figure 11 .
Figure 11.Sample of original images from MNIST dataset with and without noise and their reconstruction using different AEs.

Figure 12 .
Figure 12.Test set recognition accuracy over MNIST dataset with batch size 50 and learning rate 1.0 for different networks.

Figure 13 .
Figure 13.Sample of original images from CIFAR -10 dataset with and without noise and their reconstruction using AEs.

Figure 14
Figure14shows the noisy-image classification accuracy of the different proposed models along with a simple CNN in case the images were corrupted with 20% noise as well as 50% noise.The reported values were captured up to 400 iterations.Figure14(a) depicts the fact that DVAE-CNN performs the best with 20% noisy images achieving an accuracy of 62.8%.The classification performance of DAE-CNN, CDAE-CNN, DAE-CDAE-CNN, DVAE-CDAE-CNN and the simple CNN are 62.37%, 62.69%, 61.88%, 61.93% and 62.04% 50% noise.Without any doubt, the DVAE-CDAE provides the best reconstruction in the case of 50% noisy images and the reconstructed images displayed in the figure show the evidence of the statement.

Figure 13 .
Figure14shows the noisy-image classification accuracy of the different proposed models along with a simple CNN in case the images were corrupted with 20% noise as well as 50% noise.The reported values were captured up to 400 iterations.Figure14(a) depicts the fact that DVAE-CNN performs the best with 20% noisy images achieving an accuracy of 62.8%.The classification performance of DAE-CNN, CDAE-CNN, DAE-CDAE-CNN, DVAE-CDAE-CNN and the simple CNN are 62.37%, 62.69%, 61.88%, 61.93% and 62.04% respectively.Fig. 14(b) shows the classification accuracy of the proposed models in the case of classifying 50% noisy image data up to 400 epochs.This time, the DVAE-CDAE-CNN architecture achieved the first place with 53.91% accuracy.The DAE-CNN, CDAE-CNN, DVAE-CNN and DAE-CDAE-CNN showed a reasonable classification accuracy (52.95%, 52.5%, 52.64% and 53.63% respectively.).It is clearly visible that the classification accuracy of these models degraded while working with the CIFAR-10 dataset compared to the performance of the models over the MNIST data set.The main reason behind this issue was that in the tiny pictures in the CIFAR-10 data set (32x32 sized) do not give a clear representation of the objects in the image within such a small region.Moreover, the
Figure 14.Test set recognition accuracy over CIFAR-10 dataset with batch size 50 and learning rate 1.0 for different networks.
Kingma and Welling (2014) structural information.Xu et al. (2014)developed a deep CNN that can figure out the characteristics of blur degradation from an image.Gondara (2016)employed DAEs constructed with convolutional layers for denoising medical images.Du et al (2017)proposed stacked convolutional denoising autoencoders (SCDAE) by stacking DAEs in a convolutional way where each layer produces high dimensional feature maps by means of convolving features of the previous layer trained by a DAE.Recently,Kingma and Welling (2014)introduced the variational autoencoder (VAE), a hybrid of deep learning model along with variational inference that has prompted remarkable advances in generative modelling.The loss function used for training VAE is calculated by a variational upper bound on the loglikelihood of the data.It can figure out and preserve shape variability beyond the image set as well as reconstruct images given the manifold coordinates.Real world image classification tasks suffer from noise and other imperfections existing in the image data.So, denoising images prior to classification is compulsory.Noisy image classification tasks incorporate two steps, i.e. image denoising and image classification.This section first explains some conventional models for image denoising based on AEs as well as image classification with CNN.Then it presents the proposed hybrid methods consisting different cascaded AEs plus CNN.
Im et al. (2017)forces the adjacent CAEs to learn the innate structure of the input image throughout the series of convolution and pooling operations.The kernels and other learning parameters of each layer are updated by backpropagation to convolve the feature maps of the input images into more abstract features of each layer.Compared to previously specified AEs it has proved its capability Unlike other deterministic models, it is a probabilistic generative model which is trained all through with stochastic gradient descent.Unlike DAE that corrupts the input images by adding noise at the input level and later learns to reconstruct the clear image, VAE learns with noise added in its stochastic hidden layer.Im et al. (2017)proposed that adding noise in not only the stochastic hidden layer but also in the input layer is beneficial and empowers the VAE to perform image denoising tasks.They proposed a modified training criterion for denoising variational autoencoders (DVAE) that resemble a tractable bound, in case the input image is adulterated with noise.The intention of this work was to build a few supervised image classifiers that can demonstrate better classification results across a noisy image set; thereby, contemplating DAE, CDAE, DVAE and proposing some hybrid models utilizing CNN along with these AEs.Initially a DAE, a CDAE and a DVAE were trained with image data subject to lower regular noise level so that they could omit noise from the input images and reconstruct a native form of it.To counter the massive noisy images, two hybrid structures (i.e.DAE-CDAE and DVAE-CDAE) were further investigated where for each of them two AEs were deployed in a cascaded manner.The reconstructed images from these AEs were fed to a following CNN for classification, where the CNN is trained with raw images having zero percent noise injected into it.Theclassificationperformance of this CNN is solely dependent on the quality of the reconstructed images from the conventional as well the hybrid AE structures.The DAE-CDAE-CNN as well as DVAE-CDAE-CNN models can work better with massive noisy images because of their cascaded architectures and thus omits the requirement of training with images corrupted by noise of different levels.HYBRID DEEP LEARNING-BASED NOISYIMAGE CLASSIFICATION

Table 1
Table 1 details the classification of each class individually by all the proposed hybrid models for test set samples after fixed 400 epochs with 20% noise.For Classification Performance of the Hybrid Models in Case of Individual Objects from MNIST Dataset In any case, the proposed models have accomplished best classification for "0" by classifying it correctly 995, 997, 999, 997, 997 cases out of 1000 experiments for DAE-CNN, CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN individually.Among the majority of the cases DVAE-CNN accomplished a decent noisy image classification task misclassifying just 116 cases though DAE-CNN, CDAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN misclassified 199, 131, 260 and 257 cases respectively.In the case of 50% noisy images, the worst case occurred with all the models while classifying numeral character "5".

Table 2
Sample Handwritten Numeral Images along with their Original and Predicted Class Labels

Table 2
Table 3 portrays the consequences of the proposed techniques with different prominent works.It, moreover, displays specific feature(s) of individual procedures.It is striking that the proposed models did not use any feature extraction procedure while the vast majority of the current techniques use possibly more than one or maybe a couple of feature extraction techniques.Without utilizing any extra technique for feature extraction, the proposed DAE-CNN, CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN models seem to beat the existing strategies.According to the table, test set classification accuracies when they are corrupted with 50% noise are 95.01%,94.22%, 95.63%, 96.34%, and 96.74% and with 20% noisy test set images, the accuracies are 98.01%, 98.69%, 98.84%, 97.40 % and 96.74% for DAE-CNN, CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN architectures individually.It is clearly visible from the table that DAE-CNN, CDAE-CNN, DVAE-CNN outperform the other two models along with the existing models in case of classifying images subject to regular noise.But whenever it comes to classifying massive noisy images the DAE-CDAE-CNN and DVAE-CDAE-CNN play the frontier role.

Table 3
A Comparative Description of the Proposed DAE-CNN, CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN with Some Contemporary Methods (b)shows the classification accuracy of the proposed models in the case of classifying 50% noisy image data up to 400 epochs.This time, the DVAE-CDAE-CNN architecture achieved the first place with 53.91% accuracy.The DAE-CNN, CDAE-CNN, DVAE-CNN and DAE-CDAE-CNN showed a reasonable classification accuracy (52.95%, 52.5%, 52.64% and 53.63% respectively.).It is clearly visible that the classification accuracy of these models degraded while working with the CIFAR-10 dataset compared to the performance of the models over the MNIST data set.The main reason behind this issue was that in the tiny pictures in the CIFAR-10 data set (32x32 sized) do not give a clear representation of the objects in the image within such a small region.Moreover, the objects were captured in images with different orientationTable 4 details the classification accuracy for each individual object for the test set images after 400 epochs with 20% as well as 50% noise.All the models showed best classification accuracy for the object "Frog".The DAE-CNN, CDAE-CNN and DVAE-CNN recognized it correctly 704, 702 and 707times respectively.Both the DAE-CDAE-CNN and the DVAE-CDAE-CNN accurately classified it in 699 cases.The worst classification happened while classifying the object "Deer".Even, the DVAE-CNN architecture that showed the best performance while classifying 20% noisy images misclassified it in 49% cases.In case of classifying 50% noisy images, all the models performed worst for the object "Deer".The CDAE-CNN

Table 6
compares the result of the proposed hybrid noisy image classifiers with other prominent works while working over the CIFAR-10 data set along with the particular feature(s) of those models.As per the table, test set accuracies with 50% noise were 52.95%, 52.5%, 52.64%, 53.68% and 53.91%, while with the 20% noise test set, accuracies were 62.37%, 62.69%, 62.8%, 61.88%

Table 4 .
Classification Performance of the Hybrid Models in Case of Individual Objects from CIFAR-10 Dataset.

Models Accurate classification (out of 1000 test samples of each class) Accuracy
DVAE-CDAE-CNN classified it correctly.

Table 5 .
Sample Objects from CIFAR-10 dataset along with Their Original and Predicted Class Labels

Table 6
compares the result of the proposed hybrid noisy image classifiers with other promine while working over the CIFAR-10 data set along with the particular feature(s) of those models.A table, test set accuracies with 50% noise were 52.95%, 52.5%, 52.64%, 53.68% and 53.91%, w the 20% noise test set, accuracies were 62.37%, 62.69%, 62.8%, 61.88% and 61.93% for DA CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN models respectively.table, it is clearly visible that our proposed models outperform some of the existing prominent the case of classifying noisy images, especially when the images were subject to massiv

Table 5 .
Sample Objects from CIFAR-10 dataset along with Their Original and Predicted Class Labels

Table 6
compares the result of the proposed hybrid noisy image classifiers with other promine while working over the CIFAR-10 data set along with the particular feature(s) of those models.A table, test set accuracies with 50% noise were 52.95%, 52.5%, 52.64%, 53.68% and 53.91%, w the 20% noise test set, accuracies were 62.37%, 62.69%, 62.8%, 61.88% and 61.93% for DA CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN models respectively.table, it is clearly visible that our proposed models outperform some of the existing prominent the case of classifying noisy images, especially when the images were subject to massiv

Table 5 .
Sample Objects from CIFAR-10 dataset along with Their Original and Predicted Class Labels

Table 6
compares the result of the proposed hybrid noisy image classifiers with other promine while working over the CIFAR-10 data set along with the particular feature(s) of those models.A table, test set accuracies with 50% noise were 52.95%, 52.5%, 52.64%, 53.68% and 53.91%, w the 20% noise test set, accuracies were 62.37%, 62.69%, 62.8%, 61.88% and 61.93% for DA CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN models respectively.table, it is clearly visible that our proposed models outperform some of the existing prominent the case of classifying noisy images, especially when the images were subject to massiv

Table 5 .
Sample Objects from CIFAR-10 dataset along with Their Original and Predicted Class Labels

Original image Actual label Classified with hybrid methods DAE-CNN CDAE-CNN DVAE-CNN DAE-CDAE- CNN DVAE-C CN
93% for DAE-CNN, CDAE-CNN, DVAE-CNN, DAE-CDAE-CNN and DVAE-CDAE-CNN models respectively.From this table, it is clearly visible that our proposed models outperform some of the existing prominent models in the case of classifying noisy images, especially when the images were subject to massive noises.Moreover, the classifier was the CNN.All the autoencoders and their hybrid models served only for the image denoising task.From the table, it is clearly observable that without prior image denoising by the autoencoders, the performance of the classifier would be disastrous.It is also notable that our models do not need to be trained with images corrupted with noises of different proportions.The DAE, CDAE, DVAE were trained with 20% noisy images only and the CNN was trained with noisefree raw images.Still, the DAE-CDAE-CNN and the DVAE-CDAE-CNN models classified 50% noisy images with very good classification accuracy omitting the necessity for the noisy-image classifiers to be trained with 50% noisy images.The cascading structures of the DAE-CDAE-CNN and DVAE-CDAE-CNN enabled them to show such great performances over the massive noisy data.