AN IMPROVED ARTIFICIAL DENDRITE CELL ALGORITHM FOR ABNORMAL SIGNAL DETECTION 1

In dendrite cell algorithm (DCA), the abnormality of a data point is determined by comparing the multi-context antigen value (MCAV) with anomaly threshold. The limitation of the existing threshold is that the value needs to be determined before mining based on previous information and the existing MCAV is inefficient when exposed to extreme values. This causes the DCA fails to detect new data points if the pattern has distinct behavior from previous information and affects detection accuracy. This paper proposed an improved anomaly threshold solution for DCA using the statistical cumulative sum (CUSUM) with the aim to improve its detection capability. In the proposed approach, the MCAV were normalized with upper CUSUM and the new anomaly threshold was calculated during run time by considering the acceptance value and min MCAV. From the experiments towards 12 benchmark and two outbreak datasets, the improved DCA is proven to have a better detection result than its previous version in terms of sensitivity, specificity, false detection rate and accuracy.


INTRODUCTION
The dendritic cell algorithm (DCA) is a class of computation intelligence inspired by the principle of human immune systems.Classified as one of the artificial immune system (AIS) algorithms, DCA is modeled after the nature behavior of the human defense system against intruders such bacteria, virus, and parasite based on the concept of the danger theory for use in problemsolving.DCA believes the human immune system is triggered only when a dendritic cell recognizes a danger signal released by an unexpected cell death due to pathogenic infection.The dendrite cell plays an important role as an inspector to recognize pathogens that penetrate the body.Analogized from that task, DCA is modeled to detect anomalies mainly in time series related applications.The preliminary DCA prototype was proposed in 2005 by Greensmith, Aickelin, & Cayzer (2005) into a computer network security system in identifying suspicious network intruders, and then it has been fully implemented as a real-time network intrusion detection system in the following years (Greensmith, Twycross, & Aickelin, 2006).After that, DCA has been seen in various area, mainly to time series anomaly detection-based problems including fault detection (Lee, Lau, Wong, Tam, & Chan, 2016;Ran, Timmis, & Tyrrell, 2010), outbreak detection (Mohamad Mohsin, Hamdan, & Abu Bakar, 2013), and intrusion detection (Anandita, Rosmansyah, Dabarsyah, & Choi, 2015;Bukola & A.O., 2016;El-Alfy & AlHasan, 2016;Ou, 2012).Recently, DCA also has been used as a tool to classify structured and unstructured information (Zainal & Jali, 2017).Their published results exhibit DCA is capable of discovering hidden anomalies well in comparison to other detection systems.
DCA employs the dangers of antigen as a criterion to determine the abnormality of a data point and this strategy makes it differ from other detection algorithms that rely on the pattern-matching approach.In DCA, each data point is viewed as an antigen that is vulnerable to pathogen attacks.During monitoring, DCA tracks antigen health conditions through its life span and accumulates the final score into a variable called multi-context antigen value (MCAV).Acting as a medical profile, MCAV represents the antigen experience in its lifetime based on the frequency of being a mature antigen over total antigen.At the end, the antigen is classified as an anomaly if the MCAV score is greater than the predefined anomaly threshold (Chelly & Elouedi, 2016).
In recent practice, there are three techniques to determine the anomaly threshold.First, is the try and test experiment based on expert recommendation.Second, is the class distribution between abnormal and normal group (Greensmith, 2007), and the last is based on the min MCAV (Song & Qijuan, 2012).The issue with those implementations is that the value needs to be determined before mining based on historical information that causes the new data point to be unrecognizable if the pattern is distinct from the original setting.Besides that, the try and test approach is a time consuming process and highly depending on expert guidance.One of the solutions is by calculating the value in real time during mining.Although the mean MCAV approach is able to skip the pre-determine anomaly threshold, it has a drawback when facing extreme values among MCAV.In this paper, we proposed an adaptive anomaly threshold based on Cumulative Sum (CUSUM) where it involves two folds; determine the new mean MCAV as a threshold and normalizing the MCAV with CUSUM.The improvements were aimed to allow DCA to determine the threshold value during mining and be robust against extreme value such that that it can produce better detection accuracy.The proposed algorithm was compared with the previous DCA with mean MCAV and four evaluation criteria were applied; the sensitivity, specificity, false detection rate and accuracy.In this study, 12 benchmark datasets from several data providers were chosen as experiment data and two outbreak datasets as a case study.The remainder of this paper is organized as follows.It starts by highlighting the dendrite cell algorithm background and discussion on previous works related to MCAV and the anomaly threshold.It is followed by the presentation of the proposed work and the experiment setup.After that, the results and discussion will be presented and finally the concluding remarks.

DENDRITE CELL ALGORITHM
DCA is derived based on the abstraction of the functionality of the danger theory that takes into account our immune system which is activated when a body cell releases a danger signal as response to infection (Matzinger, 2012).Biologically, the main element of the theory, the DCs will recognize the released signals by collecting body cell protein paired with three signals, PAMP, DS and SS, and then monitors their life progress.The monitoring task continues until the cell dies either a 'healthy death' (normal) or 'unhealthy death' (abnormal).
Analogized from the danger theory's mechanism, DCA is formalized into three phases: initialization, updating and aggregation.In the initialization stage, the algorithm parameters are configured and initialized, and all DCs are set in the immature state.During this stage, each item in the dataset is marked as antigen that has chances to be attacked by pathogens.In the updating phase, a continuous process of updating data structures from the input signals and the antigens is performed.The immature DCs collect the input signals 36 (PAMP, DS, and SS) together with the multiple antigens sampling, calculates the changes and determines which antigen is causing the changes using the accumulative function such that (1) where W is the weight matrix, IS is the input signal, OS is the output signal, i represents the PAMP, SS, and DS while j is the output signal categoring CSM, Mature, and Semi-Mature.
All input signals are transformed into three cumulative output signals: CSMs, Mature, and Semi-Mature.Throughout several samplings, the output signals will change the immature DCs 1 state either to semi-mature (normal) or mature (abnormal) depending on the CSM value such that it must be greater than the migration threshold.If the CSM value exceeds the threshold, the type of maturity is determined; 'mature' if the Mature > Semi-Mature or 'semimature' if Mature < Semi-Mature.
The aggregation phase occurs when the learning ends.At the final stage, antigens that are presented by the Mature and Semi-Mature context are accessed to determine their abnormalities.Termed as the mature context antigen value (MCAV), the abnormality of an antigen is calculated as MCAV = (Mature)/(Semi Mature + Mature) (2) If the MCAV is above a predetermined value (anomaly threshold), the antigen is labeled as abnormal/anomalous otherwise as normal.

THE ANOMALY THRESHOLD (AT) AND MATURE ANTIGEN CONTEXT VALUE (MCAV)
Anomaly threshold (AT) is a default value that separates normal and abnormal antigens.It is used to compare the MCAV of an antigen.The antigen is abnormal/anomaly if the value exceeds the threshold value.Currently, there are three strategies to determine the AT for DCA; try and test experiment, class distribution between abnormal and normal group (Greensmith, 2007) and average MCAV (Song & Qijuan, 2012).The information in Table 1 summarizes the AT implementation in the existing work.
Analogized from the danger theory's mechanism, DCA is formalized into three phases: initialization, updating an aggregation.In the initialization stage, the algorithm parameters are configured and initialized, and all DCs are set the immature state.During this stage, each item in the dataset is marked as antigen that has chances to be attacked b pathogens.In the updating phase, a continuous process of updating data structures from the input signals and th antigens is performed.The immature DCs collect the input signals (PAMP, DS, and SS) together with the multip antigens sampling, calculates the changes and determines which antigen is causing the changes using th accumulative function such that (1) where W is the weight matrix, IS is the input signal, OS is the output signal, i represents the PAMP, SS, and D while j is the output signal categoring CSM, Mature, and Semi-Mature.

E-mail classification
The threshold value was determined based on try and test which represented the number of spam e-mail (Secker, Freitas, & Timmis, 2003) Spam e-mail record

Fraud detection
The threshold was based on the ratio of online fraud video rental over all transactions (Huang, Tawfik, & Nagar, 2010) Online Rental video 0.28%

Image classification
Modeling the type of leave and AT for the system was decided based on ratio of mature leave images over overall leave samples (Bendiab & Kholladi, 2011) Leave images -

General classification
Introduce the mean MCAV as AT (Song & Qi-juan, 2012) Breast cancer data - 1-'try and test', 2-class distribution, 3-min MCAV The class distribution approach refers to the proportion between normal and abnormal classes where both classes need to be balanced in terms of number in order to produce a relational threshold value as depicted in Equation ( 3).This requirement is not easy to fulfill.Sometimes, since anomalies are isolated cases they tend to create a large gap between both classes.
AT class distribution = (∑ number of anomalies)/(∑ total data points) In the outbreak detection problem, for example, outbreak is a rare case that seldom occurs.It will cause the threshold value to be too small due to the big gap between the number of outbreak and non-outbreak cases.This can affect the detection accuracy as simulated in Table 2. Table 2 shows the result of DCA when AT is determined based on different class distribution ratios for breast cancer data (WBC).In the first row, the dataset was set to have a balance class between normal and abnormal patients while in the following row the number of abnormal patients was removed 90%.The result showed that the performance declined mainly at the ability to detect normal cases or lost its sensitiveness (SNS).

SNS-sensitivity, SPS-specificity, ACC-accuracy
The other issues with the existing implementations are that the value needs to be determined before mining based on historical information.The problem of this solution is the new data points tend to be unrecognizable if the pattern is distinct from the original setting.Besides that, the try and test approach is a time consuming process and highly depends on expert guidance.One of the solutions is calculating the value in real time during mining such min MCAV as shown in Equation ( 4) (Song & Qijuan, 2012).Although the mean MCAV approach able to skip the pre-determined AT, it has a drawback when facing extreme values among MCAV.Figure 1 shows the process of calculating the AT and comparing the value with MCAV using class distribution and min MCAV.
AT min MCAV = (∑ MCAV)/(∑ total data points)  time by considering the acceptance value.Figure 2 shows the AT calculation step in DCA which was hybrid with CUSUM.The processes include calculating the average MCAV value, determining the acceptance value K, normalizing the MCAV with the upper CUSUM, and then comparing the normalized MCAV with the AT.This process started after DCA had calculated MCAV of its antigen.This improved algorithm is named NMZ_MCAV.Based on Figure 2, the input of this process is the MCAV which is generated from DCA learning.After calculating the mean MCAV, the acceptance value K is determined.K represents the allowable magnitude of change.It is expressed by Equation ( 5) where δ is the shift size from standard deviation σ.In this study, δ was set between 0-2 from the standard deviation σ.

Figure 2. The proposed NMZ_MCAV method.
Based on Figure 2, the input of this process is the MCAV which is generated from DCA learning.After calculat the mean MCAV, the acceptance value K is determined.K represents the allowable magnitude of change.I expressed by Equation (5) where δ is the shift size from standard deviation σ.In this study, δ was set between from the standard deviation σ.
K=δ/2 σ = (|μ_1-μ_0|)/2 (5) mean Then, the upper side CUSUM is used to normalize MCAV.CUSUM is a statistical approach primarily used to monitor the planned process in manufacturing operations.It monitors the mean of the process and assumes a process remains under control when the cumulative mean is within the acceptance value K (Demsar, 2006).The process is considered out of control when a huge shift in movement occurs away from the target value.In this study, the cumulative mean shift was taken into consideration to normalize the MCAV.The upper side CUSUM, C+ was applied to normalize MCAV of each antigen such that (4) where the is the upper cumulative value at x th_ observation, x _i is the process at i th observation, μ 0 is the initial mean and K is the allowance value which is chosen between the target and the out of control value μ_1.The value accumulates deviation from μ 0 that is greater than K which is reset to zero on becoming negative.The starting value =0.
After that is to obtain a new AT.In this step, the acceptance value is considered in the process by adding it with the existing mean MCAV such that The function of K is to eliminate the existence of the extreme value in MCAV.
Then, the final step is to compare the new MCAV and AT. Figure 3 depicts the proposed DCA enhancement algorithm.

Figure 3. The proposed DCA Enhancement Algorithm
Then, the upper side CUSUM is used to normalize MCAV.CUSUM is a statistical approach primarily used to onitor the planned process in manufacturing operations.It monitors the mean of the process and assumes a process mains under control when the cumulative mean is within the acceptance value K (Demsar, 2006).The process is nsidered out of control when a huge shift in movement occurs away from the target value.In this study, the mulative mean shift was taken into consideration to normalize the MCAV.The upper side CUSUM, C+ was plied to normalize MCAV of each antigen such that here the   + is the upper cumulative value at xth_observation, x_i is the process at ith observation, μ0 is the initia ean and K is the allowance value which is chosen between the target and the out of control value μ_1.The  + value accumulates deviation from μ0 that is greater than K which is reset to zero on becoming negative.The fter that is to obtain a new AT.In this step, the acceptance value is considered in the process by dding it with the existing mean MCAV such that he function of K is to eliminate the existence of the extreme value in MCAV.Then, the final step to compare the new MCAV and AT. Figure 3  Then, the upper side CUSUM is used to normalize MCAV.CUSUM is a statistical approach primarily use monitor the planned process in manufacturing operations.It monitors the mean of the process and assumes a pro remains under control when the cumulative mean is within the acceptance value K (Demsar, 2006).The proce considered out of control when a huge shift in movement occurs away from the target value.In this study cumulative mean shift was taken into consideration to normalize the MCAV.The upper side CUSUM, C+ applied to normalize MCAV of each antigen such that where the   + is the upper cumulative value at xth_observation, x_i is the process at ith observation, μ0 is the in mean and K is the allowance value which is chosen between the target and the out of control value μ_1.

𝐶𝐶 𝑖𝑖
+ value accumulates deviation from μ0 that is greater than K which is reset to zero on becoming negative.
After that is to obtain a new AT.In this step, the acceptance value is considered in the process adding it with the existing mean MCAV such that The function of K is to eliminate the existence of the extreme value in MCAV.Then, the final is to compare the new MCAV and AT. Figure 3  Then, the upper side CUSUM is used to normalize MCAV.CUSUM is a st monitor the planned process in manufacturing operations.It monitors the mean o remains under control when the cumulative mean is within the acceptance value considered out of control when a huge shift in movement occurs away from cumulative mean shift was taken into consideration to normalize the MCAV.applied to normalize MCAV of each antigen such that where the   + is the upper cumulative value at xth_observation, x_i is the proces mean and K is the allowance value which is chosen between the target and   + value accumulates deviation from μ0 that is greater than K which is reset t starting value   + =0.
After that is to obtain a new AT.In this step, the acceptance value i adding it with the existing mean MCAV such that The function of K is to eliminate the existence of the extreme value i is to compare the new MCAV and AT. Figure 3  Then, the upper side CUSUM is used to normalize MCAV.CUSUM is a statistical approach primari monitor the planned process in manufacturing operations.It monitors the mean of the process and assumes remains under control when the cumulative mean is within the acceptance value K (Demsar, 2006).The considered out of control when a huge shift in movement occurs away from the target value.In this s cumulative mean shift was taken into consideration to normalize the MCAV.The upper side CUSUM applied to normalize MCAV of each antigen such that where the   + is the upper cumulative value at xth_observation, x_i is the process at ith observation, μ0 is mean and K is the allowance value which is chosen between the target and the out of control value   + value accumulates deviation from μ0 that is greater than K which is reset to zero on becoming nega starting value   + =0.
After that is to obtain a new AT.In this step, the acceptance value is considered in the pro adding it with the existing mean MCAV such that The function of K is to eliminate the existence of the extreme value in MCAV.Then, the f is to compare the new MCAV and AT. Figure 3

THE EXPERIMENT SETUP
This section discusses the experiment setup in order to evaluate the enhanced DCA algorithm.This

THE EXPERIMENT SETUP
This section discusses the experiment setup in order to evaluate the enhanced DCA algorithm.This proposed algorithm called NMZ_MCAV was compared with the existing DCA (M_MCAV) that is based on mean MCAV as AT strategy.Four evaluation metrics were applied, sensitivity (SNS), specificity (SPS), false detection rate (FDR), and accuracy (ACC).SNS measured the accurateness of the model to detect an abnormal class as an abnormal class; SPS measured the ability of the model to detect a normal class as a normal class; FDR measured the amount of false detections of an abnormal class as a normal class; and ACC measured the accurateness of the model in classifying both classes correctly.For SNS, SPS and ACC, the highest value indicated the best result while the lowest value was the best result for FDR.
In this study, 14 experiment datasets were used as described in Table 3.The first 12 datasets were benchmark or universal data from various domains that were downloaded from online data repositories.Meanwhile the last two datasets were outbreak datasets-dengue and respiratory-which were originally taken from the hospital and previous researchers.Both datasets were considered as case study in this study.Respiratory was a synthetic dataset for influenza outbreak.Known as WSARE, this dataset was created by Wong (2004) for the outbreak detection model using the association rule and statistic.The dataset contained 100 sets of data with different outbreak patterns and the virus released date and WSARE7 was chosen for this study.The age of this dataset was from 2002 and 2003 with 12 categorical features and 23,647 daily data points.

RESULT AND FINDING
The performance of the proposed algorithm (NMZ_MCAV) is presented in this section.The enhanced algorithm NMZ_MCAV was compared with the existing DCA (M_MCAV) that used mean MCAV as AT.To present the result, this section is divided into two parts based on the benchmark dataset and the outbreak data.

Benchmark dataset
The benchmark dataset is a universal data of various domains that were downloaded from shared online data repositories.The evaluation results are shown in Table 4.In Table 4, each row represents the result of each dataset.The last the two rows summarize the average values of each performance metric and the results for all datasets in terms of wins, ties, and losses (indicated by W/T/L) towards 12 datasets.The W/T/L is considered in addition to the average measurement because the average criteria would be susceptible to outliers.The p value (pval) represents the significant test (Wilcoxon test or T-test), where the value of the NMZ_MCAV must be less than 0.05 to make it statistically significant compared to the M_MCAV (Demsar, 2006).
The results published in Table 4 indicate a positive improvement where NMZ_MCAV generates a superior result than M_MCAV in most datasets.
The AVG score of each performance metrics show that the proposed approach has improved compared to competitor.The W/T/L statistics summarizes the capability of NMZ_MCAV to detect anomaly better that M_MCAV in most datasets.Although in certain datasets M_MCAV overcame NMZ_MCAV, their result was comparable and not significantly different.considered in addition to the average measurement because the average outliers.The p value (pval) represents the significant test (Wilcoxon test NMZ_MCAV must be less than 0.05 to make it statistically signific (Demsar, 2006).
The results published in Table 4 indicate a positive improvement w superior result than M_MCAV in most datasets.The AVG score of each p proposed approach has improved compared to competitor.The W/T/L st of NMZ_MCAV to detect anomaly better that M_MCAV in most datas M_MCAV overcame NMZ_MCAV, their result was comparable and not s Besides that, the NMZ_MCAV with the new AT has better ability to accurately detect anomaly as anomaly and at the same time can reduce error in misclassifying normal records as anomaly as this is an indicator of a good detection algorithm.Figure 4 summarizes the results in terms of SNS and FDR.The higher gap/range between both elements indicates the model is able to discriminate normal and abnormal groups effectively. 1 Besides that, the NMZ_MCAV with the new AT has better ability to accurately detect anomaly as anomaly and at the same time can reduce error in misclassifying normal records as anomaly as this is an indicator of a good detection algorithm.Figure 4 summarizes the results in terms of SNS and FDR.The higher gap/range between both elements indicates the model is able to discriminate normal and abnormal groups effectively.
M_MCAV NMZ_MCAV Through the proposed approach, each antigen will have a new normalized MCAV and the value will be in a similar range with its neighbor if their characteristics are identical.Besides normalizing the MCAV with CUSUM, the acceptable value K in AT also can eliminate the existence of extreme MCAV values and this will improve detection accuracy.Figure 5 demonstrates the MCAV value before and after normalization using the proposed approach for IRIS dataset.It also shows that the MCAV of antigen before normalization does not consistently behave and the pattern changes into a uniform form after normalization.rough the proposed approach, each antigen will have a new normalized MCAV and the value ll be in a similar range with its neighbor if their characteristics are identical.Besides rmalizing the MCAV with CUSUM, the acceptable value K in AT also can eliminate the istence of extreme MCAV values and this will improve detection accuracy.Figure 5 monstrates the MCAV value before and after normalization using the proposed approach for IS dataset.It also shows that the MCAV of antigen before normalization does not consistently have and the pattern changes into a uniform form after normalization.
. rough the proposed approach, each antigen will have a new normalized MCAV and the value ll be in a similar range with its neighbor if their characteristics are identical.Besides rmalizing the MCAV with CUSUM, the acceptable value K in AT also can eliminate the istence of extreme MCAV values and this will improve detection accuracy.Figure 5 onstrates the MCAV value before and after normalization using the proposed approach for IS dataset.It also shows that the MCAV of antigen before normalization does not consistently ave and the pattern changes into a uniform form after normalization.

Outbreak dataset
The performance of the proposed approach was then experimented with outbreak datasets-dengue outbreak and respiratory outbreak.Firstly, the enhanced algorithm NMZ_MCAV produced a better result than the previous model; M_MCAV in terms of SNS on both datasets as displayed in Figure 6.The accuracy of NMZ_MCAV increased by 0.07 and 0.18 for dengue and respiratory respectively when accurately classifying normal data as non-outbreak data that contributes to SPS score 0.814 (dengue) and 0.996 (respiratory).

Outbreak dataset
The performance of the proposed approach was then experimented with outbreak datasets-dengu outbreak and respiratory outbreak.Firstly, the enhanced algorithm NMZ_MCAV produced a better resu than the previous model; M_MCAV in terms of SNS on both datasets as displayed in Figure 6.Th accuracy of NMZ_MCAV increased by 0.07 and 0.18 for dengue and respiratory respectively whe accurately classifying normal data as non-outbreak data that contributes to SPS score 0.814 (dengue) an 0.996 (respiratory).In terms of the ability to detect the epidemic week or SNS, the NMZ_MCAV showed improvemen (1.00) in comparison with M_MCAV (0.995).For respiratory data, the ability of NMZ_MCAV declined b 0.02 as compared to M_MCAV.However, their differences were small and not significant.Although th specificity result was slightly lower than its previous version, the proposed method had improved th ability of DCA in terms of sensitivity to detect the true outbreak week.Figure 7 shows a compariso between NMZ_MCAV and M_MCAV in terms of SNS on dengue and respiratory data.The analysis was continued on the relationship between SNS and FPR over DCA after the MCAV was normalized with CUSUM.The comparison is shown in Table 5.Based on the table, NMZ_MCAV showed a better result in balancing the SNS (the ability to detect outbreak week as outbreak) and reducing the FPR (the error rates while detecting normal week as outbreak).The difference between both measurements shows the NMZ_MCAV performance was more consistent with higher SNS and lower FPR than

M_ MCAV NMZ_ MCAV Difference
In terms of the ability to detect the epidemic week or SNS, the NMZ_MCAV showed improvement (1.00) in comparison with M_MCAV (0.995).For respiratory data, the ability of NMZ_MCAV declined by 0.02 as compared to M_MCAV.However, their differences were small and not significant.Although the specificity result was slightly lower than its previous version, the proposed method had improved the ability of DCA in terms of sensitivity to detect the true outbreak week.Figure 7 shows a comparison between NMZ_ MCAV and M_MCAV in terms of SNS on dengue and respiratory data.
The analysis was continued on the relationship between SNS and FPR over DCA after the MCAV was normalized with CUSUM.The comparison is shown in Table 5.Based on the table, NMZ_MCAV showed a better result in balancing the SNS (the ability to detect outbreak week as outbreak) and reducing the FPR (the error rates while detecting normal week as outbreak).The difference between both measurements shows the NMZ_MCAV performance was more consistent with higher SNS and lower FPR than M_MCAV.In addition, there were improvements in terms of average SNS and FPR for both sets.As in the benchmark data section, the proposed normalization using CUSUM will transform the MCAV from an inconstant pattern into a smaller and uniform value based on the similarity of the antigen characteristics.Figure 7 shows the MCAV before and after normalization for respiratory dataset and Figure 8 shows the dengue dataset.Based on both figures, the MCAV after normalization tends to have a uniform value than the previous model.For example, in respiratory dataset, the outbreak started on day 350 and remained for 14 days.The MCAV value before the outbreak remained low and suddenly spiked up on day 350.In comparison the MCAV value before normalization indicated an inconsistent pattern.For the dengue dataset as in Figure 8, the displayed MCAV value was for week 140 until week 203.In comparison with the respiratory dataset, it was noticed that the MCAV after normalization of the dengue dataset was not much different from before the normalization since its input signals were formalized according to the dengue definition given by the health ministry.From the experiments, it can be concluded that the performance of the DCA has increased in terms of SNS, SPS and ACC as well as the lower error rate when the MCAV has been normalized with CUSUM and consider the acceptance value K in the threshold value.Experiments on benchmark and outbreak datasets showed an improvement after its implementation.Table 7 below summarizes the differences between the DCA with normalized MCAV version (NMZ_MCAV) and without normalization (M_MCAV).From the experiments, it can be concluded that the performance of the DCA has increased in terms of SNS, SPS and ACC as well as the lower error rate when the MCAV has been normalized with CUSUM and consider the acceptance value K in the threshold value.Experiments on benchmark and outbreak datasets showed an improvement after its implementation.Table 7 below summarizes the differences between the DCA with normalized MCAV version (NMZ_MCAV) and without normalization (M_MCAV).
From the experiments, it can be concluded that the performance of the DCA has increased in terms of SNS, SPS and ACC as well as the lower error rate when the MCAV has been normalized with CUSUM and consider the acceptance value K in the threshold value.Experiments on benchmark and outbreak datasets showed an improvement after its implementation.Table 7 below summarizes the differences between the DCA with normalized MCAV version (NMZ_MCAV) and without normalization (M_MCAV).

CONCLUSION
An adaptive anomaly threshold for DCA called NMZ_MCAV was proposed in this paper.In the new approach, the upper CUSUM formula was used to normalize MCAV and then the new anomaly threshold was calculated during mining by considering the acceptance value K and min MCAV.By using the proposed solution, the performance of DCA was significantly improved in term of sensitivity, specificity, false detection rate, and accuracy after it was tested over 12 benchmark datasets and two outbreak datasets.In future, the NMZ_MCAV will be experimented on the real time network intrusion data and the business fraud data in order to further evaluate its effectiveness and robustness.

Figure 1 .
Figure 1.The steps of calculating AT and comparing it with MCAV based on class distribution and min MCAV.

Figure 3 .
Figure 3.The proposed DCA Enhancement Algorithm

Figure 4 .
Figure 4.The range between SNS and FDR for NMZ_MCAV and M_ MCAV in benchmark datasets.

Figure 4 .
Figure 4.The range between SNS and FDR for NMZ_MCAV and M_MCAV in benchmark datasets.

Figure 5 .
Figure 5.The MCAV of antigen before and after normalization with CUSUM in IRIS dataset.

Figure 5 .
Figure 5.The MCAV of antigen before and after normalization with CUSUM in IRIS dataset.

Figure 6 .
Figure 6.The SPS between NMZ_MCAV and M_MCAV on dengue and respiratory dataset.

Figure 7 .
Figure 7.The SNS between NMZ_MCAV and M_MCAV on dengue and respiratory dataset.

Figure 6 .
Figure 6.The SPS between NMZ_MCAV and M_MCAV on dengue and respiratory dataset.

Figure 7 .
Figure 7.The SNS between NMZ_MCAV and M_MCAV on dengue and respiratory dataset.

Figure 8 .
Figure 8. MCAV of dengue dataset before and after normalization with CUSUM.

Table 2
Anomaly Detection Problem Based on Class Distribution depicts the proposed DCA enhancement lgorithm.
depicts the proposed DCA enhancem algorithm.
depicts the algorithm.
depicts the proposed DCA enha algorithm.

Table 3
Description of the DatasetsDengue dataset was provided by two departments; the emergency visit dataset from the Vector Control Unit, Seremban District Hospital, Negeri Sembilan, Malaysia and the climate dataset provided by the Meteorological Centre, Malaysia.The dataset was from 2003 to 2009.The emergency visit dataset had 15 features representing the demographic and clinical data of dengue patients.
The climate dataset consisted of eight continuous attributes representing the information related to temperature, humidity and rain.Both datasets were then merged as one dengue profile dataset.

Table 4
Comparative Results between NMZ_MCAV and M_MCAV for 12 Benchmark Datasets to the average measurement because the average criteria would be susceptible to tliers.The p value (pval) represents the significant test (Wilcoxon test or T-test), where the value of the Z_MCAV must be less than 0.05 to make it statistically significant compared to the M_MCAV emsar, 2006).The results published in Table4indicate a positive improvement where NMZ_MCAV generates a erior result than M_MCAV in most datasets.The AVG score of each performance metrics show that the posed approach has improved compared to competitor.The W/T/L statistics summarizes the capability NMZ_MCAV to detect anomaly better that M_MCAV in most datasets.Although in certain datasets MCAV overcame NMZ_MCAV, their result was comparable and not significantly different.

Table 4
Comparative Results between NMZ_MCAV and M_MCAV for 12 Benchm

Table 4
Comparative Results between NMZ_MCAV and M_MCAV for 12 Benchmark Datasets

Table 4
Comparative Results between NMZ_MCAV and M_MCAV for 12 Benc The MCAV of antigen before and after normalization with CUSUM in IRIS dataset.

Table 5
The Difference between SNS and FPR of NMZ_MCAV and M_MCAV for Dengue and Respiratory Dataset

Table 6
The ACC of NMZ_MCAV and M_MCAV for Dengue and Respiratory Dataset

Table 7 A
Comparison between NMZ_MCAV and M_MCAV