SIMILARITY DISTANCE MEASURE AND PRIORITIZATION ALGORITHM FOR TEST CASE PRIORITIZATION IN SOFTWARE PRODUCT LINE TESTING

To achieve the goal of creating products for a specific market segment, implementation of Software Product Line (SPL) is required to fulfill specific needs of customers by managing a set of common features and exploiting the variabilities between the products. Testing product-by-product is not feasible in SPL due to the combinatorial explosion of product number, thus, Test Case Prioritization (TCP) is needed to select a few test cases which could yield high number of faults. Among the most promising TCP techniques is similarity-based TCP technique which consists of similarity distance measure and prioritization algorithm. The goal of this paper is to propose an enhanced string distance and prioritization algorithm which could reorder the test cases resulting to higher rate of fault detection. Comparative study has been done between different string distance measures and prioritization algorithms to select the best techniques for similarity-based test case prioritization. Identified enhancements have been implemented to both techniques for a better adoption of prioritizing SPL test cases. Experiment has been done in order to identify the effectiveness of enhancements done for combination of both techniques. Result shows the effectiveness of the Received: 15 July 2018 Accepted: 3 September2018 Published: 11 December 2018 Journal of ICT, 18, No. 1 (January) 2019, pp: 57–75 58 combination where it achieved highest average fault detection rate, attained fastest execution time for highest number of test cases and accomplished 41.25% average rate of fault detection. The result proves that the combination of both techniques improve SPL testing effectiveness compared to other existing techniques.

combination where it achieved highest average fault detection rate, attained fastest execution time for highest number of test cases and accomplished 41.25% average rate of fault detection.The result proves that the combination of both techniques improve SPL testing effectiveness compared to other existing techniques.

INTRODUCTION
Software Product Line (SPL) engineering is based on systematically managing and exploiting the commonalities and variabilities between features to achieve specific goals of customers (Al-Hajjaji, Lity, Lachmann, Thüm, Schaefer., & Saake, 2016).The adoption of SPL in many software organizations is to exploit the advantages of reducing development cost, time and effort while preserving the quality of products.Feature Model (FM) is used in SPL to provide detailed information regarding to features and relationship between the features.Furthermore, the commonalities and variabilities of all products also can be identified from the FM.These features will undergo configuration process to select only a set of valid combination of the features known as configuration.Quality assurances such as testing in SPL is much harder compared to single system due to complexity of features from the FM, thus new technique is suggested to be created for this challenge such as reusing test assets to improve efficiency and handle complexity of SPL (Johansen et al., 2012).Although product-by-product testing can be done in SPL, it is highly infeasible in terms of cost and time thus incremental testing strategy is preferable to be used in SPL (Al-Hajjaji et al., 2016).However, combinatorial explosion further complicates SPL testing due to the number of products that increases exponentially when the number of features grows.To overcome combinatorial explosion of products, regression testing strategy such as Test Case Prioritization (TCP) is preferred to be used in SPL since TCP is able to reduce the testing resources allocated while preserving the number of test cases and maintaining efficient fault detection.TCP is one of the regression testing strategies along with minimization and selection proposed by a survey in Machado et al. (2014).In SPL, various researchers proposed TCP into their works (Devroy, Perrouin, Cordy, Samih, Legay, Schobbens & Heymans, 2017;Henard, Papadakis, Perrouin, Klein, Heymans & Le Traon, 2014;Al-Hajjaji et al., 2017;Johansen, Haugen, Fleurey, Eldegard., & Syversen, 2012) to overcome issues such as combinatorial explosion.Among the most promising approach for TCP is a similarity-based prioritization technique used by Henard et al. (2014) and Al-Hajjaji et al. (2016) to effectively test SPL products.The aim of similarity-based prioritization is to reorder the test cases in the prioritized test suite.This is done to increase this method's capability to detect faults earlier in the suite thus significantly reduce the testing resources allocated and accelerate the time to market the product.Similarity-based prioritization acts upon the assumption of the most dissimilar test cases are able to detect high number of faults compared to similar ones (Henard et al., 2014).Typically, similarity-based prioritization techniques consist of two important key elements which are similarity measures to calculate the similarity between test cases and prioritization algorithm to reorder the test cases in the test suite according to their similarity value.Thus, we are motivated to compare between different similarity distance measures and prioritization algorithms in order to find the best combination of both techniques which can be further enhanced to fulfill the goal of increasing the probability of finding faults in test cases.An experiment will be done to evaluate the effectiveness of the proposed combination.Therefore there are three contributions of this study as follows.The first is comparing between different similarity distance measures and prioritization algorithms and to identify the best techniques for similaritybased prioritization.Enhancing both type of techniques and produce a better method for adoption in SPL testing is the second contribution.The third contribution is performing an experiment to evaluate the effectiveness of the integration of both enhanced techniques and identifying further improvement (if any) compared to the existing techniques.

RELATED WORKS
Combinatorial Interaction Testing (CIT) is commonly used in SPL testing to produce a set of products based on the identification of relationship between the features from FM. CIT is able to reduce the number of products generated compared to the very high number of products to be tested using product-by-product testing (Al-Hajjaji et al., 2017).Several CIT sampling algorithms have been proposed to overcome combinatorial explosion such as AETG, CASA, ICPL, Chvatal, and MoSo-PoLiTe (Cohen, Ravikumar, & Fienberg., 2008;Garvin, Cohen & Dwyer, 2011;Johansen Haugen, Fleurey, Eldegard & Syversen, 2012;Chvatal, 1979;Oster, Zorcic, Markert & Lochau , 2011).Among the highly regarded sampling algorithm is ICPL by Johansen et al. (2012) that is proposed to tackle scalability problem by producing acceptable size of covering arrays.This method also has faster execution time which is important in SPL testing.However, there are still large number of test cases need to be tested thus prioritization is needed to rank test cases from the test suite to enable a high probability of faults detected at an earlier rate (Henard et al., 2014).Several TCP criteria have been proposed by Sanchez et al. (2014) such as Cross Tree Constraint Ratio (CTCR), Coefficient of Connectivity-Density (CoC), Variability Coverage and Cyclomatic Complexity (VC & CC) and also Commonality and Dissimilarity based on the features and their relationship from the FM.These criteria are used to prioritize the test cases to achieve maximum capability to detect faults from the test cases since TCP only need subsets of test cases from test suite to detect faults.Moreover, several TCP works also consider different issues such as behavior of the features from the products (Devroey, Perrouin, Cordy, Samih, Legay, Schobbens and Heymans, 2017;Zamli, Klaib, Younis & Yeh, 2010) and prioritization of test cases in the integration level of testing for SPL (Al-Hajjaji et al., 2017;Devroy et al., 2014).Criteria such as dissimilarity used by Henard et al. (2014) and Al-Hajjaji et al. (2016) is highly preferable.Dissimilarity prioritization consists of two elements namely similarity measures to calculate similarity/dissimilarity value from test cases and prioritization algorithm to rank the order of test cases.These two elements are the main focus of this work which is also used by several other researchers (Henard et al., 2014;Al-Hajjaji et al., 2017;Sanchez, Seguira & Ruiz-Cortis, 2014).
Recently, various types of TCP technique have been proposed in SPL to overcome issue of combinatorial explosion caused by variabilities of features.Recently, statistical prioritization (Devroey et al., 2017) is proposed for SPL since most existing TCP techniques do not consider behavior of products and depend solely on FM.This technique uses usage models with Markov chains for prioritizing behavior based on the usage of the products The work suggested a new way for reuse of test assets based on behavior or scenarios of the product.Another work (Al-Hajjaji et al., 2017) implemented delta modeling into similarity-based prioritization for solution-space approach to improve effectiveness of SPL testing.Their work found that SPL testing effectiveness can be improved by incorporating delta modeling into similarity-based prioritization.Meanwhile for similaritybased prioritization, work by Henard et al. (2014) tried to solve scalability issue by using similarity heuristic and search-based approach on large feature models.The work found that the most dissimilar test cases in the test suite are able to increase the rate of fault detection compared to similar ones which significantly increase the effectiveness of the technique.Another work by Al-Hajjaji et al. (2016) investigated random order, interactionbased approach and default order of test case by proposing new approach to incrementally select the most dissimilar test cases in the prioritized test suite to increase effectiveness of the technique.Various researchers have applied similarity measures in SPL and among the techniques frequently used to calculate similarity value among test cases is Jaccard distance (Henard et al., 2014;Al-Hajjaji et al., 2016;Sanchez, Seguira & Ruiz-Cortis, 2014).Jaccard distance is regarded as the most efficient similarity measure to be used in SPL along with Hamming distance based on work by Devroey, Perrouin, Legay, Schobbens and Heymans (2016) that compared similarity measures such as Hamming distance, Jaccard distance, dice, anti-dice and Levenstein.Meanwhile, work by Al-Hajjaji et al. (2017) proposed enhancement on Hamming distance by considering deselected features into the formula in order to accurately calculate similarity value between test cases.Their work showed a promising result to increase the effectiveness of similarity-based prioritization techniques.
The use of similarity measures to calculate differences between two test cases can be improved using prioritization algorithms to reorder the test cases and increase the capability of fault detection.Several prioritization algorithms have been proposed in SPL such as Local Maximum, Global Maximum & All-yes config (Henard et al., 2014;Al-Hajjaji et al., 2017;Sanchez, Seguira and Ruiz-Cortis, 2014).Each algorithm ranks the test cases differently according to its aim, for example Local Maximum focuses on selecting two test cases with the highest distance between each other, while Global Maximum integrates Local Maximum distance to rank the first two test cases.However, the algorithm differentiates their process after that by selecting next test case with the highest distance to all test cases already selected on the prioritized test suite.Meanwhile, All-yes config selects test case with the most features as the first test case in the prioritized test suite.
Based on the work we analysed, to the best of our knowledge, there is no extensive comparison done on existing works for both similarity distance and prioritization algorithm topics.

SIMILARITY-BASED PRIORITIZATION TECHNIQUE
This section describes concepts related to the above topic which comprise of FM as the main model for the input to generate test cases.Consequently, similarity measure comparison and enhancements done on the best similarity distance measure will be explained.Lastly, this section will explain on comparison and enhancements done to improve the existing prioritization algorithm.

Feature Model
FM is typically used in SPL to describe features and relationships between each feature which is introduced by Kang et al. (1990) in the Feature-Oriented Domain Analysis (FODA).Figure 1 is an example of Electronic Shopping FM which consists of features and their relationships.Among the relationships that exist in this FM is: i) mandatory where a feature is mandatory to parent node, ii) optional where a feature is optional to parent node, iii) or where at least one of the features must be selected, iv) alternative where features in the child node must be selected, v) require where both features must be existing in the same product, and vi) exclude where both features cannot exist in the same product.ICPL is used as the sampling algorithm and the sample configurations generated from this tool is as shown in Table 1.(Sahak, Jawawi & Halim, 2017).The work discussed the benefits of Jaro-Winkler similarity measure in calculating the similarity value of test cases such as the usage of weight towards prefix in the strings which is common features of the products in SPL.For similarity-based prioritization, most of the existing works (Henard et al., 2014;Al-Hajjaji et al., 2017;Sanchez, Seguira & Ruiz-Cortis, 2014) used Jaccard distance and Hamming distance similarity measures since both similarity measures are regarded as the most efficient string distance to be used for SPL (Devroy et al., 2016).This study proposes an enhanced string distance based on hybridization of Jaro-Winker and Hamming Distance equation.The purpose of this enhancement is to increase the effectiveness of similarity-based prioritization technique.In the proposed enhanced similarity measures, we want the similarity measures to accurately calculate the similarity value between test cases and to be more diverse in terms of similarity value since the more diverse the value, the easier for prioritization algorithm to determine the ranking of test cases.This is important since we want to avoid 50-50 situation in our ranking process.50-50 situation is the situation where there are two or more test cases which have the same similarity value.Thus, prioritization algorithm is required to select between these same values as the next test case in the prioritized test suite.Typically, prioritization algorithms will select the first test case they found with the same value which sometimes is not the best test case to be chosen.
Hybridization of Jaro-Winkler and Hamming distance equations considers the deselected features from Hamming distance combined into the existing Jaro distance equation, Dj.This combination is made due to the assumption of faults are often found in unexpected places especially in real practice (Al-Hajjaji et al., 2016).After the deselected features are implemented, this study proposes a new usage of Degree of Difference, Df to calculate the difference between two test cases using their length of string in order to produce their difference value.Tumeng (2017) used Df to replace transposed character since the original equation is used only on record linkage which is a different domain compared to SPL with features because SPL does not repeat certain feature in their test case.For example, in first name string, "A" character is repeated 2 times in "ahmad" but in SPL one feature can only be used once per test case since there is no repeated feature in a test case.This modification is also motivated by the suggestion of Choi, Szakal, Chen, Branzei, and Zhao (2010) to modify existing string distance to suit different domains such as SPL.The enhanced Jaro-Winkler string distance is as shown in equation ( 1).
( 1 Common features of both test cases m = 6; length of T1 and T4 respectively is 6 and 7; their degree of difference are as follows Since total features in Electronic shopping is 10 and only three features absence between T1 and T4, which is standard, search and public report, n = 3 Enhanced Jaro Wrinkler(T1,T1) shown in equation (1).
where Common features of both test cases m = 6; length of T1 and T4 respectively is 6 and 7; their degree of difference are as follows shown in equation (1).
where  � new enhanced Jaro-Winkler based on Jaro-Winkler with addition of deselected features and Degree of Difference, Df.
where  � new enhanced Jaro-Winkler based on Jaro-Winkler with addition of deselected features and Degree of Difference, Df.
where  � new enhanced Jaro-Winkler based on Jaro-Winkler with addition of deselected features and Degree of Difference, Df.
where  � new enhanced Jaro-Winkler based on Jaro-Winkler with addition of deselected features and Degree of Difference, Df.
� "degree of difference" measured using equation,    +  � ( 1 ,  2 ) =  = 0.0769 on Jaro-Winkler with addition of deselected features sing equation,  After calculation, every answer will be in similarity value between two test cases, thus the dissimilarity value is calculated by using

Prioritization Algorithm
Prioritization algorithm is often used to help determine the appropriate ranking of test cases in a prioritized test suite.In SPL, several works implemented prioritization algorithms such as Local Maximum, Global Maximum and Allyes config algorithm (Henard et al., 2014;Al-Hajjaji et al., 2016;Sanchez, Seguira & Ruiz-Cortis, 2014).These works proposed various considerations in their algorithms to accurately determine the ranking of test cases such as consideration of total maximum distance of one test case towards all test cases in the prioritized test suite and choosing two test cases with maximum distance between them as the first two test cases in the prioritized test suite.The aim of prioritization algorithms is to rearrange test cases to be able to detect high number of faults within first few test cases which is the aim of similarity-based prioritization techniques.In this paper, we provide a comparative evaluation using three sampling algorithms used in the similarity-based prioritization based on the usage of enhanced Jaro-Winkler as our similarity measure.

Local Maximum Distance
Local Maximum distance algorithm has been used by several existing works (Henard et al., 2014;Sanchez et al., 2014) in their TCP approach.
Local Maximum algorithm starts with finding two unordered test cases with maximum distance between them as the first two test cases in the prioritized test suite.Next, the same process is iterated in the unordered test suite until every test case is placed in the prioritized test suite.

Global Maximum Distance
Global Maximum distance algorithm proposed by Henard et al. (2014) starts with finding two unordered test cases with maximum distance between them as the first two test cases in a prioritized test suite.Next, the algorithm will calculate the summation of total distance for each unprioritized test case inside After calculation, every answer will be in similarity value between two test c dissimilarity value is calculated by using 1- � .

Prioritization Algorithm
Prioritization algorithm is often used to help determine the appropriate ra prioritized test suite.In SPL, several works implemented prioritization a Maximum, Global Maximum and All-yes config algorithm (Henard et al., 2017b;Sanchez, Seguira and Ruiz-Cortis, 2014).These works proposed their algorithms to accurately determine the ranking of test cases such maximum distance of one test case towards all test cases in the prioritized te test cases with maximum distance between them as the first two test cases in The aim of prioritization algorithms is to rearrange test cases to be able t faults within first few test cases which is the aim of similarity-based prioriti paper, we provide a comparative evaluation using three sampling algorithm the prioritized test suite.This process continues until all test cases are placed in the prioritized test suite.

All-yes Config
All-yes config algorithm was proposed by Al-Hajjaji et al. (2017) to reorder the rank of each test case into the prioritized test suite.First, the algorithm will select test cases with the most features as the first test case in the prioritized test suite.The justification of choosing test cases with most features is based on the assumption that most faults will be discovered in the test cases with the most features first (Al-Hajjaji et al., 2017).Next, the algorithm will select second test case with the maximum distance towards the first test case.Lastly, the algorithm will select test cases with the maximum distance towards the test cases in the prioritized test suite with minimum distance consideration.
Eight benchmark case studies were chosen for the comparison between the three prioritization algorithms.The selection of the eight case studies are based on their usage in existing work in SPL (Sanchez, Seguira & Ruiz-Cortis, 2014;Henard et al., 2014;Al-Hajjaji et al., 2016).These case studies are widely available on Software Product Lines Online Tool (SPLOT) repository.Table 2 shows details of the benchmark case studies.APFD results from the comparison made between three prioritization algorithms described earlier is as shown in Table 3. All-yes config gained the highest average APFD scores with 80.81% followed closely by Local Maximum with 80.64% and lastly, Global Maximum with 78.67%.All-yes config prioritization algorithm gained the highest APFD scores for four out of eight FMs; meanwhile, Local Maximum scored highest APFD scores for three FMs.Other than that, Global Maximum gained highest APFD score for only one FM.These results show that All-Yes config is the best prioritization algorithm for the enhanced Jaro-Winkler similarity measure.The results also signify that All-yes config prioritization algorithm can be improved in terms of its calculation of maximum distance between test cases which will be elaborated in the following section.

Enhanced All-Yes Config Prioritization Algorithm
There are two changes proposed to the existing All-yes config algorithm.The first modification is by removing some of the codes in line 10 (marked with ) to eliminate the extra process of finding the maximum distance to the first test case.Next, this study finds two test cases with the furthest distance between them in T (set of test cases).This process is marked with in line 11.The enhancement done to the algorithm is as shown in Table 4. Algorithm 2 in Line 16 refers to the same algorithm by Al-Hajjaji (2016) without any modifications therefore the algorithm is not included in this section.To demonstrate the working process of enhanced All-yes config algorithm (EA), test cases similarity distance in Table 1 is referred as an input to the prioritization algorithm.From the table, T4, T5, T6 are test cases in unprioritized test suite with the highest number of features where each test case has seven selected features.From the three test cases, T4 is selected as the first test case in the prioritized list since it is the first test case in the test suite.Next, the algorithm will iterate among test cases in the unprioritized test suite to find two test cases with the highest distance among them, in this case T3 -T6 with 0.258.The process continues to find test cases to be prioritized in the test suite with consideration of minimum distance until all test cases are placed into the prioritized test suite.The order of test case using EA algorithm is T4, T3, T6, T5, T2 and T1.

Table 4
Enhanced All-yes config prioritization algorithm

ENHANCED SIMILARITY-BASED PRIORITIZATION EXPERIMENTATION
The main goal of this experiment is to investigate which similarity-based prioritization technique is most effective in prioritizing test cases.The technique is based on the combination of similarity measures and prioritization algorithm for test case prioritization.The experiment consists of five phases which starts with experimental setup.Then, generation of test cases will be done using ICPL sampling algorithm since it is regarded as the fastest sampling algorithm and is able to produce acceptable size of test suite (Henard et al., 2014).Next, faults will be generated.The test cases are then prioritized using the combination of enhanced Jaro-Winkler similarity measure and enhanced All-yes prioritization algorithms.Lastly, our prioritized test suite will be evaluated based on their effectiveness criteria.

Experimental Setup
The main aim of the experiment is to evaluate the effectiveness of similaritybased prioritization technique obtained by combining enhanced string distance measure (enhanced Jaro-Winkler or EJW) and enhanced prioritization algorithm (enhanced All-yes config or EA) explained in the previous sections.In order to investigate the effectiveness, EJW is consistently used in combination with different prioritization algorithms which are EA, All-Yes Config (A), Local Maximum (LM) distance and Global Maximum (GM) distance.
This experiment is performed in controlled testing environment using a single Windows-based machine run on a laptop with 6GB RAM and Intel Core I5-3337U 1.8hz processor.Every phase of the experiment is done on this machine from the generation of the product using sampling algorithm on case studies until the testing phase of the experiment which is implemented using Java Eclipse Neon 3. Our work use the existing tool provided by Sanchez et al. (2014) which extends the SPLCAT tool for the generation of test cases.We also have added several functionalities to the tool for a complete TCP process in our research.

Generation of Test Cases using ICPL
Test cases for this experiment are generated using ICPL sampling algorithm based on input from FM as shown in Table 2. ICPL is a specialized sampling algorithm used to tackle scalability issue due to the complexity and size of industrial product lines (Johansen et al., 2012).ICPL generates covering array from the FM allowing the possibility of incorporating combinatorial interaction testing for the configuration generated from the FM.ICPL is chosen based on its improvement in terms of time in producing acceptable size of covering arrays and has low standard deviation due to its non-determinism (Johansen et al., 2012)

Fault Generation
Our work utilized the fault simulator by Ensan, Bagueri and Galsevic (2012) with 1-4 features interactions faults.The fault simulator is used by many SPL researchers in their work to investigate the effectiveness of TCP technique (Henard et al., 2014;Al-Hajjaji et al., 2017;Sanchez, Seguira & Ruiz-Cortis, 2014).The assumption of faults spreading equally in the test cases has been used by other SPL researchers to evaluate effectiveness of TCP technique.

Prioritization of Test Suite
Test cases generated by the sampling algorithm are considered as an unprioritized test suite.Test cases inside the unprioritized test suite will be calculated based on their similarity value by using enhanced string distance measure.The result is stored inside a hashtable data structure.The stored similarity value for each test case inside hashtable data structure is iterated by the enhanced prioritization algorithm in order to rank the test cases in the prioritized test suite.

Evaluation of Effectiveness Criteria
Average Fault Detection (APFD) metric evaluates the effectiveness of prioritization by calculating the average number of faults exposed based on their index position in a prioritized test suite.The rate of faults detected refers to the amounts of faults detected in a certain level of a test suite.A good technique is the one which is capable to detect 100% of faults by using a low number of test suites.For example, technique A is capable to detect 100% of fault by using only 10% of test suite in Web Portal case study thus it is considered as a very effective technique in terms of rate of fault detection.This study provides the rate of fault detection in percentage value, the amount of fault detected is calculated every 10% where level of test suite comprised of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% and 100% to provide a detailed observation on result for each level.Our work utilized the fault simulator by Ensan, Bagueri and Galsevic (2012) with 1-4 features interactions faults.The fault simulator is used by many SPL researchers in their work to investigate the effectiveness of TCP technique (Henard et al., 2014;Al-Hajjaji et al., 2017;Sanchez, Seguira and Ruiz-Cortis, 2014).The assumption of faults spreading equally in the test cases has been used by other SPL researchers to evaluate effectiveness of TCP technique.

RESULTS AND DISCUSSION
In terms of APFD scores, EA outperformed other prioritization algorithms for Web Portal, Video Player, Printer and Electronic Shopping case studies while performed second best in Go Phone case study.This proves that combination between EJW and EA is suitable to be used in similarity-based prioritization technique for most case studies.This is shown in Table 5, where the average ranked obtained by EA + EJW for benchmark case studies is the highest compared to other prioritization algorithms.Full comparison between prioritization algorithms based on their APFD results is as shown in Table 5.
Next, in terms of experiment execution time, the combination of EJW and EA gained the fastest execution time compared to three other prioritization algorithms in most of the case studies except Printers.Generally, execution time depends on the number of test cases generated for each case study.Battle of Tanks and Printers which consist of 484 and 129 test cases respectively have the highest number of test cases compared to other case studies.Theoretically, test cases from both FMs will require much longer experiment completion time.However, both case studies showed that EA is the fastest prioritization algorithm with 1568 milliseconds execution time in Battle of Tanks and second best in Printers case study with 69 milliseconds execution time.This shows that EA + EJW combination is efficient in terms of time taken for execution of large number of test cases.Lastly, for the rate of fault detected, the combination of EJW and EA obtained the best results for two case studies namely Web Portal and Video Player by using only 30% and 10% of the test suite to detect 100% of the faults.This proves the capability of EA and EJW algorithms to increase the rate of fault detected for different scale of case studies.Therefore, the proposed enhancement done on EA algorithm by considering the maximum distance of two other test cases proved to be successful and shows that omitting the first process in A algorithm as being done in EA improve the effectiveness of the algorithm.

CONCLUSION
The enhanced string distance technique, EJW takes consideration of the features inside SPL for the modification of its similarity distance calculation.Modification was done on original Jaro distance and Winkler equation by incorporating Degree of Difference, D f The implementation of D f is important to provide a much more accurate and consistent similarity value since the original Jaro-Winkler equation uses prefix in their equation which is not suitable for configuration of test cases such as in SPL.
Additionally, for the enhanced prioritization algorithm, EA improves the existing algorithm by eliminating the first process in A of selected test case which has the highest features.Instead, a new process is added by selecting two test cases with maximum distance between them in the unprioritized test suite.The proposed enhancement also supports the aim of similarity-based prioritization technique by placing test cases with higher distance between each other first in the test suite.
Lastly, better results were obtained from the combination of EJW and EA in terms of APFD scores, execution time and rate of fault detection.Thus, the combination between EJW string distance and EA prioritization algorithm produced a complete synergized similarity-based prioritization technique which improves the effectiveness of SPL testing process.It is hoped that the proposed technique could facilitate and ease SPL testers process in completing their daily work.

Figure 1 .
Figure 1.Feature Model for Electronic Shopping.
T1 and T4 as follows; T1 = {E-Shop, Catalogue, Payment, Bank Transfer, Security, High} T4 = {E-Shop, Catalogue, Payment, Bank Transfer, Credit Card, Security, High} Common features of both test cases m = 6; length of T1 and T4 respectively is 6 and 7; their degree of difference are as follows  � (T1,T1) = T1 and T4 as follows; T1 = {E-Shop, Catalogue, Payment, Bank Transfer, Security, High} T4 = {E-Shop, Catalogue, Payment, Bank Transfer, Credit Card, Security, High} Common features of both test cases m = 6; length of T1 and T4 respectively is 6 and 7 T1 and T4 as follows; T1 = {E-Shop, Catalogue, Payment, Bank Transfer, Security, High} T4 = {E-Shop, Catalogue, Payment, Bank Transfer, Credit Card, Security, High} Common features of both test cases m = 6; length of T1 and T4 respectively is 6 and 7 T1 and T4 as follows; T1 = {E-Shop, Catalogue, Payment, Bank Transfer, Security, High} T4 = {E-Shop, Catalogue, Payment, Bank Transfer, Credit Card, Security, High} Common features of both test cases m = 6; length of T1 and T4 respectively is 6 and 7 T1 and T4 as follows; T1 = {E-Shop, Catalogue, Payment, Bank Transfer, Security, High} T4 = {E-Shop, Catalogue, Payment, Bank Transfer, Credit Card, Security, High} Common features of both test cases m = 6; length of T1 and T4 respectively is 6 and 7 T1 and T4 as follows; T1 = {E-Shop, Catalogue, Payment, Bank Transfer, Security, High} T4 = {E-Shop, Catalogue, Payment, Bank Transfer, Credit Card, Security, High} Common features of both test cases m = 6; length of T1 and T4 respectively is 6 and 7; their degree of difference are as follows  � (T1,T1) = in Electronic shopping is 10 and only three features absence between T1 and T4, which is standard, search and public report, n = 3 Enhanced Jaro Wrinkler(T1,T1) =  � ( 1 ,  1 ) A higher APFD indicates a faster fault detection rate.APFD metric equation is as follows: where T = test suite n = test cases = the position of the first test case exposing the faults m = number of faults exposed by test suite 16

Table 1 Sample of Test Cases Generated from Sampling Algorithm Test Case Test Case Content
Based on our previous work, we compared six similarity measures such as Jaccard distance, Hamming distance, Cosine Similarity, Counting function, Sorensein Similarity and Jaro-Winkler.The result shows Jaro-Winkler is the best string distance for eight feature models used Kang et al. (1990)gue, Payment, Bank Transfer, Security, High, Search} T6 {E-Shop, Catalogue, Payment, Bank Transfer, Security, Standard, Search}6FM is typically used in SPL to describe features and relationships between each feature which is introduced byKang et al. (1990)in the Feature-Oriented Domain Analysis (FODA).Figure1is an example of Electronic Shopping FM which consists of features and their relationships.
� new enhanced Jaro-Winkler based on Jaro-Winkler with addition of deselected features and Degree of Difference, Df. |  −   + 1| is the absolute value of test case i minus a subsequent test case i+1.Meanwhile, |  −   + 1| is the total of length of test case i and test case i+1.
Distances among test cases are calculated using the configuration of Table1, we use two test cases T1 and T4 as follows; T1 = {E-Shop, Catalogue, Payment, Bank Transfer, Security, High} T4 = {E-Shop, Catalogue, Payment, Bank Transfer, Credit Card, Security, High} where, |  −   + 1| is the absolute value of test case i minus a subsequent test case i+1.Meanwhile, |  −   + 1| is the total of length of test case i and test case i+1.
2Is the length of test case 2 Distances among test cases are calculated using the configuration of Table1, we use two test cases where, |  −   + 1| is the absolute value of test case i minus a subsequent test case i+1.Meanwhile, |  −   + 1| is the total of length of test case i and test case i+1.Distances among test cases are calculated using the configuration of Table 1, we use two test cases where, |  −   + 1| is the absolute value of test case i minus a subsequent test case i+1.Meanwhile, |  −   + 1| is the total of length of test case i and test case i+1.Distances among test cases are calculated using the configuration of Table 1, we use two test cases where, |  −   + 1| is the absolute value of test case i minus a subsequent test case i+1.Meanwhile, |  −   + 1| is the total of length of test case i and test case i+1.
Distances among test cases are calculated using the configuration of Table1, we use two test cases |  −  +1| |  −  +1| where, |  −   + 1| is the case i minus a subsequent test case i+1.Meanwhile, |  −   + 1| f test case i and test case i+1.|  −  +1| |  −  +1| where, |  −   + 1| is the absolute value of test case i minus a subsequent test case i+1.Meanwhile, |  −   + is the total of length of test case i and test case i+1.Distances among test cases are calculated using the configuration of Table 1, we use two test cases

Table 2
Benchmark Case Studies

Table 4 .
Enhanced All-yes config prioritization algorithm

Table 5
Ranks of prioritization algorithms based on APFD results

Table 6
Execution time based on different prioritization algorithms

Table 7
Rate of fault detection based on different prioritization algorithms