IDENTIFYING SKYLINES IN CLOUD DATABASES WITH INCOMPLETE DATA

Skyline queries is a rich area of research in the database community. Due to its great benefits, it has been integrated into many database applications including but not limited to personalized recommendation, multi-objective, decision support and decision-making systems. Many variations of skyline technique have been proposed in the literature addressing the issue of handling skyline queries in incomplete database. Nevertheless, these solutions are designed to fit with centralized incomplete database (single access). However, in many realworld database systems, this might not be the case, particularly for a database with a large amount of incomplete data distributed over various remote locations such as cloud databases. It is inadequate to directly apply skyline solutions designed for the centralized incomplete database to work on cloud due to the prohibitive cost. Thus, this paper introduces a new approach called Incomplete-data Cloud Skylines (ICS) aiming at processing skyline queries in cloud databases with incomplete data. This approach emphasizes on reducing the amount of data transfer and Received: 5 January 2018 Accepted: 3 October 2018 Published:11 December 2018 Journal of ICT, 18, No. 1 (January) 2019, pp: 19–34 20 domination tests during skyline process. It incorporates sorting technique that assists in arranging the data items in a way where dominating data items will be placed at the top of the list helping in eliminate dominated data items. Besides, ICS also employs a filtering technique to prune the dominated data items before applying skyline technique. It comprises a technique named local skyline joiner that helps in reducing the amount of data transfer between datacenters when deriving the final skylines. It limit the amount of data items to be transferred to only those local skylines of each relation. A comprehensive experiment have been performed on both synthetic and real-life datasets, which demonstrate the effectiveness and versatility of our approach in comparison to the current existing approaches. We argue that our approach is practical and can be adopted in many contemporary cloud database systems with incomplete data to process skyline queries.

domination tests during skyline process.It incorporates sorting technique that assists in arranging the data items in a way where dominating data items will be placed at the top of the list helping in eliminate dominated data items.Besides, ICS also employs a filtering technique to prune the dominated data items before applying skyline technique.It comprises a technique named local skyline joiner that helps in reducing the amount of data transfer between datacenters when deriving the final skylines.It limit the amount of data items to be transferred to only those local skylines of each relation.A comprehensive experiment have been performed on both synthetic and real-life datasets, which demonstrate the effectiveness and versatility of our approach in comparison to the current existing approaches.We argue that our approach is practical and can be adopted in many contemporary cloud database systems with incomplete data to process skyline queries.

INTRODUCTION
Skyline queries are one of the predominant preference queries that have received significant attention in database literature.It has been marked as a rich area of research in database community for the recent decade.Skyline process attempts to identify the superior data items, which are not dominated by other data items in the database (Bharuka & Kumar, 2013;Borzsony, Kossmann, & Stocker, 2001;Gulzar, Alwan, Salleh, & Shaikhli, 2017a;Khalefa, Mokbel, & Levandoski, 2008;Swidan, Alwan, Turaev, & Gulzar, 2018).Skyline set comprises a set of non-dominated data items which is named (skylines) in a given database.Given two data items p and q, it can be said that p dominates q if and only if p is better than q in all dimensions and p is not worse than q in at least one dimension.A running database example (restaurant) has been utilized throughout this paper to elaborate the detail process of skyline query.Assuming a database relation named restaurant, which consists of two attributes (dimensions) each contain the details of 10 different restaurants.The first dimension indicates the rating of the restaurant that is given by the customers, while the second dimension represents the food price in each restaurant as demonstrated in Figure 1.It can also be observed that restaurants r 8 and r 5 their price is the cheapest among all restaurants.Nevertheless, r 8 have the lowest rate compared with other restaurants.Therefore, based on the skyline concept, restaurant r 8 is dominated by restaurant r 5 .Similarly, restaurant r 7 has a higher price value compared with the price values of restaurant r 1 and r 9 .However, r 7 has the highest rate value compared to all other restaurants.Therefore, r 7 could be one of the potential skyline results as illustrated in Figure 1(a).Applying skyline technique on restaurant database would result into only those restaurants that are the cheapest in price and highest in rate.Hence, the skyline result will include r 5 , r 2 , and r 7 restaurants.
Various strategies have been suggested utilizing the concept of skyline concentrating on different types of databases such as complete, incomplete and uncertain.Some of the previous techniques focused on processing skyline query in the complete database.These techniques aim at reducing the searching space and minimizing the number of pairwise comparisons among the data items through avoiding the unnecessary pairwise companions between data items to identify the skylines.On the other side, other techniques concentrate on developing new solutions taking into account the new challenges introduced by the incompleteness of the data when processing skyline queries.This includes losing the transitivity property of skyline technique due to the missing values, which further leads to the issue of cyclic dominance.Hence, applying skyline technique designed for complete a database on an incomplete database is prohibitive and can incur high cost due to the unnecessary exhaustive pairwise comparisons between data items (Alwan et al., 2016;Bharuka & Kumar, 2013;Khalefa et al., 2008).However, these solutions are suggested to 2 two attributes (dimensions) each contain the details of 10 different restaurants.The first dimension indicates the rating of the restaurant that is given by the customers, while the second dimension represents the food price in each restaurant as demonstrated in Figure 1.fit with centralized database in which database relation is located in one site and only local access is needed to identify the skylines.In this regard, in cloud database merging of data before applying skyline technique result into transferring unnecessary tremendous amount of data from different remote datacenters.This solution is extremely undesirable as it leads to a prohibitive cost due to transferring a large amount of data, which incur high processing cost.Moreover, it also leads to a large number of unnecessary pairwise comparisons between data items, which can be avoided before applying skyline process.Processing skyline queries for a database with incomplete data in cloud context might not be as easy as in centralized context.
This paper is an extension of the work in (Gulzar et al., 2017b).It presents a new approach, Incomplete-data Cloud Skylines (ICS) for processing skyline queries in cloud database with incomplete data.In this context, database relations are distributed over various locations and remote access needs to be conducted in order to retrieve the skylines of the database.In this work, we assume that the database relations are divided horizontally and spread over different sites.The proposed approach comprises three phases, namely: (i) identifying the skylines of each relation in all datacenters, (ii) joining the skylines of all relations, and (iii) identifying global skylines.

RELATED WORK
A great research effort has been devoted highlighting the problem of skyline queries in database systems.In this section, we examine and report the relevant works of skyline queries in both complete and incomplete databases.Skyline queries have been first introduced into the database community by (Borzsony et al., 2001).They have proposed two different algorithms to process skyline queries in complete database, namely, Block Nested Loop (BNL) and Divide and Conquer (D&C).Later, many algorithms have been developed which are inspired by BNL and D&C techniques by either utilizing the idea of partitioning proposed in D&C or applying the concept of sorting to improve BNL technique.These algorithms are Linear Elimination Sort Skyline (Godfrey et al., 2005), Branch and Bound Skyline (Papadias et al., 2005), and SkyTree (Lee & Hwang, 2014).The main theme of these techniques is to process skyline queries in the centralized complete database.
Many other approaches have been proposed to derive skylines in incomplete databases.The review work in Gulzar et al. (2017a) has summarized skyline query approaches proposed by many researchers in incomplete databases that include BUCKET and Iskyline (Khalefa et al., 2008), Replacement Based Sets Skyline Queries (Arefin & Morimoto, 2012), Baseline, Virtual Point based algorithm, the k-iskyband algorithm (Miao et al., 2013).The work in Alwan et al. (2016) proposed a framework that inspired by the work introduced by Khalefa et al. (2008).In addition, the work in Lee et al. (2016) proposed two algorithms for incomplete data, namely: baseline algorithm called BUCKET and sorting-based bucket skyline algorithm (SOBA).In SOBA two optimized techniques: bucket level orders and point lever orders have been used to reduce domination tests between data items, minimize the size of skyline set and overall increases the efficiency of skylines processing over incomplete data.Lastly, Wang et al. (2017) has introduced an approach to process skyline queries for massive incomplete data.The main idea of the proposed approach is based on dividing the initial database into two clusters (restrict and loose) according to the importance of the dimension.This followed by applying skyline technique on the dimensions of higher importance.Similarly, skyline technique is applied to the loose dimensions, which have lower importance.Lastly, the skylines of both clusters are compared with each other to return the final skylines.
However, it can be concluded that most of these related works mentioned above assumed that the database is centralized, and data are stored in a single database relation.Nevertheless, there have been several skyline techniques proposed for distributed complete databases such as Sort First Skyline Join (Vlachou et al., 2011), Iterative (Sun et al., 2008), and Skyline Join Algorithm (Zhang et al., 2016).These techniques assume that data is partitioned either vertically or horizontally and might exist in more than one database relation.To evaluate skyline queries, join operator needs to be performed combining the data of the relations before applying the skyline technique.However, it is impractical to directly apply these techniques on incomplete distributed databases due to the prohibitive cost and the issue of cyclic dominance and losing transitivity property of skyline technique.
To the best of our knowledge, the most recent work that raised the issue of processing skyline queries in incomplete distributed databases is contributed by Alwan et al. (2017).They suggested that the database is distributed over more than two relations and these relations are divided horizontally.The proposed technique encompasses three phases, namely: identify the skylines of each relation, joining the skylines of the relations and determining the final skylines.Several optimization techniques have been employed to eliminate those dominated data items before finding the global skyline of all relations.However, this work is limited to the case of database relations which are vertically partitioned where the dimensions (attributes) of the relations are located on different sites and remote access needs to be performed during the skyline process.Besides, the architecture of cloud is quite different from distributed environment.

DEFINITIONS
This section gives some necessary definitions and annotations that are related to skylines queries in a cloud database with partially complete data.These definitions and notations are important to explain the details of our proposed approach.Our technique has been developed in the context of relational databases with partially complete data, D. A relation of the database D is denoted by R (d1, d2, ..., dm) where R is the name of the relation with m-arity and d = (d1, d2, ..., dm) is the set of dimensions.
Definition 1 Incomplete Database: given a database D (R1, R2, ..., Rn), where Ri is a relation denoted by Ri (d1, d2, ..., dm), D is said to be incomplete if and only if it contains at least a data item pj with missing values in one or more dimensions dk (attributes); otherwise, it is complete.
Definition 2 Dominance: Given two data items p i and p j database with d dimensions, p i dominates p j (the greater is better) (denoted by if and only if the following condition holds: Definition 3 Skyline Queries: Select a data item pi from the set of D database if and only if pi is as good as pj (where i ≠ j) in all dimensions (attributes) and strictly better than p j in at least one dimension (attribute).We use Sskyline to denote the set of skyline data items, Sskyline = Definition 4 Comparable: Let the data items ai and aj and aj are comparable (denoted by if and only if they have no missing values in at least one identical dimension; otherwise ai is incomparable to aj (denoted by Definition 5 Cloud Database: given a set of databases D (DB1, DB2, ..., DBn), where DB1 is a database denoted by DB (R1, R2, ..., Rn), where R represent a database relation belong to DBi, D is a cloud database if the databases are deployed over different datacenters located on different sites.

METHODOLOGY The Proposed Incomplete-data Cloud Skylines Approach
In this section, the detail steps of the proposed approach, Incomplete-data Cloud Skylines Approach (ICS) for processing skyline queries in cloud incomplete database are presented and explained.The proposed approach focuses on processing skyline queries with the intention of decreasing the number of pairwise comparisons and the amount of data transferred during skyline evaluation.To achieve this aim, we attempt to ensure that the dominated data items reside in different datacenters are eliminated before applying skyline Definition 2 Dominance: Given two data items p i and dominates p j (the greater is better) (denoted by pi ≻ pj Definition 3 Skyline Queries: Select a data item pi fr pi is as good as pj (where i ≠ j) in all dimensions (att least one dimension (attribute).We use Sskyline to

METHODOLOG The Proposed Approach (ICS)
In this section, the detail steps of the proposed approa in cloud incomplete database are presented and explain processing skyline queries with the intention of comparisons and the amount of data transferred durin aim, we attempt to ensure that the dominated data ite eliminated before applying skyline technique.This iven two data items p i and p j ∈ D database with d dimensions, p i better) (denoted by pi ≻ pj) if and only if the following condition ies: Select a data item pi from the set of D database if and only if i ≠ j) in all dimensions (attributes) and strictly better than p j in at ute).We use Sskyline to denote the set of skyline data items, , pi ≻ pj).

METHODOLOGY The Proposed Approach (ICS)
In this section, the detail steps of the proposed approach, CIS for processing skyline in cloud incomplete database are presented and explained.The proposed approach foc processing skyline queries with the intention of decreasing the number of p comparisons and the amount of data transferred during skyline evaluation.To achi aim, we attempt to ensure that the dominated data items reside in different datacen

METHODOLOGY The Proposed Approach (ICS)
In this section, the detail steps of the proposed approach, CIS for proce in cloud incomplete database are presented and explained.The proposed processing skyline queries with the intention of decreasing the comparisons and the amount of data transferred during skyline evalua aim, we attempt to ensure that the dominated data items reside in diff eliminated before applying skyline technique.This will help to avoi

METHODOLOGY The Proposed Approach (ICS)
In this section, the detail steps of the proposed approach, CIS for processing skyline qu in cloud incomplete database are presented and explained.The proposed approach focuse processing skyline queries with the intention of decreasing the number of pair comparisons and the amount of data transferred during skyline evaluation.To achieve aim, we attempt to ensure that the dominated data items reside in different datacenters eliminated before applying skyline technique.This will help to avoid many unneces technique.This will help to avoid many unnecessary pairwise comparisons between data items while holding the transitivity property and avoids the issue of cyclic dominance.The phases of the proposed approach for processing skyline queries in incomplete database in the cloud is elaborated in Figure 2.

Figure 2. The phases of the ICS approach
The steps of determining the skylines of each relation include sorting data and constructing array, filtering data, identifying local skylines and retrieving local skylines.Combining the skylines of each relation is performed to identify the skylines candidate.Finally, further comparisons are performed on the combined data items to derive the final global skylines.These phases are explained in detail as the following.

Identifying the Skylines of Each Relation in All Datacenters
The first phase, identifying the skylines of each relation in all datacentres, attempts to identify the skylines of each relation separately, which are located at different datacenters, aiming at discarding all dominated data items from the join operation.Thereby it results in propagating only the most candidate data items into the next phases.This process assists by avoiding joining of dominated data items via performing filtration.That leads in eliminating the These phases are explained in detail as the following.

Identifying the Skylines of Each Relation in All Datacenters
The first phase, identifying the skylines of each relation in all datacentres, attempts to identify the skylines of each relation separately, which are located at different datacenters, aiming at discarding all dominated data items from the join operation.Thereby it results in propagating only the most candidate data items into the next phases.This process assists by avoiding joining of dominated data items via performing filtration.That leads in eliminating the unnecessary pairwise comparisons between data items and reduce the amount of data transfer significantly.The detail processes of this phase are elaborated in the following subsections.

Sorting data and constructing array
.
unnecessary pairwise comparisons between data items and reduce the amount of data transfer significantly.The detail processes of this phase are elaborated in the following subsections.

Sorting data and constructing array
This step is responsible for analyzing the initial incomplete database relation and attempts to sort the data items based on non-missing dimensions in nonascending order.Then a set of arrays is constructed and the id's of sorted data items are stored in connected arrays.The number of arrays constructed mainly depends on the number of dimensions with no-missing value.This step helps in reducing the searching space, which further leads to decrease the number of pairwise comparisons between data items in the subsequent phases.

Filtering data
This step is one of the most significant phases in introducing the local skylines of each involved table.This phase is responsible for eliminating the dominated data items before applying skyline technique.This is achieved by scanning the whole data items in each array in sequential order using round robin fashion.
The scanning process ends when all data items have been read at least once.
It might happen that some data items are read more than once.Therefore, a counter is needed to count the number of reading of each data item.The idea behind using the counter is to sort the data items according to their count values in decreasing order.Hence, the data items with the highest count score have a higher potential to be in the skylines set.Besides, it also helps in eliminating a large number of dominated data items.The outputs of this process are a list of data items with their corresponding count values.

Identifying local skyline
In this step, the data points that have no potential to be part of the skyline are eliminated before applying skyline technique.That also helps in reducing unnecessary pairwise comparisons to make the proposed approach more efficient.This eliminating process will be executed by removing all the data points from the list with count score less than two.The rest data points will be stored in candidate set for the further process.

Retrieving local skyline
This step is responsible for the implementation of skyline technique over the data items presented in the candidate set.The aim is to find the local skylines separately from all relations stored in different datacenters at distant locations.This process is conducted in parallel on all datacenters.That helps to reduce the maximum amount of data to be transferred from one data center to another for evaluation of final skylines.The process starts by reading the first data item in the candidate set and then compared with the remaining data items.
The read data item named as processing data item p, while the data item to be compared with p is called candidate data item q.During the comparison process if p dominates q then q will be immediately eliminated from the candidate set.Else if neither p dominates q and nor q dominates p, then q will remain in the candidate set for further processing.However, if q dominates p then p will not be removed immediately; rather it will remain until the end of the iteration process.This is because p may have good potential to eliminate other data items and helps to sustain transitivity property and solves the issue of cyclic dominance.This process continues until all remaining data items are processed.It should be noted that no two data items are compared more than once.We argue that this process is effective in avoiding many unnecessary pairwise comparisons between data items.The output of this step is the set of the local skylines of each relation to be joined to form the final skylines.

Joining Skylines of all Relations
This phase intends to combine the identified local skylines of all relations into one relation at one datacenter.It should be noted that the output of this phase is a set of data items with the high potential to be in the skyline set.Thus, many unnecessary pairwise comparisons can be avoided, and only limited number of data items will be propagated into the next phase.

Identifying Global Skylines
This is the last phase of our proposed approach for processing skyline queries in a database with incomplete data over the cloud environment.It tries to determine the final skyline set which contains those data items that are not dominated by other data items in all involved relations.The sub-phases of the first phase (identifying the skylines of each relation) of our proposed approach will be performed on joined local skylines.If the joined data item is not dominated by the other data items in the candidate skyline set, then it is retrieved as part of the final skyline.Otherwise, it is removed from the candidate skyline set.In this process, we guarantee that the final skylines are the skylines of the relations in all cloud datacentres and no other data items might dominate the identified final skylines.

EXPERIMENT SETTINGS
Various experiments have been performed over different synthetic and real datasets to evaluate the performance of the proposed approach, ICS.The ICS approach has been compared with the most recent works: Incoskyline (Alwan et al., 2016) and Sort-based Incomplete Data Skyline (SIDS) (Bharuka & Kumar, 2013).Since skyline technique is a CPU intensive and needs exhaustive pairwise comparisons between data items, therefore, this work concentrates on measuring the efficiency of the proposed approach with respect to the number of pairwise comparisons and amount of data transfer between datacenters.These are considered as the most influenced parameters in processing skyline queries (Alwan et al., 2016;Bharuka & Kumar, 2013;Khalefa et al., 2008;Soliman et al., 2010).The number of pairwise comparisons has been computed with respect to the number of dimensions, and database size.These two metrics are measured by varying the number of dimensions, the number of dimensions with missing values, and the database size.In our experiments, we assumed that the database is fragmented vertically into three database relations situated on three different datacenters and the user has submitted the query into datacenter 1.Two different datasets have been involved in the experiment namely: synthetic (correlated) and real dataset (NBA and MovieLens).
Table 1 The Table 1 summarizes the parameter setting for synthetic and real datasets.Form the table we notice the first real dataset, MovieLens has a total of 4 dimensions and the number of dimensions with missing values is 3. Besides, the size of the dataset ranging between 400-2000KB and the database tables are spread over three different remote datacenters.Similarly, for NBA real dataset, the total number of dimensions varies between 6 -18 dimensions and the number of dimensions with incomplete data is between 5-17 dimensions.Furthermore, the dataset size varying between 40 -200 KB and the number of datacenters is 3. Lastly, for the correlated synthetic dataset, the number of dimensions is varies between 4-12 dimensions and the number of dimensions with missing values between 3-11 dimensions.While the dataset size in the range of 100-500KB and the total number of datacenters is 3.

RESULTS AND DISCUSSION
This section presents and discusses the result of the experiments that have been conducted on the synthetic and real datasets.The experiments have been designed with the aim of studying the impact of the dataset size and the number of dimensions on the number of pairwise comparisons and the processing time of each approach.Due to the limited resources and the high cost of constructing a real cloud environment with physical datacenters and other necessary tools, the proposed approach has been tested with a simulated cloud environment.We attempt to represent the cloud environment using some database tables which are distributed over several remote locations and simultaneous run on these database tables using the proposed approach is carried out to identify the local skylines of each database table.Then, the local skylines of each datacenter are further propagated to the next phase using the join operator.Lastly, the final skylines of all involved database tables are retrieved.Extensive experiments have been accomplished and the result discussed below.

Dataset Size
The experiment reports in this section attempt to examine the impact of the database size on processing skyline queries.For this set of experiment, the database size is variable, and the number of dimensions is fixed.Figure 3 illustrates the results obtained from real and synthetic datasets.Figure 3(a) shows the number of pairwise comparisons derived for the correlated dataset.
The number of dimensions is 8 and the size of the database is varying from 100KB to 500KB.Figure 3(b) depicts the experiment results of NBA dataset.
In this dataset, the number of dimensions is fixed to 18 and dataset size varies between 40KB to 200KB.Figure 3(c) presents the experiment results of the MovieLens dataset where database size varies from 400KB to 2000KB and the number of dimensions is 8. From the results, it is observed that our strategy outperforms Incoskyline and SIDS and database size have no significant impact on the performance of our proposed approach.This is due to applying the process of data filtration and local skyline identifier that helps in reducing the number of pairwise comparisons.

Number of Dimensions
In this set of experiment, we attempt to investigate the impact of the number of dimensions belongs to the database on the performance of the skyline process.
In this experiment, the size of the dataset has been fixed while varying the number of dimensions.Figure 4 depicts the results obtained for both datasets, namely, real and synthetic datasets.Figure 4(a) illustrates the experiment results of the real dataset, NBA where the number of dimensions is varying between 4 to 18 while the dataset size is set to be 120KB.Figure 4(b) describes the experiment results of the synthetic dataset (correlated) in which the number of dimensions is varying between 4 to 12 and dataset size is fixed to 200KB.
It can be concluded that our approach introduced a lower number of pairwise comparisons and steadily outperform SIDS and Incoskyline techniques.It is also noticed that increasing number of dimensions has a reasonable impact on the skyline process, which leads to a larger number of pairwise comparison in identifying the skylines.Nevertheless, this increment in the number of dimensions has no significant impact on our proposed approach and the number of pairwise comparisons is marginally increased.

Number of Dimensions
In this set of experiment, we attempt to investigate the impact of the number of dimensions belongs to the database on the performance of the skyline process.In this experiment, the size of the dataset has been fixed while varying the number of dimensions.Figure 4 depicts the results obtained for both datasets, namely, real and synthetic datasets.Figure 4(a) illustrates the experiment results of the real dataset, NBA where the number of dimensions is varying between 4 to 18 while the dataset size is set to be 120KB.

Data Transfer
This set of experiments concentrates on examining the impact of the dataset size on the amount of data transfer among datacenters during the skyline operation.The amount of data transfer indicates the total amount of data items that need to be transferred across the cloud datacenters to evaluate the skylines since it influences the performance of the skyline query process in a cloud environment.Figure 5a, 5b, and 5c depict the results of the proposed approach on synthetic (correlated), MovieLens, and NBA datasets, respectively.From the results, we can observe that applying skyline technique to each datacenter separately before transferring the data items is beneficial and leads to great reduces to the amount of data transferred.Amount of data transfer considered as a critical factor for query processing in distributed and cloud environments (Alwan et al., 2017).This is because the lesser amount of data to be transferred the faster the skyline process would be.Hence, transferring the local skylines to query submitted datacenter is far better than transferring all the data from each datacenter.Experiment results showed that we have successfully saved up to 95% to 98% of data from being transferred.That, in turn, saves the network cost.
13 dimensions has no significant impact on our proposed approach and the number of pairwise comparisons is marginally increased.

Data Transfer
This set of experiments concentrates on examining the impact of the dataset size on the amount of data transfer among datacenters during the skyline operation.The amount of data transfer indicates the total amount of data items that need to be transferred across the cloud datacenters to evaluate the skylines since it influences the performance of the skyline query process in a cloud environment.Figure 5a, 5b, and 5c depict the results of the proposed approach on synthetic (correlated), MovieLens, and NBA datasets, respectively.From the results, we can observe that applying skyline technique to each datacenter separately before transferring the data items is beneficial and leads to great reduces to the amount of data transferred.Amount of data transfer considered as a critical factor for query processing in distributed and cloud environments (Alwan et al., 2017).This is because the lesser amount of data to be transferred the faster the skyline process would be.Hence, transferring the local skylines to query submitted datacenter is far better than transferring all the data from each datacenter.Experiment results showed that we have successfully saved up to 95% to 98% of data from being transferred.That, in turn, saves the network cost.

CONCLUSIONS
In this paper, the issue of processing skyline queries in incomplete cloud database have been discussed.A new skyline approach called ICS has been proposed to process skyline queries in incomplete cloud databases.The detail steps of each phase of ICS have been explained.We also described how the proposed approach managed to derive the final skylines of the database in a cloud environment.We also showed the significance of using sorting and filtering techniques and how these techniques boost the skyline process.
Experiments over different types of datasets have been accomplished to measure the performance of the proposed approach.The results showed that our approach has significantly outperformed the previous techniques (SIDS and Incoskyline) in processing skyline queries in incomplete cloud databases by taking less number of pairwise comparisons and amount of data to be transferred from one datacenter to another.From the results it can be also noticed that dataset size and number of dimensions have insignificant impact

CONCLUSIONS
In this paper, the issue of processing skyline queries in incomplete cloud database have been discussed.A new skyline approach called ICS has been proposed to process skyline queries in incomplete cloud databases.The detail steps of each phase of ICS have been explained.
We also described how the proposed approach managed to derive the final skylines of the database in a cloud environment.We also showed the significance of using sorting and filtering techniques and how these techniques boost the skyline process.Experiments over different types of datasets have been accomplished to measure the performance of the proposed approach.The results showed that our approach has significantly outperformed the previous techniques (SIDS and Incoskyline) in processing skyline queries in incomplete cloud databases by taking less number of pairwise comparisons and amount of data to be transferred from one datacenter to another.From the results it can be also noticed that dataset size and number of dimensions have insignificant impact on our proposed approach.It can on our proposed approach.It can also be noticed that the idea of local skyline joiner has a great impact on reducing the amount of data transfer from one datacenter to another.

Figure 1 .
Figure 1.Example of Skyline Query

Definition 4
Comparable: Let the data items ai and (denoted by ai  aj) if and only if they have no m dimension; otherwise ai is incomparable to aj (denoted Definition 5 Cloud Database: given a set of database is a database denoted by DB (R1, R2, ..., Rn), where R DBi, D is a cloud database if the databases are deploye different sites.

Figure 2 .
Figure 2. The phases of the ICS approach

Figure 3 .
Figure 3. Database Size Effect Figure 4(b) describes the experiment results of the synthetic dataset (correlated) in which the number of dimensions isvarying between 4 to 12 and dataset size is fixed to 200KB.It can be concluded that our approach introduced a lower number of pairwise comparisons and steadily outperform SIDS and Incoskyline techniques.It is also noticed that increasing number of dimensions has a reasonable impact on the skyline process, which leads to a larger number of pairwise comparison in identifying the skylines.Nevertheless, this increment in the number of

Figure 4 .
Figure 4. Number of dimensions effect.

Figure 4 .
Figure 4. Number of Dimensions Effect

Figure 5 .
Figure 5. Amount of Data Transfer Given two data items p i and p j ∈ D database with d dimensions, p i the greater is better) (denoted by pi ≻ pj) if and only if the following condition , p i .dk ≥ p j .dk ∧ ∃d l , ∈d, p i .dl > p j .dl .Given two data items p i and p j ∈ D database with d dimension dominates p j (the greater is better) (denoted by pi ≻ pj) if and only if the following cond holds: ∀ d k ∈ d, p i .dk ≥ p j .dk ∧ ∃d l , ∈d, p i .dl > p j .dl .
holds: ∀ d k ∈ d, p i .dk ≥ p j .dk ∧ ∃d l , ∈d, Definition 3 Skyline Queries: Select a pi is as good as pj (where i ≠ j) in all d least one dimension (attribute).We us Sskyline = (pi ∀ pi, pj ∈ D, pi ≻ pj).Definition 4 Comparable: Let the da (denoted by ai  aj) if and only if th dimension; otherwise ai is incomparable Definition 5 Cloud Database: given a is a database denoted by DB (R1, R2, ... in cloud incomplete database are presen processing skyline queries with the comparisons and the amount of data tr aim, we attempt to ensure that the dom eliminated before applying skyline tec pairwise comparisons between data item Definition 2 Dominance: Given two dat dominates p j (the greater is better) (denot holds: ∀ d k ∈ d, p i .dk ≥ p j .dk ∧ ∃d l , ∈d, p i Definition 3 Skyline Queries: Select a d pi is as good as pj (where i ≠ j) in all di least one dimension (attribute).We use Sskyline = (pi ∀ pi, pj ∈ D, pi ≻ pj).Definition 4 Comparable: Let the data (denoted by ai  aj) if and only if they dimension; otherwise ai is incomparable t Definition 5 Cloud Database: given a s is a database denoted by DB (R1, R2, ..., DBi, D is a cloud database if the database different sites.MET The Proposed Approach (ICS) In this section, the detail steps of the pro in cloud incomplete database are presente processing skyline queries with the i comparisons and the amount of data tra aim, we attempt to ensure that the domi eliminated before applying skyline tech pairwise comparisons between data item ominance: kyline Queries: Select a data item pi from the set of D database if and only if as pj (where i ≠ j) in all dimensions (attributes) and strictly better than p j in at ension (attribute).We use Sskyline to denote the set of skyline data items, ∀ pi, pj ∈ D, pi ≻ pj).Comparable: Let the data items ai and aj ∈ R, ai and aj are comparable i  aj) if and only if they have no missing values in at least one identical erwise ai is incomparable to aj (denoted by ai aj).loud Database: given a set of databases D (DB1, DB2, ..., DBn), where DB1 enoted by DB (R1, R2, ..., Rn), where R represent a database relation belong to oud database if the databases are deployed over different datacenters located on METHODOLOGY Approach (ICS) , the detail steps of the proposed approach, CIS for processing skyline queries plete database are presented and explained.The proposed approach focuses on yline queries with the intention of decreasing the number of pairwise nd the amount of data transferred during skyline evaluation.To achieve this pt to ensure that the dominated data items reside in different datacenters are fore applying skyline technique.This will help to avoid many unnecessary Definition 2 Dominance: Definition 3 Skyline Queries: Select a data item pi from the set of D database if and on pi is as good as pj (where i ≠ j) in all dimensions (attributes) and strictly better than p j least one dimension (attribute).We use Sskyline to denote the set of skyline data it Sskyline = (pi ∀ pi, pj ∈ D, pi ≻ pj).Definition 4 Comparable: Let the data items ai and aj ∈ R, ai and aj are compar (denoted by ai  aj) if and only if they have no missing values in at least one iden dimension; otherwise ai is incomparable to aj (denoted by ai aj).Definition 5 Cloud Database: given a set of databases D (DB1, DB2, ..., DBn), where is a database denoted by DB (R1, R2, ..., Rn), where R represent a database relation belon DBi, D is a cloud database if the databases are deployed over different datacenters locate different sites.
Definition 2 Dominance: Given two data items p i and p j ∈ D database dominates p j (the greater is better) (denoted by pi ≻ pj) if and only if th holds: ∀ d k ∈ d, p i .dk≥pj .dk∧∃dl , ∈d, p i .dl>pj .dl.Definition 3 Skyline Queries: Select a data item pi from the set of D pi is as good as pj (where i ≠ j) in all dimensions (attributes) and stric least one dimension (attribute).We use Sskyline to denote the setSskyline = (pi ∀ pi, pj ∈ D, pi ≻ pj).Definition 4 Comparable: Let the data items ai and aj ∈ R, ai a (denoted by ai  aj) if and only if they have no missing values in dimension; otherwise ai is incomparable to aj (denoted by ai aj).Definition 5 Cloud Database: given a set of databases D (DB1, DB2 is a database denoted by DB (R1, R2, ..., Rn), where R represent a data DBi, D is a cloud database if the databases are deployed over different different sites.processing skyline queries with the intention of decreasing the comparisons and the amount of data transferred during skyline evalu aim, we attempt to ensure that the dominated data items reside in di eliminated before applying skyline technique.This will help to avo pairwise comparisons between data items while holding the transitivit 6 plain the details of our proposed approach.Our the context of relational databases with partially database D is denoted by R (d1, d2, ..., dm) where m-arity and d = (d1, d2, ..., dm) is the set of : given a database D (R1, R2, ..., Rn), where Ri is a dm), D is said to be incomplete if and only if it th missing values in one or more dimensions dk .two data items pi and pj  D database with d eater is better) (denoted by pi  pj) if and only if d, pi.dk  pj.dk  dl,d, pi.dl  pj.dl .ct a data item pi from the set of D database if and j) in all dimensions (attributes) and strictly better (attribute).We use Sskyline to denote the set of pi, pj  D, pi  pj).data items ai and aj  R, ai and aj are comparable ey have no missing values in at least one identical rable to aj (denoted by ai ε/ aj).