TRAFFIC FLOW PREDICTION MODEL BASED ON NEIGHBOURING ROADS USING NEURAL NETWORK AND MULTIPLE REGRESSION

Monitoring and understanding traffic congestion seems difficult due to its complex nature. This is because the occurrence of traffic congestion is dynamic and interrelated and it depends on many factors. Traffic congestion can also propagate from one road to neighbouring roads. Recent research shows that there is a spatial correlation between neighbouring roads with different traffic flow pattern on weekdays and on weekends. Previously, prediction of traffic flow propagation was based on day and time during weekdays and on weekends. Results obtained from past studies show that further investigation is needed to reduce errors using a more efficient method. We observed from previous research that similarity of traffic condition on weekdays and weekends was not taken into account in predicting traffic flow propagation. Hence, our study is to create and evaluate a new prediction model for traffic flow propagation at neighbouring roads using similarity of traffic flow pattern on weekdays and weekends to achieve more accurate results. We exploit similarity of traffic flow pattern on weekdays and weekends by adding


INTRODUCTION
Due to increase in population and the number of private cars in this modern era, traffic congestion has become significantly worse, not only leading to economic losses, but also causing environmental damages (Yinan, Rui, Daqing, Shengmin, & Havlin, 2017), human stress and pollution (Petrovska & Stevanovic, 2015).Drivers need traffic information which can affect their driving to a certain extent (including changing their driving habits, driving paths, etc.) (Zhu, Song, Zhang, & Wang, 2016) and thus lead to chain changes in traffic flow state in the upstream and downstream of relative road section and other road segments in the network.
In an urban traffic network, congestion will occur, and it will spread through road networks due to increasing traffic flow.Various studies in traffic flow made enormous progress based on available traffic data.Unfortunately, the understanding of spatial-temporal propagation of traffic congestion in the city is still unclear (Yinan et al., 2017).Monitoring and understanding traffic congestion seems difficult due to its complex nature.One of the complexities of traffic congestion is unpredictability.Sometimes traffic congestion occur and sometimes they do not, depending on many factors.Another complexity is that traffic congestion are dynamic and interrelated.Traffic congestion can propagate from one road to neighbouring roads (Wang, Lu, Yuan, Zhang, & Wetering, 2013) (Yinan et al., 2017).Some studies predict and model the traffic flow transition, predict missing state of traffic flow using Hidden Markov Model with Floating Car Data (Wang, Peng, Chi, Li, Yao, & Shao, 2015), another study uses HMM for modelling the traffic flow state transition on urban road network with virtual machines (Zhu et al., 2016).Both studies learn the pattern transition between one road to another road in a neighbouring area.Two or more roads in a neighbouring area may have a congestion correlation if all roads were congested at the same time (Wang, Cao, Li, & Gu, 2016).In other words, two or more roads may cause road congestion at downstream roads.This fact shows that neighbouring roads can influence traffic flow.If there is a traffic jam or road congestion on a neighbouring road, it will affect the traffic flow on roads in neighbouring area.
Other studies show that a road always has a similar traffic state on the same weekday or weekend at the same time interval (Lee, Hong, Jeong, & Lee, 2014).Its adjacent roads too have similar history in terms of road traffic condition during weekdays or weekends (Wang et al., 2016).Other research used neural network for prediction of traffic flow based on speed on neighbouring road using all day's data (Lee et al., 2014).A research by Zhou and Huang (2009) used neural network to predict traffic flow on road intersections using all day's data.Investigating similar traffic conditions on adjacent roads can lead to a pattern between a road and its neighbouring roads.Our experiment shows prediction using traffic flow based on neighbouring roads using similar traffic condition data shows better results compared to prediction using all day's data.

RELATED WORKS
Time series and neural network models are widely used to predict traffic flow and traffic congestion.Time series predictions are predictions based on historical data on the same road location.The autoregressive integrated moving average (ARIMA) method (Abadi, Rajabioun, & Ioannou, 2015;Dong, Jia, Sun, Li, & Qin, 2009) and generalized autoregressive conditional heteroskedasticity (GARCH) method (Shbier, Ku-Mahamud & Othman, 2017) are often used for time series prediction.The seasonal ARIMA model shows high performance in traffic flow forecasting (Abadi et al., 2015).Increasing performance of ARIMA method can be done with addition of day classification (Dong et al., 2009).Instead of ARIMA, regression method can also be used for time series prediction (Dai & Yang, 2006;Ku-Mahamud, Zakaria, Katuk & Shbier, 2009).There is also research which used multiple regression for prediction of average speed (Bagus & Azlina, 2017).Other research added fuzzy (Adriansyah, Gunardi, Badaruddin, & Ihsanto, 2015) to estimate real-time traffic volume (Dai & Yang, 2006), and to forecast rainfall (Othman & Azahari, 2016).Another research used feature extraction in prediction (Fitrianah, 2015).Neural networks are also commonly used for various types of predictions (Kumar, Parida, & Katiyar, 2013;Lee et al., 2014;Ow, Ngo, & Lee, 2016).Others added neural network when using linear fuzzy for short time prediction on toll roads to design a number of sensors on the highway (Chan & Dillon, 2014).Other research used naive Bayes for predicting traffic congestion (Kim & Wang, 2016), which can also be combined with support vector regression (Ahn, 2016).Considering many factors that can affect the flow of traffic, multivariate prediction is used to predict traffic flow instead of using time series data.There are also external factors that can affect traffic flow.These factors affect traffic fatalities which will affect traffic flow.These include parked vehicles, total number of licensed drivers, number of traffic fines and number of serious traffic accidents.Cai, Zhu and Yan (2015) used these factors in predicting traffic fatalities using multiple regression model.Weather is also an external factor for predicting traffic flow (Lee, Hong, Lee, & Jang, 2015).Other influential factors are traffic density and day of the week (Kumar et al., 2013).
Besides prediction using both linear and nonlinear methods, there are also other methods used to predict traffic flow.Hu, Yan and Wang (2014) predicted road congestion based on slices of crossroads using the BML (Biham, Middleton and Levine) model.Another research by Lee et al. (2014) used similarity of traffic congestion pattern to predict short-term traffic decongestion times.
In predicting traffic flow and traffic congestion on a road, common factors were used such as historical correlation or time series, vehicle speed, weather, and accidents.Instead of using historical correlation, there is a spatial correlation factor that affects traffic congestion at neighbouring roads.Previous research showed this spatial correlation.Examples of such research are inter-road relationship extracted using 3D Markov model by Ahn (2016), visualizing and highlighting impact of traffic on adjacent roads by Anwar, Nagel and Ratti (2014), visualization of traffic jam showing spread of traffic flow by Wang et al. (2013), detection of traffic jam based on slices between intersections indicating an inter-path relationship by Hu et al. (2014) and mining congestion between road segments by Wang, Cao, Li and Gu (2016).

PROBLEMS
Unpredictability is one of the complexities of predicting traffic congestion.Many factors can affect occurrence of traffic flow.Factors which can be used to predict traffic flow include speed of vehicle, weather, accidents, and special days or events.In addition to these factors, the congestion level on neighbouring roads can greatly affect traffic flow.Another complexity is that traffic congestion are dynamic and interrelated.Traffic congestion can propagate from one road to neighbouring roads.If there is traffic jam or road congestion on neighbouring roads, it will affect the traffic flow on roads in neighbouring area.
The strong relationship between one road and the other roads around it makes the road a strong candidate for factor input in predicting traffic flow.This type of prediction can help in predicting speed on roads that have damaged sensors or missing data.The surrounding road traffic can be used as a tool to predict traffic flow propagation.As we can see in Figure 1, road 158324 is influenced by the road that surrounds it.If there is traffic jam or road congestion on the neighbouring roads it will affect the average speed on the road 158324.Other research shows there is a difference in congestion pattern between weekdays and weekends (Lee et al., 2014) and also between regular and irregular days (Rempe, Huber, & Bogenberger, 2016) as we can see in Fig 2. Lee et al. (2014) analysed similarity pattern of traffic congestion to predict traffic decongestion.They used historical pattern that is similar to the current congestion pattern and predict the end of congestion times by comparing with the actual end of congestion times.The experimental results show that their approach of using similarity pattern in predicting decongestion is reliable.However, they did not use the similarity pattern to predict traffic flow propagation.Prediction of traffic flow using neural network was used in studies by Lee et al. (2007) and Zhou & Huang (2009).Lee et al. (2007) created a neural network model to predict traffic flow which consists of 41 neurons for the input layer and 44 neurons each for the two hidden layers.The input layer is composed of day of the week (7 days), time and speed of neighbouring roads.This method successfully reduced prediction error up to 41.8%.On the other hand, Zhou and Huang (2009)  relationship of upstream and downstream traffic flow at two adjacent signal intersections.Effect of congestion on neighbouring roads and similarity of traffic flow pattern were not taken into consideration.In another study conducted by Kumar et al. (2013), they applied neural network in their prediction model which incorporates traffic volume, speed, density, time and day of week as input variables.Their neural network structure has 19 neurons for the input layer and 6 neurons for the hidden layer.Similarity of traffic flow pattern and effects of neighbouring roads were not used as variables in their study.Previously, we have conducted prediction of traffic flow in normal condition based on all day's data using multiple regression method (Bagus & Azlina, 2017).We believe that a more efficient model with less variables can be developed to reduce errors by taking into consideration the similarity of traffic flow pattern.Based on our previous study, we found that traffic flow propagation pattern in neighbouring roads is different on weekdays and on weekends.Prediction on weekdays using only weekdays' data will have equal or better result if compared to using all days' data.

METHODS
In order to achieve our main research objective that is to develop and evaluate a model to predict traffic flow propagation, we proposed a methodology as shown in Figure 3.The methodology comprised of two main phases: Initial Phase and Development and Evaluation Phase.In this paper, we will report findings from experiments which were conducted in the Initial Phase of the study.The main objective of the experiments is to determine relationship between neighbouring roads.From the results, we developed a neural network model with time cluster and high correlation roads as input factors.
For our experiments, we used data set from IoT traffic sensor in Aarhus, Denmark (Barnaghi, Ralf, & Jan, 2013;Bischof, Karapantelakis, Sheth, Mileo, & Barnaghi, 2014).The total number of IoT sensors is 449 and their location is shown in Figure 4.For example, sensor at A is labelled 190100.This sensor is placed from Nørreport 93 Aarhus, Denmark to Spanien 63 Aarhus, Denmark.The distance between both points is 1490 meters.In this experiment we only used average speed as representation of traffic flow, and timestamp data.Specific details of this sensor are described in Table 1 and  Table 2     To predict average speed on neighbouring roads, we need to find the highest correlation among all roads in neighbouring area.Initially, we considered two roads as in the neighbouring area if the distance between them is not more than four (4) kilometers.We calculated correlation among all neighbouring roads using the formula below.
We observed traffic flow pattern on two roads and found there are different patterns of traffic flow between weekdays and weekends.Weekdays' data is traffic data from Monday to Thursday, and weekends' data is traffic data from Friday to Sunday.For road 158324, we calculated correlation of all neighbouring roads based on the mean value of average speed at interval 06.00 am to 07.00 am from 13-02-2014 to 02-06-2014.We chose 06.00 am to 07:00 am because at this time, congestion occurred, as seen in Figure 5.For road 193294, we calculated correlation based on the mean value of average speed at interval 15.00 to 16.00, because at this time congestion occurred, as seen in Figure 6.The correlation results of road 158324 and its neighbouring roads are shown in Table 3 (a).The correlation results of road 193294 and its neighbouring roads are shown in Table 3 (b).The visualisation of the results is shown in Figure 7 and Figure 8.The stars in these figures represent six of highest correlation values.From Figure 8, we observed that one of the roads with high correlation value is not connected to road 19324 if we set the distance between them as not more than four (4) km.Thus, we redefined that two roads are in neighbouring area if the distance between them is not more than three (3) km.With this new definition, we observed from Table 3 that roads 158386, 158595 and 158536 are less than three (3) km from road 158324 and they have high correlation values (>0.5).For road 193294, roads that are within three (3) km and have high correlation values (>0.5) are roads 193348, 193268, 193322 and 195923.

Predicting speed using back propagation neural network
In artificial neural network, there is an important process called "training".This process involves calculation of output and input relationships (Lee et al., 2007).A neural network learns to generate outputs based on inputs and changes its calculated weight values based on expected value and resulting output value.In this study, we used back propagation method to predict average speed of vehicle on roads 158324 and 193294 using our proposed neural network model.The training method of this algorithm is as follows: 1.
Set the input value to the input layer of the neural network and calculate the output.

2.
Calculate the difference between expected value and resulting output value.

3.
Calculate the weight based on the different calculation to reduce the difference.

4.
Update the weights based on step 3.

5.
Repeat all steps (1-4) until it reaches the optimum value that is when the difference between the expected value and the resulting output value is very low.
From our experiments, we identified three roads as the independent variables to predict the average speed on road 158324.They are roads 158386, 158595 and 158386.The dependent variable is road 158324 (Y).For predicting  193268, 193322, and 195923.The dependent variable is road 193294 (Y).
Our neural network model for predicting road 158324 consists of input layer with ten neurons.Three neurons are for high correlation roads, six neurons are for time clustering, and one neuron for weekday or weekend, as shown in Figure 12.On the other hand, for predicting road 193294, our neural network model consists of input layer with eleven neurons.Four neurons are for high correlation roads, six neurons are for time clustering, and one neuron for weekdays or weekends.We only use one hidden layer with one neuron as shown in Figure 13.

RESULTS AND DISCUSSION
Results of the experiment for short time prediction at Monday 02/06/2014 (50 minutes) on road 158324 are shown in Table 4. Actual values are compared with values obtained using our proposed NN cluster model, NN without timecluster and multiple linear regressions (MLR) for road 158324.Errors were then calculated and displayed in Table 5.Similarly, results of experiment for road 193294 are shown in Table 6 and the calculated errors are displayed in Table 7.The errors were calculated using mean absolute deviation (MAD), root mean square error (RMSE), and mean absolute percentage error (MAPE).For both roads 158324 and 193294, we plot line charts to compare traffic flow predictions using our proposed model with other methods as shown in Figure 14 and Figure 15.From Table 4, Table 5, Table 6 and Table 7, we observed that our proposed NN with time cluster produced better results than NN without time-cluster and multiple linear regressions (MLR).Thus, we can conclude that our proposed model using time cluster based on similarity of traffic flow pattern has potential to generate more accurate results in traffic flow prediction.Figure 16 and Figure 17 are bar charts of the results obtained from the experiments.Both figures show that prediction of traffic flow using neural network with time cluster produced better results compared to prediction using multiple linear regression method and neural network without time cluster parameters.
From Figure 16 and Figure 17, we observed that prediction of traffic flow using neural network with time cluster generally produced better results compared to prediction using multiple regression method and neural network without time cluster.

CONCLUSION
Main aim of our experiments in this study is to investigate the impact of traffic flow on one road with traffic flow on neighbouring roads.The results showed that when congestion occurs, there is a relationship between a road and its neighbouring roads.Our investigations showed that using this relationship in neural network improved accuracy of prediction of average speed of vehicles on neighbouring roads.Further, we observed similarity pattern of traffic flow on weekdays and weekends.From this observation, we developed our neural network model with time cluster and high correlation road as input factors.Using back propagation method, we compared traffic flow predictions using our time-cluster neural network model with neural network model without time cluster and multiple linear regression method.Our results showed that our time-cluster neural network model produced better results when compared to neural network without time cluster and multiple linear regression method.
used neural network to predict traffic flow but for road intersections only.They established the Traffic congestion pattern on weekdays Traffic congestion pattern on weekend 2013.04.30(Tue) 2013.05.0 (Wed) 2013.05.04 (Sat) 2013.05.05 (Sun)

Figure 3 .
Figure 3. Methodology applied to develop traffic flow propagation prediction model.

Figure 4 .
Figure 4. Map of 449 IoT traffic sensors in city of Aarhus, Denmark.

Figure 7 .
Figure 7. Visualization of results for road 158324 and its neighbouring roads.

Figure 8 .
Figure 8. Visualization of results for road 193294 and its neighbouring roads.

Figure 11 .
Figure 11.Proposed neural network model with time cluster and high correlation roads.

Figure 12 .
Figure 12.Neural network with time cluster model to predict average speed on road 158324.

Figure 13 .
Figure 13.Neural network with time cluster model to predict average speed on road 193294.

Figure 14 .
Figure 14.Result of short time prediction on road 158324.

Figure 15 .
Figure 15.Result of short time prediction on road 193294.

Figure 16 .
Figure 16.Comparison of NN time cluster with multiple linear regression and NN without time cluster for road 158324.

Figure 17 .
Figure 17.Comparison of NN time cluster with multiple linear regression and NN without time clustering for road 193294.

Table 1
Example traffic data taken from sensor 190100

Table 2
Example traffic data taken from sensor 190100

Table 3
Correlation of road 158324 and road 193294 with neighbouring roads

Table 4
Results of short time prediction using NN with and without time cluster and multiple linear regressions on road 158324

Table 5
Error of short time prediction on road 158324

Table 6
Results of short time prediction using NN with and without time cluster and multiple linear regressions (MLR) on road 193294

Table 7
Error of short time prediction on road 193294