CONSISTENCY OF ONLINE CONSUMERS’ PERCEPTIONS OF POSTED COMMENTS: AN ANALYSIS OF TRIPADVISOR REVIEWS

Ratings and comments play a dominant role in online reviews. The question, thus, arises as to whether or not there is any consistency in consumer perception of the reviews, and how future choices might be influenced. We analysed 2000 comments of 20 different hotels posted on TripAdvisor to determine if the comments posted by previous guests of a hotel influence the decisions of potential guests. Two hundred human raters were asked to consider 20 reviews and to rate a hotel based on the reviews. The Cohen Kappa coefficient was used to evaluate the degree of agreement on the hotel quality as determined by the human raters and the star rating given by the original reviewer. The results showed a high consistency between the human raters’ evaluation and the reviewers’ star rating. This research reveals the importance of website feedback such as TripAdvisor in influencing consumer choice.


INTRODUCTION
Social media is a rich platform to seek out and share products, experiences and information (Chan, Lacka, Yee, Lim, 2015).According to social psychology perspectives, consumers tend to review online comments before purchase decisions (Wong, 2015) or during the employment-seeking process (Sander, Teh, & Sloka, 2015).The users provide feedback through comments (Siersdorfer, Chelaru, Pedro, Altingovde & Nejdl, 2014).The online review has become a notable strategy to share products worldwide and has been termed as the online word-of-mouth (WOM) (Zhang, Guo, & Goes, 2013).In addition, the feedback tracked online also contributed to opinion mining (Raut & Londhe, 2015) in which a product or service improvement is required (Leskovec, 2011).With the advancement of information technology, in the otel industry, various tools could be adopted and used to capture the feedback, fulfil greater customer expectations and thus increase customer relationship (Hassan, Hussain, & Saibu Rahman, 2013).
There are various ways to provide feedback (Kauffman, Laf, Lin, & Chang, 2009).For example, rating or ranking a post is one of the common ways in the social media.TripAdvisor (Wyndham, 2015), YouTube (Alec, 2013) and Metacritic (Dietz, 2015) use rating as their review mechanism.YouTube, for instance, uses symbols such as Thumb Up or the Like button to indicate the degree of agreement, support or helpfulness of the shared information or links.Consumers provide feedback in the social media along with comments and ratings for products and services they consumed.The reviews, thus become an issue that either provides a helpful or non-helpful manner of rating (Krestel & Dokoohaki, 2011).Bermingham (2014) suggested that comments are useful information for formulating a business strategy (Chan et al., 2015) and facilitating a marketing research pertaining to customer behaviour and supporting the management disciplines.Evidently, user reviews and ratings are capable of influencing the consumers' decision-making; thus, making them worthy to be investigated.Chen, Guo, Tseng and Yang (2011) stated that a prestigious user's comment received a higher number of followers.They also found that the quality of a comment judged editorially is almost uncorrelated with the rating it received.This research question arises on the consistency in the value expressed in the reviews with the human rater opinion on the perspective of quality of review toward the hotel.Krestel and Dokoohaki (2011) postulated that some of the comments and ratings are inaccurate, irrelevant or inconsistent.Zhang, Guo and Goes (2013) revealed the same information, i.e. some of the reviews are inconsistent and some are irrelevant.If one could rank the reviews based on the relevancy of a user's favour, it will assist in understanding more about the products with reviews.Therefore, to identify if a reader's response has any relevance to the level of a user's favour, we need to determine whether or not comments posted by previous guests of the hotels has any relevancy and thus influence the decisions of potential guests.

RELATED WORK
TripAdvisor is a popular online travel website in the world (Vásquez, 2011).It has also assisted the traveller in the decision-making of their hotel stay (Matosrodríguez, 2014).A useful comment increased the website's reputation and it served as a precious asset to the website and the user's gratification assessment often times appeared in the form of ratings (Moghaddam & Martin, 2011).Not only that, Chen et al., (2011) stated that it also generated a higher support in the web community.This is further corroborated when experienced customers agreed on the statement or judgment of the overall quality of products and thus influenced other buyers' purchase decisions (Klan & Ries, 2014).
Web communities allowed comments in both objects and written format.Written comments consumed time in interpretation as compared to objects.According to Samsudin, Puteh, Hamdan and Ahmad Nazri (2013), noisy texts is a common phenomenon in online reviews and it affects data mining exercise.Also, comments may be irrelevant or casual (Zhang et al., 2013).In text analysis techniques, the content analysis tool is used to determine the positive and negative emotions from blog texts (Gill, French, Gergle, & Oberlander (2008).According to Gill et al. (2008), a positive writer tends to use positive words frequently.Gill showed positive and negative emotional concept words in their table to determine the significant differences between means.
The mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT) was used to test emotional intelligence (EI) (Abe, 2011).Students' performance rated by their supervisors were associated with positive emotive words, whereas students' perceived benefits and practical courses were less associated with positive emotive words.Several types of research have been performed on the emotive word in various means.For instance, in a cognitive study, Filippi, Ocklenburg, Bowling, Heege, Gunturkun, Newen and de Boer (2016) stated that humans typically combine linguistic and non-linguistic information to comprehend emotions.In the same study, language evolution was discussed, however, it was not specific to emotive word processing.On the other hand, in the developmental science, Li and Yu (2015) suggested children's emotive word comprehension (EWC) developed with age.A psycholinguistic study by Sereno, Scott, Yao, Thaden, and O'Donnell (2015) showed a significant interaction between participant mood and word emotionality.However, there is limited study on comments related to emotive words used in online comments.
The consideration of emotive words is fundamental in the affective language theoretical formation (Hinojosa, Rincón-Pérez, Romero-Ferreiro, Martínez-García, Villalba-García, Montoro & Pozo, 2016).However, emotive words are not the only means to express emotion.Sentiment analysis is moving beyond basic text mining and extracting opinions from the text alone; emoticons and other means of expressing sentiment are now part of the analysis.Researchers revealed that emoticons are not just for fun, in fact, they serve as an additional and valuable supplement in communication methods (Huang, Yen, & Zhang, 2008).In addition, the latest study by Teh, Rayson, Pak, Piao, and Yeng (2016) discovered that emoticons have significant abilities to reverse the polarity of comments.It further corroborates that the processing and understanding of emotions expressed by the comments is vitally important in sentiment analysis and studies.
According to Zhang, Zhang, and Yang (2016), travellers' rating behaviour is affected by a number of expert reviews on a particular hotel, level of reviewers' expertise, and the website recognized expertise.However, in the comment analysis, Fong, Lei, and Law (2016) pointed out that a review is not necessarily consistent with its sentiment.In their study, only one hotel's reviews were analysed and studied.To address the gap, we investigated and compared the consistency between the rating provided by readers of the comments and the original reviewers' rating of 20 hotels.In addition, this study assessed, compared, and investigated the consistency of the human raters and online comments, without revealing the hotel star rating by the original reviewers to the human raters.

METHODOLOGY
In this study, we selected TripAdvisor as our main platform to aggregate the comments.The reason for this selection was because of its worldwide popularity.TripAdvisor is an American travel website founded in February 2000 (TripAdvisor, 2015).It has since expanded to 45 countries, translated into 28 languages with 280 million users and it has accumulated almost 170 million comments.Users create a personal profile, with reviews and ratings (Matos-rodríguez, 2014) and share their experiences with others.The TripAdvisor bubble rating is shown as a 5-point Likert scale: 1= "Terrible", 2= "Poor", 3= "Average", 4= "Very Good" and 5= "Excellent" (TripAdvisor, 2015).Hypotheses were formulated to examine the relationship between items, online reviews, and the human raters as well as its inter-relativity and the degree of agreement of the review samples and it rating by 200 human raters.

Hypothesis Development
Two hypotheses were developed to examine the consistency between the readers' positive and negative ratings and the original reviewers' rating without revealing the hotel star rating by the original reviewers.
Hypothesis 1: There is a significant relationship between the TripAdvisor positive rating and the human raters' rating.
Hypothesis 2: There is a significant relationship between the TripAdvisor negative rating and the human raters' rating.

Questionnaire Design
We recruited human raters to evaluate the sampled reviews, based on the selected top ten (10) frequently used positive and negative emotive words, regardless of which star rating they belonged to in order to examine the reviews' quality level to of the hotel.This was to identify if the reviews have the potential to attract new customers or return customers based on the consistency of the original rating reviews and the quality of readers' opinions.Each questionnaire comprised a set of 20 sample comments; the total number of questions was thus 40 (there were two set of questionnaires).Of the 20 comments in each questionnaire, ten comments were positive in nature and ten were negative; each comment contained one of the emotive words given in Tables 1 and 2. Correspondingly, for each category of the star-rated hotels, there was a total of ten comments given in the two questionnaires; five positive comments and five negative comments.Two sets of questionnaires were designed for this study in order to have a balanced sampling of the selected reviews of the hotel from all rankings and polarity of words used.Each questionnaire contained a set of 20 sample comments: the total number of questions was thus 40 (there were two sets of questionnaires).For the 20 comments in each questionnaire, ten comments were positive in nature and ten were negative: each comment contained one of the emotive words given in Tables 1 or 2. Correspondingly, for each of the different categories of star-rated hotels there was thus an overall total of ten comments given in the two questionnaires: five positive comments and five negative comments.
Eventually, each set of questionnaire consisted of two (2) sections.Section 1 was sampled with reviews that contained positive emotive words from Table 4, each with a sample from the listed top 10 emotive words.Similarly, section 2 with the sample of reviews contained negative emotive words accordingly from the list in Table 5.Each section had 10 reviews, with the top ten emotive words appearing in the positive and negative reviews.

Sampling Techniques
TripAdvisor categorized the hotels into four different star ratings; 2-Star (lowest rating hotel), 3-Star, 4-Star, and 5-Star (highest rating hotel).Five "Top Traveller's choice hotels" (as defined by TripAdvisor) were selected from each star category.For each hotel selected, 100 most recent comments were extracted (as of 30 June 2016).

Sampling Population/Selection Criteria for Respondents
The questionnaires were distributed to 200 students aged between 22 to 30 years old.Table 3 shows 93 or 88.5% respondents reported that they sometimes, often, or always, referred to reviews and booked hotels online.Approximately 88.5% responded that they often or always booked a hotel online.95 respondents (47.5%) stated that they frequently commented on the hotel-booking website, 54 (27%) respondents indicated they sometimes commented on the website, followed by 39 (19.5%) respondents who admitted that they always commented on the website, and 12 respondents (6%) said they rarely commented on the website.

Sampling-size Calculation/Justification
Two hundred (200) questionnaires were distributed.Responses from the returned questionnaires were collected and the input was keyed into SPSS for statistical analysis.The Pearson Correlation was run to determine the relationship between the two variables (Evans, 1996) and the Inter-reliability was examined to evaluate the agreement between the two classifications.The Cohen's Kappa value ranged from 0 to 1.0 (Cohen, 1960;Fleiss et al., 2003) with a higher score indicating a higher strength of agreement between the wo classifications.We recorded the participants' rating to of the reviews based on their feelings/perceptions.The average time spent to answer the questionnaire was 10 minutes, at a private university in Malaysia.The respondents answered one of the questionnaires randomly and were not allowed to participate twice.

Data Analysis Strategy
A total frequency of 279 emotiove words was categorized by means of Wmatrix (Rayson, 2008) and the word "happy" and "happily" appeared at least once in every hotel review for all rankings.When all the reviews were analysed again, the top 10 frequently used words were recorded.Table 4 and Table 5 show all the top 10 frequently used positive and negative emotive words.Loved (0.059%), like (0.056%), disappointed (0.026%) and disappointment (0.011%) were the four top positive and negative emotive words.Figure 1 shows the rating from respondents after they read the review samples from questionnaire Set A. The circle indicates the highest rank of the positive emotive words.Apparently, eight out of ten of the positive words were rated "Excellent", and only two were rated "Very Good".This result suggested a coherent positive word used in the comments of the reader's perception.However, the usage of negative emotive words was not negatively coherent to the rating of impression.For example, in Figure 2, there were only 4 rated as "Fair" by the respondents in set A -NW2a (68%), NW3a (56%), NW4a (49%) and NW5a (51%).The other 4 rated as "Good" had a higher number of respondents, namely NW1a (60%), NW7a (57%), NW8a (75%), and NW9a (56%).This sums up to 248 higher respondents who rated better impression.Even if it is a negative use of emotive words, the value of the reviews appeared incoherent.Reviews were extracted to examine how the negative words were used; NW5a and NW10a above were extracted as examples.

Respondent Rating of Impression after Reading the Reviews
Overall, the words used conveyed a "very good" impression and "excellent" rating to the readers.
We repeated the investigative steps on Set B questionnaires. Figure 3 displays a similar consistency.The top ten positive emotive words used in the reviews were highly impressive to the readers.6 questions were rated as "Very Good"; PW1b (48%), PW2b (53%), PW3b (54%), PW4b (48%), PW7b (50%) and PW9b (53%).The others rated with "Excellent" were PW5b (51%), PW6b (60%), PW8b (51%) and PW10b (60%).Overall, the positive emotive words appeared to give a positive impression to the readers consistently.Set B with the negative words used has not being coherent to the respondent rating, thus has been rated 3 times as "Fair", 4 times as "Good", 2 times as "Very Good" and 1 time as "Excellent".We further examined the overall sentiment value of the reviews as in the following example:  Figure 5 shows the consistency in the number of positive emotive words used throughout the higher star ranking hotels.The positive emotive words increased consistently with the star ranking.It appeared that reviewers expressed further with more positive emotive words.However, there were only two pointers in 2SB (Hotel B of 2-star rating) and 2SC (Hotel C of 2-star rating) with negative words higher than positive words, which provided us with a comparison between these comments.The results analysis revealed that there were more negative emotive words used than positive emotive words.The semantic tag-set of positive emotive actions in 2SB was 9 and the negative emotive action was 11.The 2SC positive emotion actions are 12 and negative emotive actions were 18.In addition, the moves of the positive and negative lines were consistent.
The top ten positive words from Set1A and Set2A were correlated using Pearson's correlation.The result indicated that there is no significant relationship between Set1A and Set2A (r=.408, p>.05).The same procedure was repeated with the top ten negative words from Set1B and Set2B (r=.397, p>.05).The results showed that there are no significant relationships between both sets of positive or negative words.Therefore, the words are not associated and thus are used independently in the survey.Hypothesis 1: There is significant relationship between the TripAdvisor positive rating and the human raters' rating.
The result indicated that there is a positive significant relationship between the TripAdvisor positive rating and the human raters (r= .690,p<.05).Furthermore, the result suggested that a similar positive rating in TripAdvisor was reflected by the human raters.This further emphasizes the importance of online comment as it reflects the human raters rating and views of the accommodation or services used.
Hypothesis 2: There is a significant relationship between the TripAdvisor negative rating and the human raters' rating.
The result indicated that there is a significant positive relationship between the TripAdvisor negative rating and the human raters (r= .818,p<.05).The result indicated that a similar negative rating in TripAdvisor was associated with human raters' rating.Cohen's Kappa was run to determine if there is an agreement between the two raters' judgments to understand better the degree to which the two raters gave their ratings in their judgment of Fair, Good, Very Good and Excellent.Table 7 reported the human raters' and TripAdvisor ratings; one respondent gave a Fair rating as agreed by both raters.Six (6) respondents gave a Good rating as agreed by both raters, and nine (9) respondents gave a Very Good rating as agreed by both raters.Eleven (11) respondents gave an Excellent rating as agreed by both raters.Meanwhile, there were 13 respondents (i.e. 6 + 1 + 2 + 2 + 2 = 13) with whom the two raters did not agree on the rating.
Cohen's Kappa (κ) was reported to be .553(p <.05).This was the proportion of agreement over and above chance agreement; Kappa (κ) .553represented a moderate strength of an agreement.In conclusion, there was a moderate agreement between the two raters' judgments; κ = .553(p<.05).

DISCUSSION
The findings above are important for the decision-making process and further support the importance of the new electronic word-of-mouth way of marketing (Vermeulen & Seegers, 2008).Pre-connection and perception were established even before the visit (Brown, 2012).It is consistent with the findings of Ling, Beenen, Ludford, (2005), that is, individuals are inclined to contribute to the online community when they see a similarity in opinion and are more likely to share their personal views.The online reviews could influence consumers' attitudes and thus affect their decision-making in consuming the services and/or products.Classification techniques were used in the hotel reviews to suggest helpful yet contrasting reviews to end-users (O'Mahony & Smyth, 2010).According to human behaviour, individuals are confined to social norms, thus, they would refer to online comments before making decisions.The users do not just look at hotel options; 77% reviewed the TripAdvisor reviews before actually choosing a hotel (PhoCusWright, 2014).The online hotel reviews increased the average chances of an individual in choosing a certain hotel service and promoting a greater awareness on the lesser-known hotels (Vermeulen & Seegers, 2008).On the other hand, negative comments lead to a higher conformity effects and gain higher acceptance by reviewers (Lee, Park, & Han, 2008).Thus, it is essential for business managers to employ a dedicated monitoring mechanism and take note of the negative comments before they go viral and create negative publicity for their services (Dean, 2004).
This study fills in the gaps in the studies conducted by Fong et al. (2016) and Dincer and Alrawadieh (2017).The former focused on one hotel and the latter only focused on 424 negative reviews of the hotels.Our study encompassed both positive and negative reviews from 20 hotels.Although studies on TripAdvisor are common, none have assessed as extensively as we have.Indeed, at present, general research in online review and consumer behaviour is still in its infancy.Therefore, it is sensible to monitor the online reviews (both in text and also the moticon format) in order to improve hotel management and customer satisfaction in order to realise the full potential of effective complaint management.

CONCLUSION
Our study concluded that the online reviews and the human raters were consistent and the result indicated a moderate level of agreement of judgments.The human raters' evaluation and the reviewers' star rating were consistent.It corroborates the importance of comments and opinions by previous guests, regardless of the hotel's star rating.The analysis of the human raters' agreement with the original reviewers' comments indicates that the comments are reliable and a dependable source of reference prior to making a decision.
The hotel star rating does not influence the guests' opinion; even the lowest star-rated hotel is still perceived as good by the guests.Future studies should consider examining whether there is any difference in the area of satisfaction (rated with excellent) among hotels of different ratings.It could offer some advice to the hotel industry about consumer preference irrespective of the hotel star rating.This research has also validated the importance of online reviews of business websites, such as TripAdvisor, in influencing consumer decision-making.

Figures 1
Figures 1, 2, 3, and 4 show the respondents' rating -from poor to excellent -of the opinions gained after reading the whole sample of reviews.These reviews were selected based on the most used emotive words.Figures1 and 2are the results plotted from questionnaire set A, while Figures3 and 4are the results plotted from Set B questionnaire.

Figure 1 .
Figure 1.Respondent rating after reading the sample review (Positive words: Set A questionnaire).

*
Abbreviations: NW1a = Comment containing 1st Negative Emotive Words for Set A questionnaire; NW2a = Comment contains Second Negative Emotive Words for Set A questionnaire, etc.

Figure 2 .
Figure 2. Respondent rating after reading sample review (Negative words: Set A questionnaire.

*
Abbreviations: PW1b = Comment containing 1st Positive Emotive Words for Set B questionnaire; PW2b = Comment contains Second Positive Emotive Words for Set B questionnaire, etc.

Figure 3 .
Figure 3. Respondent rating after reading sample review (Positive words: Set B questionnaire)

Figure 4 .
Figure 4. Respondent rating after reading sample review (Negative words: Set B questionnaire).

Figure 5 .
Figure 5. Consistency of emotive words throughout the star rating.

Table 1
Questions Categorized for Set A Questionnaire.

Table 2
Questions Categorized for Set B Questionnaire.

Table 3
The Frequency ofReferring, Booking, and Writing Comments

Table 4
Top 10 Frequently Used Positive Emotive Words

Table 5
Top 10 Frequently Used Negative Emotive Words

No. Top 10 negative emotive words Frequency Relative frequency 1 (Between comments of the 20 hotels)
1Relative frequency does not sum to 100% as it only reports frequency of the 10 most used positive/ negative emotive words.

Table 6
Correlation Table

Table 7
CrossTabulation of TripAdvisor Rating Versus Human Raters