ReviewMeta Analysis Test: Phrase Repetition

April 27th, 2016

One of the most obvious ways to detect suspicious reviews is analyzing the language used within the body of each review.  While it’s difficult for us to draw any conclusions from the language of a single review by itself, looking at the aggregate data can help us to identify which reviews might have been created unnaturally.

Our process for the Phrase Repetition test is a little more involved than other tests.  In a nutshell, we compile a list of phrases that are used across multiple reviews for a given product, then identify which reviews contain these common phrases, and finally compare their average rating to the average rating of reviews which don’t contain these common phrases.

To compile the list of repeated phrases, we start by looking for phrases of 3 or more words that appear in multiple different reviews for the same product.  We also have a formula to make sure the phrase is somewhat substantial.  For example, the three-word phrase “it was the” is not substantial, while “surpassed all expectancies” would be considered substantial.  Our formula takes into account the phrase length, complexity and type of words being used to make sure that each phrase on the list is more than just a string of prepositions, indefinite articles and pronouns commonly used in everyday English.

Once we have our list of repeated phrases, we check each review to see if (and how often) they are using these phrases. We assign each review a score, taking into account factors like word count, number of repeated phrases found and the substantiality of those phrases.  A low score would indicate that there are few or no repetitive phrases used in that review.  Reviews that surpass a certain threshold are flagged as using repetitive phrases.

If there are a higher number of reviews that use repetitive phrases, it can be an indication that the reviews are not created naturally.  However, there’s still plenty of valid reasons we’d see repeated phrases that might not necessarily mean the reviews are biased.  For example, you may see multiple reviewers mention features of the product which are necessary to write a thorough review.  However, if several reviewers are perfectly regurgitating the same word-for-word marketing language verbatim, it might be a sign that these reviews were from hired guns.

In order to determine if these reviews are malicious or benign, we group all reviews with repetitive phrases and check their overall percentage. While it’s not immediately problematic to see a small percentage of reviews with repetitive phrases, an excessive amount can trigger a warning or failure. Next, we check to see if reviews with repetitive phrases have a higher average rating than reviews without repetitive phrases.  If they do, we’ll check to see if this discrepancy is statistically significant.  This means that we run the data through an equation that takes into account the total number of reviews along with the variance of the individual ratings and tells us if the discrepancy is more than just the result of random chance. (You can read more about our statistical significance tests here). If the reviews with repetitive phrases have a significantly higher rating than reviews without repetitive phrases, it  strongly supports that the reviews with repetitive phrases are not benign and are unfairly inflating the overall product rating.