Exploratory Analysis with Text Mining

We wanted to explore if there are any correlation between the review score and the actual comment in the Amazon Food Reviews. It will be interesting to see how accurately the review scores reflect what the users actually think about the product.

We first used SQL to extract the “review_text” for each review score. Then, using Python, symbols were removed and the frequency of each word was collected. We looked through the high frequency words and chose some meaningful words to further investigate in SQL. With SQL, we counted the instances of those meaningful key words in the “review_text”. We extracted the results from SQL to Excel to combine the similar words/categories.

Our study shows couple of interesting outcomes comprised as follows.

Review Scores: We initially expected more instances of low review scores because we thought that people are more likely to leave a comment when they had a negative experience, but surprisingly we found out that amount of reviews that had a score of 5 was almost 7 times of the other reviews.

Emotional Analysis: We aggregated all of the positive emotions together and the negative emotions together. Examples of positive emotions are Good, Perfect, Impressive, Delicious, and Love while examples of negative emotions could be Bad, Yuck, Disgusting, Hate, Horrible. We found that positive emotions increased as the review score increased and vice versa, which is exactly what we expected.

We then looked at the ratio between the frequencies of positive emotions to the frequencies of negative emotions. We found out that the ratio increased as the review score increased.

“Amazon” Frequency: It is interesting to note that the frequency of “Amazon” in the comments were high at the two ends of the spectrum and low in the middle. It may be because people with negative experience may want Amazon to look into the product or blame Amazon and people with positive experience may have enjoyed the whole “Amazon” experience.

Customers who warn others to not buy the product: We found out that as the score decreases, more people would warn others to not purchase the product, which is exactly what we would expect. Interestingly, there were instances of instance of people warning others to not purchase the product in a 4 or 5 score review.

Customers who highly recommend the product: We found that as the score increases, the higher the percentage of reviews that had the word “recommend” in it were “highly recommended. Interestingly, there were a high percentage of lower scored reviews with the word “highly recommend” in it. It may be because they highly recommend others to avoid the certain product.

In addition, our study shows that from the top key words searching result, customers consider different things and give different review scores.

Score = 5.0

Score = 1.0

Comparing these two graphs, we can see that price, Amazon, and health are considered more when giving high scores while children are considered more when giving low score.

COE Toolbox

Useful tools of the operations research trade

Exploratory Analysis with Text Mining

Leave a Reply Cancel reply