{"id":1180,"date":"2015-04-30T09:03:56","date_gmt":"2015-04-30T16:03:56","guid":{"rendered":"https:\/\/blogs.ubc.ca\/coetoolbox\/?p=1180"},"modified":"2015-05-05T13:03:15","modified_gmt":"2015-05-05T20:03:15","slug":"exploratory-analysis-with-text-mining","status":"publish","type":"post","link":"https:\/\/blogs.ubc.ca\/coetoolbox\/2015\/04\/30\/exploratory-analysis-with-text-mining\/","title":{"rendered":"Exploratory Analysis with Text Mining"},"content":{"rendered":"<p><u><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" src=\"http:\/\/s2.postimg.org\/gu469wgwp\/image.png\" alt=\"\" width=\"192\" height=\"115\" \/><\/u><\/p>\n<p>We wanted to explore if there are any correlation between the review score and the actual comment in the Amazon Food Reviews. It will be interesting to see how accurately the review scores reflect what the users actually think about the product.<\/p>\n<p>We first used SQL to extract the \u201creview_text\u201d for each review score. Then, using Python, symbols were removed and the frequency of each word was collected. We looked through the high frequency words and chose some meaningful words to further investigate in SQL. With SQL, we counted the instances of those meaningful key words in the \u201creview_text\u201d. We extracted the results from SQL to Excel to combine the similar words\/categories.<\/p>\n<p><!--more-->Our study shows couple of interesting outcomes comprised as follows.<\/p>\n<p>Review Scores: We initially expected more instances of low review scores because we thought that people are more likely to leave a comment when they had a negative experience, but surprisingly we found out that amount of reviews that had a score of 5 was almost 7 times of the other reviews.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/s13.postimg.org\/y44pshilj\/image.png\" alt=\"\" \/><\/p>\n<p style=\"font-size: 1rem\">Emotional Analysis: We aggregated all of the positive emotions together and the negative emotions together. Examples of positive emotions are Good, Perfect, Impressive, Delicious, and Love while examples of negative emotions could be Bad, Yuck, Disgusting, Hate, Horrible. We found that positive emotions increased as the review score increased and vice versa, which is exactly what we expected.<\/p>\n<p><img decoding=\"async\" class=\" aligncenter\" src=\"http:\/\/s30.postimg.org\/j0zb61iz5\/image.png\" alt=\"\" \/><img decoding=\"async\" src=\"http:\/\/s1.postimg.org\/cgrjqadwf\/image.png\" alt=\"\" \/><\/p>\n<p><span style=\"line-height: 1.714285714;font-size: 1rem\">We then looked at the ratio between the frequencies of positive emotions to the frequencies of negative emotions. We found out that the ratio increased as the review score increased.<\/span><\/p>\n<p style=\"font-size: 1rem\"><img decoding=\"async\" src=\"http:\/\/s16.postimg.org\/k7kcutfgl\/image.png\" alt=\"\" \/><\/p>\n<p style=\"font-size: 1rem\">\u201cAmazon\u201d Frequency: It is interesting to note that the frequency of \u201cAmazon\u201d in the comments were high at the two ends of the spectrum and low in the middle. It may be because people with negative experience may want Amazon to look into the product or blame Amazon and people with positive experience may have enjoyed the whole \u201cAmazon\u201d experience.<\/p>\n<p style=\"font-size: 1rem\"><img decoding=\"async\" src=\"http:\/\/s14.postimg.org\/t8b57tqf5\/image.png\" alt=\"\" \/><\/p>\n<p style=\"font-size: 1rem\">Customers who warn others to not buy the product: We found out that as the score decreases, more people would warn others to not purchase the product, which is exactly what we would expect. Interestingly, there were instances of instance of people warning others to not purchase the product in a 4 or 5 score review.<\/p>\n<p style=\"font-size: 1rem\"><img decoding=\"async\" src=\"http:\/\/s2.postimg.org\/gu469wgwp\/image.png\" alt=\"\" \/><\/p>\n<p style=\"font-size: 1rem\">Customers who highly recommend the product: We found that as the score increases, the higher the percentage of reviews that had the word \u201crecommend\u201d in it were \u201chighly recommended. Interestingly, there were a high percentage of lower scored reviews with the word \u201chighly recommend\u201d in it. It may be because they highly recommend others to avoid the certain product.<\/p>\n<p><img decoding=\"async\" style=\"line-height: 1.714285714;font-size: 1rem\" src=\"http:\/\/s2.postimg.org\/gszadlleh\/image.png\" alt=\"\" \/><\/p>\n<p>In addition, our study shows that from the top key words searching result, customers consider different things and give different review scores.<\/p>\n<p><strong><u>Score = 5.0<\/u><\/strong><\/p>\n<p><img decoding=\"async\" src=\"http:\/\/s2.postimg.org\/f00z30pll\/image.png\" alt=\"\" \/><\/p>\n<p><strong><u>Score = 1.0<\/u><\/strong><\/p>\n<p><img decoding=\"async\" src=\"http:\/\/s29.postimg.org\/cv59kr3if\/image.png\" alt=\"\" \/><\/p>\n<p>Comparing these two graphs, we can see that price, Amazon, and health are considered more when giving high scores while children are considered more when giving low score.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We wanted to explore if there are any correlation between the review score and the actual comment in the Amazon Food Reviews. It will be interesting to see how accurately the review scores reflect what the users actually think about the product. We first used SQL to extract the \u201creview_text\u201d for each review score. Then, [&hellip;]<\/p>\n","protected":false},"author":26972,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1054137],"tags":[],"class_list":["post-1180","post","type-post","status-publish","format-standard","hentry","category-text-analytics"],"_links":{"self":[{"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/posts\/1180","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/users\/26972"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/comments?post=1180"}],"version-history":[{"count":9,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/posts\/1180\/revisions"}],"predecessor-version":[{"id":1192,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/posts\/1180\/revisions\/1192"}],"wp:attachment":[{"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/media?parent=1180"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/categories?post=1180"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/tags?post=1180"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}