{"id":1151,"date":"2015-04-29T09:12:56","date_gmt":"2015-04-29T16:12:56","guid":{"rendered":"https:\/\/blogs.ubc.ca\/coetoolbox\/?p=1151"},"modified":"2015-05-22T16:28:35","modified_gmt":"2015-05-22T23:28:35","slug":"amazon-foods-reviews-activity-by-tank-brigade","status":"publish","type":"post","link":"https:\/\/blogs.ubc.ca\/coetoolbox\/2015\/04\/29\/amazon-foods-reviews-activity-by-tank-brigade\/","title":{"rendered":"Amazon Foods Reviews Activity &#8211; by Tank Brigade"},"content":{"rendered":"<p><a href=\"https:\/\/blogs.ubc.ca\/coetoolbox\/?p=1151\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" src=\"https:\/\/lh6.googleusercontent.com\/-TQ29SdR5G3E\/VUAbTN4KtII\/AAAAAAAANHk\/-miHBWyIkYw\/w1430-h770-no\/Score2.png\" alt=\"\" width=\"235\" height=\"129\" \/><\/a>Following our previous post on <a href=\"https:\/\/blogs.ubc.ca\/coetoolbox\/2015\/04\/27\/the-text-mining-tm-package-in-r\/\">how to create a word cloud in R<\/a>, we have decided to try those techniques out. By inspecting the data, we observed that some words seemed to appear more often in reviews with higher scores, while some others were more likely to appear in reviews with lower scores. Our initial hypothesis is that some words are positively correlated with good reviews and some words are negatively correlated with good reviews. <!--more--><\/p>\n<p>The tools we used include: Excel, R, SQL, and Tableau. We aggregated our data by review scores.\u00a0 A good\u00a0review had a score of 4 or 5, an okay score had a score of 3 and a bad\u00a0review had a score of 1 or 2.\u00a0 Unfortunately, due to time constraints we were unable to analyze the entire dataset.\u00a0 A random sample of 100,000 reviews were used to generate the following word clouds.\u00a0 This involved heavy amount of work to transforming and cleaning\u00a0the review text.<\/p>\n<p>Our findings are summarized by the following data visualizations.<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"https:\/\/lh6.googleusercontent.com\/-GVXCSffe_A4\/VUAbWRBrWPI\/AAAAAAAANIE\/VpBJXcNdZ7Q\/w467-h896-no\/Score1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/lh6.googleusercontent.com\/-GVXCSffe_A4\/VUAbWRBrWPI\/AAAAAAAANIE\/VpBJXcNdZ7Q\/w467-h896-no\/Score1.png\" alt=\"\" width=\"467\" height=\"896\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Words like &#8216;taste&#8217;, &#8216;coffee&#8217;, and &#8216;good&#8217; appear in all word clouds because they are high frequency words in the sample dataset.\u00a0 Then we computed the probability of a word being in each review score category and picked the top 15 of both good and bad reviews.<\/p>\n<p>For instance, the word &#8216;paid&#8217; appears in 95 reviews all of which had a score under 2.\u00a0 So the probability, of the word&#8217; being in a BAD review was 100%.\u00a0 Our interesting findings are summarized in the info-graphic below.<\/p>\n<p><a href=\"https:\/\/lh6.googleusercontent.com\/-TQ29SdR5G3E\/VUAbTN4KtII\/AAAAAAAANHk\/-miHBWyIkYw\/w1430-h770-no\/Score2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone\" src=\"https:\/\/lh6.googleusercontent.com\/-TQ29SdR5G3E\/VUAbTN4KtII\/AAAAAAAANHk\/-miHBWyIkYw\/w1430-h770-no\/Score2.png\" alt=\"\" width=\"1430\" height=\"770\" \/><\/a><\/p>\n<p>Stay tuned for more interesting insights from the Tank Brigade in future blog posts.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Following our previous post on how to create a word cloud in R, we have decided to try those techniques out. By inspecting the data, we observed that some words seemed to appear more often in reviews with higher scores, while some others were more likely to appear in reviews with lower scores. Our initial [&hellip;]<\/p>\n","protected":false},"author":27014,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1054137],"tags":[7967,534813,1054147],"class_list":["post-1151","post","type-post","status-publish","format-standard","hentry","category-text-analytics","tag-r","tag-tableau","tag-text-analytics"],"_links":{"self":[{"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/posts\/1151","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/users\/27014"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/comments?post=1151"}],"version-history":[{"count":10,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/posts\/1151\/revisions"}],"predecessor-version":[{"id":1173,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/posts\/1151\/revisions\/1173"}],"wp:attachment":[{"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/media?parent=1151"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/categories?post=1151"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.ubc.ca\/coetoolbox\/wp-json\/wp\/v2\/tags?post=1151"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}