Our analysis began by using SQL in order to count how many reviews there were for each review score numbered 1 through 5 and found the proportion the reviews fell into each category. Below is a summary of the number of comments for each rating. The total number of comments in the system is 568454. Continue reading
Amazon Foods Reviews Activity – by Tank Brigade
Following our previous post on how to create a word cloud in R, we have decided to try those techniques out. By inspecting the data, we observed that some words seemed to appear more often in reviews with higher scores, while some others were more likely to appear in reviews with lower scores. Our initial hypothesis is that some words are positively correlated with good reviews and some words are negatively correlated with good reviews. Continue reading
Word Play
Our analysis began by getting a feel for the data supplied from Web data: Amazon Fine Foods reviews (https://snap.stanford.edu/data/web-FineFoods.html). This meant filtering the file by including simple positive words and excluding negative and neutral words and phrases. Below is a list of the words that were included in each of these categories. Continue reading
The Text Mining ‘tm’ Package in R
Tank Group – Haider Shah, Tony Guo and Chris Pang
To perform text mining in R, there is a useful package called ‘tm’ which provides several functions for text handling, processing and management. The package uses the concept of a ‘corpus’ which is a collection of text documents to operate upon. Text can be stored either in-memory in R via a Volatile Corpus or on an external data store such as a database via a Permanent Corpus.
Example: Building a Word Cloud from Twitter Feeds
Continue reading
Summer Bootcamp
Welcome to the project season, padawans! You’ve learned quite a bit of theory during the last months and now you are ready to face some real world OR problems. Before diving into the projects we will spend a couple of days brushing up some practical skills that will be useful during the project season and beyond. Until you get started with your project, you will work in teams with your office mates towards the following objectives: Continue reading
Let’s talk about Text Analytics
Text analytics has been increasingly attracting the attention of the OR-analytics community as the amount of information stored as textual data increases. Just think about the amount of data stored in emails, news articles and social media, not to mention contact-center notes, surveys, feedback forms, and so forth. Some estimate that text analytics market will grow between 25% and 40% during the next 5 years.[1][2] But let’s make sure we are talking about the same thing when we talk about text analytics.
So, what is text analytics? Continue reading
Processing blood inventory simulation data with VBA
In the project Improving Blood Inventory Management through Simulation the goal was to develop an inventory management model to explore new policies for hospitals in BC. The model developed simulates how blood inventory is managed in BC hospitals and is implemented in an excel workbook that uses VBA code to run a number of trials simulating the daily operations in a network of hospitals.
One of the challenges faced was how to generate and handle inventory data from multiple hospitals, days and simulation replications. The approach was to structure the tool as a decision support system, with the simulation model running separately from the user interface. Blood supply and demand distributions inputs we read values from an Excel spreadsheet. Daily values were generated using the built in Excel functions, and for a triangular distribution we created our own function. Global variables were used to store parameters used across multiple hospitals and dynamic arrays were used to record the hospital inventory levels, consolidate replication outputs and calculate network metrics.
Text mining with SQL to find bill shock calls
In Predicting Bill Shocks at Telus Mobility the goal was to develop a model to predict unexpectedly high bills (bill shocks) based on customer profile and usage patterns, to allow Telus to identify customers who are likely to have a bill shock and determine their main characteristics. Although there was available a big amount of data related to bill shocks, such as customer bills, bill shock credits and customers’ call records, there was no way of knowing if a given bill had caused a bill shock or not.
Solution: a SQL query was created to read though call memos looking for bill shock related keywords using the LIKE comparison operator. Examples of such keywords are: “shock”, “roaming charg”, “changed his plan”, “went over her data” or French words like ‘pas couvert’ referring to features not covered. Also, by looking at the memo types individually it was noticed that certain memos should not be flagged as bill shock due to the context of the call. These memo types where filtered using a NOT IN condition. The query skimmed through the calling memos and flagged the matching entries as bill shock calls.
Forecasting sport matches results
Here’s an interesting forecasting exercise: Microsoft folks are doing quite well at predicting the results of sports matches: After crushing the World Cup, Microsoft is predicting the NFL’s whole season
The COE Tool Box is a knowledge repository of tools and technologies used to develop OR applications. We’ve started with the tools covered in the course BAMS 580D, however, the objective is to extend to other tools on the base of collaboration and knowledge sharing.
A large part of the toolbox is dedicated to programming in Visual Basic for Applications and to use it in conjunction with Excel as this is the most commonly used spreadsheet application and many enterprise applications consider spreadsheets as a starting or ending point.
We also included the structure and activities of BAMS 580D course in order to support student learning and for self learning. We hope this webpage becomes a useful resource for the COE community, students, instructors and practitioners all.