Welcome to the project season, padawans! You’ve learned quite a bit of theory during the last months and now you are ready to face some real world OR problems. Before diving into the projects we will spend a couple of days brushing up some practical skills that will be useful during the project season and beyond. Until you get started with your project, you will work in teams with your office mates towards the following objectives:
- Get introduced to “text analytics”, an increasingly important topic in our field.
- Analyze text data using SQL Server, COE’s database management software.
- Get prepared to code in Python, one of the most common languages to do text analytics.
Here is how we’ll do it:
Activity 1: Get introduced to “text analytics”
Read the post Let’s talk about text analytics and make a contribution by sharing a technique or application you find interesting (mentioned in the article or a new one). You can add a comment to the post or write a new one (one per group). Please explain clearly and include references/links accordingly.
Activity 2: Analyze text data using SQL Server
Set up a database with food product reviews from Amazon (ask further directions). Do some descriptive statistics to refresh query basics. Then try to apply some of the techniques mentioned in the post such as stemming or word counting to get some insights from the reviews (comparison keywords might be handy). For instance, are there words with higher correlation with positive or negative product reviews? Don’t get too fancy, but try to see if you can get something interesting. Put your findings in a blog post (one per group), explaining briefly how you crunched the data. Don’t forget the data visualization tips (or your post might not be published).
Activity 3: Get prepared to code in Python
Get Python set up in your computer and do Google’s Python Class. This is a 2 day tutorial and you might need to start working on your project before you finish, but try to get as far as you can. Alternatively, you can do the Python course in Codecademy, doesn’t require to install anything and is very easy to follow. Even if you don’t use Python, the coding skills will be handy sooner than you think. If you feel confident and prepared for new challenges, try to parse the original food review dataset using Python.
Good luck…

Quick Tip: until we start using the preferred text editor (which Rene plans to tell us about), I’ve found that you can right-click on a .py file (e.g. hello.py) and click “Edit with IDLE”. This opens a pretty decent text editor for python scripts.
An alternative choice is notepad++(mentioned in the google tutorial):
Download it here: http://notepad-plus-plus.org/download/v6.7.7.html ****choose the zip package
No need for installation.
Hi guys, the software we are considering to use to code in Python are:
iPython http://en.wikipedia.org/wiki/IPython
good for exploratory analysis because it integrates data visualization with code snippets and the results of the code.
Spyder http://en.wikipedia.org/wiki/Spyder_(software)
good for more complicated development because it lets you organize code into libraries.
Orange Interface Installation Instruction
Orange is a powerful data mining package of python, and its interface is easy to use and has quite good visualization especially for classifiers and rules.
Since the installation of interface is not that straightforward, we build an instruction for groups who want to use this software.
Instruction document: Orange Installation