I applied the Naïve Bayes Classifier method previously described to the Amazon food review data, and the results were encouraging, but unfortunately very slow to come by – the algorithm took about 19 hours to run for the first set of results below, and 43 hours for the second set of results (both contained only 5000 rows). The benefit of using this approach compared to the trial and error method of checking for specific words and their counts across different rating levels is that the algorithm will detect words with predictive power for you; the presence of ALL the words are considered rather than just what we can come up with from a bit of surface-level digging. Continue reading
Applying Naive Bayes to Text Mining
1 Reply