Saving Results of Python Classifiers to Disk

If you are building Naive Bayes classifiers using packages such as NLTK, you may notice that if you have a large training set that it can take hours to run. In order to not lose these results between work sessions, you can save the results of your classifier training to a disk file using the pickle commands list further below.

This may be useful if you ‘train’ your classifier on one day and then want to use it to ‘predict’ classification results on a new test data set on another day.  You can just re-load your classifier from disk into memory without having to re-build it.  Note: these pickle file sizes may get extremely large.

The examples below store a Python classifier object called ‘classifier’ into a pickle file called ‘my_classifier.pickle’.

To Save

import pickle
f = open('my_classifier.pickle', 'wb')
pickle.dump(classifier, f)
f.close()

To Load Later

import pickle
f = open('my_classifier.pickle')
classifier = pickle.load(f)
f.close()

Brought to you by Tank Brigade.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.