[email protected] @IanOzsvald
BrightonPython June 2013
Scikit-learn (learn1.py)
train_set = [u”The Daily Apple...”, …]
target = np.array([1, ...])
vectorizer = CountVectorizer(ngram_range=(1, 1))
train_set_dense =
vectorizer.fit_transform(train_set).toarray()
vectorizer.get_feature_names()
'00', '01gzw6l7h8', '2nite', '40gb', 'applenews',
'co', # no 't'
'jam', 'sauce', 'iphone', 'mac', …
'would', 'wouldn', … 'ya', 'yay', 'yaaaay', …
# hashtags? http:// @users?