Slide 36
Slide 36 text
Solution 2: Draconian text preprocessing
1. Strip punctuation, tags, etc.
2. Convert to ASCII, lowercase
3. Lemmatize
4. Discard 70% of comments, by count
of tokens.
Harsh preprocessing is necessary for our
model to give good results. Is this a failing?