Slide 10
Slide 10 text
Decisions Decisions…
• What is a term?
• Unigrams vs. n-grams
• stems, lemmas, parts of speech, named entities, etc…
• What is a document?
• Books, chapters, pages, paragraphs, sentences, etc…
• What is the measure relating my rows to columns?
• Raw counts, TF-IDF, some other index, etc…
• Skip-grams, count of document co-occurence, etc…