Upgrade to Pro — share decks privately, control downloads, hide ads and more …

text_mining_slides_20180512

 text_mining_slides_20180512

Leo Lu

May 12, 2018
Tweet

More Decks by Leo Lu

Other Decks in Technology

Transcript

  1. 橕ෝ౯ 4 㸎瓽 Leo Lu 4 ݣय़ૡᓕ 4 ፓ獮ෝᰂᣟ禂๐率 4

    Build data products 4 ETL 4 Models 4 Text mining 4 Viz 4 ... © leoluyi, 2018 2
  2. For English 4 normalization 4 stemming (扃䓄൉玲) 4 lemmatization (扃ࣳ螭ܻ)

    4 POS tagging 4 ... Get data ➜ Tokenize ➜ Embedding ➜ Viz ➜ Model 15
  3. Ӿ෈犲Ԓ穉斃墋㻌 4 䥁扃 4 犋䥁扃 4 POS tagging 4 ...

    Get data ➜ Tokenize ➜ Embedding ➜ Viz ➜ Model 16
  4. R tools 4 stringr 4 jiebaR Get data ➜ Tokenize

    ➜ Embedding ➜ Viz ➜ Model 18
  5. Embedding In a nutshell, Word Embedding turns text into numbers.

    4 Embedding Layer1 4 Word2Vec 4 GloVe 4 doc2vec 4 sense2vec 1 https://machinelearningmastery.com/what-are-word-embeddings/ Get data ➜ Tokenize ➜ Embedding ➜ Viz ➜ Model 20
  6. Visualize 4 Dimension Reduction 4 t-sne 4 PCA 4 Clustering

    4 Interactive or static plots Get data ➜ Tokenize ➜ Embedding ➜ Viz ➜ Model 23
  7. Tasks 4 Classification 4 ෈๜獤觊 4 Clustering 4 ತ疨ፘ犲෈๜ 4

    Generative models 4 ෈๜ᛔ㵕ኞ౮ Get data ➜ Tokenize ➜ Embedding ➜ Viz ➜ Model 26
  8. Summary 1. Problem definition & specific goal: Get Curious About

    Text 2. Finding Your Data 3. Preprocessing Your Data 4 Removing stopwords, Stemming, Segmentation, ... 4. Feature Extraction 4 Document-Term Matrix: tm, text2vec 4 Named Entity Recognition, POS tagging 4 Word embeddings: word2vec, GloVe 5. More Text Mining Skills 4 sentiment analysis 4 topicmodels, LDAViz: LDA 6. More Than Words - Visualizing Your Results © leoluyi, 2018 28