Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a Language Identifier

Building a Language Identifier

Use some classic machine learning techniques to build a document classifier for language identification.

http://langue.herokuapp.com

Zac Stewart

January 03, 2014
Tweet

More Decks by Zac Stewart

Other Decks in Programming

Transcript

  1. • ઍཬ೭ߦ﹐࢝ԙ଍Լɻ • Беда ́ (никогда ́) не прихо ́

    дит одна ́. • A buen entendedor, pocas palabras bastan
  2. n-grams “On ne peut désirer ce qu'on ne connaît pas.”

    ! on, ne, peut, désirer, ce, qu, on, ne, connaît, pas ! on ne, ne peut, peut désirer, désirer ce, ce qu, qu on, on ne, ne connaît, connaît pas
  3. the pourquoi a ⁋࢐ antidisest ablishmen tarianism … 10 0

    7 0 2 … 0 8 9 0 0 … 0 0 0 1 0 … 6 0 3 0 0 …
  4. Original shape: M × N value 4 0 … 0

    5 0 … 0 3 0 … 6 0 …
  5. Original shape: M × N index value 1 4 2

    0 … … 666 0 667 5 668 0 … … 986 0 987 3 989 0 … … 1037 6 1038 0 … …
  6. Original shape: M × N Index Value 1 4 667

    5 987 3 1037 6 1408 10 2867 2 5680 1 7896 1 11763 4 15879 9
  7. 0 0 2 10 7 Hasher the pourquoi a ⁋࢐

    antidisestablishmentarianism
  8. Multinomial Naive Bayes Work from home today? Any meetings today?

    Is it raining? Am I out of coffee? What’s the temperature outside? % yes % no >