Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Natural Language Processing with Swift

Natural Language Processing with Swift

Talk given at Swift Language User Group in SF on 5 March 2015

Apple has offered an API for natural language processing since iOS 5, which allowed us to tokenize text, detect the language, and determine parts of speech. With Swift and the introduction of Playgrounds, it’s faster and more delightful than ever to experiment with linguistics. We’ll go over how to build a spam detector in Swift, starting with the basic theory and ending with a fully functional Naive Bayes classifier. Feel free to bring your laptop to code along!

Ayaka Nonaka

March 05, 2015
Tweet

More Decks by Ayaka Nonaka

Other Decks in Programming

Transcript

  1. SPAM spam sp@M $PAM spam sp@m SP4M $p@m sp@M SPAM

    spam sp@M $PAM spam sp@m SP4M $p@m sp@M
  2. URGENT - HELP ME DISTRIBUTE MY $15 MILLION TO CHARITY

    IN SUMMARY:- I have 15,000,000.00 (fifteen million) U.S. Dollars and I want you to assist me in distributing the money to charity organizations. I agree to reward you with part of the money for your assistance, kindness and participation in this Godly project. This mail might come to you as a surprise and the temptation to ignore it as unserious could come into your mind but please consider it a divine wish and accept it with a deep sense of humility.
  3. See you at Natural Language Processing in Swift with Ayaka

    Nonaka of Venmo Swift Language User Group (San Francisco + Silicon Valley) Invite 1 friend Simply forward this email to a friend and have them join the Meetup.
  4. Forming a new startup and need an iOS developer to

    partner with and join me on this new, exciting venture. This startup will be the next “big thing” in social media, creating a new way for users to connect with one another, essentially creating its own niche among facebook, twitter and foursquare. If interested please contact the information below. XXXX XXXX [email protected] XXX XXX XXXX
  5. Probability of – & ⷁ ? = Probability of –

    × Probability of ⷁ given –
  6. Probability of – & ⷁ ? = Probability of –

    = 1/4 × Probability of ⷁ given –
  7. Probability of – & ⷁ ? = Probability of –

    = 1/4 × Probability of ⷁ given – = 1/13
  8. Probability of – & ⷁ ? = Probability of –

    = 1/4 × Probability of ⷁ given – = 1/13 = 1/52
  9. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam
  10. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam What’s the probability that an email is spam given that it contains the word SODIUM?
  11. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam What’s the probability that an email is spam given that it contains the word SODIUM?
  12. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam
  13. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  14. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  15. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  16. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  17. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  18. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  19. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  20. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  21. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  22. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  23. • 30 emails of a total of 50 are spam

    • 20 out of the total 50 contain the word SODIUM • 15 of the emails that contain the word SODIUM are spam • 15 out of the total 50 contain the word CHOLESTEROL • 10 of the emails that contain the word CHOLESTEROL are spam
  24. vs.