Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unstructured Data is the New Gold: ML + NLP is the New Shiz!

Unstructured Data is the New Gold: ML + NLP is the New Shiz!

Machine Learning (ML) is a key to develop intelligent systems. It has had some successes not only in data science and engineering but also in information security domain. While data gathered help in identifying threats, it only accounts for a small part of the whole picture. Using Natural Language Processing (NLP) to make sense of unstructured resources such as social media posts, online news articles and blog posts is what gives us the edge over the machines.

Jim Geovedi

August 25, 2017
Tweet

More Decks by Jim Geovedi

Other Decks in Technology

Transcript

  1. Unstructured data
 is the new gold! Machine Learning and Natural

    Language Processing
 is the new $h*z! Jim Geovedi
  2. VIP

  3. Research areas • Confidence in the robustness of the decisions

    taken by machines and in the security of the systems within which they operate. • Ability to understand how machine learning system works and improving transparency and interpretability. • Better interaction between human and computers. • Better handling real world biases and messiness in data.
  4. Machines can process huge data and extract complex patterns, but

    without a human to guide them, they only produce garbage.
  5. Software that creates machine- processable structure can utilize the linguistic,

    auditory, and visual structure that exist in all forms of human communication.
  6. NLP application areas • Finding relationship accross text data and

    structured metadata. • Identifying underlying intents behind customer comments. • Classifying, labeling, and routing messages. • Surfacing and tracking trends in unstructured data. • Emotion analysis.
  7. Natural Language Processing Semantics Lexical semantics Machine translation Named entity

    recognition Natural language generation Natural language understanding Optical character recognition Question answering Recognizing textual entailment Relationship extraction Sentiment analysis Topic modelling Word sense disambiguation Syntax Morphological segmentation Part of speech tagging Parsing Stemming Lemmatization Sentence breaking Word segmentation Terminology extraction Discourse Automatic summarization Coreference resolution Discourse Analysis Speech Speech recognition Speech segmentation Text to speech
  8. Natural Language Processing Semantics Lexical semantics Machine translation Named entity

    recognition Natural language generation Natural language understanding Optical character recognition Question answering Recognizing textual entailment Relationship extraction Sentiment analysis Topic modelling Word sense disambiguation Syntax Morphological segmentation Part of speech tagging Parsing Stemming Lemmatization Sentence breaking Word segmentation Terminology extraction Discourse Automatic summarization Coreference resolution Discourse Analysis Speech Speech recognition Speech segmentation Text to speech
  9. Natural Language Processing Semantics Lexical semantics Machine translation Named entity

    recognition Natural language generation Natural language understanding Optical character recognition Question answering Recognizing textual entailment Relationship extraction Sentiment analysis Topic modelling Word sense disambiguation Syntax Morphological segmentation Part of speech tagging Parsing Stemming Lemmatization Sentence breaking Word segmentation Terminology extraction Discourse Automatic summarization Coreference resolution Discourse Analysis Speech Speech recognition Speech segmentation Text to speech
  10. Natural Language Processing Semantics Lexical semantics Machine translation Named entity

    recognition Natural language generation Natural language understanding Optical character recognition Question answering Recognizing textual entailment Relationship extraction Sentiment analysis Topic modelling Word sense disambiguation Syntax Morphological segmentation Part of speech tagging Parsing Stemming Lemmatization Sentence breaking Word segmentation Terminology extraction Discourse Automatic summarization Coreference resolution Discourse Analysis Speech Speech recognition Speech segmentation Text to speech
  11. Natural Language Processing Semantics Lexical semantics Machine translation Named entity

    recognition Natural language generation Natural language understanding Optical character recognition Question answering Recognizing textual entailment Relationship extraction Sentiment analysis Topic modelling Word sense disambiguation Syntax Morphological segmentation Part of speech tagging Parsing Stemming Lemmatization Sentence breaking Word segmentation Terminology extraction Discourse Automatic summarization Coreference resolution Discourse Analysis Speech Speech recognition Speech segmentation Text to speech
  12. Theresa May PERSON was accused of a climbdown over the

    future sovereignty of British GPE courts after a newly published government paper appeared to leave open the possibility that the European GPE court of justice would influence UK GPE law after Brexit. The latest of a flurry of Brexit policy papers, due to be published on Wednesday DATE , will repeat the government’s insistence that the “direct jurisdiction” of the Luxembourg GPE -based ECJ must end when Britain GPE leaves the EU GPE in March 2019 DATE . But it will set out a range of options for resolving future disputes between Britain GPE and the EU GPE – over the rules of any new trade deal, for example – some of which are likely to involve European GPE judges, or the application of ECJ case law.
  13. Tools of trade • General library: NLTK, spaCy, CoreNLP, OpenNLP,

    ClearNLP, gensim • Dependency parser: Maltparser, spaCy, SyntaxNet • Machine learning: Scikit learn, Liblinear, LibSVM, CRF++, Weka • Deep learning: Theano, Tensorflow, Torch/Pytorch, Dynet • Language modeling: IRSTLM, SRILM, KenLM • Machine translation: • Alignment: Berkeley aligner, GIZA++ • StatMT: Moses, Joshua, cdec, travatar • NMT: seq2seq, OpenNMT/OpenNMT-py, fairseq, sockeye, Lamtram, Marian • Services: Amazon Web Service, Google Cloud, Microsoft Azure, IBM Bluemix
  14. Bradford Cross, Five AI Startup Predictions for 2017 “Machine Learning

    as a Service is an idea we’ve been seeing for nearly 10 years and it’s been failing the whole time. The bottom line on why it doesn’t work: the people that know what they’re doing just use open source, and the people that don’t will not get anything to work, ever, even with APIs.”
  15. Q?