Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ML Session n°1

ML Session n°1

Adrien Couque

January 20, 2017
Tweet

More Decks by Adrien Couque

Other Decks in Technology

Transcript

  1. Format Slack channel : #ml-courses • Today : concepts •

    08 Feb : understanding a ML project : what are the good questions? • 20 Feb and following : ◦ more technical sessions ◦ optional “homework” between sessions (small projects) ◦ current plan : 7 technical sessions ◦ goal : be able to work on ML projects autonomously
  2. Explaining Machine Learning Machine learning is the idea that there

    are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.
  3. Machine Learning vs Statistics ? They are both concerned with

    the same question: how do we learn from data? Statistics Machine Learning Estimation Learning Classifier Hypothesis Data Point Example/ Instance Regression Supervised Learning Classification Supervised Learning Covariate Feature Response Label
  4. Minksy’s Multiplicity (1960) Crucial parts for problem solving : •

    Induction • Planning • Search, knowledge representation • Pattern recognition • Learning Components needed to get to human-level AI
  5. Minksy’s Multiplicity (1960) Crucial parts for problem solving : •

    Induction • Planning • Search, knowledge representation • Pattern recognition • Learning Components needed to get to human-level AI
  6. Topics for Machine Learning • Self-driving cars • Human interaction

    : ◦ Handwriting ◦ Speech ◦ Natural language • OCR • Image recognition • Information retrieval • Artificial personal assistants • Recommendations systems • Drones • Game playing • ...
  7. Intuitions from linear regression • algorithm is generic, results depends

    on data • system is both the algorithm and the data • only as good as your data • starts with a hypothesis about how we can represent the data (for linear regression : a straight line) • can deal poorly with outliers • lots of calculation to learn, but very fast to apply (can run on mobile)
  8. Artificial Neural Networks (ANN) Each node does a linear combination

    of previous nodes More nodes can handle more complexity Training in multiple steps - left to right to evaluate the training set - right to left to propagate errors
  9. IBM Watson • Healthcare : ◦ Diagnostics ◦ Tests suggestion

    ◦ Prescription recommendations • Legal : ◦ Hired as a lawyer (“Ross”) • Teaching : ◦ Used as a TA (“Jill Watson”) • Cooking : ◦ published a recipe book ◦ new combinations ◦ able to avoid allergies
  10. Cloud Vision API Available on Google Cloud Automatic labelling Sentiment

    analysis Text extraction Landmark detection Logo detection Explicit content detection
  11. SyntaxNet and Parsey McParseface Parsey McParseface can correctly read: •

    The old man the boat. • While the man hunted the deer ran into the woods. • While Anna dressed the baby played in the crib. • Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo. It makes mistakes on: • I convinced her children are noisy. • The coach smiled at the player tossed the frisbee. • The cotton clothes are made up of grows in Mississippi. • James while John had had had had had had had had had had had a better effect on the teacher
  12. Applications of NLP at Quora - automatic grammar correction -

    question quality - duplicate question detection - related question suggestion - topic biography quality (= qualifications of writer) - topic labeler (from “science” to narrow topics like “tennis Courts in Mountain View”) - search - answer summaries - automatic answers wiki - hate speech/harassment detection - spam detection - question edit quality