Pydata BLN 2018: Five things I learned while prototyping ML papers

Pydata BLN 2018: Five things I learned while prototyping ML papers

B27e5bc114b24f86625025d4dae10184?s=128

ellenkoenig

July 07, 2018
Tweet

Transcript

  1. FIVE THINGS I LEARNED WHILE PROTOTYPING ML PAPERS ELLEN KÖNIG

    / @ELLEN_KOENIG SENIOR DATA SCIENTIST NATIVE INSTRUMENTS
  2. A LONG, LONG TIME AGO… IN AN OFFICE FAR AWAY…

  3. HOW MANY OF YOU CAN RELATE TO OUR PROBLEM?

  4. None
  5. BUT WORK IS ALL ABOUT GROWTH, RIGHT??

  6. FORTUNATELY

  7. OUR USE CASE

  8. WHEN SHOULD YOU LOOK FOR RESEARCH PAPERS? • „Somebody must

    have solved this before!“ • No ready-to-use implementation
  9. A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for research

    findings 2. Decide on comparison criteria 3. Evaluate your papers 4. Prioritize approaches 5. Prototype approaches
  10. STEP 1: SEARCH FOR RESEARCH FINDINGS Needed: An overview of

    the field
  11. COMPILING AN OVERVIEW OF THE FIELD Compile Foundational and cutting

    edge papers Problems and approaches Start with survey papers, follow references
  12. STEP 2: DECIDE ON YOUR COMPARISON CRITERIA

  13. WHICH PAPERS ARE RIGHT FOR YOU? Summarize common metrics and

    baselines Refresher on baselines: https://www.quora.com/What-does- baseline-mean-in-machine-learning Pick simple metrics and baselines Minimally required metric targets?
  14. STEP 3: EVALUATE YOUR PAPERS Groundbreaking? Copycat? Garbage? Journal /

    conference quality? Team experience?
  15. STEP 3: EVALUATE YOUR PAPERS — A CHECKLIST 3. Results

    2. Methodology 1. Abstract & Introduction
  16. ABSTRACT & INTRODUCTION Addresses your problem? Similar context? Approach: Groundbreaking

    or improvement? Results: Better than targets & baseline? Main question: Relevant to your problem? 3. Results 2. Methodology ✔Abstract & Introduction
  17. METHODOLOGY SECTION Main question: Approach reproducible? Data set size and

    content similar? ✓22k black-and-white pages ✓German corpus ? Research documents rather than banking documents 3. Results ✔ Methodology ✔ Abstract & Introduction
  18. METHODOLOGY SECTION Entire process described? ✓Seems to be complete Pre-processing

    steps described completely? ✓Image conversion and scaling is described ? OCR tool / approach is not mentioned Well-known methods? Or completely described methods? ✓Neural network with descriptions of the configuration 3. Results ✔ Methodology ✔ Abstract & Introduction
  19. RESULTS SECTION Main question: Results reliable? Relevant metrics for your

    use case? ✓Accuracy Metrics appropriate for the problem? ✓Common metric for classification Metrics appropriate for the dataset? XNot suitable for imbalanced classes ✔ Results ✔ Methodology ✔ Abstract & Introduction
  20. RESULTS SECTION Better than your baseline? ✓Yes, by 0.23 over

    the baseline Better than the metrics target? ? They are close Any published review of the results? ? Not yet Improvement analyzed with suitable statistical tests? X No statistical analysis, and reported measurements are not comparable ✔ Results ✔ Methodology ✔ Abstract & Introduction
  21. RESULTS SECTION • For a refreshers on model evaluation see

    http:// www.oreilly.com/data/free/files/evaluating- machine-learning-models.pdf • For a summary of statistical tests, see: http:// www.pnrjournal.com/viewimage.asp? img=JPharmNegativeResults_2010_1_2_61_7570 8_f1.jpg ✔ Results ✔ Methodology ✔ Abstract & Introduction
  22. STEP 4: PRIORITIZE YOUR CHOSEN APPROACHES

  23. PRIORIZATION MATRIX High Effort High Impact Quick Wins Major Projects

    Thankless Tasks Fill-in Jobs
  24. STEP 5: PROTOTYPE YOUR CHOSEN APPROACHES

  25. A FEW RECOMMENDATIONS Compile a glossary Understand all equations &

    code Higher level language Reference sections of papers
  26. A FEW MORE RECOMMENDATIONS Verify under same conditions Compile the

    performance in a table
  27. MORE RECOMMENDATIONS http://codecapsule.com/2012/01/18/how-to- implement-a-paper/

  28. SUMMARY: WHEN SHOULD YOU LOOK FOR RESEARCH PAPERS? • „Somebody

    must have solved this before!“ • No ready-to-use implementation
  29. SUMMARY: A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for

    research findings 2. Decide on your comparison criteria 3. Evaluate quality, relevance and reproducibility 4. Prioritize your chosen approaches 5. Prototype the best approaches
  30. HAVE (MORE ) FUN PROTOTYPING! Slides will be tweeted from

    @ellen_koenig
  31. IMAGE CREDITS • Title slide: https://www.flickr.com/photos/ vblibrary/6671465981 • Slide 2:

    Google calendar & maps • Slide 4: https://www.datasciencecentral.com/ profiles/blogs/140-machine-learning-formulas • Slide 6: https://pixabay.com/de/bremer- stadtmusikanten-skulptur-2444326/
  32. IMAGE CREDITS • Slide 8 & 28: pixabay.com • Slide

    9: thenounproject.com • Search icon by Luis Prado • Scales icon by Veronica Karenina • Evaluation icon by Dinosoft Labs • Priorities icon by Arthur Shlain • Prototype icon by asianson design
  33. IMAGE CREDITS • Slide 10 • https://pixabay.com/en/book-address-book-learning- learn-1171564/ • https://en.wikipedia.org/wiki/Map#/media/

    File:World_Map_1689.JPG • Slide 11: thenounproject.com • Network icon by Gregor Cresner • Problem solving icon and razor blade icon by Vector Market • Bank icon by Stock image photo
  34. IMAGE CREDITS • Slide 12: https://pxhere.com/en/photo/536212 • Slide 13 •

    Bar chart icon: pixabay.com • Touch icon by Jasfart for thenounproject.com • Target icon by Libby Ventura for thenounproject.com
  35. IMAGE CREDITS • Slide 14: thenounproject.com • Ground breaking icon

    by faisalovers • Trash icon by UNICORN • Newspaper icon by Aman • Slide 14: Cat icon: pixabay.com • Slide 22: https://pxhere.com/en/photo/109282 • Slide 23: Adapted from: http://www.sixsigmadaily.com/impact-effort-matrix/ • Slide 24: https://pixnio.com/objects/computer/programming-code- programmer-coding-coffee-cup-computer-copy-hands-computer-keyboard
  36. IMAGE CREDITS 1. Slide 25: thenounproject.com • Table icon by

    Yu luck • Pi icon by Sumana Chamrumsorakist • Anaconda icon by parkjisun • Documents icon by Creative Stall 2. Slide 26: thenounproject.com • Maginfiying glass icon by afredocreates.com/icons and flaticons.com • Table icon by Douglas Santos 3. Slide 29: https://commons.wikimedia.org/wiki/ File:Pocketwatch_cutaway_drawing.jpg