Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pydata BLN 2018: Five things I learned while prototyping ML papers

Pydata BLN 2018: Five things I learned while prototyping ML papers

ellenkoenig

July 07, 2018
Tweet

More Decks by ellenkoenig

Other Decks in Technology

Transcript

  1. FIVE THINGS I LEARNED WHILE PROTOTYPING ML PAPERS ELLEN KÖNIG

    / @ELLEN_KOENIG SENIOR DATA SCIENTIST NATIVE INSTRUMENTS
  2. WHEN SHOULD YOU LOOK FOR RESEARCH PAPERS? • „Somebody must

    have solved this before!“ • No ready-to-use implementation
  3. A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for research

    findings 2. Decide on comparison criteria 3. Evaluate your papers 4. Prioritize approaches 5. Prototype approaches
  4. COMPILING AN OVERVIEW OF THE FIELD Compile Foundational and cutting

    edge papers Problems and approaches Start with survey papers, follow references
  5. WHICH PAPERS ARE RIGHT FOR YOU? Summarize common metrics and

    baselines Refresher on baselines: https://www.quora.com/What-does- baseline-mean-in-machine-learning Pick simple metrics and baselines Minimally required metric targets?
  6. STEP 3: EVALUATE YOUR PAPERS — A CHECKLIST 3. Results

    2. Methodology 1. Abstract & Introduction
  7. ABSTRACT & INTRODUCTION Addresses your problem? Similar context? Approach: Groundbreaking

    or improvement? Results: Better than targets & baseline? Main question: Relevant to your problem? 3. Results 2. Methodology ✔Abstract & Introduction
  8. METHODOLOGY SECTION Main question: Approach reproducible? Data set size and

    content similar? ✓22k black-and-white pages ✓German corpus ? Research documents rather than banking documents 3. Results ✔ Methodology ✔ Abstract & Introduction
  9. METHODOLOGY SECTION Entire process described? ✓Seems to be complete Pre-processing

    steps described completely? ✓Image conversion and scaling is described ? OCR tool / approach is not mentioned Well-known methods? Or completely described methods? ✓Neural network with descriptions of the configuration 3. Results ✔ Methodology ✔ Abstract & Introduction
  10. RESULTS SECTION Main question: Results reliable? Relevant metrics for your

    use case? ✓Accuracy Metrics appropriate for the problem? ✓Common metric for classification Metrics appropriate for the dataset? XNot suitable for imbalanced classes ✔ Results ✔ Methodology ✔ Abstract & Introduction
  11. RESULTS SECTION Better than your baseline? ✓Yes, by 0.23 over

    the baseline Better than the metrics target? ? They are close Any published review of the results? ? Not yet Improvement analyzed with suitable statistical tests? X No statistical analysis, and reported measurements are not comparable ✔ Results ✔ Methodology ✔ Abstract & Introduction
  12. RESULTS SECTION • For a refreshers on model evaluation see

    http:// www.oreilly.com/data/free/files/evaluating- machine-learning-models.pdf • For a summary of statistical tests, see: http:// www.pnrjournal.com/viewimage.asp? img=JPharmNegativeResults_2010_1_2_61_7570 8_f1.jpg ✔ Results ✔ Methodology ✔ Abstract & Introduction
  13. A FEW RECOMMENDATIONS Compile a glossary Understand all equations &

    code Higher level language Reference sections of papers
  14. SUMMARY: WHEN SHOULD YOU LOOK FOR RESEARCH PAPERS? • „Somebody

    must have solved this before!“ • No ready-to-use implementation
  15. SUMMARY: A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for

    research findings 2. Decide on your comparison criteria 3. Evaluate quality, relevance and reproducibility 4. Prioritize your chosen approaches 5. Prototype the best approaches
  16. IMAGE CREDITS • Title slide: https://www.flickr.com/photos/ vblibrary/6671465981 • Slide 2:

    Google calendar & maps • Slide 4: https://www.datasciencecentral.com/ profiles/blogs/140-machine-learning-formulas • Slide 6: https://pixabay.com/de/bremer- stadtmusikanten-skulptur-2444326/
  17. IMAGE CREDITS • Slide 8 & 28: pixabay.com • Slide

    9: thenounproject.com • Search icon by Luis Prado • Scales icon by Veronica Karenina • Evaluation icon by Dinosoft Labs • Priorities icon by Arthur Shlain • Prototype icon by asianson design
  18. IMAGE CREDITS • Slide 10 • https://pixabay.com/en/book-address-book-learning- learn-1171564/ • https://en.wikipedia.org/wiki/Map#/media/

    File:World_Map_1689.JPG • Slide 11: thenounproject.com • Network icon by Gregor Cresner • Problem solving icon and razor blade icon by Vector Market • Bank icon by Stock image photo
  19. IMAGE CREDITS • Slide 12: https://pxhere.com/en/photo/536212 • Slide 13 •

    Bar chart icon: pixabay.com • Touch icon by Jasfart for thenounproject.com • Target icon by Libby Ventura for thenounproject.com
  20. IMAGE CREDITS • Slide 14: thenounproject.com • Ground breaking icon

    by faisalovers • Trash icon by UNICORN • Newspaper icon by Aman • Slide 14: Cat icon: pixabay.com • Slide 22: https://pxhere.com/en/photo/109282 • Slide 23: Adapted from: http://www.sixsigmadaily.com/impact-effort-matrix/ • Slide 24: https://pixnio.com/objects/computer/programming-code- programmer-coding-coffee-cup-computer-copy-hands-computer-keyboard
  21. IMAGE CREDITS 1. Slide 25: thenounproject.com • Table icon by

    Yu luck • Pi icon by Sumana Chamrumsorakist • Anaconda icon by parkjisun • Documents icon by Creative Stall 2. Slide 26: thenounproject.com • Maginfiying glass icon by afredocreates.com/icons and flaticons.com • Table icon by Douglas Santos 3. Slide 29: https://commons.wikimedia.org/wiki/ File:Pocketwatch_cutaway_drawing.jpg