Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pyladies meetup: 5 Things I learned from prototyping ML papers

Pyladies meetup: 5 Things I learned from prototyping ML papers

B27e5bc114b24f86625025d4dae10184?s=128

ellenkoenig

May 15, 2018
Tweet

Transcript

  1. FIVE THINGS I LEARNED FROM TURNING RESEARCH PAPERS INTO INDUSTRY

    PROTOTYPES ELLEN KÖNIG / @ELLEN_KOENIG
  2. MOTIVATION

  3. WHEN SHOULD YOU LOOK FOR RESEARCH PAPERS? • General problem

    • No library exists
  4. A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for research

    findings 2. Decide on comparison criteria 3. Evaluate your papers 4. Prioritize approaches 5. Prototype approaches
  5. STEP 1: SEARCH FOR RESEARCH FINDINGS Compile Foundational and cutting

    edge Problems and approaches Goal: Get an overview of the field Survey papers, follow references
  6. STEP 2: DECIDE ON YOUR COMPARISON CRITERIA Summarize common metrics

    and baselines Goal: What makes a good paper? Refresher on baselines: https://www.quora.com/What-does- baseline-mean-in-machine-learning Pick simple metrics and baselines Minimally required metric targets?
  7. STEP 3: EVALUATE YOUR PAPERS ✓Groundbreaking ? Copycat XGarbage Goal:

    Identify three types of papers: Journal / conference quality? Team experience? Log your findings
  8. STEP 3: EVALUATE YOUR PAPERS — A CHECKLIST 3. Results

    2. Methodology 1. Abstract & Introduction
  9. EVALUATING PAPERS CHECKLIST: ABSTRACT & INTRODUCTION ? Question: Relevant to

    your problem? ✓Addresses your problem? ✓Similar context? ✓Approach: Groundbreaking or improvement? ✓Results: Better than targets & baseline? ✓Age: Still relevant? 3. Results 2. Methodology ✔Abstract & Introduction
  10. EXAMPLE Problem Context Approach Results

  11. EVALUATING PAPERS CHECKLIST: METHODOLOGY SECTION ? Question: Approach reproducible? ✓Data

    set size and content similar? ✓Entire process described? ✓Pre-processing steps described completely? ✓Well-known methods? ✓Or: Complete descriptions of the methods? 3. Results ✔ Methodology ✔ Abstract & Introduction
  12. EXAMPLE ✓Data set size and content similar? ✓22k black-and-white pages

    ✓German corpus ? Research documents rather than banking documents ✓Entire process described? ✓Seems to be complete ✓Pre-processing steps described completely? ✓Image conversion and scaling is described ? OCR tool / approach is not mentioned ✓Well-known methods? ✓Neural networks with descriptions of the configuration
  13. EVALUATING PAPERS CHECKLIST: RESULTS SECTION ? Question: Results reliable? ✓Relevant

    metrics? ✓Metrics appropriate for the problem and dataset? ✓Better than your baseline and metric targets? ✓Any published review of the results? ✓Improvements analyzed with suitable statistical tests? ✔ Results ✔ Methodology ✔ Abstract & Introduction
  14. EVALUATING PAPERS CHECKLIST: RESULTS SECTION • For a refreshers on

    model evaluation see http:// www.oreilly.com/data/free/files/evaluating- machine-learning-models.pdf • For a summary of statistical tests, see: http:// www.pnrjournal.com/viewimage.asp? img=JPharmNegativeResults_2010_1_2_61_7570 8_f1.jpg ✔ Results ✔ Methodology ✔ Abstract & Introduction
  15. EXAMPLE ✓Relevant metrics? Accuracy ✓Metrics appropriate for the problem? Common

    metric for classification XMetrics appropriate for the dataset? Not suitable for the given imbalanced classes ✓Better than your baseline? Yes, by 0.25 over the baseline ? Better than the metrics target? They are close ? Any published review of the results? Not yet XImprovement analyzed with suitable statistical tests? No statistical analysis, and reported measurements are not comparable
  16. STEP 4: PRIORITIZE YOUR CHOSEN APPROACHES High Effort High Impact

    Quick Wins Major Projects Thankless Tasks Fill-in Jobs
  17. STEP 5: PROTOTYPE YOUR CHOSEN APPROACHES A few recommendations Compile

    a glossary Understand all equations & code Higher level language Reference sections of papers
  18. STEP 5: PROTOTYPE YOUR CHOSEN APPROACHES A few more recommendations

    Verify under same conditions Compile the performance in a table
  19. MORE RECOMMENDATIONS http://codecapsule.com/2012/01/18/how-to- implement-a-paper/

  20. SUMMARY: A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for

    research findings 2. Decide on your comparison criteria 3. Evaluate quality, relevance and reproducibility 4. Prioritize your chosen approaches 5. Prototype the best approaches Slides will be tweetet from @ellen_koenig
  21. IMAGE CREDITS 1. Title slide: https://www.flickr.com/photos/vblibrary/6671465981 2. Slide 3: pixabay.com

    3. Slide 4: thenounproject.com •Search icon by Luis Prado •Scales icon by Veronica Karenina •Evaluation icon by Dinosoft Labs •Priorities icon by Arthur Shlain •Prototype icon by asianson design
  22. IMAGE CREDITS 1. Slide 5: thenounproject.com •Map icon by Nikita

    Kozin •Network icon by Gregor Cresner •Problem solving icon and razor blade icon by Vector Market •Bank icon by Stock image photo 2. Slide 6 •Document icon afredocreates.com/icons and flaticons.com for thenounproject.com •Bar chart icon: pixabay.com •Touch icon by Jasfart for thenounproject.com •Target icon by Libby Ventura for thenounproject.com
  23. IMAGE CREDITS 1. Slide 7: thenounproject.com • Ground breaking icon

    by faisalovers • Trash icon by UNICORN • Newspaper icon by Aman • Log icon by Sercan Dogan 2. Slide 7: Cat icon: pixabay.com 3. Slide 17: Adapted from: http://www.sixsigmadaily.com/impact- effort-matrix/
  24. IMAGE CREDITS 1. Slide 18: thenounproject.com • Table icon by

    Yu luck • Pi iconby Sumana Chamrumsorakist • Anaconda icon by parkjisun • Documents icon by Creative Stall 2. Slide 19: thenounproject.com • Maginfiying glass icon by afredocreates.com/icons and flaticons.com • Table icon by Douglas Santos 3. Slide 21: https://commons.wikimedia.org/wiki/ File:Pocketwatch_cutaway_drawing.jpg