Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Pydata BLN 2018: Five things I learned while prototyping ML papers

Pydata BLN 2018: Five things I learned while prototyping ML papers

ellenkoenig

July 07, 2018
Tweet

More Decks by ellenkoenig

Other Decks in Technology

Transcript

  1. FIVE THINGS I LEARNED WHILE
    PROTOTYPING ML PAPERS
    ELLEN KÖNIG / @ELLEN_KOENIG
    SENIOR DATA SCIENTIST
    NATIVE INSTRUMENTS

    View full-size slide

  2. A LONG, LONG
    TIME AGO…
    IN AN OFFICE
    FAR AWAY…

    View full-size slide

  3. HOW MANY OF YOU CAN
    RELATE TO OUR PROBLEM?

    View full-size slide

  4. BUT WORK IS ALL ABOUT
    GROWTH, RIGHT??

    View full-size slide

  5. WHEN SHOULD YOU LOOK FOR
    RESEARCH PAPERS?
    • „Somebody must
    have solved this
    before!“
    • No ready-to-use
    implementation

    View full-size slide

  6. A WORKFLOW FOR
    PROTOTYPING ML PAPERS
    1. Search for
    research findings
    2. Decide on
    comparison criteria
    3. Evaluate your
    papers
    4. Prioritize
    approaches
    5. Prototype
    approaches

    View full-size slide

  7. STEP 1: SEARCH FOR RESEARCH
    FINDINGS
    Needed:
    An overview of the field

    View full-size slide

  8. COMPILING AN OVERVIEW OF
    THE FIELD
    Compile
    Foundational and
    cutting edge papers
    Problems and
    approaches
    Start with survey
    papers,
    follow references

    View full-size slide

  9. STEP 2: DECIDE ON YOUR
    COMPARISON CRITERIA

    View full-size slide

  10. WHICH PAPERS ARE RIGHT FOR
    YOU?
    Summarize
    common metrics
    and baselines
    Refresher on baselines: https://www.quora.com/What-does-
    baseline-mean-in-machine-learning
    Pick simple
    metrics and
    baselines
    Minimally required
    metric targets?

    View full-size slide

  11. STEP 3: EVALUATE YOUR
    PAPERS
    Groundbreaking? Copycat? Garbage?
    Journal / conference
    quality?
    Team experience?

    View full-size slide

  12. STEP 3: EVALUATE YOUR
    PAPERS — A CHECKLIST
    3. Results
    2. Methodology
    1. Abstract & Introduction

    View full-size slide

  13. ABSTRACT & INTRODUCTION
    Addresses your problem?
    Similar context?
    Approach: Groundbreaking or
    improvement?
    Results: Better than
    targets & baseline?
    Main question: Relevant to your problem?
    3. Results
    2. Methodology
    ✔Abstract & Introduction

    View full-size slide

  14. METHODOLOGY SECTION
    Main question: Approach reproducible?
    Data set size and content similar?
    ✓22k black-and-white pages
    ✓German corpus
    ? Research documents rather than
    banking documents
    3. Results
    ✔ Methodology
    ✔ Abstract & Introduction

    View full-size slide

  15. METHODOLOGY SECTION
    Entire process described?
    ✓Seems to be complete
    Pre-processing steps described completely?
    ✓Image conversion and scaling is described
    ? OCR tool / approach is not mentioned
    Well-known methods? Or completely described
    methods?
    ✓Neural network with descriptions of the
    configuration
    3. Results
    ✔ Methodology
    ✔ Abstract & Introduction

    View full-size slide

  16. RESULTS SECTION
    Main question: Results reliable?
    Relevant metrics for your use case?
    ✓Accuracy
    Metrics appropriate for the problem?
    ✓Common metric for classification
    Metrics appropriate for the dataset?
    XNot suitable for imbalanced classes
    ✔ Results
    ✔ Methodology
    ✔ Abstract & Introduction

    View full-size slide

  17. RESULTS SECTION
    Better than your baseline?
    ✓Yes, by 0.23 over the baseline
    Better than the metrics target?
    ? They are close
    Any published review of the results?
    ? Not yet
    Improvement analyzed with suitable statistical tests?
    X No statistical analysis, and reported
    measurements are not comparable
    ✔ Results
    ✔ Methodology
    ✔ Abstract & Introduction

    View full-size slide

  18. RESULTS SECTION
    • For a refreshers on model evaluation see http://
    www.oreilly.com/data/free/files/evaluating-
    machine-learning-models.pdf
    • For a summary of statistical tests, see: http://
    www.pnrjournal.com/viewimage.asp?
    img=JPharmNegativeResults_2010_1_2_61_7570
    8_f1.jpg ✔ Results
    ✔ Methodology
    ✔ Abstract & Introduction

    View full-size slide

  19. STEP 4: PRIORITIZE YOUR
    CHOSEN APPROACHES

    View full-size slide

  20. PRIORIZATION MATRIX
    High
    Effort
    High
    Impact
    Quick
    Wins
    Major
    Projects
    Thankless
    Tasks
    Fill-in
    Jobs

    View full-size slide

  21. STEP 5: PROTOTYPE YOUR
    CHOSEN APPROACHES

    View full-size slide

  22. A FEW RECOMMENDATIONS
    Compile a glossary
    Understand all
    equations & code
    Higher level
    language
    Reference
    sections of
    papers

    View full-size slide

  23. A FEW MORE
    RECOMMENDATIONS
    Verify under same conditions
    Compile the
    performance in a table

    View full-size slide

  24. MORE RECOMMENDATIONS
    http://codecapsule.com/2012/01/18/how-to-
    implement-a-paper/

    View full-size slide

  25. SUMMARY: WHEN SHOULD YOU
    LOOK FOR RESEARCH PAPERS?
    • „Somebody must
    have solved this
    before!“
    • No ready-to-use
    implementation

    View full-size slide

  26. SUMMARY: A WORKFLOW FOR
    PROTOTYPING ML PAPERS
    1. Search for research findings
    2. Decide on your comparison criteria
    3. Evaluate quality, relevance and reproducibility
    4. Prioritize your chosen approaches
    5. Prototype the best approaches

    View full-size slide

  27. HAVE (MORE ) FUN PROTOTYPING!
    Slides will be tweeted from @ellen_koenig

    View full-size slide

  28. IMAGE CREDITS
    • Title slide: https://www.flickr.com/photos/
    vblibrary/6671465981
    • Slide 2: Google calendar & maps
    • Slide 4: https://www.datasciencecentral.com/
    profiles/blogs/140-machine-learning-formulas
    • Slide 6: https://pixabay.com/de/bremer-
    stadtmusikanten-skulptur-2444326/

    View full-size slide

  29. IMAGE CREDITS
    • Slide 8 & 28: pixabay.com
    • Slide 9: thenounproject.com
    • Search icon by Luis Prado
    • Scales icon by Veronica Karenina
    • Evaluation icon by Dinosoft Labs
    • Priorities icon by Arthur Shlain
    • Prototype icon by asianson design

    View full-size slide

  30. IMAGE CREDITS
    • Slide 10
    • https://pixabay.com/en/book-address-book-learning-
    learn-1171564/
    • https://en.wikipedia.org/wiki/Map#/media/
    File:World_Map_1689.JPG
    • Slide 11: thenounproject.com
    • Network icon by Gregor Cresner
    • Problem solving icon and razor blade icon by Vector Market
    • Bank icon by Stock image photo

    View full-size slide

  31. IMAGE CREDITS
    • Slide 12: https://pxhere.com/en/photo/536212
    • Slide 13
    • Bar chart icon: pixabay.com
    • Touch icon by Jasfart for thenounproject.com
    • Target icon by Libby Ventura for
    thenounproject.com

    View full-size slide

  32. IMAGE CREDITS
    • Slide 14: thenounproject.com
    • Ground breaking icon by faisalovers
    • Trash icon by UNICORN
    • Newspaper icon by Aman
    • Slide 14: Cat icon: pixabay.com
    • Slide 22: https://pxhere.com/en/photo/109282
    • Slide 23: Adapted from: http://www.sixsigmadaily.com/impact-effort-matrix/
    • Slide 24: https://pixnio.com/objects/computer/programming-code-
    programmer-coding-coffee-cup-computer-copy-hands-computer-keyboard

    View full-size slide

  33. IMAGE CREDITS
    1. Slide 25: thenounproject.com
    • Table icon by Yu luck
    • Pi icon by Sumana Chamrumsorakist
    • Anaconda icon by parkjisun
    • Documents icon by Creative Stall
    2. Slide 26: thenounproject.com
    • Maginfiying glass icon by afredocreates.com/icons and flaticons.com
    • Table icon by Douglas Santos
    3. Slide 29: https://commons.wikimedia.org/wiki/
    File:Pocketwatch_cutaway_drawing.jpg

    View full-size slide