WWC Talk: Five things I learned from turning research papers into industry prototypes

FIVE THINGS I LEARNED FROM TURNING RESEARCH PAPERS INTO INDUSTRY
PROTOTYPES ELLEN KÖNIG / @ELLEN_KOENIG

MOTIVATION

WHEN SHOULD YOU LOOK FOR RESEARCH PAPERS? • General problem
• No library exists

A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for research
ﬁndings 2. Decide on your comparison criteria 3. Evaluate quality, relevance and reproducibility 4. Prioritize your chosen approaches 5. Prototype the best approaches

STEP 1: SEARCH FOR RESEARCH FINDINGS • Goal: Get an
overview of the ﬁeld with Google Scholar • Start with survey papers, follow references • Compile your ﬁndings: • Common problems & approaches • Foundational and cutting edge papers

STEP 2: DECIDE ON YOUR COMPARISON CRITERIA • Goal: Decide
what will be a good paper • Summarize common metrics and baselines • Pick a few simple metrics and baselines • Decide which metric target is minimally required • Refresher on baselines: https://www.quora.com/ What-does-baseline-mean-in-machine-learning

STEP 3: EVALUATE YOUR PAPERS • Goal: Identify three types
of papers: ✓Groundbreaking ? Copycat XGarbage • Check journal / conference quality and team experience • Keep a log of your evaluation results

STEP 3: EVALUATE YOUR PAPERS — A CHECKLIST 3. Results
2. Methodology 1. Abstract & Introduction

EVALUATING PAPERS CHECKLIST: ABSTRACT & INTRODUCTION ? Question: Is the
approach relevant to your problem? ✓Problem: Do they address your problem? ✓Context:Is it similar to yours? ✓Approach: Is it groundbreaking or an improvement? ✓Results: Better than your targets & baseline? ✓Age: How old is the paper? 3. Results 2. Methodology ✔Abstract & Introduction

EXAMPLE Problem Context Approach Results

EVALUATING PAPERS CHECKLIST: METHODOLOGY SECTION ? Question: Will you be
able to reproduce the approach? ✓Are the data set size and data types similar to yours? ✓Do they describe the entire process they used? ✓Are the pre-processing steps described completely? ✓Do they use standard methods? ✓If not, do they provide complete algorithmic descriptions of the methods? 3. Results ✔ Methodology ✔ Abstract & Introduction

EXAMPLE ✓Is the data set size and content similar to
yours? ✓22k black-and-white pages ✓German corpus ? Research documents rather than banking documents ✓Do they describe the entire process they used? ✓Seems to be complete ✓Are the pre-processing steps described completely? ✓Image conversion and scaling is described ? OCR tool / approach is not mentioned ✓Are the ML and statistical methods standard methods? ✓Standard methods (neural networks) with descriptions of the conﬁguration

EVALUATING PAPERS CHECKLIST: RESULTS SECTION ? Question: Are the results
reliable? ✓Are the metrics relevant to your problem? ✓Are the metrics appropriate for the type of ML problem? ✓Are the metrics appropriate for the dataset (imbalanced classes, outliers, …)? ✔ Results ✔ Methodology ✔ Abstract & Introduction

EVALUATING PAPERS CHECKLIST: RESULTS SECTION ? Question continued: Are the
results reliable? ✓Are the results better than your baseline? ✓Are the results better than your metric targets? ✓Was any critique or review of the results published? ✓Are improvement over existing methods analyzed with proper statistical tests? ✔ Results ✔ Methodology ✔ Abstract & Introduction

EVALUATING PAPERS CHECKLIST: RESULTS SECTION • For a refreshers on
model evaluation see http:// www.oreilly.com/data/free/ﬁles/evaluating- machine-learning-models.pdf • For a summary of statistical tests, see: http:// www.pnrjournal.com/viewimage.asp? img=JPharmNegativeResults_2010_1_2_61_7570 8_f1.jpg ✔ Results ✔ Methodology ✔ Abstract & Introduction

EXAMPLE ✓Are the metrics relevant to your problem? Accuracy ✓Are
the metrics appropriate for the type of ML problem? Accuracy is a common metric for classiﬁcation XAre the metrics appropriate for the dataset? Accuracy is not as suitable for imbalanced classes, and the labels are reported as „uneven“ ✓Are the results better than your baseline? Yes, by 0.25 over the baseline ? Are the results suitable for the business problem? They are close ? Was any critique or review of the results published? Not yet XAre improvement over existing methods analyzed with proper statistical tests? No statistical analysis, and reported measurements are not comparable

STEP 4: PRIORITIZE YOUR CHOSEN APPROACHES High Effort High Impact
Quick Wins Major Projects Thankless Tasks Fill-in Jobs

STEP 5: PROTOTYPE YOUR CHOSEN APPROACHES A few recommendations •
Compile a glossary of all unfamiliar terms and methods • Make sure you understand all equations & code • Prototype ﬁrst in a higher level language (Python, R, Octave, Julia,…) • Reference papers and sections of papers in your code documentation

STEP 5: PROTOTYPE YOUR CHOSEN APPROACHES A few more recommendations
• Verify the results in the paper under the same conditions before adapting • Compile the performance of each approach in a table • More recommendations: http://codecapsule.com/ 2012/01/18/how-to-implement-a-paper/

SHORT ADVERTISEMENT FOR PEOPLE BASED IN BERLIN

BI-WEEKLY OPENTECHSCHOOL DATA SCIENCE CO-LEARNING • We are looking for
volunteer coaches and learners! • Kick-off: Thursday, 19 at 7:30 PM in Kreuzberg • https://www.meetup.com/opentechschool-berlin/ events/249735100/ • OpenTechSchool is a non-proﬁt, volunteer-run tech education community

END OF ADVERTISEMENT BLOCK

SUMMARY: A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for
research ﬁndings 2. Decide on your comparison criteria 3. Evaluate quality, relevance and reproducibility 4. Prioritize your chosen approaches 5. Prototype the best approaches Slides will be tweetet from @ellen_koenig

IMAGE CREDITS 1. Title slide: https://www.ﬂickr.com/photos/vblibrary/ 6671465981 2. Slide 3:
pixabay.com 3. Slide 4 & 22: https://commons.wikimedia.org/wiki/ File:Pocketwatch_cutaway_drawing.jpg 4. Slide 5 -7: pixabay.com 5. Slide 16: Adapted from: http://www.sixsigmadaily.com/ impact-effort-matrix/

WWC Talk: Five things I learned from turning re...

WWC Talk: Five things I learned from turning research papers into industry prototypes

ellenkoenig

More Decks by ellenkoenig

Other Decks in Technology

Featured

Transcript

FIVE THINGS I LEARNED FROM TURNING RESEARCH PAPERS INTO INDUSTRY

MOTIVATION

WHEN SHOULD YOU LOOK FOR RESEARCH PAPERS? • General problem

A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for research

STEP 1: SEARCH FOR RESEARCH FINDINGS • Goal: Get an

STEP 2: DECIDE ON YOUR COMPARISON CRITERIA • Goal: Decide

STEP 3: EVALUATE YOUR PAPERS • Goal: Identify three types

STEP 3: EVALUATE YOUR PAPERS — A CHECKLIST 3. Results

EVALUATING PAPERS CHECKLIST: ABSTRACT & INTRODUCTION ? Question: Is the

EXAMPLE Problem Context Approach Results

EVALUATING PAPERS CHECKLIST: METHODOLOGY SECTION ? Question: Will you be

EXAMPLE ✓Is the data set size and content similar to

EVALUATING PAPERS CHECKLIST: RESULTS SECTION ? Question: Are the results

EVALUATING PAPERS CHECKLIST: RESULTS SECTION ? Question continued: Are the

EVALUATING PAPERS CHECKLIST: RESULTS SECTION • For a refreshers on

EXAMPLE ✓Are the metrics relevant to your problem? Accuracy ✓Are

STEP 4: PRIORITIZE YOUR CHOSEN APPROACHES High Effort High Impact

STEP 5: PROTOTYPE YOUR CHOSEN APPROACHES A few recommendations •

STEP 5: PROTOTYPE YOUR CHOSEN APPROACHES A few more recommendations

SHORT ADVERTISEMENT FOR PEOPLE BASED IN BERLIN

BI-WEEKLY OPENTECHSCHOOL DATA SCIENCE CO-LEARNING • We are looking for

END OF ADVERTISEMENT BLOCK

SUMMARY: A WORKFLOW FOR PROTOTYPING ML PAPERS 1. Search for

IMAGE CREDITS 1. Title slide: https://www.ﬂickr.com/photos/vblibrary/ 6671465981 2. Slide 3: