Slide 1

Slide 1 text

ProRail’s Story on getting Machine Learning Ready! Job Wegman (ProRail) Clint de Keizer (Vantage AI)

Slide 2

Slide 2 text

A LITTE INTRODUCT ION Clint de Keizer Machine Learning Engineer [email protected] Job Wegman Product Owner [email protected]

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

How many times can you circle around the earth with our 1.1 Billion images of the railways? 10cm x 15cm

Slide 5

Slide 5 text

4.4 x times How many times can you circle around the earth with our 1.1 Billion images of the railways?

Slide 6

Slide 6 text

WHY DO WE USE MACHINE LEARNING? RELIABLE TRAIN TRAVEL SAFER TRAIN TRAVEL SUSTAINABLE MAINTENANCE EFFICIENT MAINTENCE

Slide 7

Slide 7 text

WHY DO WE USE MACHINE LEARNING? Functionality Location Placing dates

Slide 8

Slide 8 text

HOW DO WE USE MACHINE LEARNING?

Slide 9

Slide 9 text

HOW DO WE USE MACHINE LEARNING? VIDEO INSPECTION TRAIN CONFIGURATION TEAM (6) SLEEPER: WOOD = DAMAGED TASKFORCE SEGMENTATION CLASSIFICATION CONDITION TEAM (4)

Slide 10

Slide 10 text

HOW DO WE USE MACHINE LEARNING? camera 0 camera 9 camera 8 camera 2 camera 4 camera 5 camera 3 camera 6 camera 7 camera 1

Slide 11

Slide 11 text

GEO-CODE: 117 MILEAGE: 6,239508 RD X: 139548,196 RD Y: 450651,831 TIME: 21-03-2019 13:42 CROSSING CURRENT BOX RD X: 139525,796 RD Y: 450594,283 STATUS: BROKEN JOINT RD X: 139597,648 RD Y: 450571,347 DEGRADATION: 25 %

Slide 12

Slide 12 text

HOW DO WE USE MACHINE LEARNING? Sleepers Joints No Joint Joint TYPE: NS90

Slide 13

Slide 13 text

The challenges we face from PoCs to production within ProRail’s business processes 1. Loss of knowledge 2. Dependency on suppliers and other teams 3. Translating predictions to useful input CHALLENGE S

Slide 14

Slide 14 text

CHALLENGE #1: LOSS OF KNOWLEDGE From an ML perspective • The code From a business perspective • Domain specific knowledge

Slide 15

Slide 15 text

Loss of Knowledge: The Code 1. Unorganized 2. No software engineering practices: Difficult to debug, harder to test code and write clean code 3. Hard to work with Git 4. No automatic processes

Slide 16

Slide 16 text

Lessons Learned: Getting more organized! The Code of Honor

Slide 17

Slide 17 text

The Code of Honor’s 4 commandments 1. Thou shalt write clean code --> Formatting, linting 2. Thou shalt make thy work reproducible --> Venv, dependencies in order 3. Thou shalt test your code --> unittesting, code coverage 4. Thou shalt use Git to automate checks of the commandments --> pre-commit and pipelines

Slide 18

Slide 18 text

HOW DO WE ACHIEVE THIS? CLEAN CODE FLAKE8 REPRODUCIBLE WORK TEST WORK 1. requirements.txt 2. Virtual Environments 3. Version packages 1. unittest and pytest 2. Code coverage ≥ 80% PRE-COMMIT (LOCAL) DEVOPS (CLOUD) PUSH IMPROVEMENTS NEEDED 1. Pipelines doing a final check that’s also done in pre- commit 2. Checks from colleagues MERGE TO REPO VANTAGE 101

Slide 19

Slide 19 text

Loss of Knowledge: Domain Knowledge Rail knowledge GIS IT infrastructure

Slide 20

Slide 20 text

LESSONS LEARNED: LOSS OF KNOWLEDGE 1. Let people take ownership! For every confluence page make sure someone owns that knowledge. If colleagues change they can easily take over the ownership since you’ve documented it. 2. Create a stakeholder map. Let the people know where they can go to for what knowledge. 3. Make tools to make your life easier

Slide 21

Slide 21 text

CHALLENGE #2: DEPENDENCY ON SUPPLIERS AND OTHER TEAMS Data Dependency Organisational Dependency

Slide 22

Slide 22 text

DATA QUALITY: IMAGES

Slide 23

Slide 23 text

DATA QUALITY: METADATA

Slide 24

Slide 24 text

DATA DRIFT Data-drift is defined as a variation in the production data from the data that was used to test and validate the model before deploying it in production.

Slide 25

Slide 25 text

DATA DRIFT Seasonal change JOINT New Cameras

Slide 26

Slide 26 text

Make good agreements! Think of the long-term 1. EXPLAINABLE AI 2. DATA DRIVEN APPROACH LESSONS LEARNED: DEPENDENCY ON SUPPLIERS AND OTHER TEAMS WARNING! Data quality Data drift Supplier LABEL: Positive (there is a joint in this image). But where does my model look at to decide this?

Slide 27

Slide 27 text

CHALLENGE #2: DEPENDENCY ON SUPPLIERS AND OTHER TEAMS Supplier Data team SAP team GIS team

Slide 28

Slide 28 text

LESSONS LEARNED: DEPENDENCY ON SUPPLIERS AND OTHER TEAMS 1. Plan ahead and manage relations 2. Explain your processes to your stakeholders 3. Communicate frequently with teams you and other stakeholders depend on 4. Compromise between business needs and model needs

Slide 29

Slide 29 text

CHALLENGE #3: TRANSLATE YOUR PREDICTIONS TO USEFUL INPUT Who is our user? Precise Experienced Specific rail knowledge Not a data scientist

Slide 30

Slide 30 text

LESSONS LEARNED: TRANSLATE YOUR PREDICTIONS TO USEFUL INPUT 1. Chew everything out. We are not dealing with data scientists so you have to show what is possible. 2. Visualize as much as possible 3. Think from the business needs and not your outputs. 1. Use the right metrics to evaluate your model’s success. 2. Know how the user will process your results and translate your input to that. 4. Connect as much sources as possible to your outcomes. It gives a complete overview of checks to be done.

Slide 31

Slide 31 text

1. Make agreements about and organize your processes, let people take ownership of these processes and make sure they are checked (digitally and by colleagues) 2. Make agreements for the long-term! Don’t think in fast and snappy fixes but do what is best for the data science processes since it will be the best thing to do for the business as well. 3. Place yourself in the shoes of the user(s)! You can have the best performing model, but it won’t be used if nobody understands your output. MAIN TAKEAWAYS

Slide 32

Slide 32 text

QUESTIONS? Visit our stand #53

Slide 33

Slide 33 text

Contact VANTAGE AI PRORAIL Coltbaan 4C Moreelsepark 3 3439 NG, NIEUWEGEIN 3511 EP, UTRECHT Email: [email protected] Email: [email protected]