ProRail’s Story on getting Machine Learning Ready!

ProRail’s Story on getting Machine Learning Ready! Job Wegman (ProRail)
Clint de Keizer (Vantage AI)

A LITTE INTRODUCT ION Clint de Keizer Machine Learning Engineer
[email protected] Job Wegman Product Owner [email protected]

How many times can you circle around the earth with
our 1.1 Billion images of the railways? 10cm x 15cm

4.4 x times How many times can you circle around
the earth with our 1.1 Billion images of the railways?

WHY DO WE USE MACHINE LEARNING? RELIABLE TRAIN TRAVEL SAFER
TRAIN TRAVEL SUSTAINABLE MAINTENANCE EFFICIENT MAINTENCE

WHY DO WE USE MACHINE LEARNING? Functionality Location Placing dates

HOW DO WE USE MACHINE LEARNING?

HOW DO WE USE MACHINE LEARNING? VIDEO INSPECTION TRAIN CONFIGURATION
TEAM (6) SLEEPER: WOOD = DAMAGED TASKFORCE SEGMENTATION CLASSIFICATION CONDITION TEAM (4)

HOW DO WE USE MACHINE LEARNING? camera 0 camera 9
camera 8 camera 2 camera 4 camera 5 camera 3 camera 6 camera 7 camera 1

GEO-CODE: 117 MILEAGE: 6,239508 RD X: 139548,196 RD Y: 450651,831
TIME: 21-03-2019 13:42 CROSSING CURRENT BOX RD X: 139525,796 RD Y: 450594,283 STATUS: BROKEN JOINT RD X: 139597,648 RD Y: 450571,347 DEGRADATION: 25 %

HOW DO WE USE MACHINE LEARNING? Sleepers Joints No Joint
Joint TYPE: NS90

The challenges we face from PoCs to production within ProRail’s
business processes 1. Loss of knowledge 2. Dependency on suppliers and other teams 3. Translating predictions to useful input CHALLENGE S

CHALLENGE #1: LOSS OF KNOWLEDGE From an ML perspective •
The code From a business perspective • Domain specific knowledge

Loss of Knowledge: The Code 1. Unorganized 2. No software
engineering practices: Difficult to debug, harder to test code and write clean code 3. Hard to work with Git 4. No automatic processes

Lessons Learned: Getting more organized! The Code of Honor

The Code of Honor’s 4 commandments 1. Thou shalt write
clean code --> Formatting, linting 2. Thou shalt make thy work reproducible --> Venv, dependencies in order 3. Thou shalt test your code --> unittesting, code coverage 4. Thou shalt use Git to automate checks of the commandments --> pre-commit and pipelines

HOW DO WE ACHIEVE THIS? CLEAN CODE FLAKE8 REPRODUCIBLE WORK
TEST WORK 1. requirements.txt 2. Virtual Environments 3. Version packages 1. unittest and pytest 2. Code coverage ≥ 80% PRE-COMMIT (LOCAL) DEVOPS (CLOUD) PUSH IMPROVEMENTS NEEDED 1. Pipelines doing a final check that’s also done in pre- commit 2. Checks from colleagues MERGE TO REPO VANTAGE 101

Loss of Knowledge: Domain Knowledge Rail knowledge GIS IT infrastructure

LESSONS LEARNED: LOSS OF KNOWLEDGE 1. Let people take ownership!
For every confluence page make sure someone owns that knowledge. If colleagues change they can easily take over the ownership since you’ve documented it. 2. Create a stakeholder map. Let the people know where they can go to for what knowledge. 3. Make tools to make your life easier

CHALLENGE #2: DEPENDENCY ON SUPPLIERS AND OTHER TEAMS Data Dependency
Organisational Dependency

DATA QUALITY: IMAGES

DATA QUALITY: METADATA

DATA DRIFT Data-drift is defined as a variation in the
production data from the data that was used to test and validate the model before deploying it in production.

DATA DRIFT Seasonal change JOINT New Cameras

Make good agreements! Think of the long-term 1. EXPLAINABLE AI
2. DATA DRIVEN APPROACH LESSONS LEARNED: DEPENDENCY ON SUPPLIERS AND OTHER TEAMS WARNING! Data quality Data drift Supplier LABEL: Positive (there is a joint in this image). But where does my model look at to decide this?

CHALLENGE #2: DEPENDENCY ON SUPPLIERS AND OTHER TEAMS Supplier Data
team SAP team GIS team

LESSONS LEARNED: DEPENDENCY ON SUPPLIERS AND OTHER TEAMS 1. Plan
ahead and manage relations 2. Explain your processes to your stakeholders 3. Communicate frequently with teams you and other stakeholders depend on 4. Compromise between business needs and model needs

CHALLENGE #3: TRANSLATE YOUR PREDICTIONS TO USEFUL INPUT Who is
our user? Precise Experienced Specific rail knowledge Not a data scientist

LESSONS LEARNED: TRANSLATE YOUR PREDICTIONS TO USEFUL INPUT 1. Chew
everything out. We are not dealing with data scientists so you have to show what is possible. 2. Visualize as much as possible 3. Think from the business needs and not your outputs. 1. Use the right metrics to evaluate your model’s success. 2. Know how the user will process your results and translate your input to that. 4. Connect as much sources as possible to your outcomes. It gives a complete overview of checks to be done.

1. Make agreements about and organize your processes, let people
take ownership of these processes and make sure they are checked (digitally and by colleagues) 2. Make agreements for the long-term! Don’t think in fast and snappy fixes but do what is best for the data science processes since it will be the best thing to do for the business as well. 3. Place yourself in the shoes of the user(s)! You can have the best performing model, but it won’t be used if nobody understands your output. MAIN TAKEAWAYS

QUESTIONS? Visit our stand #53

Contact VANTAGE AI PRORAIL Coltbaan 4C Moreelsepark 3 3439 NG,
NIEUWEGEIN 3511 EP, UTRECHT Email: [email protected] Email: [email protected]

ProRail’s Story on getting Machine Learning Ready!

ProRail’s Story on getting Machine Learning Ready!

Marketing OGZ
PRO

More Decks by Marketing OGZ

Featured

Transcript