How many times can you circle
around the earth with our 1.1 Billion
images of the railways?
10cm x 15cm
Slide 5
Slide 5 text
4.4 x times
How many times can you circle
around the earth with our 1.1 Billion
images of the railways?
Slide 6
Slide 6 text
WHY DO WE USE MACHINE
LEARNING?
RELIABLE TRAIN
TRAVEL
SAFER TRAIN
TRAVEL
SUSTAINABLE
MAINTENANCE
EFFICIENT
MAINTENCE
Slide 7
Slide 7 text
WHY DO WE USE MACHINE
LEARNING?
Functionality Location Placing dates
Slide 8
Slide 8 text
HOW DO WE USE MACHINE
LEARNING?
Slide 9
Slide 9 text
HOW DO WE USE MACHINE
LEARNING?
VIDEO INSPECTION
TRAIN
CONFIGURATION
TEAM (6) SLEEPER: WOOD
= DAMAGED
TASKFORCE
SEGMENTATION
CLASSIFICATION
CONDITION
TEAM (4)
Slide 10
Slide 10 text
HOW DO WE USE MACHINE
LEARNING? camera 0 camera 9
camera 8
camera 2
camera 4 camera 5 camera 3
camera 6 camera 7
camera 1
HOW DO WE USE MACHINE
LEARNING?
Sleepers
Joints
No Joint Joint
TYPE: NS90
Slide 13
Slide 13 text
The challenges we face from PoCs to production
within ProRail’s business processes
1. Loss of knowledge
2. Dependency on suppliers and other teams
3. Translating predictions to useful input
CHALLENGE
S
Slide 14
Slide 14 text
CHALLENGE #1:
LOSS OF
KNOWLEDGE
From an ML
perspective
• The code
From a business
perspective
• Domain specific
knowledge
Slide 15
Slide 15 text
Loss of Knowledge: The Code
1. Unorganized
2. No software
engineering practices:
Difficult to debug,
harder to test code and
write clean code
3. Hard to work with Git
4. No automatic processes
Slide 16
Slide 16 text
Lessons Learned: Getting more
organized!
The Code of Honor
Slide 17
Slide 17 text
The Code of Honor’s 4
commandments
1. Thou shalt write clean code --> Formatting, linting
2. Thou shalt make thy work reproducible --> Venv,
dependencies in order
3. Thou shalt test your code --> unittesting, code coverage
4. Thou shalt use Git to automate checks of the
commandments --> pre-commit and pipelines
Slide 18
Slide 18 text
HOW DO WE ACHIEVE THIS?
CLEAN CODE
FLAKE8
REPRODUCIBLE WORK
TEST WORK
1. requirements.txt
2. Virtual Environments
3. Version packages
1. unittest and pytest
2. Code coverage ≥
80%
PRE-COMMIT (LOCAL) DEVOPS (CLOUD)
PUSH
IMPROVEMENTS
NEEDED
1. Pipelines doing a
final check that’s
also done in pre-
commit
2. Checks from
colleagues
MERGE TO REPO
VANTAGE 101
Slide 19
Slide 19 text
Loss of Knowledge: Domain
Knowledge
Rail knowledge
GIS
IT infrastructure
Slide 20
Slide 20 text
LESSONS LEARNED: LOSS OF
KNOWLEDGE
1. Let people take ownership! For every confluence page make sure someone owns that
knowledge. If colleagues change they can easily take over the ownership since you’ve
documented it.
2. Create a stakeholder map. Let the people know where they can go to for what
knowledge.
3. Make tools to make your life easier
Slide 21
Slide 21 text
CHALLENGE #2: DEPENDENCY ON
SUPPLIERS AND OTHER TEAMS
Data Dependency Organisational Dependency
Slide 22
Slide 22 text
DATA QUALITY: IMAGES
Slide 23
Slide 23 text
DATA QUALITY: METADATA
Slide 24
Slide 24 text
DATA DRIFT
Data-drift is defined as a variation in
the production data from the data that
was used to test and validate the
model before deploying it in
production.
Slide 25
Slide 25 text
DATA DRIFT
Seasonal change
JOINT
New Cameras
Slide 26
Slide 26 text
Make good
agreements! Think
of the long-term
1. EXPLAINABLE AI
2. DATA DRIVEN
APPROACH
LESSONS LEARNED: DEPENDENCY ON
SUPPLIERS AND OTHER TEAMS
WARNING!
Data quality
Data drift
Supplier
LABEL: Positive (there is a joint in this image).
But where does my model look at to decide this?
Slide 27
Slide 27 text
CHALLENGE #2: DEPENDENCY ON
SUPPLIERS AND OTHER TEAMS
Supplier
Data team SAP team
GIS team
Slide 28
Slide 28 text
LESSONS LEARNED: DEPENDENCY ON
SUPPLIERS AND OTHER TEAMS
1. Plan ahead and manage relations
2. Explain your processes to your stakeholders
3. Communicate frequently with teams you and other stakeholders depend on
4. Compromise between business needs and model needs
Slide 29
Slide 29 text
CHALLENGE #3: TRANSLATE YOUR
PREDICTIONS TO USEFUL INPUT
Who is our user?
Precise
Experienced
Specific rail
knowledge
Not a data
scientist
Slide 30
Slide 30 text
LESSONS LEARNED: TRANSLATE YOUR
PREDICTIONS TO USEFUL INPUT
1. Chew everything out. We are not dealing with data scientists so you have to show what is
possible.
2. Visualize as much as possible
3. Think from the business needs and not your outputs.
1. Use the right metrics to evaluate your model’s success.
2. Know how the user will process your results and translate your input to that.
4. Connect as much sources as possible to your outcomes. It gives a complete overview of
checks to be done.
Slide 31
Slide 31 text
1. Make agreements about and organize your processes, let
people take ownership of these processes and make sure
they are checked (digitally and by colleagues)
2. Make agreements for the long-term! Don’t think in fast and
snappy fixes but do what is best for the data science
processes since it will be the best thing to do for the
business as well.
3. Place yourself in the shoes of the user(s)! You can have the
best performing model, but it won’t be used if nobody
understands your output.
MAIN TAKEAWAYS