$30 off During Our Annual Pro Sale. View Details »

Classifying train and car journeys using telematics data

Classifying train and car journeys using telematics data

As insurers are moving away from traditional demographics factors to price car insurance products, telematics is becoming an increasingly popular solution to assess individual driver risk, optimise motor insurance premiums and encourage safer driving habits. As MyDrive Solutions is using a mobile application as an option to capture data, it is crucial for the company to be able to distinguish between train and car journeys to ensure that policy holders' driving behaviours are assessed fairly.

In this talk we will go through a methodology to classify train and car journeys, reviewing the various tools used to accomplish each task and highlighting the challenges encountered.

Annabelle Rolland

May 09, 2016
Tweet

More Decks by Annabelle Rolland

Other Decks in Technology

Transcript

  1. Classifying train and car journeys using telematics data

  2. Copyright © 2016 MyDrive Solutions. All rights reserved. About Me

    • Applied mathematics and statistics background • First experience in Finance (Asset Management) • “Escaped the City” late 2013 • Joined Hailo’s data team at the beginning of 2014 • Joined MyDrive Solutions in February 2016
  3. Data & Tools About MyDrive http://www.mydrivesolutions.com/

  4. Copyright © 2016 MyDrive Solutions. All rights reserved. The Data

    Team Maša Avakumović Data Scientist Copyright © 2016 MyDrive Solutions. All rights reserved.
  5. Who are MyDrive Solutions? MyDrive Solutions is a London based

    company founded in 2010 Acquired last summer by Generali, a leading insurance company serving 65 million customers in more than 60 countries We have two core pillars: data science and software engineering We are deploying robust data science to the challenges faced by insurance companies Our prime competence currently is in the driving domain We also are working on applying our technology to smart homes, e-health… Copyright © 2016 MyDrive Solutions. All rights reserved.
  6. Our Solutions Copyright © 2016 MyDrive Solutions. All rights reserved.

    • Device agnostic implementation • Accurate Data Collection • Contextualised mapping against road topography
  7. Our Solutions Copyright © 2016 MyDrive Solutions. All rights reserved.

  8. Our Solutions Copyright © 2016 MyDrive Solutions. All rights reserved.

    Drivers behavioural scoring Educational driving feedback: Top road safety tips to help improve driving habits
  9. Data & Tools The Challenge

  10. Copyright © 2016 MyDrive Solutions. All rights reserved. Why do

    we need a classifier ? • Data collected through a mobile application • Autostart feature • Want to only take into account trips when YOU are the driver
  11. Copyright © 2016 MyDrive Solutions. All rights reserved. NOT THAT

    GUY ! source: http://metro.co.uk/2011/02/01/britains-youngest-train-driver-alex-clements-is-just-17-635051/
  12. Data & Tools Data & Tools

  13. Copyright © 2016 MyDrive Solutions. All rights reserved. MyDrive -

    Trips Data Highly precise 1 second GPS data For each point: • TripId • Latitude • Longitude • Timestamp • Speed • Satellites
  14. Copyright © 2016 MyDrive Solutions. All rights reserved. HERE -

    Stations Data HERE provides data for Train Stations and Commuter Rail Stations Columns: • Station Name • Station Type • Latitude • Longitude
  15. Copyright © 2016 MyDrive Solutions. All rights reserved. Tools •

    Qgis (2.14.0-Essen) • R (3.2.3) • RStudio • caret, ggplot2, leaflet, geosphere • Spark (1.6.1) • Zeppelin Notebook • PySpark, SQL, MLlib
  16. Data & Tools Exploring Variables

  17. Copyright © 2016 MyDrive Solutions. All rights reserved. Getting the

    Variables
  18. Data & Tools Train Stations per Trip

  19. Copyright © 2016 MyDrive Solutions. All rights reserved. Train Stations

    per Trip Median: • Car: 0 • Train: 5
  20. Copyright © 2016 MyDrive Solutions. All rights reserved. Cannot just

    rely on stations!
  21. Data & Tools % Points Snapped to Roads

  22. Copyright © 2016 MyDrive Solutions. All rights reserved. % Points

    Snapped to Roads Car Train
  23. Copyright © 2016 MyDrive Solutions. All rights reserved. % Points

    Snapped to Roads Median: • Car: 57% • Train: 15%
  24. Data & Tools Path Efficiency

  25. Copyright © 2016 MyDrive Solutions. All rights reserved. Path Efficiency

    Car Train
  26. Copyright © 2016 MyDrive Solutions. All rights reserved. Path Efficiency

    Median: • Car: 66% • Train: 83%
  27. Data & Tools Speed

  28. Copyright © 2016 MyDrive Solutions. All rights reserved. Train Speed

  29. Copyright © 2016 MyDrive Solutions. All rights reserved. Standard Deviation

    Speed Median: • Car: 26 • Train: 42
  30. Copyright © 2016 MyDrive Solutions. All rights reserved. Other variables

    Other variables considered: • Breaking • Acceleration • Distance to link… Top Variables: • % Points Snapped to Roads • Train Stations per Trip • Path Efficiency
  31. Data & Tools Building the Model

  32. Copyright © 2016 MyDrive Solutions. All rights reserved. Building and

    testing the model cv_opts <- trainControl(method="cv", number=10, summaryFunction = twoClassSummary, classProbs = TRUE) my_model <- train(train_flag ~ distinct_stations + pct_kept + path_efficiency, data=model_data_training, method="glm", family="binomial", trControl=cv_opts, metric = "ROC") Accuracy: 86% Logistic Regression Reference Prediction car train car 50% 4% train 11% 36%
  33. Copyright © 2016 MyDrive Solutions. All rights reserved. Building and

    testing the model rf_opts = data.frame(.mtry=c(1:3)) my_model2 <- train(train_flag ~ distinct_stations + pct_kept + path_efficiency, data=model_data_training, method=“rf”, tuneGrid=rf_opts, n.tree=100, trControl=cv_opts, metric = "ROC") Accuracy: 98% Reference Prediction car train car 59% 0% train 2% 39% Random Forests
  34. Copyright © 2016 MyDrive Solutions. All rights reserved. Spark MLlib

  35. Copyright © 2016 MyDrive Solutions. All rights reserved.