Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Presentation to Google and ESRC for Google Data Analytics Social Science Research Call

nickbearman
January 14, 2015
720

Presentation to Google and ESRC for Google Data Analytics Social Science Research Call

nickbearman

January 14, 2015
Tweet

More Decks by nickbearman

Transcript

  1. Dr  Nick  Bearman,  CGeog  (GIS),  AHEA   Department  of  Geography

     and  Planning   CO2  Emissions  and     Home  to  School  Travel   @nickbearmanuk   hDp://www.fotolog.com/luckyshot/41568423/   hDps://twiDer.com/Peter_Tennant/status/444404118330044416  
  2. CO2  Emissions  &  Home  to  School  Travel   •  Project

     overview   •  Findings   •  Challenges   •  OpportuniYes   •  SuggesYons  
  3. Home  to  School  Travel   •  About  7.5m  school  aged

     children  in  England   •  (Most)  have  to  travel  from  home  to  school   D  Sharon  PruiD  hDp://www.flickr.com/photos/pinksherbet/234942843/   hDp://www.flickr.com/photos/bike/8560715649/in/photostream/   hDp://en.wikipedia.org/wiki/ File:Alpine_Travel_bus_DVG517_(YMB_517W)_1981_Bristol_VRT_SL3_ECW,_10_July_2006.jpg  
  4. •  Why  is  it  important?   – AcYve  Transport   – CO2

     emissions  /  air  polluYon   – CongesYon   Data  sources   •  School  Census     – Pupil  home  postcode   – “Usual”  mode  of  travel  (11  opYons)   hDp://chestercycling.files.wordpress.com/ 2011/02/cimg2369.jpg   hDp://staYc2.stuff.co.nz/1333093574/087/6670087.jpg  
  5. TradiYonal  esYmaYon  technique   •  Euclidean  distance     • 

    Straight  lines  will   typically  underesYmate   true  distances     •  Average  emissions   values  for  mode  of   transport     •  No  sensiYvity  to   different  vehicle  types  
  6. •  Street  network  -­‐  OpenStreetMap  &  RouYno   •  Home:

     L11  4SH    School:  L11  0BP    Mode:  WLK   •  Process  was  repeated     for  each  school  child   •  Processing  Yme     ~8.5  days  in  total  
  7. Technical  challenges   •  Large  data  sets   •  MulYple

     rouYng  modes   •  Long  run  Ymes   •  ConfidenYal  data   – Home  postcode   Arne  Hückelheim  (author)  hDp://en.wikipedia.org/wiki/File:SunsetTracksCrop.JPG  
  8. •  Google  Compute  Engine  –  need  data  to  stay  in

      EU,  ideally  UK  –  Data  Protec*on  &  Dept.  of  Edu.   •  Can’t  though  -­‐  “No  guarantee  that  your  data  at   rest  is  kept  only  in  that  region”      (then  15/03/2014)     •  4.1  Data  Storage.  …  may  determine  ..  data  ..   stored  permanently,  at  rest,  in  either  the  United   States  or  the  European  Union….   •  4.2  Transient  Storage.  …  may  be  stored   transiently  or  cached  in  any  country  in  which   Google  or  its  agents  maintain  faciliYes    (now  05/01/2015)   hDps://cloud.google.com/terms/service-­‐terms   hDps://developers.google.com/compute/docs/faq#selectedcountries  
  9. •  Limit  on  Google  RouYng  API   – 2500  within  24

     hours   •  Google,  Bing  and  RouYno  rouYng  soluYons   highly  comparable  
  10. Findings   •  Non  geo  models  underesYmate   emissions  for

     urban  areas   •  Emissions  increased  with     each  academic  year   •  SelecYve  &  religious  schools   have  higher  emissions   •  Can  model  altering  the  %  of     pupils  who  use  acYve  travel  
  11. v   •  d   12032   11966   Bearman,

     N.  &  Singleton,  A.D.   (2014)  Modelling  the  potenYal  impact  on   CO2  emissions  of  an  increased  uptake  of   acYve  travel  for  the  home  to  school   commute  using  individual  level  data,  Journal   of  Transport  &  Health,  1(4)  p.  295–304.   doi:10.1016/j.jth.2014.09.009,  Open  Access  
  12. Challenges   •  Cloud  processing  –   geographic  locaYon  

    •  IntegraYon  of  open  source   and  private  /  confidenYal   data   •  Bus  routes  –  Local  Authority   specific   hDp://www.u.com/cms/s/0/ e2672ccc-­‐349f-­‐11e2-­‐8986-­‐00144feabdc0.html#axzz 2xAmBRk6u  
  13. OpportuniYes   •  Enables  CO2  esYmates  and  policy  targeYng  at

      higher  resoluYon   •  Methodology  developed  can  be  applied  across   many  types  of  transport  and  many  areas  (inc.   health  access)   •  All  code  in  public  domain,  applicaYon  to  assist   this  analysis  at  LA  level  could  be  developed.    
  14. OpportuniYes   •  Data  from  this  project  has  been  

    included  in  the  Transport  Map  Book   •  For  each  Local  Authority  in  England   hDp://www.alex-­‐singleton.com/r/2014/09/09/Transport-­‐Map-­‐Book/   hDp://data.alex-­‐singleton.com/transport/E09000007_Camden.pdf   Camden   325  LAs   58  variables   18,850  total  
  15. SuggesYons   •  Geographic  restricYons  on  cloud  storage  and  

    processing   •  EducaYon  of  data  suppliers  –  they  are  the   gatekeepers   •  Reduce  /  remove  academic  usage  limits   •  Promote  tools  through  academic  channels   “We  provide  digital   soluYons  for  UK  educaYon   and  research”  
  16. Extra  Slides   •  Presented  at  GISRUK2014  at  the  University

     of   Glasgow  on  18th  April  2014   •  hDp://www.nickbearman.me.uk/2014/04/ modelling-­‐home-­‐to-­‐school-­‐travel-­‐for-­‐state-­‐ pupils-­‐in-­‐england-­‐2008-­‐2011/  
  17. Dr  Nick  Bearman,  CGeog  (GIS)   Department  of  Geography  and

     Planning   Modelling  home  to  school  travel  for   state  pupils  in  England,  2008-­‐2011   @nickbearmanuk   hDp://www.fotolog.com/luckyshot/41568423/   hDps://twiDer.com/Peter_Tennant/status/444404118330044416  
  18. Overview   •  Home  to  school  travel   •  Data,

     technical  challenges     &  modeling  routes   •  Primary  to  Secondary   transiYon  
  19. Home  to  School  Travel   •  About  7.5m  school  aged

     children  in  England   •  (Most)  have  to  travel  from  home  to  school   D  Sharon  PruiD  hDp://www.flickr.com/photos/pinksherbet/234942843/   hDp://www.flickr.com/photos/bike/8560715649/in/photostream/   hDp://en.wikipedia.org/wiki/ File:Alpine_Travel_bus_DVG517_(YMB_517W)_1981_Bristol_VRT_SL3_ECW,_10_July_2006.jpg  
  20. •  What  influences  choice?   – Distance   – Pupil  age  

    – Road  infrastructure   – Family  factors   •  Why  is  it  important?   – AcYve  Transport   – CO2  emissions  /  air  polluYon   – CongesYon   hDp://chestercycling.files.wordpress.com/ 2011/02/cimg2369.jpg   hDp://staYc2.stuff.co.nz/1333093574/087/6670087.jpg  
  21. Data  sources   •  Pupil  Level  Annual  School  Census  (2008-­‐2011)

        – Pupil  home  postcode   – “Usual”  mode  of  travel  (11  opYons)   – Data  for  each  year   – State  schools  only  (Independent  schools  ~  7%)   •  Edubase  -­‐  school  informaYon   •  CO2  emissions  by  travel  mode  
  22. TradiYonal  esYmaYon  technique   •  Euclidean  distance     • 

    Average  emissions   values  for  mode  of   transport     •  Straight  lines  will   typically  underesYmate   true  distances     •  No  sensiYvity  to   different  vehicle  types  
  23. Technical  challenges   •  Large  data  sets   •  MulYple

     rouYng  modes   •  Long  run  Ymes   •  ConfidenYal  data   Arne  Hückelheim  (author)  hDp://en.wikipedia.org/wiki/File:SunsetTracksCrop.JPG  
  24. RouYng:  School  census   •  5m  school  children  with  records

     for  2008-­‐2011   3.8m  (75.9%)   •  with  complete  mode  data   •  Not  outliers   – Tukey  outlier  (Tukey,  1977)   – weighted  by  mode  (m),  year  (a),    local  authority  (g)   – not  too  far  from  staYon   – journey  not  too  long  
  25. RouYng:  Mode  of  travel   •  Home:  L11  4SH  

     School:  L11  0BP    Mode:  WLK   •  Street  network   •  Non-­‐street  network   C.  G.  P.  Grey    hDp://en.wikipedia.org/wiki/File:Citadis_dublin.jpg  
  26. •  Street  network  -­‐  OpenStreetMap  &  RouYno   – Walk,  Cycle,

     Car  (+  Car  Share),  Taxi     – Bus  (public,  school,  unknown)   Text  file:   •  Distance   •  Route  
  27. Home:  L11  4SH    School:  L11  0BP    Mode:  WLK

      •  R  then  called  RouYno  or  pgRouYng   •  Process  was  repeated  for  each  school  child   •  And  for  each  year  (2008-­‐2011)   – If  either  Mode,  Start  postcode  or  End  postcode   were  different   •  Processing  Yme  ~8.5  days  in  total  
  28. Why  these  programs?   •  RouYno   – Easy  way  of

     geyng  street  based  routes   •  pgRouYng   – Routes  for  custom  networks   •  R  -­‐  Open  Source   – can  handle  big  data  (o/n  running)   •  OS  X  &  R     – good  command  line  interface   – stable  (?)  
  29. Big  Data  and  the  Cloud   •  Run  Yme  an

     issue   •  But  confidenYal  data   •  Cloud  soluYons  complex  for  permissions   •  “Where  is  the  data  stored?”     •  Keeping  locally  is  a  soluYon   hDp://www.u.com/cms/s/0/ e2672ccc-­‐349f-­‐11e2-­‐8986-­‐00144feabdc0.html#axzz2xAmBRk6u  
  30. Primary  to  Secondary  mode  choice   •  Big  change  primary

     to  secondary   – longer  distances   – Bus  travel   – limited  average  change  in  mode  auer  this   hDp://pixabay.com/en/boy-­‐girl-­‐hand-­‐in-­‐hand-­‐kids-­‐ school-­‐160168/  
  31. Maj.  of  BUS  is  DSB   Secondary:  Walk  to  2.75km

     (1.75m),  then  Bus  (+  Car)   Primary:  Walk  to  1.25km  (0.75m),  then  Car  (+  Bus)   Maj.  of  NON  is  WLK   Distance  
  32. CO2  Emissions   •  Have  routes  for  all  school  traffic

      •  Can  calculate  the  CO2  impact  of  school  traffic     •  Can  model  altering  the  %  of  pupils  who  use   acYve  travel  
  33. Conclusion   •  Home  to  school  travel  is  important  

    •  Can  model  individual  routes  naYonally   •  Large  processing  can  be  done  locally   •  LimitaYons   •  Future  work