Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Seeing a Better World with Big and Heavy GeoSpatial Data at DigitalGlobe

Elastic Co
October 06, 2015

Seeing a Better World with Big and Heavy GeoSpatial Data at DigitalGlobe

DigitalGlobe is an industry leader in GeoSpatial Search and Analytics Handling data at a ludicrous scale. They utilize Elasticsearch to drill down on their 90PB data archive of vector imagery. This talk focuses on how and why DigitalGlobe chose Elasticsearch and how they've architected and deployed it to give their clients - including aid organizations in Nepal - instant access to their full geospatial database in order to deliver life saving services.

Elastic Co

October 06, 2015
Tweet

More Decks by Elastic Co

Other Decks in Technology

Transcript

  1. DigitalGlobe  Proprietary  and  Business  Confiden9al   Seeing  A  Be)er  World

        ….with  Big  and  Heavy  Data   Mark  Giaconia  
  2. 4   •  Green  Beret  turned  computer  scien9st   • 

    Big  Data  ecosystem  developer   •  Impressionist  Oil  Painter   •  Flamenco  Guitarist   •  Open  Source  enthusiast;  Apache  OpenNLP  commiIer   •  Father  of  an  awesome  nerdy  family   About  me    
  3. 5   •  Take  the  best  pictures  of  the  earth

     in  the  world   (we’ve  covered  it  about  8  9mes)   - In  30CM  squares  J   - The  highest  resolu9on  and  posi9onal  accuracy  in  the   commercial  world   •  Provide  GBDx  PaaS  for  exploi9ng  our  spectral   data   - Google  maps  and  Bing  only  show  you  an  RGB.  Our   data  goes  beyond  RGB   •  Maintain  a  digital  inventory  of  the  surface  of  the   Earth   •  Deep  Learning  for  object  detec9on  at  global   scale   •  Vectors  ,  AKA  “layers”  (This  is  where   Elas9cSearch  comes  in…)   What  We  Do  
  4. 6   6   Ludicrous  Scale   •  90  PB

     Archive   •  We  beam  down  ~70  TB  per  day   •  Mul9spectral  pixels:  8  or  16  band  +   •  Millions  of  ~30GB  files  =  Heavy   - Images   •  Billions  of  Vectors.*  =  Big   - Social,  PAI,  Vectors  (OSM  etc)  
  5. 7   Our  data  is  Heavy…   but  we  turn

     it  into  Big  to  answer  ques9ons     GBDx  PlaCorm  
  6. :   10   GBDx  PlaCorm:   Unified  Vector  Index

     PaaS   Billions  of  Shapes   90PB  archive  
  7. 11   •  Any  Vector  of  any  kind   • 

    Write  your  own  data  via  OATH  Rest  calls  (async  and  sync)   •  Query  your  own  data  by  9me,  loca9on,  text,  query   •  Share  data  globally   •  Analyze  data  in  unconven9onal  geo  ways   •  Upload  to  our  S3  dropbox   •  Analyst  access  data  via  QGIS  and  an  ArcMap  addin:  both  open   source   •  NOGIS:  “Not  Only  GIS”   Unified  Vector  Index   Data   Logis+cs  
  8. 13   •  Because…   -  Geospa9al  Big  Data  is

     even  nasty-­‐er  than  regular  Big  Data   -  Everyone  draws  or  generates  their  own  vectors   -  into  their  own  data  models     -  using  their  own  schema  or  no  schema  at  all     -  at  different  spa9al  scales  based  on  different  imagery     •  At  different  resolu9ons    and  veracity  levels   •  with  different  posi9onal  accuracy  and  currency   •  For  their  own  uses   •  And  it’s  massive  scale  when  you’re  trying  to  fuse  the  stuff   •  And  it’s  seriously  duplica+ve   •  And  Elas9cSearch  can…   -  Cleanly  represent  heterogeneous  data  in  a  common  json-­‐y  way   -  Provide  basic  analy9cs  over  massive  data   -  Scale  by  adding  nodes   •  Then  we  can…   -  Store,  and  index  vectors  of  any  structure  extremely  creaLvely  at  ludicrous  scale   -  aggregate,  analyze,  and  discover  data  spa@ally   -  deliver  heterogeneous  GBD  seamlessly  into  GIS  systems  and  other  tools  via  a  slick  API  that  we  put  over  it  all   -  Reduce  our  architecture  by  feeding  tools  like  Hadoop  straight  from  Elas9cSearch   Why  Elas9csearch?