Seeing a Better World with Big and Heavy GeoSpatial Data at DigitalGlobe

October 06, 2015

Seeing a Better World with Big and Heavy GeoSpatial Data at DigitalGlobe

DigitalGlobe is an industry leader in GeoSpatial Search and Analytics Handling data at a ludicrous scale. They utilize Elasticsearch to drill down on their 90PB data archive of vector imagery. This talk focuses on how and why DigitalGlobe chose Elasticsearch and how they've architected and deployed it to give their clients - including aid organizations in Nepal - instant access to their full geospatial database in order to deliver life saving services.


     What We Do

•  Take the best pictures of the earth in the world (we've covered it about 8 times)
- In 30CM squares
- The highest resolution and positional accuracy in the commercial world
•  Provide GBDx PaaS for exploiting our spectral data
- Google maps and Bing only show you an RGB. Our data goes beyond RGB
•  Maintain a digital inventory of the surface of the Earth
•  Deep Learning for object detection at global scale
•  Vectors, AKA "layers" (This is where ElasticSearch comes in…)  
     Ludicrous Scale
•  90 PB Archive
•  We beam down ~70 TB per day
•  Multispectral pixels: 8 or 16 band +
•  Millions of ~30GB files = Heavy
- Images
•  Billions of Vectors.* = Big
- Social, PAI, Vectors (OSM etc)  
     Heavy Data   Big Data   Image Mining

     GBDx Platform  
      GBDx  PlaCorm  
    GBDx  PlaCorm  
     PaaS   Billions  of  Shapes   90PB  archive  
    Unified Vector Index Data Logistics

•  Any Vector of any kind
•  Write your own data via OATH Rest calls (async and sync)
•  Query your own data by time, location, text, query
•  Share data globally
•  Analyze data in unconventional geo ways
•  Upload to our S3 dropbox
•  Analyst access data via QGIS and an ArcMap addin: both open source
•  NOGIS: "Not Only GIS"  
     Why Elasticsearch?

•  Because…
-  Geospatial Big Data is even nasty-er than regular Big Data
-  Everyone draws or generates their own vectors
-  into their own data models
-  using their own schema or no schema at all
-  at different spatial scales based on different imagery
•  At different resolutions and veracity levels
•  with different positional accuracy and currency
•  For their own uses
•  And it's massive scale when you're trying to fuse the stuff
•  And it's seriously duplicative
•  And ElasticSearch can…
-  Cleanly represent heterogeneous data in a common json-y way
-  Provide basic analytics over massive data
-  Scale by adding nodes
•  Then we can…
-  Store, and index vectors of any structure extremely creatively at ludicrous scale
-  aggregate, analyze, and discover data spatially
-  deliver heterogeneous GBD seamlessly into GIS systems and other tools via a slick API that we put over it all
-  Reduce our architecture by feeding tools like Hadoop straight from ElasticSearch  
  Thank You Questions / Comments Mark.giaconia@digitalglobe.com @giaconiamark