Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FOSS4G NA | Taming Billions of LIDAR Points with GPU Database MapD

FOSS4G NA | Taming Billions of LIDAR Points with GPU Database MapD

LIDAR, or “Light Detection And Ranging” is a technology which generates 3D point clouds by bouncing signal off objects. It can be deployed from satellite, airplane, drone or ground-mounted sensors, and generates hundreds of millions to billions of data points. Such datasets can overwhelm conventional visualization tools and databases. For example, Florida has comprehensively mapped its coastal zones with aerial LIDAR, an open dataset of several billion points. These are managed as thousands of arbitrarily-tiled files, and must often be heavily pre-processed even to open in GIS.

The advent of open source GPU databases such as MapD allows us to visualize LIDAR data directly and completely, as well as speeding the process of generating useful derivatives. We demonstrate a use case for Florida’s Department of Environmental Protection (DEP). The DEP is currently working on a conservation plan for 16 endangered species, whose habitat extends across more than 400 miles of Florida’s beaches. They need to know not only where built structures exist on or near beaches, but also their ground and roof elevations. This is important in assessing vulnerability to sea level rise, as well as potential light pollution impacts on sea turtles. We demonstrate here how LIDAR data can be combined with parcel geographies, to estimate key 3d building characteristics at scale. This technique is not limited to coastal areas, but could prove valuable anywhere where such datasets are available.

OmniSci

May 15, 2018
Tweet

More Decks by OmniSci

Other Decks in Technology

Transcript

  1. © MapD 2018 Taming Billions of LIDAR Points with GPU

    Database MapD Mike Flaxman & Randy Zwitch May 15, 2018
  2. © MapD 2018 Introductions Randy Zwitch Senior Developer Advocate @randyzwitch

    [email protected] /in/randyzwitch/ /randyzwitch Mike Flaxman Founder, Geodesign Technologies @geodesigntech [email protected] /in/mflaxman/ /mflaxman10 slides: https://speakerdeck.com/mapd/
  3. Core Density Makes a Huge Difference 3 GPU Processing CPU

    Processing 40,000 Cores 20 Cores *fictitious example Latency Throughput CPU 1 ns per task (1 task/ns) x (20 cores) = 20 tasks/ns GPU 10 ns per task (0.1 task per ns) x (40,000 cores) = 4,000 task per ns Latency: Time to do a task. | Throughput: Number of tasks per unit time.
  4. © MapD 2018 And Now … Native Geospatial! 5 First

    Data Types • POINT • LINE • POLYGON First Functions • DISTANCE • CONTAINS Get Involved • Roadmap Being Discussed MapD (OSS) Working Group [email protected] • Beta Available Now Email Aaron - [email protected]
  5. © MapD 2018 Advanced memory management Three-tier caching to GPU

    RAM for speed and to SSDs for persistent storage 6 SSD or NVRAM STORAGE (L ) GB to TB - GB/sec CPU RAM (L ) GB to TB - GB/sec GPU RAM (L ) GB to GB - GB/sec Hot Data Speedup = x to x Over Cold Data Warm Data Speedup = x to x Over Cold Data Cold Data COMPUTE LAYER STORAGE LAYER Data Lake/Data Warehouse/System Of Record
  6. © MapD 2018 © MapD 2018 • Catch up with

    us at FOSS4G! Visit our table any time to see our latest demos Thurs 2-5pm MapD for Analysts Workshop • github.com/mapd OSS repo • mapd.cloud Get a MapD instance in less than 60 seconds • community.mapd.com Ask questions and share your experiences 7 Next Steps
  7. © MapD 2018 Thank you! Any questions? Randy Zwitch Senior

    Developer Advocate @randyzwitch [email protected] /in/randyzwitch/ /randyzwitch Mike Flaxman Founder, Geodesign Technologies @geodesigntech [email protected] /in/mflaxman/ /mflaxman10 slides: https://speakerdeck.com/mapd/
  8. The Opportunity LIDAR (light detection and ranging) is a form

    of “reality capture” which brings full 3D data into computational environments
  9. The Challenges ❖ LIDAR data is typically HUGE ❖ LIDAR

    is not natively well- structured ❖ Conventional desktop tools require massive downloads & processing
  10. Why Use a GPU Database? ❖ Can handle millions to

    billions of points natively ❖ Can “cross-filter” to conduct quality assurance ❖ Persist and store “only the good stuff” for efficient access ❖ Eliminate “piles of tiles” file management ❖ Run large queries faster than postGIS
  11. Bringing 3 Worlds Together LIDAR 3D POINT CLOUDS Collection scan

    order Specialized file formats GIS Data Many projections Specialized file formats Tabular Geo Data Two projections New Geo Ops
  12. MapD Geodata & Ops ❖ MapD is currently being extended

    to support native vector GIS datatypes ❖ Points, Lines, Polygons ❖ New Geo Functions also available ❖ ST_Distance, ST_Contains ❖ Conventional GIS works fine for 100k-1m records ❖ MapD starts to shine at 1m+
  13. Case Study: Lake Tahoe ❖ Tahoe is a beautiful and

    highly-valued place ❖ Looks like a national park… ❖ But contains 50,000 buildings ❖ It is also highly prone to fire
  14. LIDAR for Tahoe ❖ In 2011, the Tahoe Regional Planning

    Authority (TRPA) commissioned LIDAR @1ft resolution ❖ Given basin size, this ~= 12 Billion points ❖ All management agencies have GIS ❖ But few can/have made good use of this information ❖ Why? Too big, too weird, too dis-integrated
  15. Tahoe Project Context ❖ Current Fire Risk Maps are “Landscape

    Scale” ❖ Based on National 30m Landsat Classifications ❖ Ignore individual structures in the woods ❖ Tahoe’s fire risk is all about houses in the woods Los Angeles Times Syndicate Photo, Don Barletti
  16. Project Goals ❖ Ultimately, to build better fire models for

    “Wildland Urban Intermix” ❖ To start - develop an open database including urban forest structural characteristics ❖ Develop FOSS workflows for LIDAR -> MapD ❖ Characterize “low, medium & high” vertical vegetation density. “Fuel ladders” are a major concern, since they carry fire from ground into the canopy.
  17. Coordinate Systems & LIDAR ❖ LIDAR data is typically available

    in local coordinate systems ❖ MapD 3.6 supports Web Mercator & Geographic Coordinates (WGS84 lat/lon) ❖ Back-projection into Geocoordinates usually required ❖ MapD installs have “GDAL” and “PROJ” libraries available, which can perform required conversions
  18. Methods: Data Formats ❖ Found that reading LIDAR points is

    tricky ❖ All formats are unique to the data source ❖ LAS format is verbose ❖ LAZ format has very limited reading support ❖ Many libraries have complex dependencies, difficult installs
  19. Results - LIDAR formats ❖ GDAL doesn’t natively read/write LIDAR

    formats ❖ Found that “PDAL” is best open source option ❖ “libLAS” tools powerful, but Windows-based and not FOSS ❖ Library installation a big barrier for “geo” audience (especially when requires makefile hacking) ❖ Best options: running PDAL via Docker or (new) Conda install
  20. Example LIDAR->Geo CSV ❖ sudo docker run -v /projects/tahoe/lidar/:/data:z pdal/pdal:1.7

    pdal ground --classify -i /data/LID2007_098821_W.las -o /data/LID2007_098821_W_ground.csv -f filters.reprojection -- filters.reprojection.out_srs="EPSG:4326"
  21. PDAL File-based Transfer Options ❖ Have either ❖ CSV files

    by LIDAR tile, or ❖ GeoJSON by tile ❖ Can compress and concatenate Gzip files
  22. Results of File-based Imports ❖ GeoJSON roughly twice as verbose

    as CSV for points ❖ MapD recently added the ability to directly read compressed CSV files ❖ Both formats compresses down to nearly the same ❖ Reading time nearly-identical across all ❖ No major performance differences by format ❖ No reason not to use compression other than compression time
  23. MapD Import Options ❖ Two choices: front-end or back-end load

    ❖ The simplest import method is Immerse “drag and drop” (of any supported geo format) ❖ The fastest option in “COPY FROM .. WITH GEO) ❖ COPY table_name FROM full_path WITH (Geo=‘True’)
  24. Binary GIS Import - Results ❖ One standard(ish) LIDAR tile

    in binary shape file ❖ 200 M, 6.5m points ❖ Took about 15 minutes to load over network ❖ Same file, COPY FROM takes c. 10 seconds
  25. PDAL -> Shapefile? ❖ PDAL can write ESRI binary shape

    files directly (-o *.shp) ❖ MapD can read shape files directly (drag and drop or batch) ❖ Found major limitations for PDAL point files->MapD ❖ PDAL writes 3d points without attributes ❖ Can write one LIDAR channel as a ‘measure’ ❖ But MapD cannot read 3d points or measures ❖ Shapefile format limits to 2G, and MapD cannot ‘append’ geo files Currently - Technical Dead End!
  26. PDAL -> MapD Direct? ❖ PDAL has a python API

    ❖ MapD has a python API ❖ Why not pass data directly in-memory? Currently - Technical Dead End!
  27. Proof of Concept PDAL Pipeline MapD Writer Successfully loaded 42,327,667

    records to mapd table "tahoe_lidar_hag" Loading data took 68.60 seconds
  28. Case Study Conclusions ❖ LIDAR data provides information of significant

    value for urban forest characterization ❖ Our next steps will be to conduct “point in polygon” analyses using ST_Contains() to characterize vegetation/risk per parcel
  29. Future Work ❖ MapD Geo Community now forming ❖ If

    you are interested, join us! ❖ Our Geo “road map” is under active discussion