Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Open Source GIS Architecture for Connected and Linked Data

An Open Source GIS Architecture for Connected and Linked Data

Presented by
Jerry Hayes
Frank Hardisty

More Decks by Texas Natural Resources Information System

Other Decks in Technology

Transcript

  1. World  Data  Trends   •  Huge  increase  in  data  

      volume  in  recent  years.   •  Ninety  percent  of  world’s   data  has  been  generated  in   last  two  years.   •  Much  data  is  unstructured   and  exhibits  “rela>onships”   between  objects.   Source:  IDC   Data  Volume  (Exabytes)   Are  rela%onal  databases  the  best  architecture   to  meet  all  future  GIS  data  needs?  
  2. Rela=onal  and  Graph  Databases   •  GIS  plaMorms  today  use

     rela>onal  databases  for   storing  connected  network  data  .   Example  transporta%on  network   •  Should  GIS  plaMorms  begin  to  integrate  database   architectures?  
  3. Physical    Model   Network  Model   •  A  network

     model  is  a  directed  weighted  graph.     •  Physical  proper>es  are  abstracted  as  edge   costs.   •  Network  models  are  typically  in  tabular  format.   Typical  Connected  Network  Model  
  4. Postgres  Network  Table   Simple  Network  Model  in  Postgres  

    •  A  row  represents  a  graph  {road}  edge  {segment}.   •  Each  edge  defines  a  “source”  and  “target”  node.   •  Costs  control  traversing  in  forward  and  reverse.   Implied  direc%on  
  5. Traversing  Network  Models  using  SQL   •   Use  ‘WITH  RECURSIVE’

     queries  in  Postgres   •  Deep  traversal  depths  are  prohibi>ve!   •  Performance  scales  poorly  with  graph  size!   Postgres  Recursive  SQL  Query   Postgres  Recursive  SQL  Performance  
  6. Some  Open  Source  Alterna=ves  for  Traversing   Uses  Non-­‐Na=ve  Graph

     Storage   Uses  Na=ve  Graph  Storage   •  Issues  to  consider  when  selec>ng  alterna>ve   o  Size  of  Dataset     o  Expected  Depth  of  Traversing   o  Number  of  concurrent  users   o  Degree  of  “connec>vity”   o  Complexity  of  rela>onships  
  7. Storage  &  Build  Time  Comparisons   •  Hard  Drive  Storage

     Advantage   o  pgRou>ng  was  an  order  of  magnitude  smaller.     Hard  Drive  Storage   •  Database  Build  Time  Advantage   o  Neo4j  was  much  faster.   Database  Build  Time  
  8. Run=me  Memory  Usage  Comparisons     •  Concurrent  Users  Advantage

        •  pgRou>ng  does  not  share  memory  between  threads.   •  Run>me  Memory  Advantage           •  pgRou>ng  reloads  en>re  dataset  for  each  traversal   pgRou=ng     Memory  History   Neo4j     Memory  History  
  9. Traversal  Performance  Comparisons   •  Speed  vs  Dataset  Size  Advantage

            •  Neo4j  is  constant  with  size…  pgRou>ng  degrades  with  size.   •  Traversal  Depth  Advantage    …    depends  on  applica>on.   •  For  deep  traversals  on  small  datasets       •  For  shallow  traversals  on  large  datasets      
  10. Neo4j   Performance  Query  Sta=s=cs   Postgres   Cold  Cache

      •  Neo4j  outperformed  Postgres  in  nearly  all  cases.   •  Cold  cache  observed  in  10%  of  Neo4j  queries.   •  Impact  of  Cold  Cache  was  moderate  to  severe.  
  11. §  Connects  data  objects  on  the  Seman>c    Web.  

    §  Each  data  object  is  uniquely  iden>fied  with  URI.     Linked  Data  …  the  Next  Web  Fron=er   §  Links  describe  rela>onships  between  data.   §  Rela>onships  enable  automated  data  discovery.   Links   Data   §  Traversal  depths  are  typically  shallow.  
  12. §     Server  side  is  stateless.   §   PostGIS  used

     for  ..     •  Storing  physical  model.   •  Data  visualiza>on.     §     Neo4j  used  for  …   •  Storing  logical  model   •  Graph  traversals     Open  Source  System  Architecture   Implemented  in  the  IBM  Cloud!!!  
  13. §   Provides  RESTful  API.   §  Enables  spa>al  analy>cs.  

    §  Enables  “data”  discovery.   §  Integrates  physical  and   logical  model  processing.   Implemented  in  the  IBM  Cloud!!!   Servlet  Architecture