Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Graph Databases for SQL Server Professionals

Graph Databases for SQL Server Professionals

Presented @ The Ottawa SQL Server User Group (Ottawa PASS Chapter)

Graph databases are used to represent graph structures with nodes, edges and properties. Neo4j, an open-source graph database is reliable and fast for managing and querying highly connected data. Will explore how to install and configure, create nodes and relationships, query with the Cypher Query Language, importing data and using Neo4j in concert with SQL Server... Providing answers and insight with visual diagrams about connected data that you have in your SQL Server Databases!

Stéphane Fréchette

September 18, 2014
Tweet

More Decks by Stéphane Fréchette

Other Decks in Technology

Transcript

  1. Graph  Databases     for  SQL  Server  Professionals    

                    Stéphane  Fréche9e   Thursday  September  18,  2014  
  2. Who  am  I?   My  name  is  Stéphane  Fréche2e  

      SQL  Server  MVP  |  Consultant  |  Speaker  |  Database  &  BI  Architect  |  NoSQL.   Drums,  good  food  and  fine  wine.  Founder  @ukubu,  @GaNneauOuverte,   @TEDxGaNneau     I  have  a  passion  for  architecNng,  designing  and  building  soluNons  that   ma2er.       Twi2er:  @sfreche2e   Blog:  stephanefreche2e.com   Email:  [email protected]        
  3. Session  Outline   •  What  is  a  Graph?   • 

    What  is  Neo4j?   •  Data  Modeling  –  The  Property  Graph   •  Cypher  Query  Language   •  ImporNng  Data…   •  Use  Cases   •  Demos   •  Resources      
  4. What  is  Neo4j?   An  open-­‐source  graph  database  by  Neo

      Technology.  Neo4j  stores  data  in  nodes   connected  by  directed,  typed  relaLonships   with  properLes  on  both,  also  know  as  a   Property  Graph     •  Fully  ACID  compliant   •  Massively  scalable,  up  to  several  billion  nodes/ relaNonships/properNes   •  Highly-­‐available,  when  distributed  across  mulNple   machines   •  Accessible  by  a  convenient  REST  interface  or  an   object-­‐oriented  Java  API    
  5. Data  Modeling   From  SQL  Server  to  Graph    

                Property  Graph  
  6. Example:  Meetup  Data  In  SQL  Server   ID   Member

      1   Daniel   2   Stephane   3   John   4   Randy   ID   Name   1   O2awa  SQL  Server  User  Group   2   O2awa  JavaScript   3   O2awa  Visio  User  Group   4   O2awa  Tableau  User  Group   5   Dirty  Dancing  O2awa   MemberID   MeetupID   2   1   1   2   3   3   2   4   3   5   MemberID   MeetupID   3   1   3   2   4   2   4   4   1   5   Member   Meetup   MeetupOrganizer   MeetupMember  
  7. Example:  Meetup  Data  In  a  Graph   Member   Meetup

      name:  ‘Stephane’   name:  ‘O2awa  Tableau  User  Group’   name:  ‘O2awa  SQL  Server  User  Group’   name:  ‘John’   name:  ‘O2awa  JavaScript’   name:  ‘Dirty  Dancing  O2awa’   name:  ‘O2awa  Visio  User  Group’   name:  ‘Randy’   name:  ‘Daniel’   IS_ORGANIZER   IS_ORGANIZER   IS_ORGANIZER   IS_ORGANIZER   IS_ORGANIZER   IS_MEMBER   IS_MEMBER   IS_MEMBER   IS_MEMBER  
  8. Cypher  Query  Language   Cypher  is  a  declaraNve  graph  query

     language  that  allows  for  expressive  and   efficient  querying  and  updaNng  of  the  graph  store     •  Pa2ern-­‐matching   •  DeclaraNve:  what  to  retrieve,  not  how  to  retrieve  it   •  Inspired  from  other  known  Language  (SQL,  SPARQL,  Haskell,  Python)   •  AggregaNon,  Ordering,  Limit   •  Update  the  Graph            
  9. Cypher  and  T-­‐SQL   Cypher  also  has  a  number  of

     keywords  that  have  a  direct  equivalence  with  SQL   which  makes  it  a  curiously  familiar  language   •  WHERE   •  ORDER  BY   •  LIMIT   •  SUM,  COUNT,  STDEVP,  MIN,  MAX  etc…   •  LTRIM,  UPPER,  LOWER,  REPLACE,  LEFT,  RIGHT,  SUBSTRING   •  DISTINCT   •  CASE   (SQL  Server  Pros)  –  [:WILL_LOVE]  -­‐>  (Cypher)  
  10. ImporLng  Data…   Some  important  consideraNons…   Different  import  scenarios

        •  Dataset  size:  1000s,  100000s,  10000000s   •  Dataset  format  (source):  Database,  File  (CSV,  Spreadsheet,  GraphML,  Geoff),  Service,  Other   •  Import  type:  IniNal  Bulk  Load,  Incremental  Load,  IniNal  Bulk  Load  +  Incremental  Load     Different  import  tools     •  Spreadsheet  based   •  Neo4j-­‐shell  based:  (Cypher,  neo4j-­‐shell-­‐tools,  Cypher  LOAD  CSV)   •  Command-­‐line  based:  Batch  Importer   •  Neo4j  Brower  based   •  ETL  Tools:  (Talend,  Mulesou,  Pentaho  Ke2le)   •  Custom  souware:  (Java  API,  REST  API,  Spring  Data  Neo4j)    
  11. Many  different  mappings         Not  always  clear

     what  you  should  be  using     Depends  on  your  skillsets,  dataset  size…  (lots  of  other  stuff)     Choose  wisely!       Import   Scenarios   Import  Tools  
  12. ImporLng  using  Spreadsheets   Very  small  size  datasets  <  1000,

     easy  to  use       Format  data  in   spreadsheet   Generate  Cypher   statements  with   formulas   Copy  and  Execute   Cypher  in  Neo4j   browser  
  13. ImporLng  using  neo4j-­‐shell-­‐tools   Small  to  medium  size  datasets  

    h2ps://github.com/jexp/neo4j-­‐shell-­‐tools     Format  data  in  CSV   files   Create  import-­‐cypher   commands  for   neo4j-­‐shell-­‐tools   Execute  commands   from  neo4j-­‐shell  
  14. ImporLng  using  LOAD  CSV   NaNve  Cypher      

    Format  data  in   CSV  files   Create     “LOAD  CSV”   commands   Execute   command  from   neo4j-­‐shell  or   browser   AddiNonal   “cleanup”  for   Labels  and   RelTypes  
  15. ImporLng  using  Batch  Importer   Non-­‐transacNonal  import,  suited  for  very

     very  large  datasets     Format  data  in   TSV  files   Execute  Batch   Import  command   Copy  store  files   to  Neo4j  Server   directory   Start  Neo4j  Server   with  generated   store  files  
  16. Use  Cases   Principal  uses  of  Graph  Database  include  

      •  Network  and  Data  Center  Management    (Queries:  Impact  Analysis,  Root  Cause  Analysis,  Quality-­‐of-­‐Service  Mapping,  Asset  Management)   •  AuthorizaNon  and  Access    (Queries  :  Access  Management,  Interconnected  Group  OrganizaNon,  Provenance)   •  Social    (Queries  :  Friend  RecommendaNons,  Sharing  &  CollaboraNon,  Influencer  Analysis)   •  Geo    (Queries  :  RouNng,  LogisNcs,  Capacity  Planning)   •  RecommendaNons    (Queries  :  Product,  Social,  Service,  and  Professional  RecommendaNons)   •  Fraud  DetecNon       h2p://www.neotechnology.com/neo4j-­‐use-­‐cases/  
  17. Resources   •  Neo  Technology  h2p://www.neotechnology.com/   •  Neoj.org  (Learn,

     Develop,  Downloads,…)  h2p://www.neo4j.org/   •  Neo4j  on  Vimeo  h2p://vimeo.com/neo4j   •  Neo4j  on  SlideShare  h2p://www.slideshare.net/neo4j   •  Neo4j  on  Github  h2ps://github.com/neo4j   •  Neo4j  Cypher  Cheat  Sheet  h2p://docs.neo4j.org/refcard/2.1/   •  Neo4j  Graph  Database  as  a  Service  h2p://www.graphenedb.com/   •  Linkurious  –  The  easiest  way  to  explore  graph  databases   h2p://linkurio.us/   •  KeyLines-­‐  Visualize  dynamic  networks  h2p://keylines.com/   •  Experiments  with  NEO4J:  Using  a  graph  database  as  a  SQL  Server   metadata  hub  h2p://bit.ly/V2PrxN   •  Kenny  Bastani  h2p://www.kennybastani.com/   •  Rik  Van  Bruggen  h2p://blog.bruggen.com/   •  Max  de  Marzi  h2p://maxdemarzi.com/   •  Be2er  Souware  Development  h2p://jexp.de/blog/   •  Graph  Databases  (Free  Book)  h2p://graphdatabases.com/     •  Neo4j  GraphGist  h2p://gist.neo4j.org/   •  GraphConnect    Conference  h2p://graphconnect.com/   •  Titan  –  Distributed  Graph  Database   h2ps://thinkaurelius.github.io/Ntan/   •  InfiniteGraph  h2p://www.infinitegraph.com/   •  OrientDB  h2p://www.orientechnologies.com/   •  Cayley  by  Google  h2ps://github.com/google/cayley