Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Design the graph data with nosql

Design the graph data with nosql

簡單描述何謂 NoSQL, 與 SQL 的差異, 何謂圖形資料, 幾個使用 NoSQL 實現圖形資料的方式.

Describe what is nosql, difference between nosql and sql, what is graph data, and the overview of how we implement the graph data with nosql.

vonstark

May 09, 2012
Tweet

More Decks by vonstark

Other Decks in Technology

Transcript

  1. Programmer @ NeoArk Newbie but like : Ruby , Rails

    , Neo4j , Riak, Scale.... Contact : @vonstark32 http://nosql.org.tw http://neo4j.tw http://vonstark.co Wednesday, May 9, 12
  2. SQL? Example : Online Game - Bid Scenario : Looking

    for trustable & affordable equipment Description : 1. In this town 2. Seller has reputation more than 3 3. Recommended by friends or same Guild 4. Attack 80+ 5. Order by bids Wednesday, May 9, 12
  3. SQL Columns Equipment Id Item_name Attack Require Level 1 Golden

    Knife 87 52 2 Silver Knife 77 50 TownAuction Id Item_id Town_id Seller_id 355 1 1 5 356 2 2 6 MemberReputation Id member_id receive_id 10 5 1 15 2 2 MemberRecommend Id member_id receive_id 13 7 5 22 9 5 Member Id ME 1 2 3 4 5 6 MemberFriend member_id receive_id 1 3 3 1 3 5 4 5 5 3 5 4 MemberGuild member_id guild_id 1 8 2 8 3 5 4 5 5 7 6 8 MemberBid Id member_id auction_id 99 33 5 102 39 5 Auction Id member_id auction_id 99 33 5 102 39 5 Wednesday, May 9, 12
  4. Or ? SELECT a.inV FROM graph as a WHERE a.outV=?

    SELECT b.inV FROM graph as a, graph as b WHERE a.inV=b.outV ANDa.outV=? SELECT c.inV FROM graph as a, graph as b, graph as c WHEREa.inV=b.outV AND b.inV=c.outV AND a.outV=? SELECT d.inV FROM graph as a, graph as b, graph as c, graph as d WHERE a.inV=b.outV AND b.inV=c.outV AND c.inV=d.outV AND a.outV=? SELECT e.inV FROM graph as a, graph as b, graph as c, graph as d, graph as e WHERE a.inV=b.outV AND b.inV=c.outV AND c.inV=d.outV ANDd.inV=e.outV AND a.outV=? Wednesday, May 9, 12
  5. Problems? • Implicit graph • Schema is un-flexible • Complex

    data structure • Hard & slow to traverse with deep level (joins) • Hard to scale Wednesday, May 9, 12
  6. If you still want to use MySQL with Graph.... http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html

    http://rgl.rubyforge.org/rgl/index.html http://www.slideshare.net/quipo/rdbms-in-the-social-networks-age http://www.slideshare.net/PerconaPerformance/trees-and-more-with-post q-l Wednesday, May 9, 12
  7. Definition G = (V, E), where V represents the set

    of vertices (nodes) E represents the set of edges (links) Both vertices and edges may contain additional information Wednesday, May 9, 12
  8. Traditional Graph • Graph : Neo4j, Dex, Sones, Infinite, Allegro...

    • Pros : Traverse is fast, easy. Modeling Less. • Cons : Scaleable? Fail-Over? Who use it? Wednesday, May 9, 12
  9. RDF (Resource Description Framework) Subject Predicate (property) Object Examples :

    1) <#i> <#love> <#you>. 2) <#i> <#10billion> <#wallet>. 3) <#i> have <#10billion> in my <#wallet>. 4) <#sara> <#age> 24; <#tall> 189 . <#andy> <#age> 22; <#tall> 180 . <#peter> <#age> 33; <#tall> 175 . Wednesday, May 9, 12
  10. Edge & Vertex my_friend my_friend started_from : 2010.05.07 via :

    Web Meetup started_from 2011.03.05 via : Facebook Request A B Age: 23 Gender : male Job : Engineer Age: 18 Gender : female Job : Designer Wednesday, May 9, 12
  11. Hyper Graph • Graph : HyperGraph, Neo4j • Pros :

    Higher order, n-arys • Cons : who use it? Wednesday, May 9, 12
  12. Map/Reduce Graph • Most BigData or Key/Value + Map/Reduce (

    Hadoop , CouchDB , MongoDB, Cassandra, Riak ... ) • Pros : Easy to scale, Proven example (Hadoop) • Cons : Map/Reduce is hard to write, and it’s not really for graph processing Wednesday, May 9, 12
  13. FoF who lives in Taipei also like Jazz Taipei M

    Jazz Live Like Know Scenario Wednesday, May 9, 12
  14. Introduce Riak • Document-Oriented Key/Value • OpenSource • Amazon Dynamo,

    Riak-Ring Easy to scale • A(vailability) P(artition) & Eventually Consistency • HTTP / Protocol Buffer Interface • Written in Erlang ( stable & high concurrent ) http://wiki.basho.com Wednesday, May 9, 12
  15. Introduce Neo4j • Graph Data Model • OpenSource with GPL

    v3 • Master/Slave ( Horizontal Scale is in plan ) • A(vailability) C(onsistency) I(solation) D(urability) • Various Clients & Cyper, Gremlin, Sparql Support • Written in Java, Cross Platform http://neo4j.org Wednesday, May 9, 12
  16. Introduce Redis • In-Memory Key/Value ( persist by dump to

    file disk ) • C(onsistency) , P(artition) & Some Hack A(vailability) • Master/Slave Replication • Sponsored by VmWare !!! • Various clients & Map/Reduce with Lua • Written in ANSI C , Works in Linux, *BSD, OSX http://redis.io Wednesday, May 9, 12
  17. Redis Graph http://amix.dk/blog/post/19592 from redis_wrap import * #--- Edges ----------------------------------------------

    def add_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) edges.add( to_node ) def delete_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y ) def has_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) return to_node in edges def neighbors(node_x, system='default'): return get_set( node_x, system=system ) #--- Node values ---------------------------------------------- def get_node_value(node_x, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).get( node_key ) def set_node_value(node_x, value, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).set( node_key, value ) #--- Edge values ---------------------------------------------- def get_edge_value(edge_x, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).get( edge_key ) def set_edge_value(edge_x, value, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).set( edge_key, value ) Wednesday, May 9, 12
  18. Introduce Gremlin • syntax based on XPath • support all

    graph dbs that implement the BluePrints • native support for Java, Scala, Groovy http://gremlin.tinkerpop.com/ Wednesday, May 9, 12
  19. Some basis • g => Clinet • g.V => Vertexs

    • g.E => Edges • g.id => identifier of element • .out(E/V) => outgoing vertices/edges • .in(E/V) => incoming vertices/edges • .both => both vertices/edges • .filter => filter with conditions • .has => allow if has property • .hasNot => allow if has no property • .back => back to n-steps results • .or => emit if any pipes • .and => emit if all pipes • .as => names the previous steps Wednesday, May 9, 12
  20. Introduce SPARQL • Inspired by SQL • RDF Compatible •

    recommend by W3C with semantic web • standard by RDF Data Access Working Group http://www.w3.org/TR/rdf-sparql-query/ Wednesday, May 9, 12
  21. Some basis Prefix Declarations : abbreviating URI namespaces e.g. Prefix

    namespace Result Clause : what to return from query e.g. SELECT ?name ..... Query Pattern : specifying what to query in dataset e.g. WHERE { .... } Query Modifier : slicing , ordering , or any that rearranging results e.g. ORDER BY ... LIMIT ... OFFSET ... Variables : have a ‘?’ prepended e.g. ?name http://eneumann.org/talks/Sparql_tutorial.html#(1) Wednesday, May 9, 12
  22. SPARQL example SELECT * WHERE{ ?member :friend ?f. ?f :friend

    ?fof. ?fof :live ?lives_in. ?fof :like ?likes. FILTER{?lives_in=”Taipei” && ?likes=”Jazz”} } Wednesday, May 9, 12
  23. Cypher • Implemented in Scala with parser combinator • Only

    for Neo4j • Inspired by SQL & SPARQL Wednesday, May 9, 12
  24. Cyper basis START : Where to start? Can be a

    (Node) ID, or Index MATCH : usually after START, for Traverse purpose WHERE : Filter the traversed results RETURN : the format of return Wednesday, May 9, 12
  25. Cypher Example START von=node:node_auto_index(name = 'Von') MATCH von-[:friend]->()-[:friend]->fof, fof-[:lives]->city, fof-[:likes]->interest

    WHERE city.name=‘Taipie’ AND interest.name=‘Jazz’ RETURN fof.name Wednesday, May 9, 12
  26. Traverse Patterns • Backtrack • Except/Retain • Flow Rank •

    Path • Loop • Split/Merge • Map/Reduce • Tree • Pattern Match Pattern Wednesday, May 9, 12
  27. Key/Value with Graph? Wait!! You just told me graph is

    easier to implement Key/Value Wednesday, May 9, 12
  28. Key/Value with Graph? Wait!! You just told me graph is

    easier to implement Key/Value Wednesday, May 9, 12
  29. Reasons • Schema free • Scaling without pain • Map

    the data and Reduce the complexity • Been proved Wednesday, May 9, 12
  30. Facts Facebook - Hadoop cluster with more than 1PB data,

    and 2TB new data daily Yahoo - Hadoop cluster with more than 4PB data with Webmap Google - More than 20PB data with Map/Reduce everyday. Wednesday, May 9, 12
  31. Some Terms • Phase - A step within the job

    • Job - Sequence of phases & inputs • Map - Data Collection phase • Shuffle and Sort - Global Sort • Reduce - Data Collection or processing phase Wednesday, May 9, 12
  32. Map/Reduce Differences - distributed • Hadoop - across multiple machines

    • Riak - across multiple machines • CouchDB - run over all docs in single database • MongoDB - not spread across multiple machines Wednesday, May 9, 12
  33. Map/Reduce Differences - execution • Hadoop - parallels • Riak

    - parallels • CouchDB - run over all docs in single database • MongoDB - not parallel Wednesday, May 9, 12
  34. Map Reduce Limitation • Shared Global synchronization • Real time

    • Small datasets http://mapreduce.me Wednesday, May 9, 12
  35. MapReduce Alternative • Pregel - Inspired by Bulk Synchronous Parallel

    • Dryad - Streaming process algorithm as arbitrary dataflow graphs Designed for large scale graph algorithms Mystery Acyclic graph Vertex - developer-specified computations Edges - data channels that capture dependencies Wednesday, May 9, 12
  36. Graph Frameworks • Giraph • GraphLab • Phoebus • Golden

    Orb • Signal/Collect • Spark • Piccolo • HaLoop Wednesday, May 9, 12
  37. NoSQL Taiwan Need You http://nosql.org.tw Join Us : http://fb.nosql.org.tw Discussion

    : http://hbase.nosql.org.tw http://mongo.nosql.org.tw http://redis.nosql.org.tw http://riak.nosql.org.tw http://couchdb.nosql.org.tw Looking for * Speaker * Contributor * Supports Wednesday, May 9, 12