Design the graph data with nosql

Slide 1

Slide 1 text

Design the Graph Data with NoSQL Von Stark Wednesday, May 9, 12

Slide 2

Slide 2 text

Programmer @ NeoArk Newbie but like : Ruby , Rails , Neo4j , Riak, Scale.... Contact : @vonstark32 http://nosql.org.tw http://neo4j.tw http://vonstark.co Wednesday, May 9, 12

Slide 3

Slide 3 text

No to SQL? Wednesday, May 9, 12

Slide 4

Slide 4 text

SQL? Example : Online Game - Bid Scenario : Looking for trustable & affordable equipment Description : 1. In this town 2. Seller has reputation more than 3 3. Recommended by friends or same Guild 4. Attack 80+ 5. Order by bids Wednesday, May 9, 12

Slide 5

Slide 5 text

SQL Columns Wednesday, May 9, 12

Slide 6

Slide 6 text

SQL Columns Equipment Id Item_name Attack Require Level 1 Golden Knife 87 52 2 Silver Knife 77 50 TownAuction Id Item_id Town_id Seller_id 355 1 1 5 356 2 2 6 MemberReputation Id member_id receive_id 10 5 1 15 2 2 MemberRecommend Id member_id receive_id 13 7 5 22 9 5 Member Id ME 1 2 3 4 5 6 MemberFriend member_id receive_id 1 3 3 1 3 5 4 5 5 3 5 4 MemberGuild member_id guild_id 1 8 2 8 3 5 4 5 5 7 6 8 MemberBid Id member_id auction_id 99 33 5 102 39 5 Auction Id member_id auction_id 99 33 5 102 39 5 Wednesday, May 9, 12

Slide 7

Slide 7 text

Wednesday, May 9, 12

Slide 8

Slide 8 text

WhaterverModel.Join(*&^%&^%).Join(&^*%@^ %).Join(^@&^#).Join(*&&@%).Join (*&*&@).where(“^@^%#^*&@%*^#%^*%$^*%#^ %$^#@%*^&%*^&%*^%^*%*&^%&*^%*^& %&^*%*^@%*^&#%*^&%@#^*&%#*^&@# %&*^@%&^*@$%&*^$%&^*@%*&$^%&@*^ %#@#()@*&@)(*&@#)*(#&)(#*&#(*)&#)(*&#()@#*&) (#*@&#)@(*&#(&#()*#&()*#&#(*&)*(#@&)(*#@&# %*^&”).order(“*&%&^% DESC”) Wednesday, May 9, 12

Slide 9

Slide 9 text

SQL? WhaterverModel.Join(*&^%&^%).Join(&^*%@^ %).Join(^@&^#).Join(*&&@%).Join (*&*&@).where(“^@^%#^*&@%*^#%^*%$^*%#^ %$^#@%*^&%*^&%*^%^*%*&^%&*^%*^& %&^*%*^@%*^&#%*^&%@#^*&%#*^&@# %&*^@%&^*@$%&*^$%&^*@%*&$^%&@*^ %#@#()@*&@)(*&@#)*(#&)(#*&#(*)&#)(*&#()@#*&) (#*@&#)@(*&#(&#()*#&()*#&#(*&)*(#@&)(*#@&# %*^&”).order(“*&%&^% DESC”) Wednesday, May 9, 12

Slide 10

Slide 10 text

Or ? SELECT a.inV FROM graph as a WHERE a.outV=? SELECT b.inV FROM graph as a, graph as b WHERE a.inV=b.outV ANDa.outV=? SELECT c.inV FROM graph as a, graph as b, graph as c WHEREa.inV=b.outV AND b.inV=c.outV AND a.outV=? SELECT d.inV FROM graph as a, graph as b, graph as c, graph as d WHERE a.inV=b.outV AND b.inV=c.outV AND c.inV=d.outV AND a.outV=? SELECT e.inV FROM graph as a, graph as b, graph as c, graph as d, graph as e WHERE a.inV=b.outV AND b.inV=c.outV AND c.inV=d.outV ANDd.inV=e.outV AND a.outV=? Wednesday, May 9, 12

Slide 11

Slide 11 text

Wait........ Wednesday, May 9, 12

Slide 12

Slide 12 text

Wednesday, May 9, 12

Slide 13

Slide 13 text

Wednesday, May 9, 12

Slide 14

Slide 14 text

Problems? • Implicit graph • Schema is un-ﬂexible • Complex data structure • Hard & slow to traverse with deep level (joins) • Hard to scale Wednesday, May 9, 12

Slide 15

Slide 15 text

If you still want to use MySQL with Graph.... http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html http://rgl.rubyforge.org/rgl/index.html http://www.slideshare.net/quipo/rdbms-in-the-social-networks-age http://www.slideshare.net/PerconaPerformance/trees-and-more-with-post q-l Wednesday, May 9, 12

Slide 16

Slide 16 text

There has no any database can do everything for you. Wednesday, May 9, 12

Slide 17

Slide 17 text

NoSQL = Not Only SQL Wednesday, May 9, 12

Slide 18

Slide 18 text

Pick one you need Wednesday, May 9, 12

Slide 19

Slide 19 text

Where is Graph? Wednesday, May 9, 12

Slide 20

Slide 20 text

Graph Wednesday, May 9, 12

Slide 21

Slide 21 text

Graph Wednesday, May 9, 12

Slide 22

Slide 22 text

Graph Again! Wednesday, May 9, 12

Slide 23

Slide 23 text

It’s Everywhere! Wednesday, May 9, 12

Slide 24

Slide 24 text

What is Graph? Wednesday, May 9, 12

Slide 25

Slide 25 text

Deﬁnition G = (V, E), where V represents the set of vertices (nodes) E represents the set of edges (links) Both vertices and edges may contain additional information Wednesday, May 9, 12

Slide 26

Slide 26 text

How we design graph? Wednesday, May 9, 12

Slide 27

Slide 27 text

We need to make Life Easier! Wednesday, May 9, 12

Slide 28

Slide 28 text

We need to make Life Easier! Wednesday, May 9, 12

Slide 29

Slide 29 text

Traditional Graph • Graph : Neo4j, Dex, Sones, Inﬁnite, Allegro... • Pros : Traverse is fast, easy. Modeling Less. • Cons : Scaleable? Fail-Over? Who use it? Wednesday, May 9, 12

Slide 30

Slide 30 text

RDF (Resource Description Framework) Subject Predicate (property) Object Wednesday, May 9, 12

Slide 31

Slide 31 text

RDF (Resource Description Framework) Subject Predicate (property) Object Examples : 1) <#i> <#love> <#you>. 2) <#i> <#10billion> <#wallet>. 3) <#i> have <#10billion> in my <#wallet>. 4) <#sara> <#age> 24; <#tall> 189 . <#andy> <#age> 22; <#tall> 180 . <#peter> <#age> 33; <#tall> 175 . Wednesday, May 9, 12

Slide 32

Slide 32 text

Edge & Vertex Wednesday, May 9, 12

Slide 33

Slide 33 text

Edge & Vertex my_friend my_friend started_from : 2010.05.07 via : Web Meetup started_from 2011.03.05 via : Facebook Request A B Age: 23 Gender : male Job : Engineer Age: 18 Gender : female Job : Designer Wednesday, May 9, 12

Slide 34

Slide 34 text

Hyper Graph • Graph : HyperGraph, Neo4j • Pros : Higher order, n-arys • Cons : who use it? Wednesday, May 9, 12

Slide 35

Slide 35 text

Map/Reduce Graph • Most BigData or Key/Value + Map/Reduce ( Hadoop , CouchDB , MongoDB, Cassandra, Riak ... ) • Pros : Easy to scale, Proven example (Hadoop) • Cons : Map/Reduce is hard to write, and it’s not really for graph processing Wednesday, May 9, 12

Slide 36

Slide 36 text

Show me the codes! Wednesday, May 9, 12

Slide 37

Slide 37 text

FoF who lives in Taipei also like Jazz Taipei M Jazz Live Like Know Scenario Wednesday, May 9, 12

Slide 38

Slide 38 text

Introduce Riak • Document-Oriented Key/Value • OpenSource • Amazon Dynamo, Riak-Ring Easy to scale • A(vailability) P(artition) & Eventually Consistency • HTTP / Protocol Buffer Interface • Written in Erlang ( stable & high concurrent ) http://wiki.basho.com Wednesday, May 9, 12

Slide 39

Slide 39 text

Link as Riak me.walk({:keep=>true,:tag=>"friend"}).walk({:keep=>true,:tag=>”friend”}) .walk({:keep=>true,:tag=>”lives”}).select{|city| city.name==”Taipei” } .walk({:keep=>true,:tag=>”likes”}).select{|like| like.name==”Jazz” } Wednesday, May 9, 12

Slide 40

Slide 40 text

Introduce Neo4j • Graph Data Model • OpenSource with GPL v3 • Master/Slave ( Horizontal Scale is in plan ) • A(vailability) C(onsistency) I(solation) D(urability) • Various Clients & Cyper, Gremlin, Sparql Support • Written in Java, Cross Platform http://neo4j.org Wednesday, May 9, 12

Slide 41

Slide 41 text

Traverse in Neo4j me.outgoing(:friend).depth(2).filter{|path_to_m| path_to_m.end_node.outgoing(:lives).filter{|path_to_live| path_to_live==”Taipei” }.outgoing(:like).filter{|path_to_like| path_to_live==”Jazz” } } Wednesday, May 9, 12

Slide 42

Slide 42 text

Introduce Redis • In-Memory Key/Value ( persist by dump to ﬁle disk ) • C(onsistency) , P(artition) & Some Hack A(vailability) • Master/Slave Replication • Sponsored by VmWare !!! • Various clients & Map/Reduce with Lua • Written in ANSI C , Works in Linux, *BSD, OSX http://redis.io Wednesday, May 9, 12

Slide 43

Slide 43 text

Redis Graph http://amix.dk/blog/post/19592 from redis_wrap import * #--- Edges ---------------------------------------------- def add_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) edges.add( to_node ) def delete_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y ) def has_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) return to_node in edges def neighbors(node_x, system='default'): return get_set( node_x, system=system ) #--- Node values ---------------------------------------------- def get_node_value(node_x, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).get( node_key ) def set_node_value(node_x, value, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).set( node_key, value ) #--- Edge values ---------------------------------------------- def get_edge_value(edge_x, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).get( edge_key ) def set_edge_value(edge_x, value, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).set( edge_key, value ) Wednesday, May 9, 12

Slide 44

Slide 44 text

SQL of Graph Wednesday, May 9, 12

Slide 45

Slide 45 text

Introduce Gremlin • syntax based on XPath • support all graph dbs that implement the BluePrints • native support for Java, Scala, Groovy http://gremlin.tinkerpop.com/ Wednesday, May 9, 12

Slide 46

Slide 46 text

Some basis • g => Clinet • g.V => Vertexs • g.E => Edges • g.id => identifier of element • .out(E/V) => outgoing vertices/edges • .in(E/V) => incoming vertices/edges • .both => both vertices/edges • .filter => filter with conditions • .has => allow if has property • .hasNot => allow if has no property • .back => back to n-steps results • .or => emit if any pipes • .and => emit if all pipes • .as => names the previous steps Wednesday, May 9, 12

Slide 47

Slide 47 text

Gremlin example g.v(21).as('I') .out('friend').as('my_friend') .out('friend').as('fof') .outE[[label:‘lives’]].inV[[name:‘Taipei’]] .outE[[label:‘likes’]].inV[[name:‘Jazz’]] http://www.youtube.com/watch?v=5wpTtEBK4-E Wednesday, May 9, 12

Slide 48

Slide 48 text

Introduce SPARQL • Inspired by SQL • RDF Compatible • recommend by W3C with semantic web • standard by RDF Data Access Working Group http://www.w3.org/TR/rdf-sparql-query/ Wednesday, May 9, 12

Slide 49

Slide 49 text

Some basis Prefix Declarations : abbreviating URI namespaces e.g. Prefix namespace Result Clause : what to return from query e.g. SELECT ?name ..... Query Pattern : specifying what to query in dataset e.g. WHERE { .... } Query Modifier : slicing , ordering , or any that rearranging results e.g. ORDER BY ... LIMIT ... OFFSET ... Variables : have a ‘?’ prepended e.g. ?name http://eneumann.org/talks/Sparql_tutorial.html#(1) Wednesday, May 9, 12

Slide 50

Slide 50 text

SPARQL example SELECT * WHERE{ ?member :friend ?f. ?f :friend ?fof. ?fof :live ?lives_in. ?fof :like ?likes. FILTER{?lives_in=”Taipei” && ?likes=”Jazz”} } Wednesday, May 9, 12

Slide 51

Slide 51 text

Cypher • Implemented in Scala with parser combinator • Only for Neo4j • Inspired by SQL & SPARQL Wednesday, May 9, 12

Slide 52

Slide 52 text

Cyper basis START : Where to start? Can be a (Node) ID, or Index MATCH : usually after START, for Traverse purpose WHERE : Filter the traversed results RETURN : the format of return Wednesday, May 9, 12

Slide 53

Slide 53 text

Cypher Example START von=node:node_auto_index(name = 'Von') MATCH von-[:friend]->()-[:friend]->fof, fof-[:lives]->city, fof-[:likes]->interest WHERE city.name=‘Taipie’ AND interest.name=‘Jazz’ RETURN fof.name Wednesday, May 9, 12

Slide 54

Slide 54 text

Traverse Patterns • Backtrack • Except/Retain • Flow Rank • Path • Loop • Split/Merge • Map/Reduce • Tree • Pattern Match Pattern Wednesday, May 9, 12

Slide 55

Slide 55 text

How about Map Reduce? Wednesday, May 9, 12

Slide 56

Slide 56 text

Key/Value with Graph? Wait!! You just told me graph is easier to implement Key/Value Wednesday, May 9, 12

Slide 57

Slide 57 text

Key/Value with Graph? Wait!! You just told me graph is easier to implement Key/Value Wednesday, May 9, 12

Slide 58

Slide 58 text

Reasons • Schema free • Scaling without pain • Map the data and Reduce the complexity • Been proved Wednesday, May 9, 12

Slide 59

Slide 59 text

Facts Facebook - Hadoop cluster with more than 1PB data, and 2TB new data daily Yahoo - Hadoop cluster with more than 4PB data with Webmap Google - More than 20PB data with Map/Reduce everyday. Wednesday, May 9, 12

Slide 60

Slide 60 text

Some Terms • Phase - A step within the job • Job - Sequence of phases & inputs • Map - Data Collection phase • Shufﬂe and Sort - Global Sort • Reduce - Data Collection or processing phase Wednesday, May 9, 12

Slide 61

Slide 61 text

Map/Reduce Differences - distributed • Hadoop - across multiple machines • Riak - across multiple machines • CouchDB - run over all docs in single database • MongoDB - not spread across multiple machines Wednesday, May 9, 12

Slide 62

Slide 62 text

Map/Reduce Differences - execution • Hadoop - parallels • Riak - parallels • CouchDB - run over all docs in single database • MongoDB - not parallel Wednesday, May 9, 12

Slide 63

Slide 63 text

Map Reduce Limitation • Shared Global synchronization • Real time • Small datasets http://mapreduce.me Wednesday, May 9, 12

Slide 64

Slide 64 text

MapReduce Alternative • Pregel - Inspired by Bulk Synchronous Parallel • Dryad - Streaming process algorithm as arbitrary dataﬂow graphs Designed for large scale graph algorithms Mystery Acyclic graph Vertex - developer-speciﬁed computations Edges - data channels that capture dependencies Wednesday, May 9, 12

Slide 65

Slide 65 text

Graph Frameworks • Giraph • GraphLab • Phoebus • Golden Orb • Signal/Collect • Spark • Piccolo • HaLoop Wednesday, May 9, 12

Slide 66

Slide 66 text

NoSQL Taiwan Need You http://nosql.org.tw Join Us : http://fb.nosql.org.tw Discussion : http://hbase.nosql.org.tw http://mongo.nosql.org.tw http://redis.nosql.org.tw http://riak.nosql.org.tw http://couchdb.nosql.org.tw Looking for * Speaker * Contributor * Supports Wednesday, May 9, 12

Slide 67

Slide 67 text

We are hiring • Front-end / Back-end Engineer • UI / UX Designer Apply : [email protected] Wednesday, May 9, 12

Slide 68

Slide 68 text

Q&A Wednesday, May 9, 12