Design the graph data with nosql

Design the Graph Data with NoSQL Von Stark Wednesday, May
9, 12

Programmer @ NeoArk Newbie but like : Ruby , Rails
, Neo4j , Riak, Scale.... Contact : @vonstark32 http://nosql.org.tw http://neo4j.tw http://vonstark.co Wednesday, May 9, 12

No to SQL? Wednesday, May 9, 12

SQL? Example : Online Game - Bid Scenario : Looking
for trustable & affordable equipment Description : 1. In this town 2. Seller has reputation more than 3 3. Recommended by friends or same Guild 4. Attack 80+ 5. Order by bids Wednesday, May 9, 12

SQL Columns Wednesday, May 9, 12

SQL Columns Equipment Id Item_name Attack Require Level 1 Golden
Knife 87 52 2 Silver Knife 77 50 TownAuction Id Item_id Town_id Seller_id 355 1 1 5 356 2 2 6 MemberReputation Id member_id receive_id 10 5 1 15 2 2 MemberRecommend Id member_id receive_id 13 7 5 22 9 5 Member Id ME 1 2 3 4 5 6 MemberFriend member_id receive_id 1 3 3 1 3 5 4 5 5 3 5 4 MemberGuild member_id guild_id 1 8 2 8 3 5 4 5 5 7 6 8 MemberBid Id member_id auction_id 99 33 5 102 39 5 Auction Id member_id auction_id 99 33 5 102 39 5 Wednesday, May 9, 12

Wednesday, May 9, 12

WhaterverModel.Join(*&^%&^%).Join(&^*%@^ %).Join(^@&^#).Join(*&&@%).Join (*&*&@).where(“^@^%#^*&@%*^#%^*%$^*%#^ %$^#@%*^&%*^&%*^%^*%*&^%&*^%*^& %&^*%*^@%*^&#%*^&%@#^*&%#*^&@# %&*^@%&^*@$%&*^$%&^*@%*&$^%&@*^ %#@#()@*&@)(*&@#)*(#&)(#*&#(*)&#)(*&#()@#*&) (#*@&#)@(*&#(&#()*#&()*#&#(*&)*(#@&)(*#@&# %*^&”).order(“*&%&^% DESC”)

SQL? WhaterverModel.Join(*&^%&^%).Join(&^*%@^ %).Join(^@&^#).Join(*&&@%).Join (*&*&@).where(“^@^%#^*&@%*^#%^*%$^*%#^ %$^#@%*^&%*^&%*^%^*%*&^%&*^%*^& %&^*%*^@%*^&#%*^&%@#^*&%#*^&@# %&*^@%&^*@$%&*^$%&^*@%*&$^%&@*^ %#@#()@*&@)(*&@#)*(#&)(#*&#(*)&#)(*&#()@#*&) (#*@&#)@(*&#(&#()*#&()*#&#(*&)*(#@&)(*#@&# %*^&”).order(“*&%&^%
DESC”) Wednesday, May 9, 12

Or ? SELECT a.inV FROM graph as a WHERE a.outV=?
SELECT b.inV FROM graph as a, graph as b WHERE a.inV=b.outV ANDa.outV=? SELECT c.inV FROM graph as a, graph as b, graph as c WHEREa.inV=b.outV AND b.inV=c.outV AND a.outV=? SELECT d.inV FROM graph as a, graph as b, graph as c, graph as d WHERE a.inV=b.outV AND b.inV=c.outV AND c.inV=d.outV AND a.outV=? SELECT e.inV FROM graph as a, graph as b, graph as c, graph as d, graph as e WHERE a.inV=b.outV AND b.inV=c.outV AND c.inV=d.outV ANDd.inV=e.outV AND a.outV=? Wednesday, May 9, 12

Wait........ Wednesday, May 9, 12

Problems? • Implicit graph • Schema is un-ﬂexible • Complex
data structure • Hard & slow to traverse with deep level (joins) • Hard to scale Wednesday, May 9, 12

If you still want to use MySQL with Graph.... http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html
http://rgl.rubyforge.org/rgl/index.html http://www.slideshare.net/quipo/rdbms-in-the-social-networks-age http://www.slideshare.net/PerconaPerformance/trees-and-more-with-post q-l Wednesday, May 9, 12

There has no any database can do everything for you.

NoSQL = Not Only SQL Wednesday, May 9, 12

Pick one you need Wednesday, May 9, 12

Where is Graph? Wednesday, May 9, 12

Graph Wednesday, May 9, 12

Graph Again! Wednesday, May 9, 12

It’s Everywhere! Wednesday, May 9, 12

What is Graph? Wednesday, May 9, 12

Deﬁnition G = (V, E), where V represents the set
of vertices (nodes) E represents the set of edges (links) Both vertices and edges may contain additional information Wednesday, May 9, 12

How we design graph? Wednesday, May 9, 12

We need to make Life Easier! Wednesday, May 9, 12

Traditional Graph • Graph : Neo4j, Dex, Sones, Inﬁnite, Allegro...
• Pros : Traverse is fast, easy. Modeling Less. • Cons : Scaleable? Fail-Over? Who use it? Wednesday, May 9, 12

RDF (Resource Description Framework) Subject Predicate (property) Object Wednesday, May
9, 12

RDF (Resource Description Framework) Subject Predicate (property) Object Examples :
1) <#i> <#love> <#you>. 2) <#i> <#10billion> <#wallet>. 3) <#i> have <#10billion> in my <#wallet>. 4) <#sara> <#age> 24; <#tall> 189 . <#andy> <#age> 22; <#tall> 180 . <#peter> <#age> 33; <#tall> 175 . Wednesday, May 9, 12

Edge & Vertex Wednesday, May 9, 12

Edge & Vertex my_friend my_friend started_from : 2010.05.07 via :
Web Meetup started_from 2011.03.05 via : Facebook Request A B Age: 23 Gender : male Job : Engineer Age: 18 Gender : female Job : Designer Wednesday, May 9, 12

Hyper Graph • Graph : HyperGraph, Neo4j • Pros :
Higher order, n-arys • Cons : who use it? Wednesday, May 9, 12

Map/Reduce Graph • Most BigData or Key/Value + Map/Reduce (
Hadoop , CouchDB , MongoDB, Cassandra, Riak ... ) • Pros : Easy to scale, Proven example (Hadoop) • Cons : Map/Reduce is hard to write, and it’s not really for graph processing Wednesday, May 9, 12

Show me the codes! Wednesday, May 9, 12

FoF who lives in Taipei also like Jazz Taipei M
Jazz Live Like Know Scenario Wednesday, May 9, 12

Introduce Riak • Document-Oriented Key/Value • OpenSource • Amazon Dynamo,
Riak-Ring Easy to scale • A(vailability) P(artition) & Eventually Consistency • HTTP / Protocol Buffer Interface • Written in Erlang ( stable & high concurrent ) http://wiki.basho.com Wednesday, May 9, 12

Link as Riak me.walk({:keep=>true,:tag=>"friend"}).walk({:keep=>true,:tag=>”friend”}) .walk({:keep=>true,:tag=>”lives”}).select{|city| city.name==”Taipei” } .walk({:keep=>true,:tag=>”likes”}).select{|like| like.name==”Jazz” }

Introduce Neo4j • Graph Data Model • OpenSource with GPL
v3 • Master/Slave ( Horizontal Scale is in plan ) • A(vailability) C(onsistency) I(solation) D(urability) • Various Clients & Cyper, Gremlin, Sparql Support • Written in Java, Cross Platform http://neo4j.org Wednesday, May 9, 12

Traverse in Neo4j me.outgoing(:friend).depth(2).filter{|path_to_m| path_to_m.end_node.outgoing(:lives).filter{|path_to_live| path_to_live==”Taipei” }.outgoing(:like).filter{|path_to_like| path_to_live==”Jazz” } }

Introduce Redis • In-Memory Key/Value ( persist by dump to
ﬁle disk ) • C(onsistency) , P(artition) & Some Hack A(vailability) • Master/Slave Replication • Sponsored by VmWare !!! • Various clients & Map/Reduce with Lua • Written in ANSI C , Works in Linux, *BSD, OSX http://redis.io Wednesday, May 9, 12

Redis Graph http://amix.dk/blog/post/19592 from redis_wrap import * #--- Edges ----------------------------------------------
def add_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) edges.add( to_node ) def delete_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) key_node_y = to_node if key_node_y in edges: edges.remove( key_node_y ) def has_edge(from_node, to_node, system='default'): edges = get_set( from_node, system=system ) return to_node in edges def neighbors(node_x, system='default'): return get_set( node_x, system=system ) #--- Node values ---------------------------------------------- def get_node_value(node_x, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).get( node_key ) def set_node_value(node_x, value, system='default'): node_key = 'nv:%s' % node_x return get_redis(system).set( node_key, value ) #--- Edge values ---------------------------------------------- def get_edge_value(edge_x, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).get( edge_key ) def set_edge_value(edge_x, value, system='default'): edge_key = 'ev:%s' % edge_x return get_redis(system).set( edge_key, value ) Wednesday, May 9, 12

SQL of Graph Wednesday, May 9, 12

Introduce Gremlin • syntax based on XPath • support all
graph dbs that implement the BluePrints • native support for Java, Scala, Groovy http://gremlin.tinkerpop.com/ Wednesday, May 9, 12

Some basis • g => Clinet • g.V => Vertexs
• g.E => Edges • g.id => identifier of element • .out(E/V) => outgoing vertices/edges • .in(E/V) => incoming vertices/edges • .both => both vertices/edges • .filter => filter with conditions • .has => allow if has property • .hasNot => allow if has no property • .back => back to n-steps results • .or => emit if any pipes • .and => emit if all pipes • .as => names the previous steps Wednesday, May 9, 12

Gremlin example g.v(21).as('I') .out('friend').as('my_friend') .out('friend').as('fof') .outE[[label:‘lives’]].inV[[name:‘Taipei’]] .outE[[label:‘likes’]].inV[[name:‘Jazz’]] http://www.youtube.com/watch?v=5wpTtEBK4-E Wednesday, May
9, 12

Introduce SPARQL • Inspired by SQL • RDF Compatible •
recommend by W3C with semantic web • standard by RDF Data Access Working Group http://www.w3.org/TR/rdf-sparql-query/ Wednesday, May 9, 12

Some basis Prefix Declarations : abbreviating URI namespaces e.g. Prefix
namespace Result Clause : what to return from query e.g. SELECT ?name ..... Query Pattern : specifying what to query in dataset e.g. WHERE { .... } Query Modifier : slicing , ordering , or any that rearranging results e.g. ORDER BY ... LIMIT ... OFFSET ... Variables : have a ‘?’ prepended e.g. ?name http://eneumann.org/talks/Sparql_tutorial.html#(1) Wednesday, May 9, 12

SPARQL example SELECT * WHERE{ ?member :friend ?f. ?f :friend
?fof. ?fof :live ?lives_in. ?fof :like ?likes. FILTER{?lives_in=”Taipei” && ?likes=”Jazz”} } Wednesday, May 9, 12

Cypher • Implemented in Scala with parser combinator • Only
for Neo4j • Inspired by SQL & SPARQL Wednesday, May 9, 12

Cyper basis START : Where to start? Can be a
(Node) ID, or Index MATCH : usually after START, for Traverse purpose WHERE : Filter the traversed results RETURN : the format of return Wednesday, May 9, 12

Cypher Example START von=node:node_auto_index(name = 'Von') MATCH von-[:friend]->()-[:friend]->fof, fof-[:lives]->city, fof-[:likes]->interest
WHERE city.name=‘Taipie’ AND interest.name=‘Jazz’ RETURN fof.name Wednesday, May 9, 12

Traverse Patterns • Backtrack • Except/Retain • Flow Rank •
Path • Loop • Split/Merge • Map/Reduce • Tree • Pattern Match Pattern Wednesday, May 9, 12

How about Map Reduce? Wednesday, May 9, 12

Key/Value with Graph? Wait!! You just told me graph is
easier to implement Key/Value Wednesday, May 9, 12

Reasons • Schema free • Scaling without pain • Map
the data and Reduce the complexity • Been proved Wednesday, May 9, 12

Facts Facebook - Hadoop cluster with more than 1PB data,
and 2TB new data daily Yahoo - Hadoop cluster with more than 4PB data with Webmap Google - More than 20PB data with Map/Reduce everyday. Wednesday, May 9, 12

Some Terms • Phase - A step within the job
• Job - Sequence of phases & inputs • Map - Data Collection phase • Shufﬂe and Sort - Global Sort • Reduce - Data Collection or processing phase Wednesday, May 9, 12

Map/Reduce Differences - distributed • Hadoop - across multiple machines
• Riak - across multiple machines • CouchDB - run over all docs in single database • MongoDB - not spread across multiple machines Wednesday, May 9, 12

Map/Reduce Differences - execution • Hadoop - parallels • Riak
- parallels • CouchDB - run over all docs in single database • MongoDB - not parallel Wednesday, May 9, 12

Map Reduce Limitation • Shared Global synchronization • Real time
• Small datasets http://mapreduce.me Wednesday, May 9, 12

MapReduce Alternative • Pregel - Inspired by Bulk Synchronous Parallel
• Dryad - Streaming process algorithm as arbitrary dataﬂow graphs Designed for large scale graph algorithms Mystery Acyclic graph Vertex - developer-speciﬁed computations Edges - data channels that capture dependencies Wednesday, May 9, 12

Graph Frameworks • Giraph • GraphLab • Phoebus • Golden
Orb • Signal/Collect • Spark • Piccolo • HaLoop Wednesday, May 9, 12

NoSQL Taiwan Need You http://nosql.org.tw Join Us : http://fb.nosql.org.tw Discussion
: http://hbase.nosql.org.tw http://mongo.nosql.org.tw http://redis.nosql.org.tw http://riak.nosql.org.tw http://couchdb.nosql.org.tw Looking for * Speaker * Contributor * Supports Wednesday, May 9, 12

We are hiring • Front-end / Back-end Engineer • UI
/ UX Designer Apply : von@vonstark.co Wednesday, May 9, 12

Q&A Wednesday, May 9, 12

Design the graph data with nosql

Design the graph data with nosql

More Decks by vonstark

Other Decks in Technology

Featured

Transcript