Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Large scale graph processing with apache giraph
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
André Kelpe
May 23, 2012
Programming
5.6k
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Large scale graph processing with apache giraph
André Kelpe
May 23, 2012
More Decks by André Kelpe
See All by André Kelpe
Cascading 3 and beyond
fs111
0
190
The Cascading (big) data application framework
fs111
1
250
SELECT ALL THE THINGS - Cascading Lingual, ANSI SQL for Apache Hadoop
fs111
0
220
A whirlwind tour through Lingual: ANSI SQL for Apache Hadoop
fs111
1
200
Tor for everyone!
fs111
0
210
Other Decks in Programming
See All in Programming
Performance Engineering for Everyone
elenatanasoiu
0
230
ADKを使って簡単にAIエージェントを作ってみよう
k1mu21
0
280
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
600
Agentic UI
manfredsteyer
PRO
0
200
ローカルLLMを使ってB2Bサービスを作っていての学び
yaotti
0
220
Vite+ Unified Toolchain for the Web
naokihaba
0
360
エージェンティックRAGにAWSで入門しよう!
har1101
9
1.8k
任せる範囲はこう広がった / How the Scope of AI Delegation Has Expanded
nrslib
0
170
スマートグラスで並列バイブコーディング
hyshu
0
260
Make SRE Operations Easier with Azure SRE Agent
kkamegawa
0
8.7k
Observability in Practice:Grafana 與 Edge Device SRE 的那些事
blueswen
0
180
OSもどきOS
arkw
0
600
Featured
See All Featured
How Software Deployment tools have changed in the past 20 years
geshan
0
34k
Facilitating Awesome Meetings
lara
57
7k
Into the Great Unknown - MozCon
thekraken
41
2.6k
エンジニアに許された特別な時間の終わり
watany
107
250k
The World Runs on Bad Software
bkeepers
PRO
72
12k
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
370
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
180
Google's AI Overviews - The New Search
badams
0
1.1k
Design of three-dimensional binary manipulators for pick-and-place task avoiding obstacles (IECON2024)
konakalab
0
470
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.4k
Exploring anti-patterns in Rails
aemeredith
3
430
Claude Code のすすめ
schroneko
67
230k
Transcript
Large scale graph processing with apache giraph André Kelpe @fs111
http://kel.pe
graphs 101
vertices and edges
v2 v5 v4 v7 v3 v8 v6 v1 v9 v8
v10 simple graph
graphs are everywhere road network, the www, social graphs etc.
graphs can be huge
google knows!
Pregel
Pregel by google Describes graph processing approach based on BSP
(Bulk Synchronous Parallel)
pro-tip: search for „pregel_paper.pdf“ on github ;-)
Properties of Pregel batch-oriented, scalable, fault tolerant processing of graphs
It is not a graph database It is a processing
framework
BSP vertex centric processing in so called supersteps
BSP vertices send messages to each other
BSP synchronization points between supersteps
execution of superstep S Each vertex processes messages generated in
S-1 and send messages to be processed in S+1 and determines to halt.
None
apache giraph
giraph Loose implementation of Pregel ideas on top of Hadoop
M/R coming from yahoo
apache giraph http://incubator.apache.org/giraph/
giraph avoid overhead of classic M/R process but reuse existing
infrastructure
giraph simple map jobs in master worker setup. coordination via
zookeeper. messaging via own RPC protocol. in memory processing. custom input and output formats.
current status version 0.1 released compatible with a multitude of
hadoop versions (we use CDH3 at work) still lots of things to do, join the fun!
the APIs the APIs
Vertex-API /** *@param <I> vertex id * @param <V> vertex
data * @param <E> edge data * @param <M> message data */ class BasicVertex<I extends WritableComparable, V extends Writable, E extends Writable, M extends Writable> void compute(Iterator<M> msgIterator); void sendMsg(I id, M msg); void voteToHalt();
Shortest path example https://cwiki.apache.org/confl uence/display/GIRAPH/Shorte st+Paths+Example
v2 v5 v4 v7 v3 v8 v6 v1 v9 v8
v10 simple graph
private boolean isSource() { return (getVertexId().get() == getContext().getConfiguration().getLong(SOURCE_ID, SOURCE_ID_DEFAULT)); }
@Override public void compute(Iterator<DoubleWritable> msgIterator) { if (getSuperstep() == 0) { setVertexValue(new DoubleWritable(Double.MAX_VALUE)); } double minDist = isSource() ? 0d : Double.MAX_VALUE; while (msgIterator.hasNext()) { minDist = Math.min(minDist, msgIterator.next().get()); } if (minDist < getVertexValue().get()) { setVertexValue(new DoubleWritable(minDist)); for (Edge<LongWritable, FloatWritable> edge : getOutEdgeMap().values()) { sendMsg(edge.getDestVertexId(), new DoubleWritable(minDist + edge.getEdgeValue().get())); } } voteToHalt(); }
GiraphJob job = new GiraphJob(getConf(), getClass().getName()); job.setVertexClass(SimpleShortestPathVertex.class); job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class); job.setVertexOutputFormatClass( SimpleShortestPathsVertexOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(„/foo/bar/baz“)); FileOutputFormat.setOutputPath(job, new Path(„/foo/bar/quux“)); job.getConfiguration().setLong(SimpleShortestPathsVertex.SOURCE_ID, Long.parseLong(argArray[2])); job.setWorkerConfiguration(minWorkers, maxWorkers), 100.0f); GiraphJob
see also http://incubator.apache.org/giraph/ https://cwiki.apache.org/confluence/displ ay/GIRAPH/Shortest+Paths+Example http://googleresearch.blogspot.com/2009/ 06/large-scale-graph-computing-at- google.html
Thanks! Questions?