Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Large scale graph processing with apache giraph
Search
André Kelpe
May 23, 2012
Programming
2
5.5k
Large scale graph processing with apache giraph
André Kelpe
May 23, 2012
Tweet
Share
More Decks by André Kelpe
See All by André Kelpe
Cascading 3 and beyond
fs111
0
180
The Cascading (big) data application framework
fs111
1
240
SELECT ALL THE THINGS - Cascading Lingual, ANSI SQL for Apache Hadoop
fs111
0
210
A whirlwind tour through Lingual: ANSI SQL for Apache Hadoop
fs111
1
190
Tor for everyone!
fs111
0
200
Other Decks in Programming
See All in Programming
AIコードレビューがチームの"文脈"を 読めるようになるまで
marutaku
0
310
ID管理機能開発の裏側 高速にSaaS連携を実現したチームのAI活用編
atzzcokek
0
190
「文字列→日付」の落とし穴 〜Ruby Date.parseの意外な挙動〜
sg4k0
0
360
TVerのWeb内製化 - 開発スピードと品質を両立させるまでの道のり
techtver
PRO
3
1.4k
20251127_ぼっちのための懇親会対策会議
kokamoto01_metaps
2
400
非同期処理の迷宮を抜ける: 初学者がつまづく構造的な原因
pd1xx
1
570
TUIライブラリつくってみた / i-just-make-TUI-library
kazto
1
310
Google Antigravity and Vibe Coding: Agentic Development Guide
mickey_kubo
2
130
DSPy Meetup Tokyo #1 - はじめてのDSPy
masahiro_nishimi
1
150
Socio-Technical Evolution: Growing an Architecture and Its Organization for Fast Flow
cer
PRO
0
260
CSC305 Lecture 15
javiergs
PRO
0
250
配送計画の均等化機能を提供する取り組みについて(⽩⾦鉱業 Meetup Vol.21@六本⽊(数理最適化編))
izu_nori
0
120
Featured
See All Featured
Bash Introduction
62gerente
615
210k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.4k
Navigating Team Friction
lara
191
16k
Optimising Largest Contentful Paint
csswizardry
37
3.5k
Side Projects
sachag
455
43k
Context Engineering - Making Every Token Count
addyosmani
9
460
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
55
3.1k
What's in a price? How to price your products and services
michaelherold
246
12k
Code Reviewing Like a Champion
maltzj
527
40k
YesSQL, Process and Tooling at Scale
rocio
174
15k
Transcript
Large scale graph processing with apache giraph André Kelpe @fs111
http://kel.pe
graphs 101
vertices and edges
v2 v5 v4 v7 v3 v8 v6 v1 v9 v8
v10 simple graph
graphs are everywhere road network, the www, social graphs etc.
graphs can be huge
google knows!
Pregel
Pregel by google Describes graph processing approach based on BSP
(Bulk Synchronous Parallel)
pro-tip: search for „pregel_paper.pdf“ on github ;-)
Properties of Pregel batch-oriented, scalable, fault tolerant processing of graphs
It is not a graph database It is a processing
framework
BSP vertex centric processing in so called supersteps
BSP vertices send messages to each other
BSP synchronization points between supersteps
execution of superstep S Each vertex processes messages generated in
S-1 and send messages to be processed in S+1 and determines to halt.
None
apache giraph
giraph Loose implementation of Pregel ideas on top of Hadoop
M/R coming from yahoo
apache giraph http://incubator.apache.org/giraph/
giraph avoid overhead of classic M/R process but reuse existing
infrastructure
giraph simple map jobs in master worker setup. coordination via
zookeeper. messaging via own RPC protocol. in memory processing. custom input and output formats.
current status version 0.1 released compatible with a multitude of
hadoop versions (we use CDH3 at work) still lots of things to do, join the fun!
the APIs the APIs
Vertex-API /** *@param <I> vertex id * @param <V> vertex
data * @param <E> edge data * @param <M> message data */ class BasicVertex<I extends WritableComparable, V extends Writable, E extends Writable, M extends Writable> void compute(Iterator<M> msgIterator); void sendMsg(I id, M msg); void voteToHalt();
Shortest path example https://cwiki.apache.org/confl uence/display/GIRAPH/Shorte st+Paths+Example
v2 v5 v4 v7 v3 v8 v6 v1 v9 v8
v10 simple graph
private boolean isSource() { return (getVertexId().get() == getContext().getConfiguration().getLong(SOURCE_ID, SOURCE_ID_DEFAULT)); }
@Override public void compute(Iterator<DoubleWritable> msgIterator) { if (getSuperstep() == 0) { setVertexValue(new DoubleWritable(Double.MAX_VALUE)); } double minDist = isSource() ? 0d : Double.MAX_VALUE; while (msgIterator.hasNext()) { minDist = Math.min(minDist, msgIterator.next().get()); } if (minDist < getVertexValue().get()) { setVertexValue(new DoubleWritable(minDist)); for (Edge<LongWritable, FloatWritable> edge : getOutEdgeMap().values()) { sendMsg(edge.getDestVertexId(), new DoubleWritable(minDist + edge.getEdgeValue().get())); } } voteToHalt(); }
GiraphJob job = new GiraphJob(getConf(), getClass().getName()); job.setVertexClass(SimpleShortestPathVertex.class); job.setVertexInputFormatClass(SimpleShortestPathsVertexInputFormat.class); job.setVertexOutputFormatClass( SimpleShortestPathsVertexOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(„/foo/bar/baz“)); FileOutputFormat.setOutputPath(job, new Path(„/foo/bar/quux“)); job.getConfiguration().setLong(SimpleShortestPathsVertex.SOURCE_ID, Long.parseLong(argArray[2])); job.setWorkerConfiguration(minWorkers, maxWorkers), 100.0f); GiraphJob
see also http://incubator.apache.org/giraph/ https://cwiki.apache.org/confluence/displ ay/GIRAPH/Shortest+Paths+Example http://googleresearch.blogspot.com/2009/ 06/large-scale-graph-computing-at- google.html
Thanks! Questions?