Slide 1

Slide 1 text

Graph-Tool The Efficient Network Analyzing Tool for Python Mosky

Slide 2

Slide 2 text

Graph-Tool in Practice
 Mosky

Slide 3

Slide 3 text

MOSKY 3

Slide 4

Slide 4 text

MOSKY • Python Charmer at Pinkoi 3

Slide 5

Slide 5 text

MOSKY • Python Charmer at Pinkoi • An author of the Python packages: • MoSQL, Clime, Uniout, ZIPCodeTW, … 3

Slide 6

Slide 6 text

MOSKY • Python Charmer at Pinkoi • An author of the Python packages: • MoSQL, Clime, Uniout, ZIPCodeTW, … • A speaker of the conferences • 2014: PyCon APAC, OSDC; 2013: PyCon APAC, PyCon TW, COSCUP, … 3

Slide 7

Slide 7 text

MOSKY • Python Charmer at Pinkoi • An author of the Python packages: • MoSQL, Clime, Uniout, ZIPCodeTW, … • A speaker of the conferences • 2014: PyCon APAC, OSDC; 2013: PyCon APAC, PyCon TW, COSCUP, … • A Python instructor 3

Slide 8

Slide 8 text

MOSKY • Python Charmer at Pinkoi • An author of the Python packages: • MoSQL, Clime, Uniout, ZIPCodeTW, … • A speaker of the conferences • 2014: PyCon APAC, OSDC; 2013: PyCon APAC, PyCon TW, COSCUP, … • A Python instructor • mosky.tw 3

Slide 9

Slide 9 text

OUTLINE 4

Slide 10

Slide 10 text

OUTLINE • Introduction 4

Slide 11

Slide 11 text

OUTLINE • Introduction • Create Graph 4

Slide 12

Slide 12 text

OUTLINE • Introduction • Create Graph • Visualize Graph 4

Slide 13

Slide 13 text

OUTLINE • Introduction • Create Graph • Visualize Graph • Analyze Graph 4

Slide 14

Slide 14 text

OUTLINE • Introduction • Create Graph • Visualize Graph • Analyze Graph • Conclusion 4

Slide 15

Slide 15 text

INTRODUCTION

Slide 16

Slide 16 text

GRAPH-TOOL 6

Slide 17

Slide 17 text

GRAPH-TOOL • It's for analyzing graph. 6

Slide 18

Slide 18 text

GRAPH-TOOL • It's for analyzing graph. • Fast. It bases on 
 Boost Graph in C++. 6

Slide 19

Slide 19 text

GRAPH-TOOL • It's for analyzing graph. • Fast. It bases on 
 Boost Graph in C++. • Powerful visualization 6

Slide 20

Slide 20 text

GRAPH-TOOL • It's for analyzing graph. • Fast. It bases on 
 Boost Graph in C++. • Powerful visualization • Lot of useful algorithms 6

Slide 21

Slide 21 text

GET GRAPH-TOOL 7

Slide 22

Slide 22 text

GET GRAPH-TOOL • Super easy on Debian / Ubuntu • http://graph-tool.skewed.de/download#debian 7

Slide 23

Slide 23 text

GET GRAPH-TOOL • Super easy on Debian / Ubuntu • http://graph-tool.skewed.de/download#debian • Super hard on Mac • http://graph-tool.skewed.de/download#macos • Install the dependencies by homebrew and pip. 
 Then compile it from source. • Note it may take you 3~4 hours. I warned you! 7

Slide 24

Slide 24 text

CREATE GRAPH

Slide 25

Slide 25 text

BEFORE STARTING 9

Slide 26

Slide 26 text

BEFORE STARTING • Define your problem. 9

Slide 27

Slide 27 text

BEFORE STARTING • Define your problem. • Convert it into a graphic form. 9

Slide 28

Slide 28 text

BEFORE STARTING • Define your problem. • Convert it into a graphic form. • Parse raw data. 9

Slide 29

Slide 29 text

MY PROBLEM 10

Slide 30

Slide 30 text

MY PROBLEM • To improve the duration of an online marketplace. 10

Slide 31

Slide 31 text

MY PROBLEM • To improve the duration of an online marketplace. • What's product browsing flow that users prefer? 10

Slide 32

Slide 32 text

IN GRAPHIC FORM 11 What Weight Vertex Product Count Edge Directed Browsing Count

Slide 33

Slide 33 text

PARSING 12

Slide 34

Slide 34 text

PARSING • Regular expression • Filter garbages. 12

Slide 35

Slide 35 text

PARSING • Regular expression • Filter garbages. • Sorting 12

Slide 36

Slide 36 text

PARSING • Regular expression • Filter garbages. • Sorting • Pickle • HIGHEST_PROTOCOL • Use tuple to save space/time. • Save into serial files. 12

Slide 37

Slide 37 text

VERTEX AND EDGE import graph_tool.all as gt ! g = gt.Graph() v1 = g.add_vertex() v2 = g.add_vertex() e = g.add_edge(v1, v2) 13

Slide 38

Slide 38 text

PROPERTY v_count_p = g.new_vertex_property('int') ! # store it in our graph, optionally g.vp['count'] = v_count_p 14

Slide 39

Slide 39 text

FASTER IMPORT from graph_tool import Graph 15

Slide 40

Slide 40 text

COUNTING name_v_map = {} for name in names: v = name_v_map.get(name) if v is None: v = g.add_vertex() v_count_p[v] = 0 name_v_map[name] = v v_count_p[v] += 1 16

Slide 41

Slide 41 text

VISUALIZE GRAPH

Slide 42

Slide 42 text

THE SIMPLEST gt.graph_draw( g, output_path = 'output.pdf', ) ! gt.graph_draw( g, output_path = 'output.png', ) 18

Slide 43

Slide 43 text

19

Slide 44

Slide 44 text

USE CONSTANTS SIZE = 400 V_SIZE = SIZE / 20. E_PWIDTH = V_SIZE / 4. gt.graph_draw( … output_size = (SIZE, SIZE), vertex_size = V_SIZE, edge_pen_width = E_PWDITH, ) 20

Slide 45

Slide 45 text

21

Slide 46

Slide 46 text

USE PROP_TO_SIZE v_size_p = gt.prop_to_size( v_count_p, MI_V_SIZE, MA_V_SIZE, ) … gt.graph_draw( … vertex_size = v_size_p, edge_pen_width = e_pwidth_p, ) 22

Slide 47

Slide 47 text

23

Slide 48

Slide 48 text

USE FILL_COLOR gt.graph_draw( … vertex_fill_color = v_size_p, ) 24

Slide 49

Slide 49 text

25

Slide 50

Slide 50 text

ANALYZE GRAPH

Slide 51

Slide 51 text

CHOOSE AN ALGORITHM 27

Slide 52

Slide 52 text

CHOOSE AN ALGORITHM • Search algorithms • BFS search … 27

Slide 53

Slide 53 text

CHOOSE AN ALGORITHM • Search algorithms • BFS search … • Assessing graph topology • shortest path … 27

Slide 54

Slide 54 text

CHOOSE AN ALGORITHM • Search algorithms • BFS search … • Assessing graph topology • shortest path … • Centrality measures • pagerank, betweenness, closeness … 27

Slide 55

Slide 55 text

28

Slide 56

Slide 56 text

• Maximum flow algorithms 28

Slide 57

Slide 57 text

• Maximum flow algorithms • Community structures 28

Slide 58

Slide 58 text

• Maximum flow algorithms • Community structures • Clustering coefficients 28

Slide 59

Slide 59 text

CENTRALITY MEASURES 29

Slide 60

Slide 60 text

CENTRALITY MEASURES • Degree centrality • the number of links incident upon a node • the immediate risk of taking a node out 29

Slide 61

Slide 61 text

CENTRALITY MEASURES • Degree centrality • the number of links incident upon a node • the immediate risk of taking a node out • Closeness centrality • sum of a node's distances to all other nodes • the cost to spread information to all other nodes 29

Slide 62

Slide 62 text

30

Slide 63

Slide 63 text

• Betweenness centrality • the number of times a node acts as a bridge • the control of a node on the communication between other nodes 30

Slide 64

Slide 64 text

• Betweenness centrality • the number of times a node acts as a bridge • the control of a node on the communication between other nodes • Eigenvector centrality • the influence of a node in a network • Google's PageRank is a variant of the Eigenvector centrality measure 30

Slide 65

Slide 65 text

MY CHOICE 31

Slide 66

Slide 66 text

MY CHOICE • Centrality measures - Closeness centrality 31

Slide 67

Slide 67 text

MY CHOICE • Centrality measures - Closeness centrality • Get the products are easier to all other products. 31

Slide 68

Slide 68 text

CALCULATE CLOSENESS ! ! e_icount_p = g.new_edge_property('int') e_icount_p.a = e_count_p.a.max()-e_count_p.a ! v_cl_p = closeness(g, weight=e_icount_p) ! import numpy as np v_cl_p.a = np.nan_to_num(v_cl_p.a) 32

Slide 69

Slide 69 text

DRAW CLOSENESS v_cl_size_p = gt.prop_to_size( v_cl_p, MI_V_SIZE, MA_V_SIZE, ) … gt.graph_draw( … vertex_fill_color = v_cl_size_p, ) 33

Slide 70

Slide 70 text

34

Slide 71

Slide 71 text

ON THE FLY FILTERING ! v_pck_p = g.new_vertex_property('bool') v_pck_p.a = v_count_p.a > v_count_p.a.mean() ! g.set_vertex_filter(v_pck_p) # g.set_vertex_filter(None) # unset 35

Slide 72

Slide 72 text

36

Slide 73

Slide 73 text

TOP N t10_idxs = v_count_p.a.argsort()[-10:][::-1] ! t1_idx = t10_idxs[0] t1_v = g.vertex(t1_idx) t1_name = v_name_p[t1_v] t1_count = v_count_p[t1_v] 37

Slide 74

Slide 74 text

SFDF LAYOUT gt.graph_draw( … pos = gt.sfdp_layout(g), ) 38

Slide 75

Slide 75 text

39

Slide 76

Slide 76 text

gt.graph_draw( … pos = gt.sfdp_layout( g, eweight=e_count_p ), ) ! gt.graph_draw( … pos = gt.sfdp_layout( g, eweight=e_count_p, vweight=v_count_p ), ) 40

Slide 77

Slide 77 text

41

Slide 78

Slide 78 text

42

Slide 79

Slide 79 text

43

Slide 80

Slide 80 text

FR LAYOUT gt.graph_draw( … pos = gt.fruchterman_reingold_layout(g), ) ! gt.graph_draw( … pos = gt.fruchterman_reingold_layout( g, weight=e_count_p ), ) 44

Slide 81

Slide 81 text

45

Slide 82

Slide 82 text

46

Slide 83

Slide 83 text

47

Slide 84

Slide 84 text

ARF LAYOUT gt.graph_draw( … pos = gt.arf_layout(g), ) ! gt.graph_draw( … pos = gt.arf_layout( g, weight=e_count_p ), ) 48

Slide 85

Slide 85 text

49

Slide 86

Slide 86 text

50

Slide 87

Slide 87 text

51

Slide 88

Slide 88 text

MY GRAPH

Slide 89

Slide 89 text

53

Slide 90

Slide 90 text

CONCLUSION

Slide 91

Slide 91 text

55 CONCLUSION

Slide 92

Slide 92 text

• Define problem in graphic form. 55 CONCLUSION

Slide 93

Slide 93 text

• Define problem in graphic form. • Parse raw data. • Watch out! 
 Your data will bite you. → 55 CONCLUSION

Slide 94

Slide 94 text

• Define problem in graphic form. • Parse raw data. • Watch out! 
 Your data will bite you. → • Visualize to understand. 55 CONCLUSION

Slide 95

Slide 95 text

• Define problem in graphic form. • Parse raw data. • Watch out! 
 Your data will bite you. → • Visualize to understand. • Choose a proper algorithms. 55 CONCLUSION

Slide 96

Slide 96 text

• Define problem in graphic form. • Parse raw data. • Watch out! 
 Your data will bite you. → • Visualize to understand. • Choose a proper algorithms. • Filter data which interest you. 55 CONCLUSION

Slide 97

Slide 97 text

• Define problem in graphic form. • Parse raw data. • Watch out! 
 Your data will bite you. → • Visualize to understand. • Choose a proper algorithms. • Filter data which interest you. • Visualize again to convince. 55 CONCLUSION

Slide 98

Slide 98 text

• Define problem in graphic form. • Parse raw data. • Watch out! 
 Your data will bite you. → • Visualize to understand. • Choose a proper algorithms. • Filter data which interest you. • Visualize again to convince. • mosky.tw 55 CONCLUSION

Slide 99

Slide 99 text

DEMO

Slide 100

Slide 100 text

COSCUP 2014 2014.07.19 - 2014.07.20 | Academia Sinica, Taipei, Taiwan

Slide 101

Slide 101 text

LINKS • Quick start using graph-tool
 http://graph-tool.skewed.de/static/doc/quickstart.html • Learn more about Graph object
 http://graph-tool.skewed.de/static/doc/graph_tool.html • The possible property value types
 http://graph-tool.skewed.de/static/doc/ graph_tool.html#graph_tool.PropertyMap 58

Slide 102

Slide 102 text

• Graph drawing and layout
 http://graph-tool.skewed.de/static/doc/draw.html • Available subpackages - Graph-Tool
 http://graph-tool.skewed.de/static/doc/ graph_tool.html#available-subpackages • Centrality - Wiki
 http://en.wikipedia.org/wiki/Centrality • NumPy Reference
 http://docs.scipy.org/doc/numpy/reference/ 59