$30 off During Our Annual Pro Sale. View Details »

Graph-Tool in Practice

Graph-Tool in Practice

It was the talk, titled "Graph-Tool: The Efficient Network Analyzing Tool for Python", at PyCon APAC 2014 [1] and PyCon SG 2014 [2]. It introduces you to Graph-Tool by mass code snippets.

[1] https://tw.pycon.org/2014apac
[2] https://pycon.sg/

Mosky Liu

May 17, 2014
Tweet

More Decks by Mosky Liu

Other Decks in Programming

Transcript

  1. Graph-Tool
    The Efficient Network
    Analyzing Tool for Python

    Mosky

    View Slide

  2. Graph-Tool
    in Practice

    Mosky

    View Slide

  3. MOSKY
    3

    View Slide

  4. MOSKY
    • Python Charmer at Pinkoi
    3

    View Slide

  5. MOSKY
    • Python Charmer at Pinkoi
    • An author of the Python packages:

    • MoSQL, Clime, Uniout, ZIPCodeTW, …
    3

    View Slide

  6. MOSKY
    • Python Charmer at Pinkoi
    • An author of the Python packages:

    • MoSQL, Clime, Uniout, ZIPCodeTW, …
    • A speaker of the conferences

    • 2014: PyCon APAC, OSDC; 2013: PyCon APAC, PyCon TW, COSCUP, …
    3

    View Slide

  7. MOSKY
    • Python Charmer at Pinkoi
    • An author of the Python packages:

    • MoSQL, Clime, Uniout, ZIPCodeTW, …
    • A speaker of the conferences

    • 2014: PyCon APAC, OSDC; 2013: PyCon APAC, PyCon TW, COSCUP, …
    • A Python instructor
    3

    View Slide

  8. MOSKY
    • Python Charmer at Pinkoi
    • An author of the Python packages:

    • MoSQL, Clime, Uniout, ZIPCodeTW, …
    • A speaker of the conferences

    • 2014: PyCon APAC, OSDC; 2013: PyCon APAC, PyCon TW, COSCUP, …
    • A Python instructor
    • mosky.tw
    3

    View Slide

  9. OUTLINE
    4

    View Slide

  10. OUTLINE
    • Introduction
    4

    View Slide

  11. OUTLINE
    • Introduction
    • Create Graph
    4

    View Slide

  12. OUTLINE
    • Introduction
    • Create Graph
    • Visualize Graph
    4

    View Slide

  13. OUTLINE
    • Introduction
    • Create Graph
    • Visualize Graph
    • Analyze Graph
    4

    View Slide

  14. OUTLINE
    • Introduction
    • Create Graph
    • Visualize Graph
    • Analyze Graph
    • Conclusion
    4

    View Slide

  15. INTRODUCTION

    View Slide

  16. GRAPH-TOOL
    6

    View Slide

  17. GRAPH-TOOL
    • It's for analyzing graph.
    6

    View Slide

  18. GRAPH-TOOL
    • It's for analyzing graph.
    • Fast. It bases on 

    Boost Graph in C++.
    6

    View Slide

  19. GRAPH-TOOL
    • It's for analyzing graph.
    • Fast. It bases on 

    Boost Graph in C++.
    • Powerful visualization
    6

    View Slide

  20. GRAPH-TOOL
    • It's for analyzing graph.
    • Fast. It bases on 

    Boost Graph in C++.
    • Powerful visualization
    • Lot of useful algorithms
    6

    View Slide

  21. GET GRAPH-TOOL
    7

    View Slide

  22. GET GRAPH-TOOL
    • Super easy on Debian / Ubuntu

    • http://graph-tool.skewed.de/download#debian
    7

    View Slide

  23. GET GRAPH-TOOL
    • Super easy on Debian / Ubuntu

    • http://graph-tool.skewed.de/download#debian
    • Super hard on Mac

    • http://graph-tool.skewed.de/download#macos

    • Install the dependencies by homebrew and pip. 

    Then compile it from source.

    • Note it may take you 3~4 hours. I warned you!
    7

    View Slide

  24. CREATE GRAPH

    View Slide

  25. BEFORE STARTING
    9

    View Slide

  26. BEFORE STARTING
    • Define your problem.
    9

    View Slide

  27. BEFORE STARTING
    • Define your problem.
    • Convert it into a graphic form.
    9

    View Slide

  28. BEFORE STARTING
    • Define your problem.
    • Convert it into a graphic form.
    • Parse raw data.
    9

    View Slide

  29. MY PROBLEM
    10

    View Slide

  30. MY PROBLEM
    • To improve the duration of an online marketplace.
    10

    View Slide

  31. MY PROBLEM
    • To improve the duration of an online marketplace.
    • What's product browsing flow that users prefer?
    10

    View Slide

  32. IN GRAPHIC FORM
    11
    What Weight
    Vertex Product Count
    Edge
    Directed

    Browsing
    Count

    View Slide

  33. PARSING
    12

    View Slide

  34. PARSING
    • Regular expression

    • Filter garbages.
    12

    View Slide

  35. PARSING
    • Regular expression

    • Filter garbages.
    • Sorting
    12

    View Slide

  36. PARSING
    • Regular expression

    • Filter garbages.
    • Sorting
    • Pickle

    • HIGHEST_PROTOCOL

    • Use tuple to save space/time.

    • Save into serial files.
    12

    View Slide

  37. VERTEX AND EDGE
    import graph_tool.all as gt
    !
    g = gt.Graph()
    v1 = g.add_vertex()
    v2 = g.add_vertex()
    e = g.add_edge(v1, v2)
    13

    View Slide

  38. PROPERTY
    v_count_p = g.new_vertex_property('int')
    !
    # store it in our graph, optionally
    g.vp['count'] = v_count_p
    14

    View Slide

  39. FASTER IMPORT
    from graph_tool import Graph
    15

    View Slide

  40. COUNTING
    name_v_map = {}
    for name in names:
    v = name_v_map.get(name)
    if v is None:
    v = g.add_vertex()
    v_count_p[v] = 0
    name_v_map[name] = v
    v_count_p[v] += 1
    16

    View Slide

  41. VISUALIZE GRAPH

    View Slide

  42. THE SIMPLEST
    gt.graph_draw(
    g,
    output_path = 'output.pdf',
    )
    !
    gt.graph_draw(
    g,
    output_path = 'output.png',
    )
    18

    View Slide

  43. 19

    View Slide

  44. USE CONSTANTS
    SIZE = 400
    V_SIZE = SIZE / 20.
    E_PWIDTH = V_SIZE / 4.
    gt.graph_draw(

    output_size = (SIZE, SIZE),
    vertex_size = V_SIZE,
    edge_pen_width = E_PWDITH,
    )
    20

    View Slide

  45. 21

    View Slide

  46. USE PROP_TO_SIZE
    v_size_p = gt.prop_to_size(
    v_count_p,
    MI_V_SIZE,
    MA_V_SIZE,
    )

    gt.graph_draw(

    vertex_size = v_size_p,
    edge_pen_width = e_pwidth_p,
    )
    22

    View Slide

  47. 23

    View Slide

  48. USE FILL_COLOR
    gt.graph_draw(

    vertex_fill_color = v_size_p,
    )
    24

    View Slide

  49. 25

    View Slide

  50. ANALYZE GRAPH

    View Slide

  51. CHOOSE AN ALGORITHM
    27

    View Slide

  52. CHOOSE AN ALGORITHM
    • Search algorithms

    • BFS search …
    27

    View Slide

  53. CHOOSE AN ALGORITHM
    • Search algorithms

    • BFS search …
    • Assessing graph topology

    • shortest path …
    27

    View Slide

  54. CHOOSE AN ALGORITHM
    • Search algorithms

    • BFS search …
    • Assessing graph topology

    • shortest path …
    • Centrality measures

    • pagerank, betweenness, closeness …
    27

    View Slide

  55. 28

    View Slide

  56. • Maximum flow algorithms
    28

    View Slide

  57. • Maximum flow algorithms
    • Community structures
    28

    View Slide

  58. • Maximum flow algorithms
    • Community structures
    • Clustering coefficients
    28

    View Slide

  59. CENTRALITY MEASURES
    29

    View Slide

  60. CENTRALITY MEASURES
    • Degree centrality

    • the number of links incident upon a node

    • the immediate risk of taking a node out
    29

    View Slide

  61. CENTRALITY MEASURES
    • Degree centrality

    • the number of links incident upon a node

    • the immediate risk of taking a node out
    • Closeness centrality

    • sum of a node's distances to all other nodes

    • the cost to spread information to all other nodes
    29

    View Slide

  62. 30

    View Slide

  63. • Betweenness centrality

    • the number of times a node acts as a bridge

    • the control of a node on the communication
    between other nodes
    30

    View Slide

  64. • Betweenness centrality

    • the number of times a node acts as a bridge

    • the control of a node on the communication
    between other nodes
    • Eigenvector centrality

    • the influence of a node in a network

    • Google's PageRank is a variant of the Eigenvector
    centrality measure
    30

    View Slide

  65. MY CHOICE
    31

    View Slide

  66. MY CHOICE
    • Centrality measures - Closeness centrality
    31

    View Slide

  67. MY CHOICE
    • Centrality measures - Closeness centrality
    • Get the products are easier to all other products.
    31

    View Slide

  68. CALCULATE CLOSENESS
    !
    !
    e_icount_p = g.new_edge_property('int')
    e_icount_p.a = e_count_p.a.max()-e_count_p.a
    !
    v_cl_p = closeness(g, weight=e_icount_p)
    !
    import numpy as np
    v_cl_p.a = np.nan_to_num(v_cl_p.a)
    32

    View Slide

  69. DRAW CLOSENESS
    v_cl_size_p = gt.prop_to_size(
    v_cl_p,
    MI_V_SIZE,
    MA_V_SIZE,
    )

    gt.graph_draw(

    vertex_fill_color = v_cl_size_p,
    )
    33

    View Slide

  70. 34

    View Slide

  71. ON THE FLY FILTERING
    !
    v_pck_p = g.new_vertex_property('bool')
    v_pck_p.a = v_count_p.a > v_count_p.a.mean()
    !
    g.set_vertex_filter(v_pck_p)
    # g.set_vertex_filter(None) # unset
    35

    View Slide

  72. 36

    View Slide

  73. TOP N
    t10_idxs = v_count_p.a.argsort()[-10:][::-1]
    !
    t1_idx = t10_idxs[0]
    t1_v = g.vertex(t1_idx)
    t1_name = v_name_p[t1_v]
    t1_count = v_count_p[t1_v]
    37

    View Slide

  74. SFDF LAYOUT
    gt.graph_draw(

    pos = gt.sfdp_layout(g),
    )
    38

    View Slide

  75. 39

    View Slide

  76. gt.graph_draw(

    pos = gt.sfdp_layout(
    g, eweight=e_count_p
    ),
    )
    !
    gt.graph_draw(

    pos = gt.sfdp_layout(
    g,
    eweight=e_count_p, vweight=v_count_p
    ),
    )
    40

    View Slide

  77. 41

    View Slide

  78. 42

    View Slide

  79. 43

    View Slide

  80. FR LAYOUT
    gt.graph_draw(

    pos = gt.fruchterman_reingold_layout(g),
    )
    !
    gt.graph_draw(

    pos = gt.fruchterman_reingold_layout(
    g, weight=e_count_p
    ),
    )
    44

    View Slide

  81. 45

    View Slide

  82. 46

    View Slide

  83. 47

    View Slide

  84. ARF LAYOUT
    gt.graph_draw(

    pos = gt.arf_layout(g),
    )
    !
    gt.graph_draw(

    pos = gt.arf_layout(
    g, weight=e_count_p
    ),
    )
    48

    View Slide

  85. 49

    View Slide

  86. 50

    View Slide

  87. 51

    View Slide

  88. MY GRAPH

    View Slide

  89. 53

    View Slide

  90. CONCLUSION

    View Slide

  91. 55
    CONCLUSION

    View Slide

  92. • Define problem in graphic form.
    55
    CONCLUSION

    View Slide

  93. • Define problem in graphic form.
    • Parse raw data.

    • Watch out! 

    Your data will bite you. →
    55
    CONCLUSION

    View Slide

  94. • Define problem in graphic form.
    • Parse raw data.

    • Watch out! 

    Your data will bite you. →
    • Visualize to understand.
    55
    CONCLUSION

    View Slide

  95. • Define problem in graphic form.
    • Parse raw data.

    • Watch out! 

    Your data will bite you. →
    • Visualize to understand.
    • Choose a proper algorithms.
    55
    CONCLUSION

    View Slide

  96. • Define problem in graphic form.
    • Parse raw data.

    • Watch out! 

    Your data will bite you. →
    • Visualize to understand.
    • Choose a proper algorithms.
    • Filter data which interest you.
    55
    CONCLUSION

    View Slide

  97. • Define problem in graphic form.
    • Parse raw data.

    • Watch out! 

    Your data will bite you. →
    • Visualize to understand.
    • Choose a proper algorithms.
    • Filter data which interest you.
    • Visualize again to convince.
    55
    CONCLUSION

    View Slide

  98. • Define problem in graphic form.
    • Parse raw data.

    • Watch out! 

    Your data will bite you. →
    • Visualize to understand.
    • Choose a proper algorithms.
    • Filter data which interest you.
    • Visualize again to convince.
    • mosky.tw
    55
    CONCLUSION

    View Slide

  99. DEMO

    View Slide

  100. COSCUP 2014
    2014.07.19 - 2014.07.20 | Academia Sinica, Taipei, Taiwan

    View Slide

  101. LINKS
    • Quick start using graph-tool

    http://graph-tool.skewed.de/static/doc/quickstart.html

    • Learn more about Graph object

    http://graph-tool.skewed.de/static/doc/graph_tool.html

    • The possible property value types

    http://graph-tool.skewed.de/static/doc/
    graph_tool.html#graph_tool.PropertyMap
    58

    View Slide

  102. • Graph drawing and layout

    http://graph-tool.skewed.de/static/doc/draw.html

    • Available subpackages - Graph-Tool

    http://graph-tool.skewed.de/static/doc/
    graph_tool.html#available-subpackages

    • Centrality - Wiki

    http://en.wikipedia.org/wiki/Centrality

    • NumPy Reference

    http://docs.scipy.org/doc/numpy/reference/
    59

    View Slide