Shulhi Sapli
August 23, 2015
290

# Who owns what? Graph theory application @ PyCon MY 2015

Finding companies indirect ownership can be tricky. Look at how we make use of graph theory and linear algebra to solve real world application.

August 23, 2015

## Transcript

1. WHO OWNS WHAT?
A graph theory application.
@shulhi

2. Background
Mathematics & Actuarial Science
shulhi @ gmail, github, twitter, etc

3. Overview
1. Problems
2. Optimizations

4. Problems
Given list of companies and their respective
shareholders, find all companies’ effective
ownership.

5. Company Shareholder Share
Foo Sdn. Bhd A 50
Foo Sdn. Bhd Bar Sdn. Bhd 50
Bar Sdn. Bhd B 50
Bar Sdn. Bhd C 50

6. Foo
Bar A
B C
50%
50% 50%
50%
Effective ownership
own-a relationship

7. Foo
Bar A
B C
50%
50% 50%
50%
25%
25%
Effective ownership

8. Foo
Bar A
B C
50%
50% 50%
50%
25%
25%
Effective ownership

9. Trial & Error #1
Graph traversal

10. Graph traversal
1. Simple for linear relationship. Remember
our first example? That’s linear.
Foo
Bar A
B C
50%
50% 50%
50%

11. Graph traversal
Foo
Bar A
B C
33%
34%
50%
33%
50%

A -> Bar = 0.20481928
C -> Foo = 0.19879518
B -> Foo = 0.19879518

13. Graph traversal
3. Cycle relationship needs to be converted
into geometric series formula in order to be
correctly calculated.
Foo
Bar A
B C
33%
34%
50%
33%
50%

14. Graph traversal
4. Lots of cycle in real data :\

5. Lots of tracking to be done.

15. Graph traversal
5. From our runs, it took more than a week+
to calculate all the results.

16. Trial & Error #2
Graph traversal + The Matrix

17. Matrix: Revolutions
1. Model companies/shareholders’

2. Feed into given equation

3. Calculate the inverse

4. Profit!

18. Equation

19. Equation
where,
I is the identity matrix and A is the adjacency matrix
corresponding to the relationship of companies.

A C B Foo Bar
A 0 0 0 0.5 0
C 0 0 0 0 0.5
B 0 0 0 0 0.5
Foo 0 0 0 0 0
Bar 0 0 0 0.5 0
A =

21. Even works for cycles!

22. Optimizations

23. CPU Utilization
1. OpenBLAS - Multicore Numpy
Numpy - Make use of BLAS
BLAS - Low level linear algebra

24. CPU Utilization
work for our use case.

• Multi-thread - no true parallelization.

• Bottleneck is CPU not I/O bound

• Multi-process - One single process
already consuming lots of CPU. Lots
of context switching.

25. Memory usage
1. Iterative vs recursive algorithm

• Stack frame

• No support for tail-call optimization

2. del keyword in Python. Manual
management of object reference count.

26. Memory usage - del keyword

27. Numpy quirks
File  "/usr/local/lib/python2.7/dist-­‐packages/scipy/linalg/decomp_svd.py",  line  103,  in
svd
raise  LinAlgError("SVD  did  not  converge")
numpy.linalg.linalg.LinAlgError:  SVD  did  not  converge
• SVD does not converge.

• Moore-Penrose pseudo-inverse make use of
SVD. By definition, you can always find SVD.

• Numpy has low iteration limit hard-coded into
its source code.

• Will raise SVD did not converge if failed to
converge within this iteration limit.

• Refer file dlapack_lite.c

28. Data structures
• Know when to use Set vs List

• Lookup: Set O(1) vs List O(n)

• Numpy matrices format - sparse vs
dense

29. Algorithm
1. Re-frame the problem

2. Matrix inverse is always hard and cpu
intensive

• If we can’t invent algo that can do the
calculation in O(1), try to limit the n
• Because matrix inversion becomes
slower as n becoms larger

30. Algorithm - Limiting the n
• Reducing memory usage

• Reducing CPU utilization
n x n
n depends on the total companies/shareholders.
Assuming n is 50,000.
50,000 x 50,000 x 8 bytes = 160Gb of memory
usage just to hold data into memory.
A =

31. Algorithm - Limiting the n
1. Find smallest connected components
2. Calculate on each component

32. Algorithm - Limiting the n
Lots of
cyclic
nodes
}
Matrix approach
}
Use normal approach for each
connected component since
linear multiplication between
nodes are trivial in CPU cost
Lots of
cyclic
nodes

33. Result
Memory usage

~< 30Gb vs > 120Gb for our early trials & errors

Run time calculation

~< 3 hours vs > 1 week for graph traversal
approach

34. Reference
Dr. Ivan Keglević, Matrix Approach to the Calculation of
Indirect Quotas

http://web.math.pmf.unizg.hr/applmath99/245-251.pdf

35. We’re hiring!
Contact me at:

shulhi @ gmail, github, twitter, etc