Slide 1

Slide 1 text

CellIQ: Real-Time Cellular Network Analytics at Scale Anand Iyer#, Li Erran Li+, Ion Stoica# #UC Berkeley +Bell Labs

Slide 2

Slide 2 text

Cellular Networks have been seeing exponential growth and become part of our lives

Slide 3

Slide 3 text

Image courtesy: Alcatel-Lucent

Slide 4

Slide 4 text

What is needed to solve these issues? Are some regions in the network hotspots? - Better load balancing How is user traffic moving in the network? - Better resource provisioning What are the popular handoff sequences? - Troubleshoot handoff related problems

Slide 5

Slide 5 text

Cellular Network Analytics Today

Slide 6

Slide 6 text

Cellular Network Analytics Today

Slide 7

Slide 7 text

Cellular Network Analytics Today

Slide 8

Slide 8 text

Problem Existing cellular network analytic systems do not support advanced analytic tasks in an efficient manner.

Slide 9

Slide 9 text

High Velocity Data Continuous Monitoring Advanced Tasks Timely Spatio-Temporal Analysis Challenges

Slide 10

Slide 10 text

CellIQ is a cellular network analytics system that supports rich analysis tasks efficiently by leveraging domain-specific optimizations

Slide 11

Slide 11 text

Cellular Data as Time-Evolving Graphs Tasks easily expressed in graphs: Hotspot computation è Connected components Handoff sequences & User traffic è Pregel model Edge Property Vertex Property BS1 UE2 UE1 BS2 UE3 UE4 UE5

Slide 12

Slide 12 text

Why Not Use a Graph Parallel Framework? �� �� ��� ��� ��� ��� ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ Fails to produce results! Domain specific optimizations key for efficient analysis

Slide 13

Slide 13 text

CellIQ Implementation *Gonzales. et.al. “GraphX: Graph Processing in a Distributed Dataflow Framework”, OSDI 2014 Implemented as a layer on GraphX* Incorporates several domain specific optimizations GraphX Spark Pregel API PageRank Connected Comp. K-core Triangle Count LDA SVD++ CellIQ

Slide 14

Slide 14 text

Computational Model BS1 UE2 UE1 BS2 UE3 UE4 UE5

Slide 15

Slide 15 text

Computational Model BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5

Slide 16

Slide 16 text

Computational Model BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5

Slide 17

Slide 17 text

Computational Model: GStreams BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 Domain specific graph partitioning Spatial operations Window operations

Slide 18

Slide 18 text

Computational Model: GStreams BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 Domain specific graph partitioning Spatial operations Window operations

Slide 19

Slide 19 text

Graph computation frameworks rely on partitioning to minimize communication & balance computation   B C A D F E A D D B C D E A A F Machine 1 Machine 2 A B C D E F Graph Partitioning

Slide 20

Slide 20 text

Partition geographically close-by entities   Machine 3 Machine 4 3 B C B C D E A F Machine 1 Machine 2 CellIQ Graph Partitioning G H 2D 1D ?

Slide 21

Slide 21 text

3 Machine 3 Machine 4 B C B C D E A F Machine 1 Machine 2 A B C D E F Graph Partitioning G H G H Random (hashed) partitioning

Slide 22

Slide 22 text

3 Machine 3 Machine 4 B C B C D E A F Machine 1 Machine 2 A B C D E F Graph Partitioning G H G H Random (hashed) partitioning results in poor spatial locality

Slide 23

Slide 23 text

Machine 3 Machine 4 B C B C D E A F Machine 1 Machine 2 CellIQ Graph Partitioning G H Uses Hilbert space-filling curves

Slide 24

Slide 24 text

Machine 3 Machine 4 0 3 2 1 B C B C D E A F Machine 1 Machine 2 CellIQ Graph Partitioning G H Uses Hilbert space-filling curves Use curve’s distance as the 1-dimensional key

Slide 25

Slide 25 text

Machine 3 Machine 4 0 3 2 1 B C B C D E A F Machine 1 Machine 2 A B C D E F CellIQ Graph Partitioning G H G H Uses Hilbert space-filling curves Use curve’s distance as the 1-dimensional key Range partition the key space

Slide 26

Slide 26 text

0 1 2 3 4 7 6 5 8 11 10 9 14 15 12 13 Machine 3 Machine 4 B C B C D E A F Machine 1 Machine 2 A B C D E F CellIQ Graph Partitioning G H G H Uses Hilbert space-filling curves Use curve’s distance as the 1-dimensional key Range partition the key space

Slide 27

Slide 27 text

Computational Model: GStreams BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 Domain specific graph partitioning Spatial operations Window operations

Slide 28

Slide 28 text

GeoGraph API class  GeoGraph[V,  E]  {      //  Broadcast  a  message  to  all        //  vertices  within  a  radius      def  sendMsg(radius)            //  Create  a  spatially  aggregated        //  graph  by  combining  vertices          //  and  edges        def  spatialAG(reduceV:  (V,  V)  =>  V,                                  reduceE:  (E,  E)  =>  E)   }  

Slide 29

Slide 29 text

Tracking user traffic gradients Goal: Detect and track direction of movement of user groups

Slide 30

Slide 30 text

3 B C A D F E A D D B C D E A A F Tracking user traffic gradients Base Station

Slide 31

Slide 31 text

3 B C A D F E A D D B C D E A A F Tracking user traffic gradients

Slide 32

Slide 32 text

B C A D F E A D D B C D E A A F Hop-by-hop propagation Tracking user traffic gradients

Slide 33

Slide 33 text

B C A D F E A D D B C D E A A F Hop-by-hop propagation is inefficient Tracking user traffic gradients

Slide 34

Slide 34 text

Tracking user traffic gradients B C A D F E A D D B C D E A A F Instead, CellIQ enables radius based broadcast

Slide 35

Slide 35 text

Part. 2 Part. 1 Vertex Table (RDD) B C A D F E A D Routing Table in GraphX enables Multicast D B C D E A A F Machine 1 Machine 2 Edge Table (RDD) A B A C C D B C A E A F E F E D B C D E A F Routing Table (RDD) B C D E A F 1   2   1   2   1   2   1   2   Slide courtesy: Joey Gonzales

Slide 36

Slide 36 text

Routing Table (RDD) B C D E A F 1   2   1   2   1   2   1   2   Part. 2 Part. 1 Vertex Table (RDD) B C A D F E A D D B C D E A A F Machine 1 Machine 2 Edge Table (RDD) A B A C C D B C A E A F E F E D B C D E A F Slide courtesy: Joey Gonzales Can compute destination partitions easily due to the use of geo-partitioner

Slide 37

Slide 37 text

GeoGraph API class  GeoGraph[V,  E]  {      //  Broadcast  a  message  to  all        //  vertices  within  a  radius      def  sendMsg(radius)            //  Create  a  spatially  aggregated        //  graph  by  combining  vertices          //  and  edges        def  spatialAG(reduceV:  (V,  V)  =>  V,                                  reduceE:  (E,  E)  =>  E)   }  

Slide 38

Slide 38 text

B C A D F E A D D B C D E A A F Spatial Clustering F E D D B’ F Goal: Combine spatially close-by vertices

Slide 39

Slide 39 text

Spatial Clustering Two ways to enable spatial aggregation: - Using a (supplied) field in properties - Leverage geo partitioner 00   01   02   03   10   13   12   11   20   23   22   21   32   33   30   31  

Slide 40

Slide 40 text

Spatial Clustering Two ways to enable spatial aggregation: - Using a (supplied) field in properties - Leverage geo partitioner 00   01   02   03   10   13   12   11   20   23   22   21   32   33   30   31   0   3   2   1  

Slide 41

Slide 41 text

Computational Model: GStreams BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 Domain specific graph partitioning Spatial operations Window operations

Slide 42

Slide 42 text

Tracking Persistent Hotspots Goal: Detect and track groups of base stations with high traffic volume Equivalent to finding connected components

Slide 43

Slide 43 text

Tracking Persistent Hotspots BS1 BS2 BS3 t1 t2 t3 W Combining graphs at the end of the window results in many join operations (inefficient) BS1 BS2 BS1 BS2

Slide 44

Slide 44 text

Tracking Persistent Hotspots BS1 BS2 BS3 t1 t2 t3 W BS1 BS2 BS1 BS2 BS1 BS2 BS3 1 1 1 BS1 BS2 BS3 2 1 1 BS1 BS2 BS3 3 1 1 Apply incremental updates to a cumulative graph

Slide 45

Slide 45 text

Tracking Persistent Hotspots BS1 BS2 BS3 t1 t2 t3 BS1 BS2 BS1 BS2 BS1 BS2 BS3 1 1 1 Apply differential updates to a cumulative graph BS1 BS3 t4 BS1 BS2 BS3 1 2 1 BS1 BS2 BS3 1 3 1 BS1 BS2 BS3 1 2 0

Slide 46

Slide 46 text

GStream API class  GStream[V,  E]  {        def  graphReduceByWindow(          reduceFunc(Graph[V,  E],  Graph[V,  E],                                  fv:  (V,  V)  =>  V,                                  fe:  (E,  E)  =>  E):  Graph[V,  E],            invReduceFunc(Graph[V,  E],  Graph[V,  E],                                  fv:  (V,  V)  =>  V,                                  fe:  (E,  E)  =>  E):  Graph[V,  E],            windowDuration,  slideDuration)   }  

Slide 47

Slide 47 text

graphReduceByWindow     •  Implemented using Spark’s cogroupedRDD   •  Two default reduce functions: graph intersection and union •  Further optimizations: – Co-partition graphs from multiple batches – Reuse indices and routing tables for graphs in the same window More details in the paper!

Slide 48

Slide 48 text

How does CellIQ perform?

Slide 49

Slide 49 text

Evaluation Setup •  LTE control plane data from a major cellular network operator •  1 million+ subscribers, live network •  2 TB data from 1 week – 1 file per minute, 750k records, 100s of fields/line – 10 collection points, 10 hours per day •  Implemented several analysis tasks

Slide 50

Slide 50 text

Benefits of Geo-partitioning �� �� ��� ��� ��� ��� ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ����������������

Slide 51

Slide 51 text

Benefits of Geo-partitioning �� �� ��� ��� ��� ��� ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ���������������� Small amount of data, movement not noticeable Default practitioner fails to produce results

Slide 52

Slide 52 text

Benefits of Incremental Updates �� �� ��� ��� ��� ��� ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ���������������� ����������������������������� �������������������������������

Slide 53

Slide 53 text

Benefits of Incremental Updates �� �� ��� ��� ��� ��� ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ���������������� ����������������������������� ������������������������������� 2 – 5X improvements

Slide 54

Slide 54 text

Benefits of Incremental Updates �� �� ��� ��� ��� ��� ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ���������������� ����������������������������� ������������������������������� window size affects performance

Slide 55

Slide 55 text

Benefits of Differential Updates �� �� �� �� �� ��� �� �� �� �� �� ��� ��� ����������������� ���������������� �������� ������

Slide 56

Slide 56 text

Benefits of Differential Updates �� �� �� �� �� ��� �� �� �� �� �� ��� ��� ����������������� ���������������� �������� ������ Larger windows see bigger benefits Graceful degradation in performance

Slide 57

Slide 57 text

Benefits of Radius-based Broadcast �� ���� ���� ���� ���� ���� ���� �������� ���������� ������ ����� ����������������� ������������ ���������������������� ��� � �� ��� ���������� ��� � �� ���

Slide 58

Slide 58 text

Benefits of Radius-based Broadcast �� ���� ���� ���� ���� ���� ���� �������� ���������� ������ ����� ����������������� ������������ ���������������������� ��� � �� ��� ���������� ��� � �� ��� Larger datasets result in increase in messages exchanges per hop

Slide 59

Slide 59 text

CellIQ is a cellular network analytics system that uses domain-specific optimizations to achieve 2x to 5x improvements

Slide 60

Slide 60 text

CellIQ is a cellular network analytics system that uses domain-specific optimizations to achieve 2x to 5x improvements Ongoing Work: • Using techniques in CellIQ to perform root-cause analysis on operational LTE Networks • Generalized streaming graph analysis techniques