Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CellIQ: Real-Time Cellular Network Analytics at...

CellIQ: Real-Time Cellular Network Analytics at Scale

Presented at NSDI 2015

Anand Iyer

May 05, 2015
Tweet

More Decks by Anand Iyer

Other Decks in Research

Transcript

  1. CellIQ: Real-Time Cellular Network Analytics at Scale Anand Iyer#, Li

    Erran Li+, Ion Stoica# #UC Berkeley +Bell Labs
  2. What is needed to solve these issues? Are some regions

    in the network hotspots? - Better load balancing How is user traffic moving in the network? - Better resource provisioning What are the popular handoff sequences? - Troubleshoot handoff related problems
  3. CellIQ is a cellular network analytics system that supports rich

    analysis tasks efficiently by leveraging domain-specific optimizations
  4. Cellular Data as Time-Evolving Graphs Tasks easily expressed in graphs:

    Hotspot computation è Connected components Handoff sequences & User traffic è Pregel model Edge Property Vertex Property BS1 UE2 UE1 BS2 UE3 UE4 UE5
  5. Why Not Use a Graph Parallel Framework? �� �� ���

    ��� ��� ��� ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ Fails to produce results! Domain specific optimizations key for efficient analysis
  6. CellIQ Implementation *Gonzales. et.al. “GraphX: Graph Processing in a Distributed

    Dataflow Framework”, OSDI 2014 Implemented as a layer on GraphX* Incorporates several domain specific optimizations GraphX Spark Pregel API PageRank Connected Comp. K-core Triangle Count LDA SVD++ CellIQ
  7. Computational Model BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1

    UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5
  8. Computational Model: GStreams BS1 UE2 UE1 BS2 UE3 UE4 UE5

    BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 Domain specific graph partitioning Spatial operations Window operations
  9. Computational Model: GStreams BS1 UE2 UE1 BS2 UE3 UE4 UE5

    BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 Domain specific graph partitioning Spatial operations Window operations
  10. Graph computation frameworks rely on partitioning to minimize communication &

    balance computation   B C A D F E A D D B C D E A A F Machine 1 Machine 2 A B C D E F Graph Partitioning
  11. Partition geographically close-by entities   Machine 3 Machine 4 3

    B C B C D E A F Machine 1 Machine 2 CellIQ Graph Partitioning G H 2D 1D ?
  12. 3 Machine 3 Machine 4 B C B C D

    E A F Machine 1 Machine 2 A B C D E F Graph Partitioning G H G H Random (hashed) partitioning
  13. 3 Machine 3 Machine 4 B C B C D

    E A F Machine 1 Machine 2 A B C D E F Graph Partitioning G H G H Random (hashed) partitioning results in poor spatial locality
  14. Machine 3 Machine 4 B C B C D E

    A F Machine 1 Machine 2 CellIQ Graph Partitioning G H Uses Hilbert space-filling curves
  15. Machine 3 Machine 4 0 3 2 1 B C

    B C D E A F Machine 1 Machine 2 CellIQ Graph Partitioning G H Uses Hilbert space-filling curves Use curve’s distance as the 1-dimensional key
  16. Machine 3 Machine 4 0 3 2 1 B C

    B C D E A F Machine 1 Machine 2 A B C D E F CellIQ Graph Partitioning G H G H Uses Hilbert space-filling curves Use curve’s distance as the 1-dimensional key Range partition the key space
  17. 0 1 2 3 4 7 6 5 8 11

    10 9 14 15 12 13 Machine 3 Machine 4 B C B C D E A F Machine 1 Machine 2 A B C D E F CellIQ Graph Partitioning G H G H Uses Hilbert space-filling curves Use curve’s distance as the 1-dimensional key Range partition the key space
  18. Computational Model: GStreams BS1 UE2 UE1 BS2 UE3 UE4 UE5

    BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 Domain specific graph partitioning Spatial operations Window operations
  19. GeoGraph API class  GeoGraph[V,  E]  {      //  Broadcast

     a  message  to  all        //  vertices  within  a  radius      def  sendMsg(radius)            //  Create  a  spatially  aggregated        //  graph  by  combining  vertices          //  and  edges        def  spatialAG(reduceV:  (V,  V)  =>  V,                                  reduceE:  (E,  E)  =>  E)   }  
  20. 3 B C A D F E A D D

    B C D E A A F Tracking user traffic gradients Base Station
  21. 3 B C A D F E A D D

    B C D E A A F Tracking user traffic gradients
  22. B C A D F E A D D B

    C D E A A F Hop-by-hop propagation Tracking user traffic gradients
  23. B C A D F E A D D B

    C D E A A F Hop-by-hop propagation is inefficient Tracking user traffic gradients
  24. Tracking user traffic gradients B C A D F E

    A D D B C D E A A F Instead, CellIQ enables radius based broadcast
  25. Part. 2 Part. 1 Vertex Table (RDD) B C A

    D F E A D Routing Table in GraphX enables Multicast D B C D E A A F Machine 1 Machine 2 Edge Table (RDD) A B A C C D B C A E A F E F E D B C D E A F Routing Table (RDD) B C D E A F 1   2   1   2   1   2   1   2   Slide courtesy: Joey Gonzales
  26. Routing Table (RDD) B C D E A F 1

      2   1   2   1   2   1   2   Part. 2 Part. 1 Vertex Table (RDD) B C A D F E A D D B C D E A A F Machine 1 Machine 2 Edge Table (RDD) A B A C C D B C A E A F E F E D B C D E A F Slide courtesy: Joey Gonzales Can compute destination partitions easily due to the use of geo-partitioner
  27. GeoGraph API class  GeoGraph[V,  E]  {      //  Broadcast

     a  message  to  all        //  vertices  within  a  radius      def  sendMsg(radius)            //  Create  a  spatially  aggregated        //  graph  by  combining  vertices          //  and  edges        def  spatialAG(reduceV:  (V,  V)  =>  V,                                  reduceE:  (E,  E)  =>  E)   }  
  28. B C A D F E A D D B

    C D E A A F Spatial Clustering F E D D B’ F Goal: Combine spatially close-by vertices
  29. Spatial Clustering Two ways to enable spatial aggregation: - Using a

    (supplied) field in properties - Leverage geo partitioner 00   01   02   03   10   13   12   11   20   23   22   21   32   33   30   31  
  30. Spatial Clustering Two ways to enable spatial aggregation: - Using a

    (supplied) field in properties - Leverage geo partitioner 00   01   02   03   10   13   12   11   20   23   22   21   32   33   30   31   0   3   2   1  
  31. Computational Model: GStreams BS1 UE2 UE1 BS2 UE3 UE4 UE5

    BS1 UE2 UE1 BS2 UE3 UE4 UE5 BS1 UE2 UE1 BS2 UE3 UE4 UE5 Domain specific graph partitioning Spatial operations Window operations
  32. Tracking Persistent Hotspots Goal: Detect and track groups of base

    stations with high traffic volume Equivalent to finding connected components
  33. Tracking Persistent Hotspots BS1 BS2 BS3 t1 t2 t3 W

    Combining graphs at the end of the window results in many join operations (inefficient) BS1 BS2 BS1 BS2
  34. Tracking Persistent Hotspots BS1 BS2 BS3 t1 t2 t3 W

    BS1 BS2 BS1 BS2 BS1 BS2 BS3 1 1 1 BS1 BS2 BS3 2 1 1 BS1 BS2 BS3 3 1 1 Apply incremental updates to a cumulative graph
  35. Tracking Persistent Hotspots BS1 BS2 BS3 t1 t2 t3 BS1

    BS2 BS1 BS2 BS1 BS2 BS3 1 1 1 Apply differential updates to a cumulative graph BS1 BS3 t4 BS1 BS2 BS3 1 2 1 BS1 BS2 BS3 1 3 1 BS1 BS2 BS3 1 2 0
  36. GStream API class  GStream[V,  E]  {        def

     graphReduceByWindow(          reduceFunc(Graph[V,  E],  Graph[V,  E],                                  fv:  (V,  V)  =>  V,                                  fe:  (E,  E)  =>  E):  Graph[V,  E],            invReduceFunc(Graph[V,  E],  Graph[V,  E],                                  fv:  (V,  V)  =>  V,                                  fe:  (E,  E)  =>  E):  Graph[V,  E],            windowDuration,  slideDuration)   }  
  37. graphReduceByWindow     •  Implemented using Spark’s cogroupedRDD   • 

    Two default reduce functions: graph intersection and union •  Further optimizations: – Co-partition graphs from multiple batches – Reuse indices and routing tables for graphs in the same window More details in the paper!
  38. Evaluation Setup •  LTE control plane data from a major

    cellular network operator •  1 million+ subscribers, live network •  2 TB data from 1 week – 1 file per minute, 750k records, 100s of fields/line – 10 collection points, 10 hours per day •  Implemented several analysis tasks
  39. Benefits of Geo-partitioning �� �� ��� ��� ��� ��� ���

    ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ����������������
  40. Benefits of Geo-partitioning �� �� ��� ��� ��� ��� ���

    ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ���������������� Small amount of data, movement not noticeable Default practitioner fails to produce results
  41. Benefits of Incremental Updates �� �� ��� ��� ��� ���

    ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ���������������� ����������������������������� �������������������������������
  42. Benefits of Incremental Updates �� �� ��� ��� ��� ���

    ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ���������������� ����������������������������� ������������������������������� 2 – 5X improvements
  43. Benefits of Incremental Updates �� �� ��� ��� ��� ���

    ��� ��� ��� �������� ���������� ������ ����� ����������������������� ������������ �������������������� ���������������� ����������������������������� ������������������������������� window size affects performance
  44. Benefits of Differential Updates �� �� �� �� �� ���

    �� �� �� �� �� ��� ��� ����������������� ���������������� �������� ������
  45. Benefits of Differential Updates �� �� �� �� �� ���

    �� �� �� �� �� ��� ��� ����������������� ���������������� �������� ������ Larger windows see bigger benefits Graceful degradation in performance
  46. Benefits of Radius-based Broadcast �� ���� ���� ���� ���� ����

    ���� �������� ���������� ������ ����� ����������������� ������������ ���������������������� ��� � �� ��� ���������� ��� � �� ���
  47. Benefits of Radius-based Broadcast �� ���� ���� ���� ���� ����

    ���� �������� ���������� ������ ����� ����������������� ������������ ���������������������� ��� � �� ��� ���������� ��� � �� ��� Larger datasets result in increase in messages exchanges per hop
  48. CellIQ is a cellular network analytics system that uses domain-specific

    optimizations to achieve 2x to 5x improvements
  49. CellIQ is a cellular network analytics system that uses domain-specific

    optimizations to achieve 2x to 5x improvements Ongoing Work: • Using techniques in CellIQ to perform root-cause analysis on operational LTE Networks • Generalized streaming graph analysis techniques