Mitigating the Latency-Accuracy Trade-off in Mobile Data Analytics Systems

0ff46442256bf55681d64027c68beea7?s=47 Anand Iyer
November 01, 2018

Mitigating the Latency-Accuracy Trade-off in Mobile Data Analytics Systems

0ff46442256bf55681d64027c68beea7?s=128

Anand Iyer

November 01, 2018
Tweet

Transcript

  1. Mitigating the Latency-Accuracy Trade- off in Mobile Data Analytics Systems

    Anand Iyer ⋆, Li Erran Li⬩, Mosharaf Chowdhury✢, Ion Stoica ⋆ ⋆UC Berkeley ⬩Fudan University/Pony.ai ✢University of Michigan MobiCom, November 1, 2018
  2. § Many emerging domains Mobile Data Analytics Very Popular

  3. § Many emerging domains Mobile Data Analytics Very Popular

  4. § Many emerging domains Mobile Data Analytics Very Popular

  5. § Many emerging domains Common Goal: Understand user/entity behavior Mobile

    Data Analytics Very Popular
  6. Mobile Data Analytics

  7. Mobile Data Analytics

  8. Mobile Data Analytics

  9. Mobile Data Analytics

  10. Uplink SINR > -11.75 RSRQ > -16.5 RSRQ Available? Success

    Drop Uplink SINR > -5.86 CQI > 5.875 Drop Drop Yes No Yes No No Yes Success No No Yes Yes Success Mobile Data Analytics
  11. Uplink SINR > -11.75 RSRQ > -16.5 RSRQ Available? Success

    Drop Uplink SINR > -5.86 CQI > 5.875 Drop Drop Yes No Yes No No Yes Success No No Yes Yes Success Mobile Data Analytics Tasks operate on data ingested in (near) real-time for low-latency decisions
  12. Uplink SINR > -11.75 RSRQ > -16.5 RSRQ Available? Success

    Drop Uplink SINR > -5.86 CQI > 5.875 Drop Drop Yes No Yes No No Yes Success No No Yes Yes Success Mobile Data Analytics Tasks operate on data ingested in (near) real-time for low-latency decisions Model/predict per-user/per-entity behavior
  13. Latency-Accuracy Trade-off Data collection latency Model Accuracy

  14. Latency-Accuracy Trade-off Data collection latency Model Accuracy

  15. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant

  16. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant

  17. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency
  18. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency 1 hour latency for 94% accuracy
  19. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency 1 hour latency for 94% accuracy
  20. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency 1 hour latency for 94% accuracy Staleness enforces short interval analyses
  21. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency 1 hour latency for 94% accuracy Staleness enforces short interval analyses Highest achieved accuracy ~66%
  22. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency 1 hour latency for 94% accuracy Staleness enforces short interval analyses Highest achieved accuracy ~66% Need to update models frequently
  23. Mitigating Latency-Accuracy Trade-off Data collection latency Model Accuracy

  24. Mitigating Latency-Accuracy Trade-off Data collection latency Model Accuracy

  25. Mitigating Latency-Accuracy Trade-off Data collection latency Model Accuracy Efficient Task

    Formulations
  26. Mitigating Latency-Accuracy Trade-off Data collection latency Model Accuracy Efficient Task

    Formulations Intelligent Data Grouping
  27. Hybrid Multi-Task Learning Mitigating Latency-Accuracy Trade-off Data collection latency Model

    Accuracy Efficient Task Formulations Intelligent Data Grouping
  28. PCA based partitioning Hybrid Multi-Task Learning Mitigating Latency-Accuracy Trade-off Data

    collection latency Model Accuracy Efficient Task Formulations Intelligent Data Grouping
  29. Cellular RAN Performance Diagnostics

  30. Cellular RAN Performance Diagnostics

  31. Goal: Diagnose problems using data collected at base stations Cellular

    RAN Performance Diagnostics
  32. Base stations vary widely in data �� ���� ����� �����

    ����� ����� ����� ����� ����� � � � � � �� �� ����������������� ��������������������������������� ���������
  33. Base stations vary widely in data �� ���� ����� �����

    ����� ����� ����� ����� ����� � � � � � �� �� ����������������� ��������������������������������� ��������� Many base stations do not collect enough data in small intervals
  34. Latency-Accuracy Trade-off in RANs 0 20 40 60 80 100

    0 1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (minutes) Random Forest Lasso Regression 10 60 (Call drops) (Throughput)
  35. Latency-Accuracy Trade-off in RANs 0 20 40 60 80 100

    0 1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (minutes) Random Forest Lasso Regression 10 60 High latency incurred for good accuracy (Call drops) (Throughput)
  36. Latency-Accuracy Trade-off in RANs 0 20 40 60 80 100

    0 1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (minutes) Random Forest Lasso Regression 10 60 High latency incurred for good accuracy Staleness causes huge variance and errors (Call drops) (Throughput)
  37. Cellscope Architecture CellScope Domain-Specific MTL Gradient Boosted Trees RAN Performance

    Analyzer ML Lib Bearer Level Trace Dashboards Self-Organizing Networks (SON) Throughput Drop Feature Engineering PCA-Based Similarity Grouping Streaming Hybrid MTL
  38. Cellscope Architecture CellScope Domain-Specific MTL Gradient Boosted Trees RAN Performance

    Analyzer ML Lib Bearer Level Trace Dashboards Self-Organizing Networks (SON) Throughput Drop Feature Engineering PCA-Based Similarity Grouping Streaming Hybrid MTL
  39. Cellscope Architecture CellScope Domain-Specific MTL Gradient Boosted Trees RAN Performance

    Analyzer ML Lib Bearer Level Trace Dashboards Self-Organizing Networks (SON) Throughput Drop Feature Engineering PCA-Based Similarity Grouping Streaming Hybrid MTL
  40. Cellscope Architecture CellScope Domain-Specific MTL Gradient Boosted Trees RAN Performance

    Analyzer ML Lib Bearer Level Trace Dashboards Self-Organizing Networks (SON) Throughput Drop Feature Engineering PCA-Based Similarity Grouping Streaming Hybrid MTL
  41. Multi Task Learning (MTL)

  42. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences
  43. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences ℎ " = $(&' ( , &* " , … , &, ("))
  44. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences Data Train Model Task 1 Data Train Model Task 2 Data Train Model Task N … ℎ " = $(&' ( , &* " , … , &, ("))
  45. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences Data Train Model Task 1 Data Train Model Task 2 Data Train Model Task N … ℎ " = $(&' ( , &* " , … , &, (")) Data Task 1 Data Task 2 Data Task N … Model Model Model … Train
  46. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences Data Train Model Task 1 Data Train Model Task 2 Data Train Model Task N … ℎ " = $(&' ( , &* " , … , &, (")) Data Task 1 Data Task 2 Data Task N … Model Model Model … Train ℎ " = $./ (&' ( , &* " , … , &, ("))
  47. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences Data Train Model Task 1 Data Train Model Task 2 Data Train Model Task N … ℎ " = $(&' ( , &* " , … , &, (")) Data Task 1 Data Task 2 Data Task N … Model Model Model … Train ℎ " = $./ (&' ( , &* " , … , &, (")) Assumes that all tasks are related
  48. MTL in Cellscope

  49. MTL in Cellscope Train Data Task 1 Data Task 2

    Data Task N … Model Model Model … ℎ " = $%& (() * , (, " , … , (. ("))
  50. MTL in Cellscope Train Data Task 1 Data Task 2

    Data Task N … Model Model Model … ℎ " = $%& (() * , (, " , … , (. (")) … … Train Data Task 1 Task 2 Task K … Model … Group 1 Data Data Model Model Train Data Task 1 Task 2 Task K … Model … Group N Data Data Model Model ℎ " = $0(%&) (() * , (, " , … , (. ("))
  51. MTL in Cellscope Problem: Scalable maintenance of large number of

    models Train Data Task 1 Data Task 2 Data Task N … Model Model Model … ℎ " = $%& (() * , (, " , … , (. (")) … … Train Data Task 1 Task 2 Task K … Model … Group 1 Data Data Model Model Train Data Task 1 Task 2 Task K … Model … Group N Data Data Model Model ℎ " = $0(%&) (() * , (, " , … , (. ("))
  52. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization
  53. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Prediction error
  54. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Prediction error
  55. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Regularization parameter Prediction error
  56. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Regularization parameter Prediction error Decompose parameters into shared common set fc and base station specific set fs
  57. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Regularization parameter Prediction error Decompose parameters into shared common set fc and base station specific set fs $( ∑% ℎ ': )+ , )5 , - + /||1(': )+ )||) + /||1(': )5 )||
  58. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Regularization parameter Prediction error Decompose parameters into shared common set fc and base station specific set fs Base-station specific $( ∑% ℎ ': )+ , )5 , - + /||1(': )+ )||) + /||1(': )5 )||
  59. Hybrid MTL Structure of determines efficient implementation ℎ : #$

    , #&
  60. Hybrid MTL Structure of determines efficient implementation Restrict models to

    be of form w . x ℎ : #$ , #&
  61. Hybrid MTL Structure of determines efficient implementation Restrict models to

    be of form w . x Leverage ensemble methods ℎ : #$ , #&
  62. Hybrid MTL Structure of determines efficient implementation Dataset Restrict models

    to be of form w . x Leverage ensemble methods ℎ : #$ , #&
  63. Hybrid MTL Structure of determines efficient implementation Dataset Model 2

    Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods ℎ : #$ , #&
  64. Hybrid MTL Structure of determines efficient implementation Dataset Ensemble Model

    Model 2 Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods ℎ : #$ , #&
  65. Hybrid MTL Structure of determines efficient implementation Dataset Ensemble Model

    Model 2 Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods ℎ : #$ , #& f1 f2 f3 f4 fN
  66. Hybrid MTL Structure of determines efficient implementation Dataset Ensemble Model

    Model 2 Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods Gradient Boosted Trees ℎ : #$ , #& f1 f2 f3 f4 fN
  67. Hybrid MTL More details in the paper Structure of determines

    efficient implementation Dataset Ensemble Model Model 2 Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods Gradient Boosted Trees ℎ : #$ , #& f1 f2 f3 f4 fN
  68. Data Grouping for MTL Key Idea: Use Principal Component Analysis

    (PCA) to find normal behavior
  69. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior
  70. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior …
  71. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … n
  72. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … n m Measurement matrix
  73. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … n m Measurement matrix
  74. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … … … … … … … … … … … … … n m n k Measurement matrix
  75. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … … … … … … … … … … … … … n m n k Measurement matrix
  76. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … … … … … … … … … … … … … n m n k loadings Measurement matrix
  77. PCA Similarity Find the similarity between the principal components

  78. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n
  79. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n … … … … … … … … … … … … n k … … … … … … … … … … … … n k
  80. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n … … … … … … … … … … … … n k … … … … … … … … … … … … n k
  81. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n … … … … … … … … … … … … n k … … … … … … … … … … … … n k S"#$%%&'()$ = + ,-. / + 0-. 1 |30, − 50, |
  82. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n … … … … … … … … … … … … n k … … … … … … … … … … … … n k S"#$%%&'()$ = + ,-. / + 0-. 1 |30, − 50, | × 7809:;1'$(=,?)
  83. Implementation & Evaluation § Implemented on Apache Spark § Extends

    Mllib and provides a simple API for grouping § Evaluated using data from a live RAN § Data over several months § Models for two metrics: drops and throughput prediction § Also analyzed several issues in the wild
  84. Cellscope Reduces Latency 0 20 40 60 80 100 0

    1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (min) Per Base Station Cellscope 10 60 > 90% accuracy with 3 minutes data (compared to 60 minutes) 3 x
  85. Cellscope Improves Accuracy 0 20 40 60 80 100 0

    1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (min) Per Base Station Cellscope 10 60 Achieves high accuracy in small timespans 1.4 x 4 x
  86. Real-world Analysis with Cellscope § Cellscope can significantly reduce operator

    efforts § Reduces the need for field trials, can build accurate models quickly § Up to 2 order of magnitudes (10s of hours → minutes) § Cellscope found new issues previously unknown § E.g., Grouping revealed high interference base station clusters § Cellscope can aid domain expert § Can reduce the troubleshooting search space significantly
  87. Much more in the paper… § Extending the techniques to

    other domains § Straightforward & effective § Comparison with strawman grouping techniques § Why they’re not sufficient § Implementation details of our hybrid MTL & API § Extensive evaluation § Real-world analysis & findings
  88. Summary § Mobile data analytics popular § Need low-latency decisions

    on live data § Latency-Accuracy Trade-off § Not enough data in small timespans, staleness determines bounds on data collection latencies § Intelligent grouping and efficient task formulations § Hybrid MTL and PCA based partitioning http://www.cs.berkeley.edu/~api api@cs.berkeley.edu