Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Mitigating the Latency-Accuracy Trade-off in Mobile Data Analytics Systems

Anand Iyer
November 01, 2018

Mitigating the Latency-Accuracy Trade-off in Mobile Data Analytics Systems

Anand Iyer

November 01, 2018
Tweet

More Decks by Anand Iyer

Other Decks in Research

Transcript

  1. Mitigating the Latency-Accuracy Trade- off in Mobile Data Analytics Systems

    Anand Iyer ⋆, Li Erran Li⬩, Mosharaf Chowdhury✢, Ion Stoica ⋆ ⋆UC Berkeley ⬩Fudan University/Pony.ai ✢University of Michigan MobiCom, November 1, 2018
  2. Uplink SINR > -11.75 RSRQ > -16.5 RSRQ Available? Success

    Drop Uplink SINR > -5.86 CQI > 5.875 Drop Drop Yes No Yes No No Yes Success No No Yes Yes Success Mobile Data Analytics
  3. Uplink SINR > -11.75 RSRQ > -16.5 RSRQ Available? Success

    Drop Uplink SINR > -5.86 CQI > 5.875 Drop Drop Yes No Yes No No Yes Success No No Yes Yes Success Mobile Data Analytics Tasks operate on data ingested in (near) real-time for low-latency decisions
  4. Uplink SINR > -11.75 RSRQ > -16.5 RSRQ Available? Success

    Drop Uplink SINR > -5.86 CQI > 5.875 Drop Drop Yes No Yes No No Yes Success No No Yes Yes Success Mobile Data Analytics Tasks operate on data ingested in (near) real-time for low-latency decisions Model/predict per-user/per-entity behavior
  5. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency 1 hour latency for 94% accuracy Staleness enforces short interval analyses
  6. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency 1 hour latency for 94% accuracy Staleness enforces short interval analyses Highest achieved accuracy ~66%
  7. Latency-Accuracy Trade-off Data collection latency Model Accuracy Statistically insignificant High

    latency 1 hour latency for 94% accuracy Staleness enforces short interval analyses Highest achieved accuracy ~66% Need to update models frequently
  8. Hybrid Multi-Task Learning Mitigating Latency-Accuracy Trade-off Data collection latency Model

    Accuracy Efficient Task Formulations Intelligent Data Grouping
  9. PCA based partitioning Hybrid Multi-Task Learning Mitigating Latency-Accuracy Trade-off Data

    collection latency Model Accuracy Efficient Task Formulations Intelligent Data Grouping
  10. Base stations vary widely in data �� ���� ����� �����

    ����� ����� ����� ����� ����� � � � � � �� �� ����������������� ��������������������������������� ���������
  11. Base stations vary widely in data �� ���� ����� �����

    ����� ����� ����� ����� ����� � � � � � �� �� ����������������� ��������������������������������� ��������� Many base stations do not collect enough data in small intervals
  12. Latency-Accuracy Trade-off in RANs 0 20 40 60 80 100

    0 1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (minutes) Random Forest Lasso Regression 10 60 (Call drops) (Throughput)
  13. Latency-Accuracy Trade-off in RANs 0 20 40 60 80 100

    0 1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (minutes) Random Forest Lasso Regression 10 60 High latency incurred for good accuracy (Call drops) (Throughput)
  14. Latency-Accuracy Trade-off in RANs 0 20 40 60 80 100

    0 1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (minutes) Random Forest Lasso Regression 10 60 High latency incurred for good accuracy Staleness causes huge variance and errors (Call drops) (Throughput)
  15. Cellscope Architecture CellScope Domain-Specific MTL Gradient Boosted Trees RAN Performance

    Analyzer ML Lib Bearer Level Trace Dashboards Self-Organizing Networks (SON) Throughput Drop Feature Engineering PCA-Based Similarity Grouping Streaming Hybrid MTL
  16. Cellscope Architecture CellScope Domain-Specific MTL Gradient Boosted Trees RAN Performance

    Analyzer ML Lib Bearer Level Trace Dashboards Self-Organizing Networks (SON) Throughput Drop Feature Engineering PCA-Based Similarity Grouping Streaming Hybrid MTL
  17. Cellscope Architecture CellScope Domain-Specific MTL Gradient Boosted Trees RAN Performance

    Analyzer ML Lib Bearer Level Trace Dashboards Self-Organizing Networks (SON) Throughput Drop Feature Engineering PCA-Based Similarity Grouping Streaming Hybrid MTL
  18. Cellscope Architecture CellScope Domain-Specific MTL Gradient Boosted Trees RAN Performance

    Analyzer ML Lib Bearer Level Trace Dashboards Self-Organizing Networks (SON) Throughput Drop Feature Engineering PCA-Based Similarity Grouping Streaming Hybrid MTL
  19. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences ℎ " = $(&' ( , &* " , … , &, ("))
  20. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences Data Train Model Task 1 Data Train Model Task 2 Data Train Model Task N … ℎ " = $(&' ( , &* " , … , &, ("))
  21. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences Data Train Model Task 1 Data Train Model Task 2 Data Train Model Task N … ℎ " = $(&' ( , &* " , … , &, (")) Data Task 1 Data Task 2 Data Task N … Model Model Model … Train
  22. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences Data Train Model Task 1 Data Train Model Task 2 Data Train Model Task N … ℎ " = $(&' ( , &* " , … , &, (")) Data Task 1 Data Task 2 Data Task N … Model Model Model … Train ℎ " = $./ (&' ( , &* " , … , &, ("))
  23. Multi Task Learning (MTL) Jointly learn many tasks by exploiting

    commonalities and differences Data Train Model Task 1 Data Train Model Task 2 Data Train Model Task N … ℎ " = $(&' ( , &* " , … , &, (")) Data Task 1 Data Task 2 Data Task N … Model Model Model … Train ℎ " = $./ (&' ( , &* " , … , &, (")) Assumes that all tasks are related
  24. MTL in Cellscope Train Data Task 1 Data Task 2

    Data Task N … Model Model Model … ℎ " = $%& (() * , (, " , … , (. ("))
  25. MTL in Cellscope Train Data Task 1 Data Task 2

    Data Task N … Model Model Model … ℎ " = $%& (() * , (, " , … , (. (")) … … Train Data Task 1 Task 2 Task K … Model … Group 1 Data Data Model Model Train Data Task 1 Task 2 Task K … Model … Group N Data Data Model Model ℎ " = $0(%&) (() * , (, " , … , (. ("))
  26. MTL in Cellscope Problem: Scalable maintenance of large number of

    models Train Data Task 1 Data Task 2 Data Task N … Model Model Model … ℎ " = $%& (() * , (, " , … , (. (")) … … Train Data Task 1 Task 2 Task K … Model … Group 1 Data Data Model Model Train Data Task 1 Task 2 Task K … Model … Group N Data Data Model Model ℎ " = $0(%&) (() * , (, " , … , (. ("))
  27. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization
  28. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Prediction error
  29. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Prediction error
  30. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Regularization parameter Prediction error
  31. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Regularization parameter Prediction error Decompose parameters into shared common set fc and base station specific set fs
  32. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Regularization parameter Prediction error Decompose parameters into shared common set fc and base station specific set fs $( ∑% ℎ ': )+ , )5 , - + /||1(': )+ )||) + /||1(': )5 )||
  33. min $ % ℎ ': )*+ , - + /||

    1(': )*+ )|| Hybrid MTL Model estimation by L1 regularized loss minimization Per base-station parameters Regularization parameter Prediction error Decompose parameters into shared common set fc and base station specific set fs Base-station specific $( ∑% ℎ ': )+ , )5 , - + /||1(': )+ )||) + /||1(': )5 )||
  34. Hybrid MTL Structure of determines efficient implementation Restrict models to

    be of form w . x Leverage ensemble methods ℎ : #$ , #&
  35. Hybrid MTL Structure of determines efficient implementation Dataset Restrict models

    to be of form w . x Leverage ensemble methods ℎ : #$ , #&
  36. Hybrid MTL Structure of determines efficient implementation Dataset Model 2

    Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods ℎ : #$ , #&
  37. Hybrid MTL Structure of determines efficient implementation Dataset Ensemble Model

    Model 2 Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods ℎ : #$ , #&
  38. Hybrid MTL Structure of determines efficient implementation Dataset Ensemble Model

    Model 2 Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods ℎ : #$ , #& f1 f2 f3 f4 fN
  39. Hybrid MTL Structure of determines efficient implementation Dataset Ensemble Model

    Model 2 Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods Gradient Boosted Trees ℎ : #$ , #& f1 f2 f3 f4 fN
  40. Hybrid MTL More details in the paper Structure of determines

    efficient implementation Dataset Ensemble Model Model 2 Model 3 Model 4 Model N Model 1 … Restrict models to be of form w . x Leverage ensemble methods Gradient Boosted Trees ℎ : #$ , #& f1 f2 f3 f4 fN
  41. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior
  42. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior …
  43. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … n
  44. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … n m Measurement matrix
  45. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … n m Measurement matrix
  46. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … … … … … … … … … … … … … n m n k Measurement matrix
  47. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … … … … … … … … … … … … … n m n k Measurement matrix
  48. Data Grouping for MTL Project large number of correlated dimensions

    to a small set of orthogonal dimensions. Key Idea: Use Principal Component Analysis (PCA) to find normal behavior … … … … … … … … … … … … … … … … … … … … n m n k loadings Measurement matrix
  49. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n
  50. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n … … … … … … … … … … … … n k … … … … … … … … … … … … n k
  51. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n … … … … … … … … … … … … n k … … … … … … … … … … … … n k
  52. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n … … … … … … … … … … … … n k … … … … … … … … … … … … n k S"#$%%&'()$ = + ,-. / + 0-. 1 |30, − 50, |
  53. PCA Similarity Find the similarity between the principal components …

    … … … … … … … mA n … … … … … … … … mB n … … … … … … … … … … … … n k … … … … … … … … … … … … n k S"#$%%&'()$ = + ,-. / + 0-. 1 |30, − 50, | × 7809:;1'$(=,?)
  54. Implementation & Evaluation § Implemented on Apache Spark § Extends

    Mllib and provides a simple API for grouping § Evaluated using data from a live RAN § Data over several months § Models for two metrics: drops and throughput prediction § Also analyzed several issues in the wild
  55. Cellscope Reduces Latency 0 20 40 60 80 100 0

    1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (min) Per Base Station Cellscope 10 60 > 90% accuracy with 3 minutes data (compared to 60 minutes) 3 x
  56. Cellscope Improves Accuracy 0 20 40 60 80 100 0

    1 2 3 4 5 6 7 8 Accuracy (%) Data Collection Latency (min) Per Base Station Cellscope 10 60 Achieves high accuracy in small timespans 1.4 x 4 x
  57. Real-world Analysis with Cellscope § Cellscope can significantly reduce operator

    efforts § Reduces the need for field trials, can build accurate models quickly § Up to 2 order of magnitudes (10s of hours → minutes) § Cellscope found new issues previously unknown § E.g., Grouping revealed high interference base station clusters § Cellscope can aid domain expert § Can reduce the troubleshooting search space significantly
  58. Much more in the paper… § Extending the techniques to

    other domains § Straightforward & effective § Comparison with strawman grouping techniques § Why they’re not sufficient § Implementation details of our hybrid MTL & API § Extensive evaluation § Real-world analysis & findings
  59. Summary § Mobile data analytics popular § Need low-latency decisions

    on live data § Latency-Accuracy Trade-off § Not enough data in small timespans, staleness determines bounds on data collection latencies § Intelligent grouping and efficient task formulations § Hybrid MTL and PCA based partitioning http://www.cs.berkeley.edu/~api [email protected]