Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Watt-Wise Web: Architecting a Responsive and Energy-Efficient Mobile Web

3c332dfc0b438785cb10c5234652dd66?s=47 Yuhao Zhu
October 26, 2020

Watt-Wise Web: Architecting a Responsive and Energy-Efficient Mobile Web

Guest lecture given at University of Utah on Web browsing

3c332dfc0b438785cb10c5234652dd66?s=128

Yuhao Zhu

October 26, 2020
Tweet

Transcript

  1. Watt-Wise Web: Architecting a Responsive and Energy-Efficient Mobile Web Yuhao

    Zhu http://yuhaozhu.com
  2. None
  3. None
  4. Snake circa 2000

  5. Snake circa 2000 Snake circa 2020

  6. Performance

  7. Performance Power

  8. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3
  9. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 0 2 4 6 8 2009 2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3
  10. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 0 2 4 6 8 2009 2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3 In pursuit of high performance
  11. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 0 2 4 6 8 2009 2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3 Throttling In pursuit of high performance
  12. Measure Power (W) 0 2 4 6 8 Year 2009

    2010 2011 2012 2013 2014 2015 0 2 4 6 8 2009 2010 2011 2012 2013 2014 2015 Mobile CPU’s Rise to Power [HPCA 2016] 3 Throttling In pursuit of high performance SoC Thermal Design Power (TDP)
  13. 4 Mobile Processor Design “Strategy” Performance Power

  14. 4 Mobile Processor Design “Strategy” 2007 (In-order) Performance Power

  15. 4 Mobile Processor Design “Strategy” 2007 (In-order) 2011 (Out-of-order) 2012

    (Multi-core) 2013 (Asymmetric Multi-core) Performance Power
  16. 4 Mobile Processor Design “Strategy” 2007 (In-order) 2011 (Out-of-order) 2012

    (Multi-core) 2013 (Asymmetric Multi-core) Performance Power Squeeze 3 decades of desktop CPU techniques into a 6-year span
  17. “Improving” Energy Capacity 5 600 smartphone from 2006 to 2014

    on http://www.gsmarena.com/makers.php3
  18. “Improving” Energy Capacity 5 Screen Size (inches) Battery Capacity (mAh)

    600 smartphone from 2006 to 2014 on http://www.gsmarena.com/makers.php3             
  19. “Improving” Energy Capacity 5 Screen Size (inches) Battery Capacity (mAh)

    600 smartphone from 2006 to 2014 on http://www.gsmarena.com/makers.php3             
  20. “Improving” Energy Capacity 5 Screen Size (inches) Battery Capacity (mAh)

    600 smartphone from 2006 to 2014 on http://www.gsmarena.com/makers.php3             
  21. “Improving” Energy Capacity 6

  22. “Improving” Energy Capacity 6

  23. “Improving” Energy Capacity 6

  24. “Improving” Energy Capacity 6

  25. “Improving” Energy Capacity 6

  26. Mobile Applications 7

  27. Mobile Applications 7 ^ Web

  28. Mobile Applications 8 ^ Web Monthly Unique Mobile Users Non-Web

    Web 3.3 M 8.9 M comScore Mobile Metrix, U.S., June 2015
  29. Mobile Applications 8 ^ Web Monthly Unique Mobile Users 2

    4 6 8 10 12 Jun
 2014 Sep
 2014 Dec
 2014 Mar
 2015 Jun
 2015 Sep
 2015 Dec
 2015 Mar
 2016 Jun
 2016 Million Web Non-Web comScore 2016 U.S. Mobile App Report
  30. Is This (Just) a Network Issue? [IEEE MICRO 2015] 9

  31. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 Is This (Just) a Network Issue? [IEEE MICRO 2015]
  32. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 LTE 3G Adverse 3G 2G Wi-Fi Is This (Just) a Network Issue? [IEEE MICRO 2015]
  33. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 LTE 3G Adverse 3G 2G Wi-Fi Is This (Just) a Network Issue? [IEEE MICRO 2015]
  34. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 LTE 3G Adverse 3G 2G Wi-Fi Is This (Just) a Network Issue? [IEEE MICRO 2015]
  35. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 10 Compute LTE 3G Adverse 3G 2G Wi-Fi Is This (Just) a Network Issue? [IEEE MICRO 2015]
  36. Measured Power (W) 0.0 3.0 6.0 9.0 2009 2010 2011

    2012 2013 2014 2015 Screen Radio CPU 11 Compute Is This (Just) a Network Issue? [IEEE MICRO 2015]
  37. 12 Compute Is This (Just) a Network Issue? [IEEE MICRO

    2015] Measured Power (W) 0.0 3.0 6.0 9.0 2009 2010 2011 2012 2013 2014 2015 Screen Radio CPU
  38. Compute 13 Web Browsing from a Compute Perspective

  39. 14 Web Browsing from a Compute Perspective Compute

  40. Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security

    Local Storage User Input Layout Render Runtime 15 Web Browsing from a Compute Perspective Compute Architecture Application
  41. Runtime 16 Cross-Layer Optimizations Architecture Application

  42. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    [ISCA 2014] [TOCS 2017]
  43. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017]
  44. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017]
  45. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017]
  46. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    WebRT Fast, Energy-Efficient Mobile Web Runtime GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017] [HPCA 2013] [HPCA 2015] [ISCA 2019]
  47. Runtime 16 Cross-Layer Optimizations Architecture Application WebCore Web-specific Processor Architecture

    WebRT Fast, Energy-Efficient Mobile Web Runtime GreenWeb Language Support for Quality-of-Experience [PLDI 2016] [ISCA 2014] [TOCS 2017] [HPCA 2013] [HPCA 2015] [ISCA 2019]
  48. 17 Mobile Applications are Event-Driven Touching Moving Interactions

  49. 17 Mobile Applications are Event-Driven Touching Moving Interactions Events

  50. 17 Mobile Applications are Event-Driven Touching Moving Interactions Events click

    touchstart touchmove scroll
  51. 17 Mobile Applications are Event-Driven Touching Moving Interactions Events click

    touchstart touchmove scroll Event Queue
  52. 17 Mobile Applications are Event-Driven Touching Moving Interactions Events click

    touchstart touchmove scroll Event Loop Event Queue CPU
  53. 17 Mobile Applications are Event-Driven Touching Moving Interactions Events click

    touchstart touchmove scroll Event is the atomic unit of execution; optimize latency/energy at the event-level. Event Loop Event Queue CPU
  54. ▸ Observation: Events have different latency slacks that enable energy

    optimizations 17 Mobile Applications are Event-Driven Touching Moving Interactions Events click touchstart touchmove scroll Event Loop Event Queue CPU
  55. ▸ Observation: Events have different latency slacks that enable energy

    optimizations 18 Event-based Web Runtime Optimization
  56. ▸ Observation: Events have different latency slacks that enable energy

    optimizations 18 ▸ Mechanism: Provide just enough energy to meet QoE requirement for different events Event-based Web Runtime Optimization
  57. ▸ Observation: Events have different latency slacks that enable energy

    optimizations 18 ▸ Mechanism: Provide just enough energy to meet QoE requirement for different events ▸ Implementation: Map events to different heterogeneous hardware configurations Event-based Web Runtime Optimization
  58. Event-Level Characterization !19

  59. Event-Level Characterization !19

  60. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events
  61. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events
  62. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events keyup
  63. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events keyup
  64. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack keyup
  65. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change keyup
  66. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack keyup
  67. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack click keyup
  68. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack No Slack click keyup
  69. Event-Level Characterization !19 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack No Slack click keyup ▸ Wide distribution of event latencies. Events exhibit different slacks. ▹ How to exploit event slacks?
  70. !20 Event-based Scheduler (EBS)

  71. !20 Event-based Scheduler (EBS) ▸ Goal: For each event, find

    the most energy-efficient hardware configuration that meets the latency target
  72. !20 Event-based Scheduler (EBS) Thread Scheduling

  73. !20 Event-based Scheduler (EBS) Thread Scheduling

  74. !20 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling

  75. !20 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

  76. !20 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Events-based Scheduling
  77. !20 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Events-based Scheduling Event Queue
  78. !20 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Event-based Scheduler Events-based Scheduling Event Queue
  79. !20 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Event-based Scheduler Events-based Scheduling Event Latency Event Energy Event Queue
  80. !21 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices
  81. !22 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices Energy Consumption Performance Big Core Small Core
  82. !22 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices Energy Consumption Performance Big Core Small Core Voltage/ Frequency Levels
  83. !22 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices Energy Consumption Performance Big Core Small Core
  84. !22 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space ▸ Widely used in commodity devices Energy Consumption Performance Big Core Small Core
  85. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03
  86. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency =
  87. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory +
  88. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f
  89. !23 Leveraging Heterogeneous Hardware ▸ Offer a large performance-energy trade-off

    space Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f
  90. Evaluation Methodology ▸ Implemented inside Google Chromium Web browser ▸

    Representative hardware platform ▹ Exynos 5410 SoC (A15 + A7) ▸UI-level record and replay for reproducibility. ▹ Mosaic [ISPASS 2015] https://github.com/Matthalp/mosaic 24
  91. 25 Evaluation Results 0 0.2 0.4 0.6 0.8 1 1.2

    Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf
  92. 26 Evaluation Results 0 0.2 0.4 0.6 0.8 1 1.2

    Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 Android
  93. 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0

    0.2 0.4 0.6 0.8 1 Perf 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 Android 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 EBS 27 Evaluation Results 29.2% Energy Savings
  94. 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4

    0.6 0.8 1 Android 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 EBS 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf 28 Evaluation Results Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1
  95. 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4

    0.6 0.8 1 Android 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 EBS 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 29 Evaluation Results
  96. 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4

    0.6 0.8 1 Android 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 EBS 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Perf Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 Norm. QoS Violation 0 0.2 0.4 0.6 0.8 1 1.2 Norm. Energy 0 0.2 0.4 0.6 0.8 1 30 Evaluation Results 0.8% More QoS violations
  97. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Event Queue
  98. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Pending Events Event Queue
  99. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Pending Events onclick GC task Timer event Event Queue Time Event Queue Pending Events
  100. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Reactive Strategy Pending Events onclick GC task Timer event Event Queue Time Event Queue Pending Events
  101. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Reactive Strategy Pending Events onclick GC task Timer event ontouchmove onsubmit Event Queue Time Event Queue Pending Events
  102. Rethink Event-based Scheduling 31 Event-based Scheduler Event Latency Event Energy

    Proactive Strategy Pending & Speculative Events onclick GC task Timer event ontouchmove onsubmit Event Queue Time Event Queue Pending Events Speculative Events
  103. Inefficiency of Current Schedulers 32

  104. Inefficiency of Current Schedulers 32 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2
  105. Inefficiency of Current Schedulers 32 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 OS
  106. Inefficiency of Current Schedulers 32 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 slack OS
  107. Inefficiency of Current Schedulers 32 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 slack OS
  108. Inefficiency of Current Schedulers 33 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 EBS OS
  109. Inefficiency of Current Schedulers 33 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 E1 EBS OS
  110. Inefficiency of Current Schedulers 33 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 E1 E2 EBS OS
  111. Inefficiency of Current Schedulers 33 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 E1 E2 ? EBS OS
  112. Inefficiency of Current Schedulers 34 Time Oracle input 1 input

    2 QoS Deadline 1 QoS Deadline 2 E1 E2 E1 E2 EBS OS
  113. Inefficiency of Current Schedulers 34 Time Oracle input 1 input

    2 QoS Deadline 1 QoS Deadline 2 E1 E2 E1 E2 E1 E2 EBS OS
  114. Inefficiency of Current Schedulers 35 Time input 1 input 2

    QoS Deadline 1 QoS Deadline 2 E1 E2 Schedule across both pending and future events Oracle
  115. PES: Proactive Event Scheduler 36

  116. PES: Proactive Event Scheduler 36 Prediction Web Program Analysis Machine

    Learning
  117. PES: Proactive Event Scheduler 36 Prediction Web Program Analysis Machine

    Learning Scheduling Constrained Optimization
  118. PES: Proactive Event Scheduler 36 Prediction Web Program Analysis Machine

    Learning Scheduling Constrained Optimization
  119. Event Sequence Learning 37 Prediction Model

  120. Event Sequence Learning 37 Time E1 E2 E3 Event Sequence

    Prediction Model
  121. Event Sequence Learning 37 Time E1 E2 E3 Event Sequence

    Prediction Model
  122. Event Sequence Learning 37 Time E1 E2 E3 Event Sequence

    E4 Prediction Model
  123. Event Sequence Learning 37 Time E1 E2 E3 Event Sequence

    E4 Prediction Model
  124. Recurrent Prediction 38 Time E1 E2 E3 E4 Event Sequence

    Prediction Model
  125. Recurrent Prediction 38 Time E1 E2 E3 E4 Event Sequence

    Prediction Model
  126. Recurrent Prediction 38 Time E1 E2 E3 E4 Event Sequence

    E5 Prediction Model
  127. Recurrent Prediction 38 Time E1 E2 E3 E4 Event Sequence

    E5 Prediction Model
  128. Recurrent Prediction 38 Time E1 E2 E3 E4 Event Sequence

    E5 … Prediction Model
  129. Prediction Model 39 Prediction Model

  130. Prediction Model 40 Prediction Model The distance of click The

    number of scrolls The number of navigations Features encoding past interactions …
  131. Prediction Model 41 Features encoding past interactions ln( p 1

    − p ) = x β Prediction Model
  132. Prediction Model 42 Features encoding past interactions Click ScrollUp ScrollDown

    ZoomIn ZoomOut … Prediction Model 0.10 0.12 0.58 0.07 0.02
  133. Prediction Model 42 Features encoding past interactions Click ScrollUp ScrollDown

    ZoomIn ZoomOut … Prediction Model ✔
  134. Prediction Model 42 Features encoding past interactions Click ScrollUp ScrollDown

    ZoomIn ZoomOut … Prediction Model All Event Types
  135. Prediction Model 42 Features encoding past interactions Click ScrollUp ScrollDown

    ZoomIn ZoomOut … Prediction Model Filter Unlikely Events!
  136. 43 Program State Analysis

  137. 43 Program State Analysis

  138. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Program State Analysis DOM Tree
  139. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Program State Analysis DOM Tree Viewport
  140. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Viewport Program State Analysis DOM Tree Viewport
  141. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Viewport Program State Analysis DOM Tree Viewport
  142. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Viewport Program State Analysis DOM Tree Viewport
  143. 43 <collapsible> <href> <html> <body> <div> … <div> <div> <href>

    <href> … Viewport Program State Analysis DOM Tree Viewport
  144. Integrating Program States 44 Features encoding past interactions Click ScrollUp

    ScrollDown ZoomIn ZoomOut … Prediction Model
  145. Integrating Program States 45 Features encoding past interactions Click ScrollUp

    ScrollDown ZoomIn ZoomOut … Current
 application states + + Prediction Model
  146. Overview of the Predictor 46 Click ScrollUp ScrollDown ZoomIn ZoomOut

    … + Prediction Model ~10 us Features encoding past interactions Current
 application states +
  147. Overview of the Predictor 46 Click ScrollUp ScrollDown ZoomIn ZoomOut

    … + Prediction Model Stop until the cumulative confidence of the predicted event sequence is below a threshold ~10 us Features encoding past interactions Current
 application states +
  148. PES: Proactive Event Scheduler 47 Prediction Web Program Analysis Machine

    Learning Scheduling Constrained Optimization
  149. Constrained Optimization Formulation 48 E1 E2 E3 Time ▸ Goal:

    minimize total energy while meeting deadlines
  150. Objective: Constrained Optimization Formulation 48 E1 E2 E3 Time ▸

    Goal: minimize total energy while meeting deadlines N ∑ i Min. Energy (i)
  151. Objective: Constrained Optimization Formulation 48 E1 E2 E3 Time △Texe

    1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines N ∑ i △Texe (i) x Min.
  152. Objective: Constrained Optimization Formulation 48 E1 E2 E3 Time △Texe

    1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines N ∑ i △Texe (i) x Power (i) Min.
  153. Objective: Constraints: Constrained Optimization Formulation 48 E1 E2 E3 Time

    △Texe 1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines N ∑ i △Texe (i) x Power (i) Min.
  154. Objective: Constraints: Constrained Optimization Formulation 48 E1 E2 E3 Time

    △Texe 1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines Order: ≤ Tend (i) Tstart (i+1) N ∑ i △Texe (i) x Power (i) Min.
  155. Objective: Constraints: Constrained Optimization Formulation 48 E1 E2 E3 Time

    Tstart 1 Tstart 2 Tstart 3 TQoS 1 TQoS 2 TQoS 3 △Texe 1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + N ∑ i △Texe (i) x Power (i) Min.
  156. Objective: Constraints: Constrained Optimization Formulation 48 E1 E2 E3 Time

    Tstart 1 Tstart 2 Tstart 3 TQoS 1 TQoS 2 TQoS 3 △Texe 1 △Texe 2 △Texe 3 ▸ Goal: minimize total energy while meeting deadlines Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + Scheduling knobs: Big/little + DVFS for each event. N ∑ i △Texe (i) x Power (i) Min.
  157. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. △Texe (i) = Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  158. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. △Texe (i) = Tmemory + Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  159. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. Tcpu △Texe (i) = Tmemory + Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  160. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. △Texe (i) = Tmemory + Ncycles / f (i) Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  161. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Power (i) Min. △Texe (i) = Tmemory + Constants Ncycles / f (i) Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  162. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Min. △Texe (i) = Tmemory + Constants Ncycles / f (i) Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + Pmap (i) Pmap (i) = Freq2Power (f(i))
  163. Each Event: Objective: Constraints: Problem Formulation 49 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Min. △Texe (i) = Tmemory + Constants Offline profile Ncycles / f (i) Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + Pmap (i) Pmap (i) = Freq2Power (f(i))
  164. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) +
  165. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) ⍺ (i, j) in {0,1}
  166. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) ⍺ (i, j) in {0,1} Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0
  167. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) ⍺ (i, j) in {0,1} Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  168. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) ⍺ (i, j) in {0,1} Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0 Only one DVFS setting is chosen for each event
  169. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  170. Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem

    → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  171. Overhead: Each Event: Objective: Constraints: Problem Formulation 50 ▸ Scheduling

    Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  172. Overhead: 10 DVFS configurations Each Event: Objective: Constraints: Problem Formulation

    50 ▸ Scheduling Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  173. Overhead: 10 DVFS configurations 8 events look ahead Each Event:

    Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  174. Overhead: 80 variables in ILP Each Event: Objective: Constraints: Problem

    Formulation 50 ▸ Scheduling Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0
  175. Overhead: Runtime overhead: 10ms 80 variables in ILP Each Event:

    Objective: Constraints: Problem Formulation 50 ▸ Scheduling Problem → Constrained Optimization. N ∑ i △Texe (i) x Pmap (i) Min. Order: ≤ Tend (i) Tstart (i+1) Deadline: ≤ Tstart (i) △Texe (i) TQoS (i) + { Ncycles / f (i, j) } △Texe (i) =Tmemory +M ∑ j=0 * ⍺ (i, j) Integer Linear Programing! Pmap (i) = Freq2Power ( f(i, j) ) * ⍺ (i, j) M ∑ j=0 1 = ⍺ (i, j) With the constraint: M ∑ j=0 Dynamic programing
  176. Putting Things Together 51 Web Application Rendering Engine Time

  177. PES Putting Things Together 51 Web Application Rendering Engine Time

  178. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Events Time Prediction
  179. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Events Scheduler Predictions Time Prediction
  180. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Events Speculative Schedules Scheduler Predictions Time Prediction Schedule
  181. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Events Speculative Schedules Scheduler Predictions Time Prediction Schedule F1 Pending Frame Buffer F2 F3
  182. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Scheduler Predictions Time Prediction Schedule F1 Pending Frame Buffer F2 F3
  183. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Scheduler Predictions Time Prediction Schedule F1 Pending Frame Buffer F2 F3
  184. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Scheduler Predictions Commit Time Prediction Schedule F1 Pending Frame Buffer F2 F3 ✔ ✔
  185. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Recover Scheduler Predictions Commit Time Prediction Schedule F1 Pending Frame Buffer F2 F3 ✔ ✔ ✘
  186. PES Putting Things Together 51 Web Application Rendering Engine Predictor

    Controller Events Speculative Schedules Pending Frames Recover Scheduler Predictions Commit Time Prediction Schedule F1 Pending Frame Buffer F2 F3 ✔ ✔ ✘ F3 ✔
  187. Evaluation 52 52 ▸Traces ▹Training: 10 users, more than 100

    traces. ▹Testing: 36 traces, 3 for each Web application. ▸Baseline Mechanisms ▹Interactive governor (Interactive) — Android default ▹EBS: a state-of-the-art reactive event-based scheduler ▹Oracle: optimal scheduler ▸Metrics ▹Energy consumption ▹QoS violation
  188. High Prediction Accuracy, Low Mis-Prediction Penalty 53 Prediction Accuracy (%)

    60 70 80 90 100 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter
  189. High Prediction Accuracy, Low Mis-Prediction Penalty 53 Prediction Accuracy (%)

    60 70 80 90 100 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Mis-prediction Waste (ms) 0 10 20 30 40 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter
  190. High Prediction Accuracy, Low Mis-Prediction Penalty 53 Prediction Accuracy (%)

    60 70 80 90 100 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Mis-prediction Waste (ms) 0 10 20 30 40 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter
  191. High Prediction Accuracy, Low Mis-Prediction Penalty 53 Prediction Accuracy (%)

    60 70 80 90 100 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Mis-prediction Waste (ms) 0 10 20 30 40 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter 92% Prediction Accuracy and 20 ms of Mis-Prediction Waste
  192. Experimental Result 54

  193. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  194. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  195. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  196. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  197. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  198. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  199. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  200. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  201. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle
  202. Experimental Result 54 Norn. Energy 0 0.25 0.5 0.75 1

    163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle QoS Violation 0 0.15 0.3 0.45 0.6 163 msn slashdot youtube google amazon ebay sina espn bbc cnn twitter Interactive EBS PES Oracle 61% less QoS Violation and 26% of Energy Reduction
  203. Event-driven Processing is a Fundamental and Prevalent Paradigm Edge Computing

    Database Crowdsourced Sensor Network Mobile Applications Virtual Reality Internet-of-things Cloud Computing
  204. What’s Next?

  205. “Microkernel”-based Browser ▸ A web app/website exercises only a small

    portion of the Web specification. There are “hot” tags, CSS properties, etc. ▸ Design a minimalistic browser core, and load the rest only when requested by a specific webpage. 57 https://www.advancedwebranking.com/html/ https://maqentaer.com/devopera-static-backup/http/dev.opera.com/articles/view/mama-css-syntax/index.html
  206. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. 58
  207. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX 58
  208. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. 58
  209. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. 58
  210. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. ▹ Developers are given a set of high-level, coarse-grained guidelines 58
  211. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. ▹ Developers are given a set of high-level, coarse-grained guidelines ▸ Proposal: empower developers to directly express their requirements of user-centric performance goals (up to the browser to deliver). 58
  212. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. ▹ Developers are given a set of high-level, coarse-grained guidelines ▸ Proposal: empower developers to directly express their requirements of user-centric performance goals (up to the browser to deliver). ▹ Paint order: developers identify “hero” elements and specify the order in which hero elements are painted. div#hero {paint-order:1} 58
  213. User-Centric Language (Extensions) ▸ Web performance is a comprehensive user-centric

    objective that can be not captured by one single metric. ▹ Old days: time when unload is fired, doesn’t corresponds to UX ▹ Now: First Meaningful Paint (FMP), Time to Interactive (TTI), etc. ▸ Still, focus on performance analysis and inspection without giving developers directly control over user-perceived performance. ▹ Developers are given a set of high-level, coarse-grained guidelines ▸ Proposal: empower developers to directly express their requirements of user-centric performance goals (up to the browser to deliver). ▹ Paint order: developers identify “hero” elements and specify the order in which hero elements are painted. div#hero {paint-order:1} ▹ Interactive state: developers specify the kind of interaction that should ideally be granted when a particular element is painted. div#menu{istate:touchstart,50} 58