Proposal talk: Energy-Efficient Mobile Web Computing

3c332dfc0b438785cb10c5234652dd66?s=47 Yuhao Zhu
February 17, 2016

Proposal talk: Energy-Efficient Mobile Web Computing

Ph.D. Proposal

3c332dfc0b438785cb10c5234652dd66?s=128

Yuhao Zhu

February 17, 2016
Tweet

Transcript

  1. 1 Energy-Efficient Mobile Web Computing Yuhao Zhu UT Austin Advisor:

    Vijay Janapa Reddi Feb. 17th, 2016
  2. 2

  3. Call Text 2

  4. Call Text The (in)famous “snake game” 2

  5. 3

  6. 4 Architects Make Mobile Processors Faster

  7. 4 Architects Make Mobile Processors Faster In-order (2007)

  8. 4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010)

    Multi-core (2010) Asymmetric Multi-core (2014)
  9. 4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010)

    Multi-core (2010) Asymmetric Multi-core (2014) Performance
  10. 4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010)

    Multi-core (2010) Asymmetric Multi-core (2014) Performance Power
  11. 4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010)

    Multi-core (2010) Asymmetric Multi-core (2014) Performance Power At the Expense of Excessive Power
  12. Responsiveness 5

  13. Responsiveness Energy-Efficiency 5

  14. Responsiveness Energy-Efficiency Conflicting requirements 5

  15. Thesis Statement 6 Energy-Efficiency Conflicting requirements A mobile computing system

    that satisfies user QoS requirements on a mobile energy budget Responsiveness
  16. Thesis Statement 6 Energy-Efficiency Conflicting requirements A mobile computing system

    that satisfies user QoS requirements on a mobile energy budget Responsiveness for the mobile Web
  17. 7

  18. 7

  19. 7

  20. 7

  21. 8 Achieving Mobile Web Performance Mobile Client

  22. 8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers

  23. 8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers

    Cellular Network
  24. 8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers

    Cellular Network
  25. 8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers

    Cellular Network [MICRO 2015] (Top Picks Honorable Mention)
  26. 9 Achieving Mobile Web Performance Mobile Client Cellular Network

  27. 9 Achieving Mobile Web Performance Mobile Client Cellular Network

  28. 10 Isn’t Responsiveness a Network Issue? Mobile Client Cellular Network

  29. Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations

  30. Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations

    Resource loading is the bottleneck
  31. Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations

    Client compute doesn’t matter much Resource loading is the bottleneck
  32. Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations

    Client compute doesn’t matter much Resource loading is the bottleneck Conclusions circa 2010!
  33. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 Isn’t Responsiveness a Network Issue? A Year 2015 Experiment!
  34. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 Isn’t Responsiveness a Network Issue? ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  35. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  36. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  37. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  38. 38 32 26 20 14 8 2 Load time (s)

    10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!
  39. 13 Responsiveness is also a Compute Issue! Mobile Client Cellular

    Network
  40. 13 Responsiveness is also a Compute Issue! Mobile Client Cellular

    Network This Proposal
  41. 14 Traditional Approach

  42. 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render
  43. 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application
  44. ▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application
  45. ▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture
  46. ▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture ▸ Voltage/frequency scaling on general-purpose processors
  47. ▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors
  48. ▸ Parallelize browser computation ▸ Ignored! 14 Traditional Approach Frameworks

    and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors
  49. ▸ Parallelize browser computation ▸ Ignored! 14 Traditional Approach Frameworks

    and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors ▸ End of Dennard Scaling! ▸ Diminishing return
  50. ▸ Parallelize browser computation ▸ Ignored! 15 My Approach Frameworks

    and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture WebCore Web-specific Architecture
  51. ▸ Parallelize browser computation 15 My Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Lost page-level diversity ▸ Lost user QoS requirements WebCore Web-specific Architecture
  52. ▸ Parallelize browser computation 15 My Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture ▸ Lost page-level diversity ▸ Lost user QoS requirements WebCore Web-specific Architecture
  53. 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  54. 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime
  55. 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime
  56. 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language

    Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime
  57. WebRT Energy-aware Web Runtime 16 My Approach Frameworks and Libraries

    HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime
  58. Runtime 17 My Approach Architecture Application WebRT Energy-aware Web Runtime

    WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  59. Runtime 17 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions [PLDI 2016] [ISCA 2014] [HPCA 2013] [HPCA 2015] [CAL 2014] (Best of CAL)
  60. Runtime 18 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions [PLDI 2016] [ISCA 2014] [HPCA 2013] [HPCA 2015] [CAL 2014] (Best of CAL)
  61. 19 Execution Time Energy General-Purpose Designs WebCore: a Web-Specific Mobile

    Architecture
  62. 19 Execution Time Energy General-Purpose Designs WebCore: a Web-Specific Mobile

    Architecture Diminishing return
  63. 19 Execution Time Energy ASIC? General-Purpose Designs WebCore: a Web-Specific

    Mobile Architecture
  64. 19 Execution Time Energy ASIC? Extremely challenging ‣Chrome: 17M LoC,

    29 languages ▹ c.f., H264 codec: 0.13M LoC, 6 languages ‣Code base is very irregular ▹ No fine-grained parallelism General-Purpose Designs WebCore: a Web-Specific Mobile Architecture
  65. 19 Execution Time Energy ASIC? General-Purpose Designs WebCore: a Web-Specific

    Mobile Architecture Goal
  66. 19 Execution Time Energy ??? ASIC? General-Purpose Designs WebCore: a

    Web-Specific Mobile Architecture Goal
  67. WebCore Philosophy 20 Claim: Instead of directly jumping to fully

    specialization, we must take it step by step
  68. WebCore Philosophy 20

  69. Web Software WebCore Philosophy 20

  70. Web Software WebCore Philosophy 20 General- purpose Processor (GPP)

  71. Web Software WebCore Philosophy 20 General- purpose Processor (GPP) Customized

    GPP Customization Tune uarch parameters
  72. Web Software WebCore Philosophy 20 General- purpose Processor (GPP) Customized

    GPP Specialization Customized GPP Customization Tune uarch parameters Specialization Accelerate key kernels
  73. Web Software WebCore Philosophy 20 General- purpose Processor (GPP) Customized

    GPP Specialization Customized GPP Customization Tune uarch parameters Specialization Accelerate key kernels WebCore
  74. WebCore: a Web-Specific Mobile Architecture 21 Execution Time Energy General-Purpose

    Designs Goal
  75. WebCore: a Web-Specific Mobile Architecture 21 Execution Time Energy General-Purpose

    Designs Customization Goal
  76. WebCore: a Web-Specific Mobile Architecture 21 Execution Time Energy General-Purpose

    Designs Customization Specialization Goal
  77. Customization: Find an Ideal General Purpose Architecture for the Mobile

    Web 22 22
  78. Customization: Find an Ideal General Purpose Architecture for the Mobile

    Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? 22 22
  79. Customization: Find an Ideal General Purpose Architecture for the Mobile

    Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration. 22 22
  80. Customization: Find an Ideal General Purpose Architecture for the Mobile

    Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration. 22 22
  81. Design Space Exploration (DSE) Setup ▸Search space of over 3

    billion design points ▹ Leverage statistical inference models to increase search speed ▸Use integrated simulators ▹McPAT for Power ▹Marss86 for Performance (x86 full-system simulator) ▸Chromium Web browser 23
  82. Design Space Exploration (DSE) Findings 24

  83. Design Space Exploration (DSE) Findings 24

  84. Design Space Exploration (DSE) Findings 24

  85. Design Space Exploration (DSE) Findings ▸Out-of-order designs are more flexible

    24
  86. Understand the Difference Using Kernel Knowledge 25

  87. Understand the Difference Using Kernel Knowledge 25 10% 13% 17%

    25% 35% Render Style Other Layout DOM
  88. Understand the Difference Using Kernel Knowledge In-order design 25

  89. Understand the Difference Using Kernel Knowledge In-order design 25

  90. Understand the Difference Using Kernel Knowledge ▸In-order designs show strong

    kernel variance In-order design 25
  91. Understand the Difference Using Kernel Knowledge ▸In-order designs show strong

    kernel variance In-order design 25
  92. Understand the Difference Using Kernel Knowledge ▸In-order designs show strong

    kernel variance In-order design 25
  93. Understand the Difference Using Kernel Knowledge ▸In-order designs show strong

    kernel variance In-order design 25 Out-of-order design
  94. Understand the Difference Using Kernel Knowledge ▸In-order designs show strong

    kernel variance In-order design 25 Out-of-order design ▸An Out-of-order design can accommodate kernel variance
  95. Customization: Identifying Major Sources of Energy Inefficiency 26 26

  96. Customization: Identifying Major Sources of Energy Inefficiency 26 P2 P1

    26
  97. Customization: Identifying Major Sources of Energy Inefficiency 26 P1 P2

    ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 26
  98. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency
  99. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency
  100. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction supply 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency
  101. P1 P2 ARM A15 Issue width 1 3 3 #

    Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction supply ▸Data feeding 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency
  102. Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack

    more operations in one instruction ▸Data feeding ▹ Move operands closer to operations
  103. Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack

    more operations in one instruction ▸Data feeding ▹ Move operands closer to operations
  104. Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack

    more operations in one instruction ▸Data feeding ▹ Move operands closer to operations
  105. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29
  106. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 10% 13% 17% 25% 35% Render Style Other Layout DOM 12% 14% 16% 18% 40% Render Style Other Layout DOM Execution time breakdown Energy breakdown
  107. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 10% 13% 17% 25% 35% Render Style Other Layout DOM 12% 14% 16% 18% 40% Render Style Other Layout DOM Execution time breakdown Energy breakdown
  108. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}
  109. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}
  110. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)
  111. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)
  112. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP)
  113. Style Resolution Kernel ▸ Choose the Style kernel as the

    specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP) ▸ Exploiting the parallelism to increase the arithmetic intensity
  114. ▸ A running example from www.cnn.com
 30 Rule Property 1

    Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 Style Resolution Kernel
  115. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 Style Resolution Kernel
  116. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  117. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  118. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  119. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  120. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  121. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel
  122. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority Style Resolution Kernel
  123. Property 1 Property 2 Property 3 id value id value

    id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority Style Resolution Kernel
  124. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP 31
  125. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP 31 Input Scratchpad
  126. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad
  127. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution
  128. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m Prop m 31 Input Scratchpad Conflict Resolution
  129. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m 31 Input Scratchpad Conflict Resolution
  130. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution Compute Lanes
  131. ... ... Rule j ... ... Prop l ... ...

    Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution Output Scratchpad Compute Lanes
  132. Evaluation Results 32

  133. Evaluation Results 32 ▸Fully synthesized using Synopsys 28 nm toolchain

  134. Evaluation Results 32 ▸Fully synthesized using Synopsys 28 nm toolchain

    ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  135. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  136. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  137. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  138. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  139. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  140. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  141. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  142. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  143. Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4
  144. WebCore in SoC 33

  145. WebCore in SoC 33 CPUs

  146. WebCore in SoC 33 CPUs GPUs

  147. WebCore in SoC 33 CPUs GPUs Memory

  148. WebCore in SoC 33 CPUs GPUs Specialized Logics Memory

  149. WebCore in SoC 33 CPUs GPUs Specialized Logics Memory WebCore

    ▸ One of the cores in the multicore SoC ▸ Becomes “dark” when other applications are executing
  150. Runtime 34 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  151. Runtime 34 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  152. 35 Architecture Evolution

  153. 35 Architecture Evolution In-order (2007) Out-of-order (2011) CMP (2011) Complex!

    (Present)
  154. 35 Architecture Evolution In-order (2007) Out-of-order (2011) CMP (2011) Complex!

    (Present)
  155. 35 Architecture Evolution ACMP (Big/Little) In-order (2007) Out-of-order (2011) CMP

    (2011) Complex! (Present)
  156. 36 WebRT: Energy-aware Web Runtime

  157. ▸ Why ACMP?: Offer a large performance-energy trade-off space for

    energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings 36 WebRT: Energy-aware Web Runtime
  158. ▸ Why ACMP?: Offer a large performance-energy trade-off space for

    energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings ▸ Idea: Provide just-enough energy to meet performance target 36 WebRT: Energy-aware Web Runtime
  159. ▸ Why ACMP?: Offer a large performance-energy trade-off space for

    energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings ▸ Idea: Provide just-enough energy to meet performance target ▸ Approach: Systematically understand user interactions and bridge the gap between user behavior and system execution. 36 WebRT: Energy-aware Web Runtime
  160. Interacting With a Mobile Web Application 37

  161. Interacting With a Mobile Web Application 37

  162. Interacting With a Mobile Web Application 37 Loading Interactions

  163. Interacting With a Mobile Web Application 37 Austin Loading Interactions

  164. Interacting With a Mobile Web Application 37 Austin Loading Interactions

  165. Interacting With a Mobile Web Application 37 Austin Loading Touching

    Interactions
  166. Interacting With a Mobile Web Application 37 Austin Loading Touching

    Interactions
  167. Interacting With a Mobile Web Application 37 Austin Loading Touching

    Moving Interactions
  168. Interacting With a Mobile Web Application 38 Loading Touching Moving

    Interactions
  169. Interacting With a Mobile Web Application 38 Loading Touching Moving

    Interactions Once per a usage session
  170. Interacting With a Mobile Web Application 38 Loading Touching Moving

    Interactions Proactive Mechanism WebRT Component
  171. Interacting With a Mobile Web Application 38 Loading Touching Moving

    Interactions Proactive Mechanism WebRT Component Repetitive in a usage session
  172. Interacting With a Mobile Web Application 38 Loading Touching Moving

    Interactions Proactive Mechanism WebRT Component History- based Mechanism
  173. 39 Loading Touching Moving Interactions Proactive Mechanism WebRT Component History-

    based Mechanism WebRT: Energy-aware Web Runtime
  174. Optimizing for Loading 40

  175. Optimizing for Loading ▸ Observation: Web applications have different characteristics

    that lead to different loading times and energy consumptions 40
  176. Optimizing for Loading ▸ Observation: Web applications have different characteristics

    that lead to different loading times and energy consumptions 40 ▸ Mechanism: Predict the ideal ACMP configuration (<core, frequency>) and schedule application loading accordingly
  177. Optimizing for Loading ▸ Observation: Web applications have different characteristics

    that lead to different loading times and energy consumptions 40 ▸ Mechanism: Predict the ideal ACMP configuration (<core, frequency>) and schedule application loading accordingly ▸ Effect: Properly provision the hardware resources based on application characteristics
  178. Big/Little Setup 41 ODroid XU+E development board, which contains an

    Exynos 5410 SoC used in Samsung Galaxy S4.
  179. Big/Little Setup 41 ODroid XU+E development board, which contains an

    Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity
  180. Big/Little Setup 41 Little core cluster: ARM Cortex A7, In-order

    with 2 issue DVFS: 350 MHz ~ 600 MHz at a 50 MHz granularity ODroid XU+E development board, which contains an Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity
  181. Big/Little Setup 41 Little core cluster: ARM Cortex A7, In-order

    with 2 issue DVFS: 350 MHz ~ 600 MHz at a 50 MHz granularity ODroid XU+E development board, which contains an Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity Overhead: ▸ Frequency switch: 100 us ▸ Core migration: 20 us
  182. Power and Energy Measurements 42 + - Vin+ Vin- Vout

    GND Sense resistor 15mΩ SoC ARM Cortex A9 VRM Gain x50 Probe Data Acquisition (DAQ)
  183. Performance-Energy Trade-off 43

  184. Enegy Consumption (J) 0 2 4 6 8 Load time

    (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com
  185. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com
  186. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com
  187. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com
  188. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off
  189. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off
  190. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off
  191. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com 30% Performance-Energy Trade-off
  192. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off
  193. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off
  194. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off
  195. 0 2 4 6 8 0 3 6 9 12

    15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com 80% Performance-Energy Trade-off
  196. 46 Breaking Down the Computations 46

  197. 46 Breaking Down the Computations HTML (Structure) CSS (Style) 46

  198. 46 Breaking Down the Computations Tag Attribute HTML (Structure) CSS

    (Style) 46
  199. 46 Breaking Down the Computations Tag Attribute HTML (Structure) CSS

    (Style) Selector Property 46
  200. 46 Breaking Down the Computations DOM Tree Tag Attribute HTML

    (Structure) CSS (Style) Selector Property 46
  201. 46 Breaking Down the Computations DOM Tree Tag Attribute HTML

    (Structure) CSS (Style) Selector Property 46 Web Primitives
  202. 46 Breaking Down the Computations DOM Tree Tag Attribute HTML

    (Structure) CSS (Style) Selector Property 46 Web Primitives
  203. 47 47 HTML Tag Analysis www.163.com

  204. 47 47 HTML Tag Analysis Number of Tags (K) 5

    Webpages
  205. 47 47 HTML Tag Analysis Number of Tags (K) 5

    Webpages www.google.com
  206. 47 47 HTML Tag Analysis Number of Tags (K) 5

    Webpages
  207. 47 47 HTML Tag Analysis Number of Tags (K) 5

    Webpages
  208. 47 47 HTML Tag Analysis Number of Tags (K) 5

    Webpages ▸ Web applications have different tag counts
  209. 48 48 Tag Processing Overhead ms mJ 0 175 350

    525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts
  210. 49 49 ms mJ 0 175 350 525 700 0

    45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts Tag Processing Overhead
  211. 50 50 Tag Processing Overhead ms mJ 0 175 350

    525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts
  212. 51 51 Tag Processing Overhead ms mJ 0 175 350

    525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts
  213. 51 51 Tag Processing Overhead ms mJ 0 175 350

    525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Tags have different processing overheads ▸ Web applications have different tag counts
  214. Root-cause of Web Application Variance 51 51 Tag Processing Overhead

    ▸ Tags have different processing overheads ▸ Web applications have different tag counts
  215. Predicting Loading Performance & Energy 52 Idea: predict load time

    & energy (responses) based on Web primitives (predictors)
  216. Predicting Loading Performance & Energy 52 Identify Predictors Training using

    hottest 2,500 webpages Predictors (HTML, CSS) Responses (Time, Energy)
  217. Predicting Loading Performance & Energy 52 Identify Predictors Training using

    hottest 2,500 webpages Model Construction & Refinement Refine the linear model Predictors (HTML, CSS) Responses (Time, Energy) Mitigate Over-fitting Model Non-Linearity Linear Regression
  218. Predicting Loading Performance & Energy 52 Identify Predictors Training using

    hottest 2,500 webpages Model Construction & Refinement Refine the linear model Model Validation Validating on another 2,500 webpages Predictors (HTML, CSS) Responses (Time, Energy) Mitigate Over-fitting Model Non-Linearity Linear Regression Loading Time Model Energy Model
  219. 53 0.00 0.05 0.10 0.15 0.20 performance • • •

    • • • • • • • • • • • • • • • • • • • • • • 0.00 0.05 0.10 0.15 0.20 energy Median prediction error is less than 5% Predicting Loading Performance & Energy
  220. Webpage-aware Scheduler 54

  221. Webpage-aware Scheduler 54 Normal Web application loading Scheduler operations

  222. Webpage-aware Scheduler 54 Network ........ Normal Web application loading Scheduler

    operations
  223. Webpage-aware Scheduler 54 ........ Parsing (1~%) Normal Web application loading

    Scheduler operations
  224. Webpage-aware Scheduler 54 ........ Prediction (minimal overhead) Normal Web application

    loading Scheduler operations
  225. Webpage-aware Scheduler 54 ........ Scheduling Overhead (~120 us) Normal Web

    application loading Scheduler operations
  226. Webpage-aware Scheduler 54 ........ Rest of loading Normal Web application

    loading Scheduler operations
  227. 55 Evaluation

  228. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness
  229. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores)
  230. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations
  231. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations 83.0% energy savings over Perf, 4.1% more QoS violations
  232. 55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big

    core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations 83.0% energy savings over Perf, 4.1% more QoS violations 8.6% energy savings over OS, 0.1% more QoS violations
  233. 56 Loading Touching Moving Interactions Proactive Mechanism WebRT Component History-

    based Mechanism WebRT: Energy-aware Web Runtime
  234. 56 Loading Touching Moving Interactions Proactive Mechanism WebRT Component History-

    based Mechanism WebRT: Energy-aware Web Runtime
  235. 57 Optimizing Post-loading Interactions

  236. 57 Optimizing Post-loading Interactions Touching Moving Interactions

  237. 57 Optimizing Post-loading Interactions Touching Moving Interactions Events

  238. 57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart

    touchmove scroll
  239. 57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart

    touchmove scroll Event Queue
  240. 57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart

    touchmove scroll Event Loop Event Queue
  241. 57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart

    touchmove scroll Optimize post-loading at an event-granularity Event Loop Event Queue
  242. ▸ Observation: Events have different execution latencies that enable energy

    optimizations 57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart touchmove scroll Event Loop Event Queue
  243. ▸ Observation: Events have different execution latencies that enable energy

    optimizations 58 Optimizing Post-loading Interactions
  244. ▸ Observation: Events have different execution latencies that enable energy

    optimizations 58 ▸ Mechanism: Event-based scheduling to predict the ACMP configuration that exploits event slacks and saves energy Optimizing Post-loading Interactions
  245. ▸ Observation: Events have different execution latencies that enable energy

    optimizations 58 ▸ Mechanism: Event-based scheduling to predict the ACMP configuration that exploits event slacks and saves energy ▸ Effect: Properly provision the hardware resources based on event characteristics Optimizing Post-loading Interactions
  246. Event-Level Characterization 59

  247. Event-Level Characterization 59

  248. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events
  249. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events
  250. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events keyup
  251. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events keyup
  252. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack keyup
  253. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change keyup
  254. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack keyup
  255. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack click keyup
  256. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack No Slack click keyup
  257. Event-Level Characterization 59 150 100 50 0 Event Latency (ms)

    Events Large Slack change Small Slack No Slack click keyup ▸ Wide distribution of event latencies. Events exhibit different slacks. ▹ How to exploit event slacks?
  258. 60 Event-based Scheduler (EBS)

  259. 60 Event-based Scheduler (EBS) ▸ Goal: For each event, find

    the most energy-efficient ACMP configuration that meets the latency target
  260. 60 Event-based Scheduler (EBS) Thread Scheduling

  261. 60 Event-based Scheduler (EBS) Thread Scheduling

  262. 60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling

  263. 60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

  264. 60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Events-based Scheduling
  265. 60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Events-based Scheduling Event Queue
  266. 60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Event-based Scheduler Events-based Scheduling Event Queue
  267. 60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

    Event-based Scheduler Events-based Scheduling Event Latency Event Energy Event Queue
  268. 61 Predicting Event Latency

  269. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03
  270. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency =
  271. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory +
  272. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f
  273. 61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent

    f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f
  274. 61 Predicting Event Latency Event Latency = Tmemory + Ndependent

    / f Event Latency Frequency
  275. 61 Predicting Event Latency Event Latency = Tmemory + Ndependent

    / f Event Latency Frequency
  276. 62 Event-based Scheduler

  277. 62 Event-based Scheduler Events

  278. 62 Event-based Scheduler Model Constructor Event-Based Scheduler Events

  279. 62 Event-based Scheduler QoS Monitor Model Constructor Event-Based Scheduler Model

    Events
  280. 62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based

    Scheduler Model <core, freq> Events
  281. 62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based

    Scheduler Model <core, freq> Events
  282. 62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based

    Scheduler Model <core, freq> Events ▸ Fine-tune the model when over or under-predict
  283. 62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based

    Scheduler Model Recalibrate <core, freq> Events ▸ Fine-tune the model when over or under-predict ▸ Recalibrate if it mispredicts too often
  284. Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee

    responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63
  285. Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee

    responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63 ▸Metrics ▹Energy Savings ▹QoS Violations
  286. Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee

    responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63 ▸Metrics ▹Energy Savings ▹QoS Violations 37.9% - 41.2% energy savings, 0.1% more QoS violations
  287. Runtime 64 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  288. Runtime 64 My Approach Architecture Application My Research Scope WebRT

    Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions
  289. 65 GreenWeb: QoS Web Language Extensions

  290. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS

  291. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile QoS
  292. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile QoS Expressing Abstractions
  293. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing
  294. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience
  295. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience
  296. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible
  297. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable
  298. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable
  299. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  300. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  301. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  302. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  303. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  304. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  305. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  306. 65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings
  307. Imperceptible Unusable Tolerable 66 GreenWeb: QoS Web Language Extensions Understanding

    Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  308. ▸ QoS Type: performance metric Imperceptible Unusable Tolerable 66 GreenWeb:

    QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  309. ▸ QoS Type: performance metric ▸ QoS Target: threshold performance

    values Imperceptible Unusable Tolerable 66 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  310. 67 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▸ QoS Target: threshold performance values element {event: Type, Target} When event is triggered on element, the QoS type and QoS target is Type and Target, respectively. Semantics: Syntax (CSS Compatible)
  311. 68 Future Work ▸ Automatic GreenWeb Annotation ▹ Empower the

    developers, but not overburden them! ▸ GreenWeb Composability ▹ Can GreenWeb programs be safely integrated with other code? ▹ How to compose comprehensive QoS abstractions? ▸ Integrating WebRT with GreenWeb ▹ How can WebRT adapt to different QoS constraints?
  312. Timeline 69 Key Tasks Program-level Composability Study (Goal: Improve the

    composability and flexibility of GreenWeb extensions.) Automatic Annotation System for GreenWeb (Goal: Explore the feasibility of automatic applying GreenWeb annotations.) Thesis Writing APR MAY JUNE JULY AUG FEB MAR WebRT Adaptivity Study (Goal: Evaluate the sensitivity of WebRT with respect to different QoS constraints.)
  313. Retrospective: Three Principles Learnt 70 Runtime Application Architecture

  314. Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ General-purpose

    vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization
  315. Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ Exposing

    Hardware Complexities ▹ WebRT Leverages Core Type and Core Frequency ▸ General-purpose vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization
  316. Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ Empowering

    the Developers ▹ GreenWeb Language Extensions Provide QoS Abstractions ▸ Exposing Hardware Complexities ▹ WebRT Leverages Core Type and Core Frequency ▸ General-purpose vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization
  317. [PLDI 2016] Yuhao Zhu, Vijay Janapa Reddi, “GreenWeb: Language Extensions

    for Energy-Efficient Mobile Web Computing” [HPCA 2015] Yuhao Zhu, Matthew Halpern, Vijay Janapa Reddi, “Event- Based Scheduling for Energy-Efficient QoS (eQoS) in Mobile Web Applications” [HPCA 2013] Yuhao Zhu, Vijay Janapa Reddi, “High-Performance and Energy-Efficient Mobile Web Browsing on Big/Little Systems” [CAL 2012] Yuhao Zhu, Aditya Srikanth, Jingwen Leng, Vijay Janapa Reddi, “Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing” (Best of CAL) [ISCA 2014] Yuhao Zhu, Vijay Janapa Reddi, “WebCore: Architectural Support for Mobile Web Browsing” [IEEE MICRO 2015] Yuhao Zhu, Matthew Halpern, Vijay Janapa Reddi, “The Role of the CPU in Energy-Efficient Mobile Web Browsing” [HPCA 2016] Matthew Halpern, Yuhao Zhu, Vijay Janapa Reddi, “Mobile CPU’s Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction” [MICRO 2015] Yuhao Zhu, Daniel Richins, Matthew Halpern, Vijay Janapa Reddi, “Microarchitectural Implications of Event-driven Server- side Web Applications” (Top Picks Honorable Mention) GreenWeb WebRT WebCore Motivational Studies Server Microarch
  318. [DAC 2011] Yuhao Zhu, Yangdong Deng, Yubei Chen, “Hermes: An

    Integrated CPU/GPU Microarchitecture for IP Routing.” [DAC 2010] Bo Wang, Yuhao Zhu, Yangdong Deng, “Distributed Time, Conservative Parallel Logic Simulation on GPUs.” [TODAES 2011] Yuhao Zhu, Bo Wang, Yangdong Deng, “Massively Parallel Logic Simulation with GPUs.” [ISPASS 2015] Matthew Halpern, Yuhao Zhu, Ramesh Peri, and Vijay Janapa Reddi, “Mosaic: Cross-platform User-interaction Record and Replay for the Fragmented Android Ecosystem.” [IRPS 2014] Chen Zhou, Xiaofei Wang, Weichao Xu, Yuhao Zhu, Vijay Janapa Reddi, Chris Kim, “Estimation of Instantaneous Frequency Fluctuation in a Fast DVFS Environment Using an Empirical BTI Stress- Relaxation Model.” GPGPU & IP Routing Architecture Tools Reliability
  319. Coursework 73 Name Instructor Semester SUP Grade COMPILERS Keshav Pingali

    Fall 2010 A ADV EMBED MICROCONTROL SYS Mark McDermott Spring 2011 A- MEMORY MANAGEMENT Kathryn McKinley Spring 2011 Y A VLSI I Jacob Abraham Fall 2011 A- COMP ARCH: PARALLISM/LOCLTY Mattan Erez Fall 2011 A MICROARCHITECTURE Yale Patt Spring 2012 B DYNAMIC COMPILATION Vijay Janapa Reddi Spring 2012 A- COMP PERF EVAL/BENCHMARKING Lizy John Fall 2012 B+ PARALLEL COMP ARCHITECTURE Derek Chiou Spring 2013 B+ HUMAN COMPUT & CROWDSRCING Matt Lease Fall 2015 Y A-
  320. Thank you!

  321. Scheduling Results 75 Using a performance-oriented strategy as the baseline

  322. Scheduling Results 75 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  323. Scheduling Results 76 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  324. Scheduling Results 77 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  325. Scheduling Results 78 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  326. Scheduling Results 78 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  327. Scheduling Results 79 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  328. Scheduling Results 80 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  329. Scheduling Results 81 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  330. Scheduling Results 81 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline
  331. Scheduling Results 81 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline 83.0% energy savings over Perf, 4.1% more QoS violations
  332. Scheduling Results 81 Energy Savings (%) 0 25 50 75

    100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline 8.6% energy savings over OS, 0.1% more QoS violations 83.0% energy savings over Perf, 4.1% more QoS violations
  333. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82
  334. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82
  335. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios Performance QoS Experience Unusable Tolerable Imperceptible
  336. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility Performance QoS Experience Unusable Tolerable Imperceptible
  337. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility ▹ Scheduling for tolerability Performance QoS Experience Unusable Tolerable Imperceptible
  338. Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) —

    Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility ▹ Scheduling for tolerability Performance QoS Experience Unusable Tolerable Imperceptible
  339. Evaluation Results 83 QoS Violations (%) 0.0 1.5 3.0 4.5

    6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Ondemand Energy
  340. 84 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy Evaluation Results No QoS Violations
  341. 85 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy Evaluation Results No QoS Violations
  342. 86 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results
  343. 87 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results
  344. 88 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs

    gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt jquery backbone paperjs sina google ebay
  345. 89 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt

    jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9
  346. 90 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt

    jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9
  347. 91 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt

    jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9
  348. 91 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt

    jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 37.9% - 41.2% energy savings, 0.1% more QoS violations
  349. Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding

    Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  350. ▸ QoS Type: performance metric Imperceptible Unusable Tolerable 92 GreenWeb:

    QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  351. ▸ QoS Type: performance metric ▹ Single (frame latency) vs.

    Continuous (frame throughput) Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  352. ▸ QoS Type: performance metric ▹ Single (frame latency) vs.

    Continuous (frame throughput) ▸ QoS Target: threshold performance values Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  353. ▸ QoS Type: performance metric ▹ Single (frame latency) vs.

    Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu) Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience
  354. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  355. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  356. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  357. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  358. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  359. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions Selector button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  360. 93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions {QoS Declaration} Selector button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  361. 94 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions {QoS Declaration} Selector Semantics: QoS is evaluated by a single frame latency when clicking the button button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  362. 95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  363. 95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  364. 95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting

    Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  365. Overwrite default targets 95 GreenWeb: QoS Web Language Extensions Understanding

    Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  366. Overwrite default targets 95 GreenWeb: QoS Web Language Extensions Understanding

    Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)
  367. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96
  368. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1
  369. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by # webpage elements
  370. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by IPC
  371. Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component

    Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1
  372. Design Considerations 97 How large should the scratchpad memory be?

  373. Design Considerations 97 How large should the scratchpad memory be?

    ... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k
  374. Design Considerations 97 How large should the scratchpad memory be?

    100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  375. Design Considerations 97 How large should the scratchpad memory be?

    100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  376. Design Considerations 97 How large should the scratchpad memory be?

    100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  377. Design Considerations 97 How large should the scratchpad memory be?

    ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  378. Design Considerations 97 How large should the scratchpad memory be?

    How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  379. Design Considerations 97 How large should the scratchpad memory be?

    How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP ... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k
  380. 100 80 60 40 20 0 Total CSS Properties (%)

    96 64 32 0 PLP Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP
  381. 100 80 60 40 20 0 Total CSS Properties (%)

    96 64 32 0 PLP Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP
  382. Design Considerations 97 How large should the scratchpad memory be?

    How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP
  383. Design Considerations 97 How large should the scratchpad memory be?

    How many compute lanes should an SRU have? ~1 KB 32 Lanes 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP
  384. SRU Integration 98 IF ID EX MEM WB ALU MUL

    FPU SRU Style_apply(DOMNodeId, matchedRules); Hardware Layer API Layer Runtime Layer Software Failsafe SRU Access ISA support
  385. Evaluation Methodology 99 99

  386. Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain 99

    99
  387. Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain ▸24

    representative webpages 99 99
  388. Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain ▸24

    representative webpages 99 99
  389. Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain ▸24

    representative webpages 99 www.amazon.com www.cnn.com www.msn.com www.google.com.hk www.twitter.com www.espn.go.com www.bbc.co.uk www.slashdot.org www.youtube.com www.ebay.com www.sina.com.cn www.163.com Desktop and mobile versions 99
  390. Evaluation Results 100

  391. Evaluation Results 100 ▸Fully synthesized using Synopsys 28 nm toolchain

  392. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) ▸Fully synthesized using Synopsys 28 nm toolchain
  393. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design ▸Fully synthesized using Synopsys 28 nm toolchain
  394. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain
  395. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain
  396. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain
  397. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain
  398. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain
  399. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain
  400. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain
  401. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead
  402. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches
  403. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches I$
  404. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches D$
  405. Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8

    2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches I+D$
  406. 01. 1 2 Smartphone Models Energy-Efficiency Plateaued 101 Motorola Droid

    2009 Galaxy S Nexus Galaxy S3 Galaxy S4 Galaxy S5 2010 2011 2012 2013 2014 Galaxy S6 2015
  407. Smartphone Models Energy-Efficiency Plateaued 102 2009 2010 2011 2012 2013

    2014 2015 Coremark SPEC CPU 2006 01. 1 2