Slide 1

Slide 1 text

1 Energy-Efficient Mobile Web Computing Yuhao Zhu UT Austin Advisor: Vijay Janapa Reddi Feb. 17th, 2016

Slide 2

Slide 2 text

2

Slide 3

Slide 3 text

Call Text 2

Slide 4

Slide 4 text

Call Text The (in)famous “snake game” 2

Slide 5

Slide 5 text

3

Slide 6

Slide 6 text

4 Architects Make Mobile Processors Faster

Slide 7

Slide 7 text

4 Architects Make Mobile Processors Faster In-order (2007)

Slide 8

Slide 8 text

4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010) Multi-core (2010) Asymmetric Multi-core (2014)

Slide 9

Slide 9 text

4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010) Multi-core (2010) Asymmetric Multi-core (2014) Performance

Slide 10

Slide 10 text

4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010) Multi-core (2010) Asymmetric Multi-core (2014) Performance Power

Slide 11

Slide 11 text

4 Architects Make Mobile Processors Faster In-order (2007) Out-of-order (2010) Multi-core (2010) Asymmetric Multi-core (2014) Performance Power At the Expense of Excessive Power

Slide 12

Slide 12 text

Responsiveness 5

Slide 13

Slide 13 text

Responsiveness Energy-Efficiency 5

Slide 14

Slide 14 text

Responsiveness Energy-Efficiency Conflicting requirements 5

Slide 15

Slide 15 text

Thesis Statement 6 Energy-Efficiency Conflicting requirements A mobile computing system that satisfies user QoS requirements on a mobile energy budget Responsiveness

Slide 16

Slide 16 text

Thesis Statement 6 Energy-Efficiency Conflicting requirements A mobile computing system that satisfies user QoS requirements on a mobile energy budget Responsiveness for the mobile Web

Slide 17

Slide 17 text

7

Slide 18

Slide 18 text

7

Slide 19

Slide 19 text

7

Slide 20

Slide 20 text

7

Slide 21

Slide 21 text

8 Achieving Mobile Web Performance Mobile Client

Slide 22

Slide 22 text

8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers

Slide 23

Slide 23 text

8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers Cellular Network

Slide 24

Slide 24 text

8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers Cellular Network

Slide 25

Slide 25 text

8 Achieving Mobile Web Performance Mobile Client Cloud Web Servers Cellular Network [MICRO 2015] (Top Picks Honorable Mention)

Slide 26

Slide 26 text

9 Achieving Mobile Web Performance Mobile Client Cellular Network

Slide 27

Slide 27 text

9 Achieving Mobile Web Performance Mobile Client Cellular Network

Slide 28

Slide 28 text

10 Isn’t Responsiveness a Network Issue? Mobile Client Cellular Network

Slide 29

Slide 29 text

Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations

Slide 30

Slide 30 text

Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations Resource loading is the bottleneck

Slide 31

Slide 31 text

Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations Client compute doesn’t matter much Resource loading is the bottleneck

Slide 32

Slide 32 text

Isn’t Responsiveness a Network Issue? 11 [HotMobile’11, WWW’12], 100+ citations Client compute doesn’t matter much Resource loading is the bottleneck Conclusions circa 2010!

Slide 33

Slide 33 text

38 32 26 20 14 8 2 Load time (s) 10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 Isn’t Responsiveness a Network Issue? A Year 2015 Experiment!

Slide 34

Slide 34 text

38 32 26 20 14 8 2 Load time (s) 10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 Isn’t Responsiveness a Network Issue? ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!

Slide 35

Slide 35 text

38 32 26 20 14 8 2 Load time (s) 10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!

Slide 36

Slide 36 text

38 32 26 20 14 8 2 Load time (s) 10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!

Slide 37

Slide 37 text

38 32 26 20 14 8 2 Load time (s) 10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!

Slide 38

Slide 38 text

38 32 26 20 14 8 2 Load time (s) 10 2 3 4 5 6 7 8 100 2 3 4 5 6 7 8 1000 2 Network RTT (ms) 12 LTE 3G Adverse 3G 2G Wi-Fi Isn’t Responsiveness a Network Issue? Circa 2010 ▸ Samsung Galaxy S4 smartphone. ▸ Hot webpages from Alexa1. ▸ Time measured using Navigation Timing API2. 1. http://www.alexa.com/ 2. https://www.w3.org/TR/navigation-timing-2/ A Year 2015 Experiment!

Slide 39

Slide 39 text

13 Responsiveness is also a Compute Issue! Mobile Client Cellular Network

Slide 40

Slide 40 text

13 Responsiveness is also a Compute Issue! Mobile Client Cellular Network This Proposal

Slide 41

Slide 41 text

14 Traditional Approach

Slide 42

Slide 42 text

14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render

Slide 43

Slide 43 text

14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application

Slide 44

Slide 44 text

▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application

Slide 45

Slide 45 text

▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture

Slide 46

Slide 46 text

▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture ▸ Voltage/frequency scaling on general-purpose processors

Slide 47

Slide 47 text

▸ Parallelize browser computation 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors

Slide 48

Slide 48 text

▸ Parallelize browser computation ▸ Ignored! 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors

Slide 49

Slide 49 text

▸ Parallelize browser computation ▸ Ignored! 14 Traditional Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Voltage/frequency scaling on general-purpose processors ▸ End of Dennard Scaling! ▸ Diminishing return

Slide 50

Slide 50 text

▸ Parallelize browser computation ▸ Ignored! 15 My Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture WebCore Web-specific Architecture

Slide 51

Slide 51 text

▸ Parallelize browser computation 15 My Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Inputs Architecture ▸ Lost page-level diversity ▸ Lost user QoS requirements WebCore Web-specific Architecture

Slide 52

Slide 52 text

▸ Parallelize browser computation 15 My Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture ▸ Lost page-level diversity ▸ Lost user QoS requirements WebCore Web-specific Architecture

Slide 53

Slide 53 text

16 My Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions

Slide 54

Slide 54 text

16 My Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime

Slide 55

Slide 55 text

16 My Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime

Slide 56

Slide 56 text

16 My Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime

Slide 57

Slide 57 text

WebRT Energy-aware Web Runtime 16 My Approach Frameworks and Libraries HTML JavaScript CSS Language Runtime Styling Security Local Storage User Input Layout Render Application Architecture WebCore Web-specific Architecture GreenWeb QoS Language Extensions Runtime

Slide 58

Slide 58 text

Runtime 17 My Approach Architecture Application WebRT Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions

Slide 59

Slide 59 text

Runtime 17 My Approach Architecture Application My Research Scope WebRT Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions [PLDI 2016] [ISCA 2014] [HPCA 2013] [HPCA 2015] [CAL 2014] (Best of CAL)

Slide 60

Slide 60 text

Runtime 18 My Approach Architecture Application My Research Scope WebRT Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions [PLDI 2016] [ISCA 2014] [HPCA 2013] [HPCA 2015] [CAL 2014] (Best of CAL)

Slide 61

Slide 61 text

19 Execution Time Energy General-Purpose Designs WebCore: a Web-Specific Mobile Architecture

Slide 62

Slide 62 text

19 Execution Time Energy General-Purpose Designs WebCore: a Web-Specific Mobile Architecture Diminishing return

Slide 63

Slide 63 text

19 Execution Time Energy ASIC? General-Purpose Designs WebCore: a Web-Specific Mobile Architecture

Slide 64

Slide 64 text

19 Execution Time Energy ASIC? Extremely challenging ‣Chrome: 17M LoC, 29 languages ▹ c.f., H264 codec: 0.13M LoC, 6 languages ‣Code base is very irregular ▹ No fine-grained parallelism General-Purpose Designs WebCore: a Web-Specific Mobile Architecture

Slide 65

Slide 65 text

19 Execution Time Energy ASIC? General-Purpose Designs WebCore: a Web-Specific Mobile Architecture Goal

Slide 66

Slide 66 text

19 Execution Time Energy ??? ASIC? General-Purpose Designs WebCore: a Web-Specific Mobile Architecture Goal

Slide 67

Slide 67 text

WebCore Philosophy 20 Claim: Instead of directly jumping to fully specialization, we must take it step by step

Slide 68

Slide 68 text

WebCore Philosophy 20

Slide 69

Slide 69 text

Web Software WebCore Philosophy 20

Slide 70

Slide 70 text

Web Software WebCore Philosophy 20 General- purpose Processor (GPP)

Slide 71

Slide 71 text

Web Software WebCore Philosophy 20 General- purpose Processor (GPP) Customized GPP Customization Tune uarch parameters

Slide 72

Slide 72 text

Web Software WebCore Philosophy 20 General- purpose Processor (GPP) Customized GPP Specialization Customized GPP Customization Tune uarch parameters Specialization Accelerate key kernels

Slide 73

Slide 73 text

Web Software WebCore Philosophy 20 General- purpose Processor (GPP) Customized GPP Specialization Customized GPP Customization Tune uarch parameters Specialization Accelerate key kernels WebCore

Slide 74

Slide 74 text

WebCore: a Web-Specific Mobile Architecture 21 Execution Time Energy General-Purpose Designs Goal

Slide 75

Slide 75 text

WebCore: a Web-Specific Mobile Architecture 21 Execution Time Energy General-Purpose Designs Customization Goal

Slide 76

Slide 76 text

WebCore: a Web-Specific Mobile Architecture 21 Execution Time Energy General-Purpose Designs Customization Specialization Goal

Slide 77

Slide 77 text

Customization: Find an Ideal General Purpose Architecture for the Mobile Web 22 22

Slide 78

Slide 78 text

Customization: Find an Ideal General Purpose Architecture for the Mobile Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? 22 22

Slide 79

Slide 79 text

Customization: Find an Ideal General Purpose Architecture for the Mobile Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration. 22 22

Slide 80

Slide 80 text

Customization: Find an Ideal General Purpose Architecture for the Mobile Web ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration. 22 22

Slide 81

Slide 81 text

Design Space Exploration (DSE) Setup ▸Search space of over 3 billion design points ▹ Leverage statistical inference models to increase search speed ▸Use integrated simulators ▹McPAT for Power ▹Marss86 for Performance (x86 full-system simulator) ▸Chromium Web browser 23

Slide 82

Slide 82 text

Design Space Exploration (DSE) Findings 24

Slide 83

Slide 83 text

Design Space Exploration (DSE) Findings 24

Slide 84

Slide 84 text

Design Space Exploration (DSE) Findings 24

Slide 85

Slide 85 text

Design Space Exploration (DSE) Findings ▸Out-of-order designs are more flexible 24

Slide 86

Slide 86 text

Understand the Difference Using Kernel Knowledge 25

Slide 87

Slide 87 text

Understand the Difference Using Kernel Knowledge 25 10% 13% 17% 25% 35% Render Style Other Layout DOM

Slide 88

Slide 88 text

Understand the Difference Using Kernel Knowledge In-order design 25

Slide 89

Slide 89 text

Understand the Difference Using Kernel Knowledge In-order design 25

Slide 90

Slide 90 text

Understand the Difference Using Kernel Knowledge ▸In-order designs show strong kernel variance In-order design 25

Slide 91

Slide 91 text

Understand the Difference Using Kernel Knowledge ▸In-order designs show strong kernel variance In-order design 25

Slide 92

Slide 92 text

Understand the Difference Using Kernel Knowledge ▸In-order designs show strong kernel variance In-order design 25

Slide 93

Slide 93 text

Understand the Difference Using Kernel Knowledge ▸In-order designs show strong kernel variance In-order design 25 Out-of-order design

Slide 94

Slide 94 text

Understand the Difference Using Kernel Knowledge ▸In-order designs show strong kernel variance In-order design 25 Out-of-order design ▸An Out-of-order design can accommodate kernel variance

Slide 95

Slide 95 text

Customization: Identifying Major Sources of Energy Inefficiency 26 26

Slide 96

Slide 96 text

Customization: Identifying Major Sources of Energy Inefficiency 26 P2 P1 26

Slide 97

Slide 97 text

Customization: Identifying Major Sources of Energy Inefficiency 26 P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 26

Slide 98

Slide 98 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency

Slide 99

Slide 99 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency

Slide 100

Slide 100 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction supply 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency

Slide 101

Slide 101 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction supply ▸Data feeding 27 P2 P1 27 Customization: Identifying Major Sources of Energy Inefficiency

Slide 102

Slide 102 text

Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack more operations in one instruction ▸Data feeding ▹ Move operands closer to operations

Slide 103

Slide 103 text

Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack more operations in one instruction ▸Data feeding ▹ Move operands closer to operations

Slide 104

Slide 104 text

Specialization: Fixing the Pending Inefficiencies 28 ▸Instruction supply ▹ Pack more operations in one instruction ▸Data feeding ▹ Move operands closer to operations

Slide 105

Slide 105 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29

Slide 106

Slide 106 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29 10% 13% 17% 25% 35% Render Style Other Layout DOM 12% 14% 16% 18% 40% Render Style Other Layout DOM Execution time breakdown Energy breakdown

Slide 107

Slide 107 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29 10% 13% 17% 25% 35% Render Style Other Layout DOM 12% 14% 16% 18% 40% Render Style Other Layout DOM Execution time breakdown Energy breakdown

Slide 108

Slide 108 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}

Slide 109

Slide 109 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}

Slide 110

Slide 110 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)

Slide 111

Slide 111 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)

Slide 112

Slide 112 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP)

Slide 113

Slide 113 text

Style Resolution Kernel ▸ Choose the Style kernel as the specialization target 29 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP) ▸ Exploiting the parallelism to increase the arithmetic intensity

Slide 114

Slide 114 text

▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 Style Resolution Kernel

Slide 115

Slide 115 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 Style Resolution Kernel

Slide 116

Slide 116 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel

Slide 117

Slide 117 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel

Slide 118

Slide 118 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel

Slide 119

Slide 119 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel

Slide 120

Slide 120 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel

Slide 121

Slide 121 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority Style Resolution Kernel

Slide 122

Slide 122 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority Style Resolution Kernel

Slide 123

Slide 123 text

Property 1 Property 2 Property 3 id value id value id value Final Style Info ▸ A running example from www.cnn.com
 30 Rule Property 1 Property 2 id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority Style Resolution Kernel

Slide 124

Slide 124 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP 31

Slide 125

Slide 125 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP 31 Input Scratchpad

Slide 126

Slide 126 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad

Slide 127

Slide 127 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution

Slide 128

Slide 128 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m Prop m 31 Input Scratchpad Conflict Resolution

Slide 129

Slide 129 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m 31 Input Scratchpad Conflict Resolution

Slide 130

Slide 130 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution Compute Lanes

Slide 131

Slide 131 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit 31 ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority 31 Input Scratchpad Conflict Resolution Output Scratchpad Compute Lanes

Slide 132

Slide 132 text

Evaluation Results 32

Slide 133

Slide 133 text

Evaluation Results 32 ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 134

Slide 134 text

Evaluation Results 32 ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 135

Slide 135 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 136

Slide 136 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 137

Slide 137 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 138

Slide 138 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 139

Slide 139 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 140

Slide 140 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 141

Slide 141 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 142

Slide 142 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 143

Slide 143 text

Evaluation Results 32 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▹ SoC die area is 122 mm2 in Samsung Galaxy S4

Slide 144

Slide 144 text

WebCore in SoC 33

Slide 145

Slide 145 text

WebCore in SoC 33 CPUs

Slide 146

Slide 146 text

WebCore in SoC 33 CPUs GPUs

Slide 147

Slide 147 text

WebCore in SoC 33 CPUs GPUs Memory

Slide 148

Slide 148 text

WebCore in SoC 33 CPUs GPUs Specialized Logics Memory

Slide 149

Slide 149 text

WebCore in SoC 33 CPUs GPUs Specialized Logics Memory WebCore ▸ One of the cores in the multicore SoC ▸ Becomes “dark” when other applications are executing

Slide 150

Slide 150 text

Runtime 34 My Approach Architecture Application My Research Scope WebRT Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions

Slide 151

Slide 151 text

Runtime 34 My Approach Architecture Application My Research Scope WebRT Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions

Slide 152

Slide 152 text

35 Architecture Evolution

Slide 153

Slide 153 text

35 Architecture Evolution In-order (2007) Out-of-order (2011) CMP (2011) Complex! (Present)

Slide 154

Slide 154 text

35 Architecture Evolution In-order (2007) Out-of-order (2011) CMP (2011) Complex! (Present)

Slide 155

Slide 155 text

35 Architecture Evolution ACMP (Big/Little) In-order (2007) Out-of-order (2011) CMP (2011) Complex! (Present)

Slide 156

Slide 156 text

36 WebRT: Energy-aware Web Runtime

Slide 157

Slide 157 text

▸ Why ACMP?: Offer a large performance-energy trade-off space for energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings 36 WebRT: Energy-aware Web Runtime

Slide 158

Slide 158 text

▸ Why ACMP?: Offer a large performance-energy trade-off space for energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings ▸ Idea: Provide just-enough energy to meet performance target 36 WebRT: Energy-aware Web Runtime

Slide 159

Slide 159 text

▸ Why ACMP?: Offer a large performance-energy trade-off space for energy optimizations ▹ Different microarchitectures (in-order + out-of-order) ▹ Different frequency settings ▸ Idea: Provide just-enough energy to meet performance target ▸ Approach: Systematically understand user interactions and bridge the gap between user behavior and system execution. 36 WebRT: Energy-aware Web Runtime

Slide 160

Slide 160 text

Interacting With a Mobile Web Application 37

Slide 161

Slide 161 text

Interacting With a Mobile Web Application 37

Slide 162

Slide 162 text

Interacting With a Mobile Web Application 37 Loading Interactions

Slide 163

Slide 163 text

Interacting With a Mobile Web Application 37 Austin Loading Interactions

Slide 164

Slide 164 text

Interacting With a Mobile Web Application 37 Austin Loading Interactions

Slide 165

Slide 165 text

Interacting With a Mobile Web Application 37 Austin Loading Touching Interactions

Slide 166

Slide 166 text

Interacting With a Mobile Web Application 37 Austin Loading Touching Interactions

Slide 167

Slide 167 text

Interacting With a Mobile Web Application 37 Austin Loading Touching Moving Interactions

Slide 168

Slide 168 text

Interacting With a Mobile Web Application 38 Loading Touching Moving Interactions

Slide 169

Slide 169 text

Interacting With a Mobile Web Application 38 Loading Touching Moving Interactions Once per a usage session

Slide 170

Slide 170 text

Interacting With a Mobile Web Application 38 Loading Touching Moving Interactions Proactive Mechanism WebRT Component

Slide 171

Slide 171 text

Interacting With a Mobile Web Application 38 Loading Touching Moving Interactions Proactive Mechanism WebRT Component Repetitive in a usage session

Slide 172

Slide 172 text

Interacting With a Mobile Web Application 38 Loading Touching Moving Interactions Proactive Mechanism WebRT Component History- based Mechanism

Slide 173

Slide 173 text

39 Loading Touching Moving Interactions Proactive Mechanism WebRT Component History- based Mechanism WebRT: Energy-aware Web Runtime

Slide 174

Slide 174 text

Optimizing for Loading 40

Slide 175

Slide 175 text

Optimizing for Loading ▸ Observation: Web applications have different characteristics that lead to different loading times and energy consumptions 40

Slide 176

Slide 176 text

Optimizing for Loading ▸ Observation: Web applications have different characteristics that lead to different loading times and energy consumptions 40 ▸ Mechanism: Predict the ideal ACMP configuration () and schedule application loading accordingly

Slide 177

Slide 177 text

Optimizing for Loading ▸ Observation: Web applications have different characteristics that lead to different loading times and energy consumptions 40 ▸ Mechanism: Predict the ideal ACMP configuration () and schedule application loading accordingly ▸ Effect: Properly provision the hardware resources based on application characteristics

Slide 178

Slide 178 text

Big/Little Setup 41 ODroid XU+E development board, which contains an Exynos 5410 SoC used in Samsung Galaxy S4.

Slide 179

Slide 179 text

Big/Little Setup 41 ODroid XU+E development board, which contains an Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity

Slide 180

Slide 180 text

Big/Little Setup 41 Little core cluster: ARM Cortex A7, In-order with 2 issue DVFS: 350 MHz ~ 600 MHz at a 50 MHz granularity ODroid XU+E development board, which contains an Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity

Slide 181

Slide 181 text

Big/Little Setup 41 Little core cluster: ARM Cortex A7, In-order with 2 issue DVFS: 350 MHz ~ 600 MHz at a 50 MHz granularity ODroid XU+E development board, which contains an Exynos 5410 SoC used in Samsung Galaxy S4. Big core cluster: ARM Cortex A15, OoO with 3 issue DVFS: 800 MHz ~ 1.8 GHz at a 100 MHz granularity Overhead: ▸ Frequency switch: 100 us ▸ Core migration: 20 us

Slide 182

Slide 182 text

Power and Energy Measurements 42 + - Vin+ Vin- Vout GND Sense resistor 15mΩ SoC ARM Cortex A9 VRM Gain x50 Probe Data Acquisition (DAQ)

Slide 183

Slide 183 text

Performance-Energy Trade-off 43

Slide 184

Slide 184 text

Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com

Slide 185

Slide 185 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com

Slide 186

Slide 186 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com

Slide 187

Slide 187 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core Performance-Energy Trade-off 43 www.autoblog.com

Slide 188

Slide 188 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off

Slide 189

Slide 189 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off

Slide 190

Slide 190 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com Performance-Energy Trade-off

Slide 191

Slide 191 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 44 www.newegg.com 30% Performance-Energy Trade-off

Slide 192

Slide 192 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off

Slide 193

Slide 193 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off

Slide 194

Slide 194 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com Performance-Energy Trade-off

Slide 195

Slide 195 text

0 2 4 6 8 0 3 6 9 12 15 Small Core Enegy Consumption (J) 0 2 4 6 8 Load time (s) 0 3 6 9 12 15 Big Core 45 www.adobe.com 80% Performance-Energy Trade-off

Slide 196

Slide 196 text

46 Breaking Down the Computations 46

Slide 197

Slide 197 text

46 Breaking Down the Computations HTML (Structure) CSS (Style) 46

Slide 198

Slide 198 text

46 Breaking Down the Computations Tag Attribute HTML (Structure) CSS (Style) 46

Slide 199

Slide 199 text

46 Breaking Down the Computations Tag Attribute HTML (Structure) CSS (Style) Selector Property 46

Slide 200

Slide 200 text

46 Breaking Down the Computations DOM Tree Tag Attribute HTML (Structure) CSS (Style) Selector Property 46

Slide 201

Slide 201 text

46 Breaking Down the Computations DOM Tree Tag Attribute HTML (Structure) CSS (Style) Selector Property 46 Web Primitives

Slide 202

Slide 202 text

46 Breaking Down the Computations DOM Tree Tag Attribute HTML (Structure) CSS (Style) Selector Property 46 Web Primitives

Slide 203

Slide 203 text

47 47 HTML Tag Analysis www.163.com

Slide 204

Slide 204 text

47 47 HTML Tag Analysis Number of Tags (K) 5 Webpages

Slide 205

Slide 205 text

47 47 HTML Tag Analysis Number of Tags (K) 5 Webpages www.google.com

Slide 206

Slide 206 text

47 47 HTML Tag Analysis Number of Tags (K) 5 Webpages

Slide 207

Slide 207 text

47 47 HTML Tag Analysis Number of Tags (K) 5 Webpages

Slide 208

Slide 208 text

47 47 HTML Tag Analysis Number of Tags (K) 5 Webpages ▸ Web applications have different tag counts

Slide 209

Slide 209 text

48 48 Tag Processing Overhead ms mJ 0 175 350 525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts

Slide 210

Slide 210 text

49 49 ms mJ 0 175 350 525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts Tag Processing Overhead

Slide 211

Slide 211 text

50 50 Tag Processing Overhead ms mJ 0 175 350 525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts

Slide 212

Slide 212 text

51 51 Tag Processing Overhead ms mJ 0 175 350 525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Web applications have different tag counts

Slide 213

Slide 213 text

51 51 Tag Processing Overhead ms mJ 0 175 350 525 700 0 45 90 135 180 h3 table img Load time Energy ▸ Tags have different processing overheads ▸ Web applications have different tag counts

Slide 214

Slide 214 text

Root-cause of Web Application Variance 51 51 Tag Processing Overhead ▸ Tags have different processing overheads ▸ Web applications have different tag counts

Slide 215

Slide 215 text

Predicting Loading Performance & Energy 52 Idea: predict load time & energy (responses) based on Web primitives (predictors)

Slide 216

Slide 216 text

Predicting Loading Performance & Energy 52 Identify Predictors Training using hottest 2,500 webpages Predictors (HTML, CSS) Responses (Time, Energy)

Slide 217

Slide 217 text

Predicting Loading Performance & Energy 52 Identify Predictors Training using hottest 2,500 webpages Model Construction & Refinement Refine the linear model Predictors (HTML, CSS) Responses (Time, Energy) Mitigate Over-fitting Model Non-Linearity Linear Regression

Slide 218

Slide 218 text

Predicting Loading Performance & Energy 52 Identify Predictors Training using hottest 2,500 webpages Model Construction & Refinement Refine the linear model Model Validation Validating on another 2,500 webpages Predictors (HTML, CSS) Responses (Time, Energy) Mitigate Over-fitting Model Non-Linearity Linear Regression Loading Time Model Energy Model

Slide 219

Slide 219 text

53 0.00 0.05 0.10 0.15 0.20 performance ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 0.05 0.10 0.15 0.20 energy Median prediction error is less than 5% Predicting Loading Performance & Energy

Slide 220

Slide 220 text

Webpage-aware Scheduler 54

Slide 221

Slide 221 text

Webpage-aware Scheduler 54 Normal Web application loading Scheduler operations

Slide 222

Slide 222 text

Webpage-aware Scheduler 54 Network ........ Normal Web application loading Scheduler operations

Slide 223

Slide 223 text

Webpage-aware Scheduler 54 ........ Parsing (1~%) Normal Web application loading Scheduler operations

Slide 224

Slide 224 text

Webpage-aware Scheduler 54 ........ Prediction (minimal overhead) Normal Web application loading Scheduler operations

Slide 225

Slide 225 text

Webpage-aware Scheduler 54 ........ Scheduling Overhead (~120 us) Normal Web application loading Scheduler operations

Slide 226

Slide 226 text

Webpage-aware Scheduler 54 ........ Rest of loading Normal Web application loading Scheduler operations

Slide 227

Slide 227 text

55 Evaluation

Slide 228

Slide 228 text

55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big core ▹Standard to guarantee responsiveness

Slide 229

Slide 229 text

55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores)

Slide 230

Slide 230 text

55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations

Slide 231

Slide 231 text

55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations 83.0% energy savings over Perf, 4.1% more QoS violations

Slide 232

Slide 232 text

55 Evaluation ▸ Highest performance (Perf) ▹Highest frequency on big core ▹Standard to guarantee responsiveness ▸ OS DVFS strategies (OS) ▹Ondemand governor (across big and little cores) ▸ Metrics: ▹ Energy Savings ▹ QoS Violations 83.0% energy savings over Perf, 4.1% more QoS violations 8.6% energy savings over OS, 0.1% more QoS violations

Slide 233

Slide 233 text

56 Loading Touching Moving Interactions Proactive Mechanism WebRT Component History- based Mechanism WebRT: Energy-aware Web Runtime

Slide 234

Slide 234 text

56 Loading Touching Moving Interactions Proactive Mechanism WebRT Component History- based Mechanism WebRT: Energy-aware Web Runtime

Slide 235

Slide 235 text

57 Optimizing Post-loading Interactions

Slide 236

Slide 236 text

57 Optimizing Post-loading Interactions Touching Moving Interactions

Slide 237

Slide 237 text

57 Optimizing Post-loading Interactions Touching Moving Interactions Events

Slide 238

Slide 238 text

57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart touchmove scroll

Slide 239

Slide 239 text

57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart touchmove scroll Event Queue

Slide 240

Slide 240 text

57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart touchmove scroll Event Loop Event Queue

Slide 241

Slide 241 text

57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart touchmove scroll Optimize post-loading at an event-granularity Event Loop Event Queue

Slide 242

Slide 242 text

▸ Observation: Events have different execution latencies that enable energy optimizations 57 Optimizing Post-loading Interactions Touching Moving Interactions Events click touchstart touchmove scroll Event Loop Event Queue

Slide 243

Slide 243 text

▸ Observation: Events have different execution latencies that enable energy optimizations 58 Optimizing Post-loading Interactions

Slide 244

Slide 244 text

▸ Observation: Events have different execution latencies that enable energy optimizations 58 ▸ Mechanism: Event-based scheduling to predict the ACMP configuration that exploits event slacks and saves energy Optimizing Post-loading Interactions

Slide 245

Slide 245 text

▸ Observation: Events have different execution latencies that enable energy optimizations 58 ▸ Mechanism: Event-based scheduling to predict the ACMP configuration that exploits event slacks and saves energy ▸ Effect: Properly provision the hardware resources based on event characteristics Optimizing Post-loading Interactions

Slide 246

Slide 246 text

Event-Level Characterization 59

Slide 247

Slide 247 text

Event-Level Characterization 59

Slide 248

Slide 248 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events

Slide 249

Slide 249 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events

Slide 250

Slide 250 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events keyup

Slide 251

Slide 251 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events keyup

Slide 252

Slide 252 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events Large Slack keyup

Slide 253

Slide 253 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events Large Slack change keyup

Slide 254

Slide 254 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events Large Slack change Small Slack keyup

Slide 255

Slide 255 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events Large Slack change Small Slack click keyup

Slide 256

Slide 256 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events Large Slack change Small Slack No Slack click keyup

Slide 257

Slide 257 text

Event-Level Characterization 59 150 100 50 0 Event Latency (ms) Events Large Slack change Small Slack No Slack click keyup ▸ Wide distribution of event latencies. Events exhibit different slacks. ▹ How to exploit event slacks?

Slide 258

Slide 258 text

60 Event-based Scheduler (EBS)

Slide 259

Slide 259 text

60 Event-based Scheduler (EBS) ▸ Goal: For each event, find the most energy-efficient ACMP configuration that meets the latency target

Slide 260

Slide 260 text

60 Event-based Scheduler (EBS) Thread Scheduling

Slide 261

Slide 261 text

60 Event-based Scheduler (EBS) Thread Scheduling

Slide 262

Slide 262 text

60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling

Slide 263

Slide 263 text

60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness

Slide 264

Slide 264 text

60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness Events-based Scheduling

Slide 265

Slide 265 text

60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness Events-based Scheduling Event Queue

Slide 266

Slide 266 text

60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness Event-based Scheduler Events-based Scheduling Event Queue

Slide 267

Slide 267 text

60 Event-based Scheduler (EBS) Thread-based Scheduler Thread Scheduling Throughput Fairness Event-based Scheduler Events-based Scheduling Event Latency Event Energy Event Queue

Slide 268

Slide 268 text

61 Predicting Event Latency

Slide 269

Slide 269 text

61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03

Slide 270

Slide 270 text

61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency =

Slide 271

Slide 271 text

61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory +

Slide 272

Slide 272 text

61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f

Slide 273

Slide 273 text

61 Predicting Event Latency Memory Operation CPU Operation Tmemory Ndependent f Event Latency Xie, et al., Compile-Time Dynamic Voltage Scaling Settings: Opportunities and Limits, PLDI’03 Event Latency = Tmemory + Ndependent / f

Slide 274

Slide 274 text

61 Predicting Event Latency Event Latency = Tmemory + Ndependent / f Event Latency Frequency

Slide 275

Slide 275 text

61 Predicting Event Latency Event Latency = Tmemory + Ndependent / f Event Latency Frequency

Slide 276

Slide 276 text

62 Event-based Scheduler

Slide 277

Slide 277 text

62 Event-based Scheduler Events

Slide 278

Slide 278 text

62 Event-based Scheduler Model Constructor Event-Based Scheduler Events

Slide 279

Slide 279 text

62 Event-based Scheduler QoS Monitor Model Constructor Event-Based Scheduler Model Events

Slide 280

Slide 280 text

62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based Scheduler Model Events

Slide 281

Slide 281 text

62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based Scheduler Model Events

Slide 282

Slide 282 text

62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based Scheduler Model Events ▸ Fine-tune the model when over or under-predict

Slide 283

Slide 283 text

62 Event-based Scheduler QoS Monitor Model Constructor Big/Little Hardware Event-Based Scheduler Model Recalibrate Events ▸ Fine-tune the model when over or under-predict ▸ Recalibrate if it mispredicts too often

Slide 284

Slide 284 text

Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63

Slide 285

Slide 285 text

Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63 ▸Metrics ▹Energy Savings ▹QoS Violations

Slide 286

Slide 286 text

Evaluation ▸Baseline Mechanisms ▹Highest performance (Perf) — Standard to guarantee responsiveness ▹Minimal energy (Energy) — Minimize energy consumption ▹Interactive governor (Interactive) — Android default 63 ▸Metrics ▹Energy Savings ▹QoS Violations 37.9% - 41.2% energy savings, 0.1% more QoS violations

Slide 287

Slide 287 text

Runtime 64 My Approach Architecture Application My Research Scope WebRT Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions

Slide 288

Slide 288 text

Runtime 64 My Approach Architecture Application My Research Scope WebRT Energy-aware Web Runtime WebCore Web-specific Architecture GreenWeb QoS Language Extensions

Slide 289

Slide 289 text

65 GreenWeb: QoS Web Language Extensions

Slide 290

Slide 290 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS

Slide 291

Slide 291 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS

Slide 292

Slide 292 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions

Slide 293

Slide 293 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing

Slide 294

Slide 294 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience

Slide 295

Slide 295 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience

Slide 296

Slide 296 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible

Slide 297

Slide 297 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable

Slide 298

Slide 298 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable

Slide 299

Slide 299 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings

Slide 300

Slide 300 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings

Slide 301

Slide 301 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings

Slide 302

Slide 302 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings

Slide 303

Slide 303 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings

Slide 304

Slide 304 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings

Slide 305

Slide 305 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings

Slide 306

Slide 306 text

65 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Performance Degradation QoS Experience Imperceptible Tolerable Unusable Energy Savings

Slide 307

Slide 307 text

Imperceptible Unusable Tolerable 66 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience

Slide 308

Slide 308 text

▸ QoS Type: performance metric Imperceptible Unusable Tolerable 66 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience

Slide 309

Slide 309 text

▸ QoS Type: performance metric ▸ QoS Target: threshold performance values Imperceptible Unusable Tolerable 66 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience

Slide 310

Slide 310 text

67 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions ▸ QoS Type: performance metric ▸ QoS Target: threshold performance values element {event: Type, Target} When event is triggered on element, the QoS type and QoS target is Type and Target, respectively. Semantics: Syntax (CSS Compatible)

Slide 311

Slide 311 text

68 Future Work ▸ Automatic GreenWeb Annotation ▹ Empower the developers, but not overburden them! ▸ GreenWeb Composability ▹ Can GreenWeb programs be safely integrated with other code? ▹ How to compose comprehensive QoS abstractions? ▸ Integrating WebRT with GreenWeb ▹ How can WebRT adapt to different QoS constraints?

Slide 312

Slide 312 text

Timeline 69 Key Tasks Program-level Composability Study (Goal: Improve the composability and flexibility of GreenWeb extensions.) Automatic Annotation System for GreenWeb (Goal: Explore the feasibility of automatic applying GreenWeb annotations.) Thesis Writing APR MAY JUNE JULY AUG FEB MAR WebRT Adaptivity Study (Goal: Evaluate the sensitivity of WebRT with respect to different QoS constraints.)

Slide 313

Slide 313 text

Retrospective: Three Principles Learnt 70 Runtime Application Architecture

Slide 314

Slide 314 text

Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ General-purpose vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization

Slide 315

Slide 315 text

Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ Exposing Hardware Complexities ▹ WebRT Leverages Core Type and Core Frequency ▸ General-purpose vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization

Slide 316

Slide 316 text

Retrospective: Three Principles Learnt 70 Runtime Application Architecture ▸ Empowering the Developers ▹ GreenWeb Language Extensions Provide QoS Abstractions ▸ Exposing Hardware Complexities ▹ WebRT Leverages Core Type and Core Frequency ▸ General-purpose vs. Specialization ▹ WebCore combines general-purpose customization with domain specialization

Slide 317

Slide 317 text

[PLDI 2016] Yuhao Zhu, Vijay Janapa Reddi, “GreenWeb: Language Extensions for Energy-Efficient Mobile Web Computing” [HPCA 2015] Yuhao Zhu, Matthew Halpern, Vijay Janapa Reddi, “Event- Based Scheduling for Energy-Efficient QoS (eQoS) in Mobile Web Applications” [HPCA 2013] Yuhao Zhu, Vijay Janapa Reddi, “High-Performance and Energy-Efficient Mobile Web Browsing on Big/Little Systems” [CAL 2012] Yuhao Zhu, Aditya Srikanth, Jingwen Leng, Vijay Janapa Reddi, “Exploiting Webpage Characteristics for Energy-Efficient Mobile Web Browsing” (Best of CAL) [ISCA 2014] Yuhao Zhu, Vijay Janapa Reddi, “WebCore: Architectural Support for Mobile Web Browsing” [IEEE MICRO 2015] Yuhao Zhu, Matthew Halpern, Vijay Janapa Reddi, “The Role of the CPU in Energy-Efficient Mobile Web Browsing” [HPCA 2016] Matthew Halpern, Yuhao Zhu, Vijay Janapa Reddi, “Mobile CPU’s Rise to Power: Quantifying the Impact of Generational Mobile CPU Design Trends on Performance, Energy, and User Satisfaction” [MICRO 2015] Yuhao Zhu, Daniel Richins, Matthew Halpern, Vijay Janapa Reddi, “Microarchitectural Implications of Event-driven Server- side Web Applications” (Top Picks Honorable Mention) GreenWeb WebRT WebCore Motivational Studies Server Microarch

Slide 318

Slide 318 text

[DAC 2011] Yuhao Zhu, Yangdong Deng, Yubei Chen, “Hermes: An Integrated CPU/GPU Microarchitecture for IP Routing.” [DAC 2010] Bo Wang, Yuhao Zhu, Yangdong Deng, “Distributed Time, Conservative Parallel Logic Simulation on GPUs.” [TODAES 2011] Yuhao Zhu, Bo Wang, Yangdong Deng, “Massively Parallel Logic Simulation with GPUs.” [ISPASS 2015] Matthew Halpern, Yuhao Zhu, Ramesh Peri, and Vijay Janapa Reddi, “Mosaic: Cross-platform User-interaction Record and Replay for the Fragmented Android Ecosystem.” [IRPS 2014] Chen Zhou, Xiaofei Wang, Weichao Xu, Yuhao Zhu, Vijay Janapa Reddi, Chris Kim, “Estimation of Instantaneous Frequency Fluctuation in a Fast DVFS Environment Using an Empirical BTI Stress- Relaxation Model.” GPGPU & IP Routing Architecture Tools Reliability

Slide 319

Slide 319 text

Coursework 73 Name Instructor Semester SUP Grade COMPILERS Keshav Pingali Fall 2010 A ADV EMBED MICROCONTROL SYS Mark McDermott Spring 2011 A- MEMORY MANAGEMENT Kathryn McKinley Spring 2011 Y A VLSI I Jacob Abraham Fall 2011 A- COMP ARCH: PARALLISM/LOCLTY Mattan Erez Fall 2011 A MICROARCHITECTURE Yale Patt Spring 2012 B DYNAMIC COMPILATION Vijay Janapa Reddi Spring 2012 A- COMP PERF EVAL/BENCHMARKING Lizy John Fall 2012 B+ PARALLEL COMP ARCHITECTURE Derek Chiou Spring 2013 B+ HUMAN COMPUT & CROWDSRCING Matt Lease Fall 2015 Y A-

Slide 320

Slide 320 text

Thank you!

Slide 321

Slide 321 text

Scheduling Results 75 Using a performance-oriented strategy as the baseline

Slide 322

Slide 322 text

Scheduling Results 75 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 323

Slide 323 text

Scheduling Results 76 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 324

Slide 324 text

Scheduling Results 77 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 325

Slide 325 text

Scheduling Results 78 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 326

Slide 326 text

Scheduling Results 78 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 327

Slide 327 text

Scheduling Results 79 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 328

Slide 328 text

Scheduling Results 80 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 329

Slide 329 text

Scheduling Results 81 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 330

Slide 330 text

Scheduling Results 81 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline

Slide 331

Slide 331 text

Scheduling Results 81 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline 83.0% energy savings over Perf, 4.1% more QoS violations

Slide 332

Slide 332 text

Scheduling Results 81 Energy Savings (%) 0 25 50 75 100 QoS Violations (%) 0 10 20 30 40 OS (Big) OS (Little) WS Using a performance-oriented strategy as the baseline 8.6% energy savings over OS, 0.1% more QoS violations 83.0% energy savings over Perf, 4.1% more QoS violations

Slide 333

Slide 333 text

Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) — Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82

Slide 334

Slide 334 text

Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) — Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82

Slide 335

Slide 335 text

Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) — Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios Performance QoS Experience Unusable Tolerable Imperceptible

Slide 336

Slide 336 text

Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) — Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility Performance QoS Experience Unusable Tolerable Imperceptible

Slide 337

Slide 337 text

Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) — Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility ▹ Scheduling for tolerability Performance QoS Experience Unusable Tolerable Imperceptible

Slide 338

Slide 338 text

Evaluation Methodology ▸ Baseline Mechanisms ▹ Highest performance (Perf) — Standard to guarantee responsiveness ▹ Minimal energy (Energy) — Minimize energy consumption ▹ Interactive governor (Interactive) — Android default ▹ On-demand governor (Ondemand) 82 ▸ Scheduling Scenarios ▹ Scheduling for imperceptibility ▹ Scheduling for tolerability Performance QoS Experience Unusable Tolerable Imperceptible

Slide 339

Slide 339 text

Evaluation Results 83 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Ondemand Energy

Slide 340

Slide 340 text

84 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy Evaluation Results No QoS Violations

Slide 341

Slide 341 text

85 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy Evaluation Results No QoS Violations

Slide 342

Slide 342 text

86 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results

Slide 343

Slide 343 text

87 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results

Slide 344

Slide 344 text

88 QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 Evaluation Results Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt jquery backbone paperjs sina google ebay

Slide 345

Slide 345 text

89 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9

Slide 346

Slide 346 text

90 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9

Slide 347

Slide 347 text

91 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9

Slide 348

Slide 348 text

91 Energy (J) 0.0 1.0 2.0 3.0 4.0 emberjs gwt jquery backbone paperjs sina google ebay 8.2 7.7 Evaluation Results QoS Violations (%) 0.0 1.5 3.0 4.5 6.0 emberjs gwt jquery backbone paperjs sina google ebay EBS Perf Interactive Energy 9.4 17.8 58.1 6.9 37.9% - 41.2% energy savings, 0.1% more QoS violations

Slide 349

Slide 349 text

Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience

Slide 350

Slide 350 text

▸ QoS Type: performance metric Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience

Slide 351

Slide 351 text

▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience

Slide 352

Slide 352 text

▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience

Slide 353

Slide 353 text

▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu) Imperceptible Unusable Tolerable 92 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile QoS Expressing Abstractions Performance Degradation QoS Experience

Slide 354

Slide 354 text

93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 355

Slide 355 text

93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 356

Slide 356 text

93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 357

Slide 357 text

93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 358

Slide 358 text

93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 359

Slide 359 text

93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions Selector button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 360

Slide 360 text

93 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions {QoS Declaration} Selector button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 361

Slide 361 text

94 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions {QoS Declaration} Selector Semantics: QoS is evaluated by a single frame latency when clicking the button button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 362

Slide 362 text

95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 363

Slide 363 text

95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 364

Slide 364 text

95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 365

Slide 365 text

Overwrite default targets 95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 366

Slide 366 text

Overwrite default targets 95 GreenWeb: QoS Web Language Extensions Understanding Mobile QoS Abstracting Mobile Expressing Abstractions button:QoS {onclick: continuous} button:QoS {onclick: single} Use default QoS targets button:QoS {onclick: continuous, 20, 100} ▸ QoS Type: performance metric ▹ Single (frame latency) vs. Continuous (frame throughput) ▸ QoS Target: threshold performance values ▹ Imperceptible target (Ti) vs. Usable target (Tu)

Slide 367

Slide 367 text

Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96

Slide 368

Slide 368 text

Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1

Slide 369

Slide 369 text

Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by # webpage elements

Slide 370

Slide 370 text

Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by IPC

Slide 371

Slide 371 text

Design Space Exploration (DSE) Setup ▸Webpages selected by Principal Component Analysis (PCA) ▹ PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 96 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1

Slide 372

Slide 372 text

Design Considerations 97 How large should the scratchpad memory be?

Slide 373

Slide 373 text

Design Considerations 97 How large should the scratchpad memory be? ... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k

Slide 374

Slide 374 text

Design Considerations 97 How large should the scratchpad memory be? 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP

Slide 375

Slide 375 text

Design Considerations 97 How large should the scratchpad memory be? 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP

Slide 376

Slide 376 text

Design Considerations 97 How large should the scratchpad memory be? 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP

Slide 377

Slide 377 text

Design Considerations 97 How large should the scratchpad memory be? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP

Slide 378

Slide 378 text

Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP

Slide 379

Slide 379 text

Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP ... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k

Slide 380

Slide 380 text

100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP

Slide 381

Slide 381 text

100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP

Slide 382

Slide 382 text

Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP

Slide 383

Slide 383 text

Design Considerations 97 How large should the scratchpad memory be? How many compute lanes should an SRU have? ~1 KB 32 Lanes 100 80 60 40 20 0 Total Coverage (%) 16 12 8 4 0 RLP 100 80 60 40 20 0 Total CSS Properties (%) 96 64 32 0 PLP

Slide 384

Slide 384 text

SRU Integration 98 IF ID EX MEM WB ALU MUL FPU SRU Style_apply(DOMNodeId, matchedRules); Hardware Layer API Layer Runtime Layer Software Failsafe SRU Access ISA support

Slide 385

Slide 385 text

Evaluation Methodology 99 99

Slide 386

Slide 386 text

Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain 99 99

Slide 387

Slide 387 text

Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain ▸24 representative webpages 99 99

Slide 388

Slide 388 text

Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain ▸24 representative webpages 99 99

Slide 389

Slide 389 text

Evaluation Methodology ▸Fully synthesized using Synopsys 28 nm toolchain ▸24 representative webpages 99 www.amazon.com www.cnn.com www.msn.com www.google.com.hk www.twitter.com www.espn.go.com www.bbc.co.uk www.slashdot.org www.youtube.com www.ebay.com www.sina.com.cn www.163.com Desktop and mobile versions 99

Slide 390

Slide 390 text

Evaluation Results 100

Slide 391

Slide 391 text

Evaluation Results 100 ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 392

Slide 392 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 393

Slide 393 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 394

Slide 394 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 395

Slide 395 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 396

Slide 396 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 397

Slide 397 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 398

Slide 398 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 399

Slide 399 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 400

Slide 400 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 401

Slide 401 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0% ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead

Slide 402

Slide 402 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches

Slide 403

Slide 403 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches I$

Slide 404

Slide 404 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches D$

Slide 405

Slide 405 text

Evaluation Results 100 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization ▸Fully synthesized using Synopsys 28 nm toolchain ▸Cost of specialization: 0.59 mm2 area overhead ▸Better than scaling-up approaches I+D$

Slide 406

Slide 406 text

01. 1 2 Smartphone Models Energy-Efficiency Plateaued 101 Motorola Droid 2009 Galaxy S Nexus Galaxy S3 Galaxy S4 Galaxy S5 2010 2011 2012 2013 2014 Galaxy S6 2015

Slide 407

Slide 407 text

Smartphone Models Energy-Efficiency Plateaued 102 2009 2010 2011 2012 2013 2014 2015 Coremark SPEC CPU 2006 01. 1 2