Slide 1

Slide 1 text

WebCore: Architectural Support for Mobile Web Browsing Yuhao Zhu, Vijay Janapa Reddi Department of Electrical and Computer Engineering The University of Texas at Austin ISCA MainTalk — June 18th, 2014

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Swift

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

The Fundamental Challenges 4

Slide 10

Slide 10 text

The Fundamental Challenges 4 Achieving High Performance Demanded by End-User

Slide 11

Slide 11 text

The Fundamental Challenges 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity

Slide 12

Slide 12 text

The Fundamental Challenges 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements

Slide 13

Slide 13 text

The Fundamental Challenges How to achieve high performance with low energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements

Slide 14

Slide 14 text

The Fundamental Challenges How to achieve high performance with low energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements A mobile architecture

Slide 15

Slide 15 text

The Fundamental Challenges How to achieve high performance with low energy? 4 Achieving High Performance Demanded by End-User Conserving Energy Due to Limited Battery Capacity Conflicting requirements A mobile architecture WebCore:

Slide 16

Slide 16 text

Executive Summary 5 Time Energy General Purpose Designs

Slide 17

Slide 17 text

Executive Summary 5 Time Energy General Purpose Designs Diminishing return

Slide 18

Slide 18 text

Executive Summary 5 Time Energy General Purpose Designs ASIC?

Slide 19

Slide 19 text

Executive Summary 5 Time Energy General Purpose Designs ASIC? Extremely challenging ‣Chrome: 7M LoC, 29 languages ‣Firefox: 10M LoC, 33 languages

Slide 20

Slide 20 text

Executive Summary 5 Time Energy General Purpose Designs ASIC?

Slide 21

Slide 21 text

Executive Summary 5 Time Energy General Purpose Designs ASIC? WebCore Goal

Slide 22

Slide 22 text

Executive Summary 5 Time Energy General Purpose Designs ??? ASIC? WebCore Goal

Slide 23

Slide 23 text

Executive Summary 6 Time Energy General Purpose Designs WebCore Goal

Slide 24

Slide 24 text

Executive Summary 6 Time Energy General Purpose Designs WebCore Goal

Slide 25

Slide 25 text

Executive Summary 6 Time Energy General Purpose Designs Customizing µarch Parameters WebCore Goal

Slide 26

Slide 26 text

Executive Summary 6 Time Energy General Purpose Designs Customizing µarch Parameters Specialized FU and Memory WebCore Goal

Slide 27

Slide 27 text

Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of the mobile Web 7

Slide 28

Slide 28 text

Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization 7

Slide 29

Slide 29 text

Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results 7

Slide 30

Slide 30 text

Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 7

Slide 31

Slide 31 text

Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 8

Slide 32

Slide 32 text

Customization: Find the Ideal General Purpose Baseline Architecture

Slide 33

Slide 33 text

▸Why customization?!? Customization: Find the Ideal General Purpose Baseline Architecture

Slide 34

Slide 34 text

▸Why customization?!? ▸What is a proper general purpose baseline architecture? Customization: Find the Ideal General Purpose Baseline Architecture

Slide 35

Slide 35 text

▸Why customization?!? ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? Customization: Find the Ideal General Purpose Baseline Architecture

Slide 36

Slide 36 text

▸Why customization?!? ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? Customization: Find the Ideal General Purpose Baseline Architecture

Slide 37

Slide 37 text

▸Why customization?!? ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture

Slide 38

Slide 38 text

▸Why customization?!? ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture

Slide 39

Slide 39 text

▸Why customization?!? ▸What is a proper general purpose baseline architecture? ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)? ▹Are existing general purpose mobile designs ideal? ▸Exhaustive design space exploration Customization: Find the Ideal General Purpose Baseline Architecture

Slide 40

Slide 40 text

Design Space Exploration (DSE) Setup ▸Integrated power (McPAT) and performance x86 full-system simulator (Marss86) ▸WebKit engine in the Chromium Web browser 10

Slide 41

Slide 41 text

Design Space Exploration (DSE) Setup ▸Integrated power (McPAT) and performance x86 full-system simulator (Marss86) ▸WebKit engine in the Chromium Web browser 10

Slide 42

Slide 42 text

Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA

Slide 43

Slide 43 text

▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA

Slide 44

Slide 44 text

▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1

Slide 45

Slide 45 text

▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by # webpage elements

Slide 46

Slide 46 text

▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) ▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 dominated by IPC

Slide 47

Slide 47 text

▹PCs calculated from webpage-inherent and µarch-dependent features (~400 in total) 10-4 10-3 10-2 10-1 100 101 PC2 (log) -5 0 5 PC1 Design Space Exploration (DSE) Setup 11 ▸Webpages selection using PCA

Slide 48

Slide 48 text

Design Space Exploration (DSE) Findings 12

Slide 49

Slide 49 text

Design Space Exploration (DSE) Findings 12

Slide 50

Slide 50 text

Design Space Exploration (DSE) Findings 12

Slide 51

Slide 51 text

Design Space Exploration (DSE) Findings ▸Out-of-order µarchitecture is much more flexible 12

Slide 52

Slide 52 text

Design Space Exploration (DSE) Findings ▸Out-of-order µarchitecture is much more flexible 12

Slide 53

Slide 53 text

Design Space Exploration (DSE) Findings ▸Out-of-order µarchitecture is much more flexible 12 ▸In-order cores are acceptable if end-users can tolerate latency

Slide 54

Slide 54 text

Understand the Difference Using Kernel Knowledge 13

Slide 55

Slide 55 text

Understand the Difference Using Kernel Knowledge 13 Execution time breakdown

Slide 56

Slide 56 text

Understand the Difference Using Kernel Knowledge In-order design 13

Slide 57

Slide 57 text

Understand the Difference Using Kernel Knowledge In-order design 13

Slide 58

Slide 58 text

▸In-order designs show strong kernel variance Understand the Difference Using Kernel Knowledge In-order design 13

Slide 59

Slide 59 text

▸In-order designs show strong kernel variance Understand the Difference Using Kernel Knowledge In-order design 13

Slide 60

Slide 60 text

▸In-order designs show strong kernel variance Understand the Difference Using Kernel Knowledge In-order design 13

Slide 61

Slide 61 text

▸In-order designs show strong kernel variance Understand the Difference Using Kernel Knowledge In-order design 13 Out-of-order design

Slide 62

Slide 62 text

▸In-order designs show strong kernel variance Understand the Difference Using Kernel Knowledge In-order design 13 Out-of-order design ▸An Out-of-order design can accommodate kernel variance

Slide 63

Slide 63 text

14 Customization: Identifying Major Sources of Energy Inefficiency

Slide 64

Slide 64 text

14 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency

Slide 65

Slide 65 text

14 Customization: Identifying Major Sources of Energy Inefficiency P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096

Slide 66

Slide 66 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency

Slide 67

Slide 67 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency

Slide 68

Slide 68 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency

Slide 69

Slide 69 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency

Slide 70

Slide 70 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency

Slide 71

Slide 71 text

P1 P2 ARM A15 Issue width 1 3 3 # Function units 2 3 8 Load queue size 4 16 16 Store queue size 4 16 16 BTB size 1024 128 256 ROB size 128 128 40+ L1 I-$ size (KB) 64 128 32 # Physical registers 128 140 ? L1 D-$ size (KB) 8 64 32 L2-$ size (KB) 256 1024 <4096 ▸Instruction delivery ▸Data feeding 15 P2 P1 Customization: Identifying Major Sources of Energy Inefficiency

Slide 72

Slide 72 text

Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization -Mitigate instruction delivery: Style resolution unit (SRU) -Improving data feeding: Browser engine cache ▸Evaluation Results ▸Related Work 16

Slide 73

Slide 73 text

WebCore Specialization Overview 17 Customized core IF ID EX MEM WB Hardware Layer

Slide 74

Slide 74 text

WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU Hardware Layer

Slide 75

Slide 75 text

WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Hardware Layer

Slide 76

Slide 76 text

L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Hardware Layer

Slide 77

Slide 77 text

L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Hardware Layer Browser Engine Cache

Slide 78

Slide 78 text

L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Hardware Layer API Layer Browser Engine Cache

Slide 79

Slide 79 text

L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Browser Engine Cache

Slide 80

Slide 80 text

DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Browser Engine Cache

Slide 81

Slide 81 text

DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Browser Engine Cache

Slide 82

Slide 82 text

DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management Browser Engine Cache

Slide 83

Slide 83 text

DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management SRU Access Browser Engine Cache

Slide 84

Slide 84 text

DOM_LD(Id, &attr); DOM_ST(Id, &attr); L1 D-cache WebCore Specialization Overview 17 Customized core IF ID MEM WB ALU MUL FPU SRU Style_apply(Id); Hardware Layer API Layer Runtime Layer Cache Management Software Failsafe SRU Access Browser Engine Cache

Slide 85

Slide 85 text

Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization -Mitigate instruction delivery: Style resolution unit (SRU) -Improving data feeding: Browser engine cache ▸Evaluation Results ▸Related Work 18

Slide 86

Slide 86 text

▸Style kernel is the most critical kernel Style Resolution Unit 19

Slide 87

Slide 87 text

▸Style kernel is the most critical kernel Style Resolution Unit 19 Execution time breakdown Energy consumption breakdown

Slide 88

Slide 88 text

▸Style kernel is the most critical kernel Style Resolution Unit 19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}

Slide 89

Slide 89 text

▸Style kernel is the most critical kernel Style Resolution Unit 19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}}

Slide 90

Slide 90 text

▸Style kernel is the most critical kernel Style Resolution Unit 19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)

Slide 91

Slide 91 text

▸Style kernel is the most critical kernel Style Resolution Unit 19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP)

Slide 92

Slide 92 text

▸Style kernel is the most critical kernel Style Resolution Unit 19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP)

Slide 93

Slide 93 text

▸Style kernel is the most critical kernel Style Resolution Unit 19 for (each rule in matchedRules) { for (each property in rule) { switch (property.id) { case Font: Style[Font] = Handler(property.value, DOMNode); break; case N: ...}}} Rule-level Parallelism (RLP) Property-level Parallelism (PLP) ▸Exploiting the parallelism to increase the arithmetic intensity and reduce instruction footprint

Slide 94

Slide 94 text

▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0

Slide 95

Slide 95 text

▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority

Slide 96

Slide 96 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority

Slide 97

Slide 97 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority

Slide 98

Slide 98 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority

Slide 99

Slide 99 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority

Slide 100

Slide 100 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority

Slide 101

Slide 101 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px margin 0 High priority

Slide 102

Slide 102 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority

Slide 103

Slide 103 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority

Slide 104

Slide 104 text

Property 1 Property 1 Property 2 Property 2 Property 3 Property 3 id value id value id value Final Style Info ▸A running example from www.cnn.com Style Resolution Unit (2) Rule Property 1 Property 1 Property 2 Property 2 Rule id value id value 1 padding 0 margin 0 2 padding 6 px width 36 px Style Rules padding 0 width 6 px 36 px ▸Order Matters in RLP ▸Order Does Not Matter in PLP margin 0 High priority

Slide 105

Slide 105 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 ▸Order Matters in RLP ▸Order Does Not Matter in PLP

Slide 106

Slide 106 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory ▸Order Matters in RLP ▸Order Does Not Matter in PLP

Slide 107

Slide 107 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority

Slide 108

Slide 108 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority

Slide 109

Slide 109 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m Prop m

Slide 110

Slide 110 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority Prop m

Slide 111

Slide 111 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Conflict Resolution Compute Lanes ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority

Slide 112

Slide 112 text

... ... Rule j ... ... Prop l ... ... Rule i.id ... Prop m ... Prop k ... Rule j.id ... ... ... ... ... start end start end Rule i Prop k Prop m Prop m Prop l Style l Style m Style k Style Resolution Unit (3) 21 Input Scratchpad Memory Output Scratchpad Memory Conflict Resolution Compute Lanes ▸Order Matters in RLP ▸Order Does Not Matter in PLP Higher Priority

Slide 113

Slide 113 text

Agenda of Today’s Talk ▸Motivation of our work: energy-efficiency of the mobile Web ▸How does WebCore improve the energy-efficiency? ▹Customization ▹Specialization ▸Evaluation Results ▸Related Work 22

Slide 114

Slide 114 text

Evaluations 23 ▸Fully synthesized using Synopsys 28 nm toolchain

Slide 115

Slide 115 text

Evaluations 23 ▸Fully synthesized using Synopsys 28 nm toolchain ▸24 representative webpages

Slide 116

Slide 116 text

Evaluations 23 ▸Fully synthesized using Synopsys 28 nm toolchain ▸24 representative webpages www.amazon.com www.cnn.com www.msn.com www.google.com.hk www.twitter.com www.espn.go.com www.bbc.co.uk www.slashdot.org www.youtube.com www.ebay.com www.sina.com.cn www.163.com Desktop and mobile versions

Slide 117

Slide 117 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s)

Slide 118

Slide 118 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design

Slide 119

Slide 119 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization

Slide 120

Slide 120 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% A15-like design Customization

Slide 121

Slide 121 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization

Slide 122

Slide 122 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% A15-like design Customization Specialization

Slide 123

Slide 123 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 22.2% A15-like design Customization Specialization

Slide 124

Slide 124 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) 18.6% 22.2% 9.2% 22.2% A15-like design Customization Specialization

Slide 125

Slide 125 text

Evaluations 24 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization 29.2% 47.0%

Slide 126

Slide 126 text

Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead

Slide 127

Slide 127 text

Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches

Slide 128

Slide 128 text

Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches I$

Slide 129

Slide 129 text

Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches D$

Slide 130

Slide 130 text

Evaluations 25 0.55 0.688 0.825 0.963 1.1 1.6 1.8 2 2.2 2.4 Energy (J) Load Time (s) A15-like design Customization Specialization Cost of specialization: 0.59 mm2 area overhead Better than scaling- up approaches I+D$

Slide 131

Slide 131 text

Related Work 26 Hardware Software Focus on Performance Focus on Energy-Efficiency

Slide 132

Slide 132 text

Related Work 26 Hardware Software Focus on Performance Focus on Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo

Slide 133

Slide 133 text

Related Work 26 Hardware Software Focus on Performance Focus on Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling

Slide 134

Slide 134 text

Related Work 26 Hardware Software Focus on Performance Focus on Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo ASIC Tegra 4 WebRTC accelerator SiChrome System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling

Slide 135

Slide 135 text

Related Work 26 Hardware Software Focus on Performance Focus on Energy-Efficiency Parallelization Algorithm- level Zoomm Mozilla Servo ASIC Tegra 4 WebRTC accelerator SiChrome System- level Optimizations Redundancy Removal Prefetching Big/little Scheduling WebCore

Slide 136

Slide 136 text

Conclusions 27 The Web browser has become a general purpose platform that supports a wide range of mobile Web applications Customization allows us to find the ideal general-purpose baseline architecture Hardware/software collaborative specialization leverages application knowledge to mitigate inefficiencies in general-purpose architectures

Slide 137

Slide 137 text

Thank you