Upgrade to Pro — share decks privately, control downloads, hide ads and more …

WebCore: Architectural Support for Mobile Web Browsing

WebCore: Architectural Support for Mobile Web Browsing

ISCA 2014 Main talk

Yuhao Zhu

June 18, 2014
Tweet

More Decks by Yuhao Zhu

Other Decks in Education

Transcript

  1. WebCore:
    Architectural Support for Mobile Web Browsing
    Yuhao Zhu, Vijay Janapa Reddi
    Department of Electrical and Computer Engineering
    The University of Texas at Austin
    ISCA MainTalk — June 18th, 2014

    View full-size slide

  2. The Fundamental Challenges
    4

    View full-size slide

  3. The Fundamental Challenges
    4
    Achieving High Performance
    Demanded by End-User

    View full-size slide

  4. The Fundamental Challenges
    4
    Achieving High Performance
    Demanded by End-User
    Conserving Energy Due to
    Limited Battery Capacity

    View full-size slide

  5. The Fundamental Challenges
    4
    Achieving High Performance
    Demanded by End-User
    Conserving Energy Due to
    Limited Battery Capacity
    Conflicting
    requirements

    View full-size slide

  6. The Fundamental Challenges
    How to achieve high performance with low energy?
    4
    Achieving High Performance
    Demanded by End-User
    Conserving Energy Due to
    Limited Battery Capacity
    Conflicting
    requirements

    View full-size slide

  7. The Fundamental Challenges
    How to achieve high performance with low energy?
    4
    Achieving High Performance
    Demanded by End-User
    Conserving Energy Due to
    Limited Battery Capacity
    Conflicting
    requirements
    A mobile architecture

    View full-size slide

  8. The Fundamental Challenges
    How to achieve high performance with low energy?
    4
    Achieving High Performance
    Demanded by End-User
    Conserving Energy Due to
    Limited Battery Capacity
    Conflicting
    requirements
    A mobile architecture
    WebCore:

    View full-size slide

  9. Executive Summary
    5
    Time
    Energy
    General Purpose
    Designs

    View full-size slide

  10. Executive Summary
    5
    Time
    Energy
    General Purpose
    Designs
    Diminishing
    return

    View full-size slide

  11. Executive Summary
    5
    Time
    Energy
    General Purpose
    Designs
    ASIC?

    View full-size slide

  12. Executive Summary
    5
    Time
    Energy
    General Purpose
    Designs
    ASIC?
    Extremely challenging
    ‣Chrome: 7M LoC, 29 languages
    ‣Firefox: 10M LoC, 33 languages

    View full-size slide

  13. Executive Summary
    5
    Time
    Energy
    General Purpose
    Designs
    ASIC?

    View full-size slide

  14. Executive Summary
    5
    Time
    Energy
    General Purpose
    Designs
    ASIC? WebCore Goal

    View full-size slide

  15. Executive Summary
    5
    Time
    Energy
    General Purpose
    Designs
    ???
    ASIC? WebCore Goal

    View full-size slide

  16. Executive Summary
    6
    Time
    Energy
    General Purpose
    Designs
    WebCore Goal

    View full-size slide

  17. Executive Summary
    6
    Time
    Energy
    General Purpose
    Designs
    WebCore Goal

    View full-size slide

  18. Executive Summary
    6
    Time
    Energy
    General Purpose
    Designs
    Customizing µarch
    Parameters
    WebCore Goal

    View full-size slide

  19. Executive Summary
    6
    Time
    Energy
    General Purpose
    Designs
    Customizing µarch
    Parameters
    Specialized
    FU and Memory
    WebCore Goal

    View full-size slide

  20. Agenda of Today’s Talk
    ▸Motivation of our work: energy-efficiency of the mobile Web
    7

    View full-size slide

  21. Agenda of Today’s Talk
    ▸Motivation of our work: energy-efficiency of the mobile Web
    ▸How does WebCore improve the energy-efficiency?
    ▹Customization
    ▹Specialization
    7

    View full-size slide

  22. Agenda of Today’s Talk
    ▸Motivation of our work: energy-efficiency of the mobile Web
    ▸How does WebCore improve the energy-efficiency?
    ▹Customization
    ▹Specialization
    ▸Evaluation Results
    7

    View full-size slide

  23. Agenda of Today’s Talk
    ▸Motivation of our work: energy-efficiency of the mobile Web
    ▸How does WebCore improve the energy-efficiency?
    ▹Customization
    ▹Specialization
    ▸Evaluation Results
    ▸Related Work
    7

    View full-size slide

  24. Agenda of Today’s Talk
    ▸Motivation of our work: energy-efficiency of the mobile Web
    ▸How does WebCore improve the energy-efficiency?
    ▹Customization
    ▹Specialization
    ▸Evaluation Results
    ▸Related Work
    8

    View full-size slide

  25. Customization: Find the Ideal General
    Purpose Baseline Architecture

    View full-size slide

  26. ▸Why customization?!?
    Customization: Find the Ideal General
    Purpose Baseline Architecture

    View full-size slide

  27. ▸Why customization?!?
    ▸What is a proper general purpose baseline architecture?
    Customization: Find the Ideal General
    Purpose Baseline Architecture

    View full-size slide

  28. ▸Why customization?!?
    ▸What is a proper general purpose baseline architecture?
    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)?
    Customization: Find the Ideal General
    Purpose Baseline Architecture

    View full-size slide

  29. ▸Why customization?!?
    ▸What is a proper general purpose baseline architecture?
    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)?
    ▹Are existing general purpose mobile designs ideal?
    Customization: Find the Ideal General
    Purpose Baseline Architecture

    View full-size slide

  30. ▸Why customization?!?
    ▸What is a proper general purpose baseline architecture?
    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)?
    ▹Are existing general purpose mobile designs ideal?
    ▸Exhaustive design space exploration
    Customization: Find the Ideal General
    Purpose Baseline Architecture

    View full-size slide

  31. ▸Why customization?!?
    ▸What is a proper general purpose baseline architecture?
    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)?
    ▹Are existing general purpose mobile designs ideal?
    ▸Exhaustive design space exploration
    Customization: Find the Ideal General
    Purpose Baseline Architecture

    View full-size slide

  32. ▸Why customization?!?
    ▸What is a proper general purpose baseline architecture?
    ▹Out-of-order (Silvermont, A15) or in-order (Saltwell, A7)?
    ▹Are existing general purpose mobile designs ideal?
    ▸Exhaustive design space exploration
    Customization: Find the Ideal General
    Purpose Baseline Architecture

    View full-size slide

  33. Design Space Exploration (DSE) Setup
    ▸Integrated power (McPAT) and performance
    x86 full-system simulator (Marss86)
    ▸WebKit engine in the Chromium Web browser
    10

    View full-size slide

  34. Design Space Exploration (DSE) Setup
    ▸Integrated power (McPAT) and performance
    x86 full-system simulator (Marss86)
    ▸WebKit engine in the Chromium Web browser
    10

    View full-size slide

  35. Design Space Exploration (DSE) Setup
    11
    ▸Webpages selection using PCA

    View full-size slide

  36. ▹PCs calculated from webpage-inherent and µarch-dependent
    features (~400 in total)
    Design Space Exploration (DSE) Setup
    11
    ▸Webpages selection using PCA

    View full-size slide

  37. ▹PCs calculated from webpage-inherent and µarch-dependent
    features (~400 in total)
    Design Space Exploration (DSE) Setup
    11
    ▸Webpages selection using PCA
    10-4
    10-3
    10-2
    10-1
    100
    101
    PC2 (log)
    -5 0 5
    PC1

    View full-size slide

  38. ▹PCs calculated from webpage-inherent and µarch-dependent
    features (~400 in total)
    ▹PCs calculated from webpage-inherent and µarch-dependent
    features (~400 in total)
    Design Space Exploration (DSE) Setup
    11
    ▸Webpages selection using PCA
    10-4
    10-3
    10-2
    10-1
    100
    101
    PC2 (log)
    -5 0 5
    PC1
    dominated by
    # webpage elements

    View full-size slide

  39. ▹PCs calculated from webpage-inherent and µarch-dependent
    features (~400 in total)
    ▹PCs calculated from webpage-inherent and µarch-dependent
    features (~400 in total)
    ▹PCs calculated from webpage-inherent and µarch-dependent
    features (~400 in total)
    Design Space Exploration (DSE) Setup
    11
    ▸Webpages selection using PCA
    10-4
    10-3
    10-2
    10-1
    100
    101
    PC2 (log)
    -5 0 5
    PC1
    dominated by IPC

    View full-size slide

  40. ▹PCs calculated from webpage-inherent and µarch-dependent
    features (~400 in total)
    10-4
    10-3
    10-2
    10-1
    100
    101
    PC2 (log)
    -5 0 5
    PC1
    Design Space Exploration (DSE) Setup
    11
    ▸Webpages selection using PCA

    View full-size slide

  41. Design Space Exploration (DSE) Findings
    12

    View full-size slide

  42. Design Space Exploration (DSE) Findings
    12

    View full-size slide

  43. Design Space Exploration (DSE) Findings
    12

    View full-size slide

  44. Design Space Exploration (DSE) Findings
    ▸Out-of-order µarchitecture
    is much more flexible
    12

    View full-size slide

  45. Design Space Exploration (DSE) Findings
    ▸Out-of-order µarchitecture
    is much more flexible
    12

    View full-size slide

  46. Design Space Exploration (DSE) Findings
    ▸Out-of-order µarchitecture
    is much more flexible
    12
    ▸In-order cores are
    acceptable if end-users
    can tolerate latency

    View full-size slide

  47. Understand the Difference Using Kernel
    Knowledge
    13

    View full-size slide

  48. Understand the Difference Using Kernel
    Knowledge
    13
    Execution time
    breakdown

    View full-size slide

  49. Understand the Difference Using Kernel
    Knowledge
    In-order design 13

    View full-size slide

  50. Understand the Difference Using Kernel
    Knowledge
    In-order design 13

    View full-size slide

  51. ▸In-order designs show strong kernel variance
    Understand the Difference Using Kernel
    Knowledge
    In-order design 13

    View full-size slide

  52. ▸In-order designs show strong kernel variance
    Understand the Difference Using Kernel
    Knowledge
    In-order design 13

    View full-size slide

  53. ▸In-order designs show strong kernel variance
    Understand the Difference Using Kernel
    Knowledge
    In-order design 13

    View full-size slide

  54. ▸In-order designs show strong kernel variance
    Understand the Difference Using Kernel
    Knowledge
    In-order design 13
    Out-of-order design

    View full-size slide

  55. ▸In-order designs show strong kernel variance
    Understand the Difference Using Kernel
    Knowledge
    In-order design 13
    Out-of-order design
    ▸An Out-of-order design can accommodate kernel variance

    View full-size slide

  56. 14
    Customization: Identifying Major Sources
    of Energy Inefficiency

    View full-size slide

  57. 14
    P2
    P1
    Customization: Identifying Major Sources
    of Energy Inefficiency

    View full-size slide

  58. 14
    Customization: Identifying Major Sources
    of Energy Inefficiency
    P1 P2 ARM
    A15
    Issue width 1 3 3
    # Function units 2 3 8
    Load queue size 4 16
    16
    Store queue size 4 16
    16
    BTB size 1024 128 256
    ROB size 128 128 40+
    L1 I-$ size (KB) 64 128 32
    # Physical
    registers
    128 140 ?
    L1 D-$ size (KB) 8 64 32
    L2-$ size (KB) 256 1024 <4096

    View full-size slide

  59. P1 P2 ARM
    A15
    Issue width 1 3 3
    # Function units 2 3 8
    Load queue size 4 16
    16
    Store queue size 4 16
    16
    BTB size 1024 128 256
    ROB size 128 128 40+
    L1 I-$ size (KB) 64 128 32
    # Physical
    registers
    128 140 ?
    L1 D-$ size (KB) 8 64 32
    L2-$ size (KB) 256 1024 <4096 15
    P2
    P1
    Customization: Identifying Major Sources
    of Energy Inefficiency

    View full-size slide

  60. P1 P2 ARM
    A15
    Issue width 1 3 3
    # Function units 2 3 8
    Load queue size 4 16
    16
    Store queue size 4 16
    16
    BTB size 1024 128 256
    ROB size 128 128 40+
    L1 I-$ size (KB) 64 128 32
    # Physical
    registers
    128 140 ?
    L1 D-$ size (KB) 8 64 32
    L2-$ size (KB) 256 1024 <4096 15
    P2
    P1
    Customization: Identifying Major Sources
    of Energy Inefficiency

    View full-size slide

  61. P1 P2 ARM
    A15
    Issue width 1 3 3
    # Function units 2 3 8
    Load queue size 4 16
    16
    Store queue size 4 16
    16
    BTB size 1024 128 256
    ROB size 128 128 40+
    L1 I-$ size (KB) 64 128 32
    # Physical
    registers
    128 140 ?
    L1 D-$ size (KB) 8 64 32
    L2-$ size (KB) 256 1024 <4096 15
    P2
    P1
    Customization: Identifying Major Sources
    of Energy Inefficiency

    View full-size slide

  62. P1 P2 ARM
    A15
    Issue width 1 3 3
    # Function units 2 3 8
    Load queue size 4 16
    16
    Store queue size 4 16
    16
    BTB size 1024 128 256
    ROB size 128 128 40+
    L1 I-$ size (KB) 64 128 32
    # Physical
    registers
    128 140 ?
    L1 D-$ size (KB) 8 64 32
    L2-$ size (KB) 256 1024 <4096
    ▸Instruction delivery
    15
    P2
    P1
    Customization: Identifying Major Sources
    of Energy Inefficiency

    View full-size slide

  63. P1 P2 ARM
    A15
    Issue width 1 3 3
    # Function units 2 3 8
    Load queue size 4 16
    16
    Store queue size 4 16
    16
    BTB size 1024 128 256
    ROB size 128 128 40+
    L1 I-$ size (KB) 64 128 32
    # Physical
    registers
    128 140 ?
    L1 D-$ size (KB) 8 64 32
    L2-$ size (KB) 256 1024 <4096
    ▸Instruction delivery
    15
    P2
    P1
    Customization: Identifying Major Sources
    of Energy Inefficiency

    View full-size slide

  64. P1 P2 ARM
    A15
    Issue width 1 3 3
    # Function units 2 3 8
    Load queue size 4 16
    16
    Store queue size 4 16
    16
    BTB size 1024 128 256
    ROB size 128 128 40+
    L1 I-$ size (KB) 64 128 32
    # Physical
    registers
    128 140 ?
    L1 D-$ size (KB) 8 64 32
    L2-$ size (KB) 256 1024 <4096
    ▸Instruction delivery
    ▸Data feeding
    15
    P2
    P1
    Customization: Identifying Major Sources
    of Energy Inefficiency

    View full-size slide

  65. Agenda of Today’s Talk
    ▸Motivation of our work: energy-efficiency of the mobile Web
    ▸How does WebCore improve the energy-efficiency?
    ▹Customization
    ▹Specialization
    -Mitigate instruction delivery: Style resolution unit (SRU)
    -Improving data feeding: Browser engine cache
    ▸Evaluation Results
    ▸Related Work
    16

    View full-size slide

  66. WebCore Specialization Overview
    17
    Customized
    core
    IF ID EX MEM WB
    Hardware
    Layer

    View full-size slide

  67. WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    Hardware
    Layer

    View full-size slide

  68. WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Hardware
    Layer

    View full-size slide

  69. L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Hardware
    Layer

    View full-size slide

  70. L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Hardware
    Layer
    Browser
    Engine Cache

    View full-size slide

  71. L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Hardware
    Layer
    API
    Layer
    Browser
    Engine Cache

    View full-size slide

  72. L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Style_apply(Id);
    Hardware
    Layer
    API
    Layer
    Browser
    Engine Cache

    View full-size slide

  73. DOM_LD(Id, &attr);
    DOM_ST(Id, &attr);
    L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Style_apply(Id);
    Hardware
    Layer
    API
    Layer
    Browser
    Engine Cache

    View full-size slide

  74. DOM_LD(Id, &attr);
    DOM_ST(Id, &attr);
    L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Style_apply(Id);
    Hardware
    Layer
    API
    Layer
    Runtime
    Layer
    Browser
    Engine Cache

    View full-size slide

  75. DOM_LD(Id, &attr);
    DOM_ST(Id, &attr);
    L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Style_apply(Id);
    Hardware
    Layer
    API
    Layer
    Runtime
    Layer
    Cache
    Management
    Browser
    Engine Cache

    View full-size slide

  76. DOM_LD(Id, &attr);
    DOM_ST(Id, &attr);
    L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Style_apply(Id);
    Hardware
    Layer
    API
    Layer
    Runtime
    Layer
    Cache
    Management
    SRU
    Access
    Browser
    Engine Cache

    View full-size slide

  77. DOM_LD(Id, &attr);
    DOM_ST(Id, &attr);
    L1 D-cache
    WebCore Specialization Overview
    17
    Customized
    core
    IF ID MEM WB
    ALU
    MUL
    FPU
    SRU
    Style_apply(Id);
    Hardware
    Layer
    API
    Layer
    Runtime
    Layer
    Cache
    Management
    Software
    Failsafe
    SRU
    Access
    Browser
    Engine Cache

    View full-size slide

  78. Agenda of Today’s Talk
    ▸Motivation of our work: energy-efficiency of the mobile Web
    ▸How does WebCore improve the energy-efficiency?
    ▹Customization
    ▹Specialization
    -Mitigate instruction delivery: Style resolution unit (SRU)
    -Improving data feeding: Browser engine cache
    ▸Evaluation Results
    ▸Related Work
    18

    View full-size slide

  79. ▸Style kernel is the most critical kernel
    Style Resolution Unit
    19

    View full-size slide

  80. ▸Style kernel is the most critical kernel
    Style Resolution Unit
    19
    Execution time
    breakdown
    Energy consumption
    breakdown

    View full-size slide

  81. ▸Style kernel is the most critical kernel
    Style Resolution Unit
    19
    for (each rule in matchedRules) {
    for (each property in rule) {
    switch (property.id) {
    case Font:
    Style[Font] = Handler(property.value, DOMNode);
    break;
    case N: ...}}}

    View full-size slide

  82. ▸Style kernel is the most critical kernel
    Style Resolution Unit
    19
    for (each rule in matchedRules) {
    for (each property in rule) {
    switch (property.id) {
    case Font:
    Style[Font] = Handler(property.value, DOMNode);
    break;
    case N: ...}}}

    View full-size slide

  83. ▸Style kernel is the most critical kernel
    Style Resolution Unit
    19
    for (each rule in matchedRules) {
    for (each property in rule) {
    switch (property.id) {
    case Font:
    Style[Font] = Handler(property.value, DOMNode);
    break;
    case N: ...}}}
    Rule-level
    Parallelism (RLP)

    View full-size slide

  84. ▸Style kernel is the most critical kernel
    Style Resolution Unit
    19
    for (each rule in matchedRules) {
    for (each property in rule) {
    switch (property.id) {
    case Font:
    Style[Font] = Handler(property.value, DOMNode);
    break;
    case N: ...}}}
    Rule-level
    Parallelism (RLP)

    View full-size slide

  85. ▸Style kernel is the most critical kernel
    Style Resolution Unit
    19
    for (each rule in matchedRules) {
    for (each property in rule) {
    switch (property.id) {
    case Font:
    Style[Font] = Handler(property.value, DOMNode);
    break;
    case N: ...}}}
    Rule-level
    Parallelism (RLP)
    Property-level
    Parallelism (PLP)

    View full-size slide

  86. ▸Style kernel is the most critical kernel
    Style Resolution Unit
    19
    for (each rule in matchedRules) {
    for (each property in rule) {
    switch (property.id) {
    case Font:
    Style[Font] = Handler(property.value, DOMNode);
    break;
    case N: ...}}}
    Rule-level
    Parallelism (RLP)
    Property-level
    Parallelism (PLP)
    ▸Exploiting the parallelism to increase the arithmetic intensity
    and reduce instruction footprint

    View full-size slide

  87. ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules padding 0
    width
    6 px 36 px
    margin 0

    View full-size slide

  88. ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules padding 0
    width
    6 px 36 px
    margin 0
    High priority

    View full-size slide

  89. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules padding 0
    width
    6 px 36 px
    margin 0
    High priority

    View full-size slide

  90. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules padding 0
    width
    6 px 36 px
    margin 0
    High priority

    View full-size slide

  91. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules
    padding 0
    width
    6 px 36 px
    margin 0
    High priority

    View full-size slide

  92. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules
    padding 0
    width
    6 px 36 px
    margin 0
    High priority

    View full-size slide

  93. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules
    padding 0
    width
    6 px 36 px
    margin 0
    High priority

    View full-size slide

  94. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules
    padding 0
    width
    6 px
    36 px
    margin 0
    High priority

    View full-size slide

  95. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules
    padding 0
    width
    6 px
    36 px
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    margin 0
    High priority

    View full-size slide

  96. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules
    padding 0 width
    6 px 36 px
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    margin 0
    High priority

    View full-size slide

  97. Property 1
    Property 1 Property 2
    Property 2 Property 3
    Property 3
    id value id value id value
    Final Style Info
    ▸A running example from www.cnn.com
    Style Resolution Unit (2)
    Rule
    Property 1
    Property 1 Property 2
    Property 2
    Rule
    id value id value
    1 padding 0 margin 0
    2 padding 6 px width 36 px
    Style Rules
    padding 0 width
    6 px 36 px
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    margin 0
    High priority

    View full-size slide

  98. ... ... Rule j
    ... ...
    Prop l
    ... ...
    Rule i.id
    ... Prop m ... Prop k ...
    Rule j.id
    ...
    ...
    ... ... ...
    start end start end
    Rule i
    Prop k
    Prop m Prop m
    Prop l
    Style l Style m Style k
    Style Resolution Unit (3)
    21
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP

    View full-size slide

  99. ... ... Rule j
    ... ...
    Prop l
    ... ...
    Rule i.id
    ... Prop m ... Prop k ...
    Rule j.id
    ...
    ...
    ... ... ...
    start end start end
    Rule i
    Prop k
    Prop m Prop m
    Prop l
    Style l Style m Style k
    Style Resolution Unit (3)
    21
    Input
    Scratchpad
    Memory
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP

    View full-size slide

  100. ... ... Rule j
    ... ...
    Prop l
    ... ...
    Rule i.id
    ... Prop m ... Prop k ...
    Rule j.id
    ...
    ...
    ... ... ...
    start end start end
    Rule i
    Prop k
    Prop m Prop m
    Prop l
    Style l Style m Style k
    Style Resolution Unit (3)
    21
    Input
    Scratchpad
    Memory
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    Higher Priority

    View full-size slide

  101. ... ... Rule j
    ... ...
    Prop l
    ... ...
    Rule i.id
    ... Prop m ... Prop k ...
    Rule j.id
    ...
    ...
    ... ... ...
    start end start end
    Rule i
    Prop k
    Prop m Prop m
    Prop l
    Style l Style m Style k
    Style Resolution Unit (3)
    21
    Input
    Scratchpad
    Memory
    Conflict
    Resolution
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    Higher Priority

    View full-size slide

  102. ... ... Rule j
    ... ...
    Prop l
    ... ...
    Rule i.id
    ... Prop m ... Prop k ...
    Rule j.id
    ...
    ...
    ... ... ...
    start end start end
    Rule i
    Prop k
    Prop m Prop m
    Prop l
    Style l Style m Style k
    Style Resolution Unit (3)
    21
    Input
    Scratchpad
    Memory
    Conflict
    Resolution
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    Higher Priority
    Prop m Prop m

    View full-size slide

  103. ... ... Rule j
    ... ...
    Prop l
    ... ...
    Rule i.id
    ... Prop m ... Prop k ...
    Rule j.id
    ...
    ...
    ... ... ...
    start end start end
    Rule i
    Prop k
    Prop m Prop m
    Prop l
    Style l Style m Style k
    Style Resolution Unit (3)
    21
    Input
    Scratchpad
    Memory
    Conflict
    Resolution
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    Higher Priority
    Prop m

    View full-size slide

  104. ... ... Rule j
    ... ...
    Prop l
    ... ...
    Rule i.id
    ... Prop m ... Prop k ...
    Rule j.id
    ...
    ...
    ... ... ...
    start end start end
    Rule i
    Prop k
    Prop m Prop m
    Prop l
    Style l Style m Style k
    Style Resolution Unit (3)
    21
    Input
    Scratchpad
    Memory
    Conflict
    Resolution
    Compute
    Lanes
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    Higher Priority

    View full-size slide

  105. ... ... Rule j
    ... ...
    Prop l
    ... ...
    Rule i.id
    ... Prop m ... Prop k ...
    Rule j.id
    ...
    ...
    ... ... ...
    start end start end
    Rule i
    Prop k
    Prop m Prop m
    Prop l
    Style l Style m Style k
    Style Resolution Unit (3)
    21
    Input
    Scratchpad
    Memory
    Output
    Scratchpad
    Memory
    Conflict
    Resolution
    Compute
    Lanes
    ▸Order Matters in RLP
    ▸Order Does Not Matter in PLP
    Higher Priority

    View full-size slide

  106. Agenda of Today’s Talk
    ▸Motivation of our work: energy-efficiency of the mobile Web
    ▸How does WebCore improve the energy-efficiency?
    ▹Customization
    ▹Specialization
    ▸Evaluation Results
    ▸Related Work
    22

    View full-size slide

  107. Evaluations
    23
    ▸Fully synthesized using Synopsys 28 nm toolchain

    View full-size slide

  108. Evaluations
    23
    ▸Fully synthesized using Synopsys 28 nm toolchain
    ▸24 representative webpages

    View full-size slide

  109. Evaluations
    23
    ▸Fully synthesized using Synopsys 28 nm toolchain
    ▸24 representative webpages
    www.amazon.com
    www.cnn.com
    www.msn.com
    www.google.com.hk
    www.twitter.com
    www.espn.go.com
    www.bbc.co.uk
    www.slashdot.org
    www.youtube.com
    www.ebay.com
    www.sina.com.cn
    www.163.com
    Desktop and mobile versions

    View full-size slide

  110. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)

    View full-size slide

  111. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    A15-like
    design

    View full-size slide

  112. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    A15-like
    design
    Customization

    View full-size slide

  113. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    18.6%
    A15-like
    design
    Customization

    View full-size slide

  114. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    18.6%
    22.2%
    A15-like
    design
    Customization

    View full-size slide

  115. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    18.6%
    22.2%
    A15-like
    design
    Customization
    Specialization

    View full-size slide

  116. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    18.6%
    22.2%
    22.2%
    A15-like
    design
    Customization
    Specialization

    View full-size slide

  117. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    18.6%
    22.2%
    9.2%
    22.2%
    A15-like
    design
    Customization
    Specialization

    View full-size slide

  118. Evaluations
    24
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    A15-like
    design
    Customization
    Specialization
    29.2%
    47.0%

    View full-size slide

  119. Evaluations
    25
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    A15-like
    design
    Customization
    Specialization
    Cost of specialization:
    0.59 mm2 area overhead

    View full-size slide

  120. Evaluations
    25
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    A15-like
    design
    Customization
    Specialization
    Cost of specialization:
    0.59 mm2 area overhead
    Better than scaling-
    up approaches

    View full-size slide

  121. Evaluations
    25
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    A15-like
    design
    Customization
    Specialization
    Cost of specialization:
    0.59 mm2 area overhead
    Better than scaling-
    up approaches
    I$

    View full-size slide

  122. Evaluations
    25
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    A15-like
    design
    Customization
    Specialization
    Cost of specialization:
    0.59 mm2 area overhead
    Better than scaling-
    up approaches
    D$

    View full-size slide

  123. Evaluations
    25
    0.55
    0.688
    0.825
    0.963
    1.1
    1.6 1.8 2 2.2 2.4
    Energy (J)
    Load Time (s)
    A15-like
    design
    Customization
    Specialization
    Cost of specialization:
    0.59 mm2 area overhead
    Better than scaling-
    up approaches
    I+D$

    View full-size slide

  124. Related Work
    26
    Hardware
    Software
    Focus on Performance
    Focus on Energy-Efficiency

    View full-size slide

  125. Related Work
    26
    Hardware
    Software
    Focus on Performance
    Focus on Energy-Efficiency
    Parallelization
    Algorithm-
    level
    Zoomm
    Mozilla
    Servo

    View full-size slide

  126. Related Work
    26
    Hardware
    Software
    Focus on Performance
    Focus on Energy-Efficiency
    Parallelization
    Algorithm-
    level
    Zoomm
    Mozilla
    Servo
    System-
    level Optimizations
    Redundancy
    Removal
    Prefetching Big/little
    Scheduling

    View full-size slide

  127. Related Work
    26
    Hardware
    Software
    Focus on Performance
    Focus on Energy-Efficiency
    Parallelization
    Algorithm-
    level
    Zoomm
    Mozilla
    Servo
    ASIC
    Tegra 4
    WebRTC
    accelerator
    SiChrome
    System-
    level Optimizations
    Redundancy
    Removal
    Prefetching Big/little
    Scheduling

    View full-size slide

  128. Related Work
    26
    Hardware
    Software
    Focus on Performance
    Focus on Energy-Efficiency
    Parallelization
    Algorithm-
    level
    Zoomm
    Mozilla
    Servo
    ASIC
    Tegra 4
    WebRTC
    accelerator
    SiChrome
    System-
    level Optimizations
    Redundancy
    Removal
    Prefetching Big/little
    Scheduling
    WebCore

    View full-size slide

  129. Conclusions
    27
    The Web browser has become a
    general purpose platform that supports
    a wide range of mobile Web applications
    Customization allows us to find the ideal
    general-purpose baseline architecture
    Hardware/software collaborative
    specialization leverages application
    knowledge to mitigate inefficiencies
    in general-purpose architectures

    View full-size slide