Slide 1

Slide 1 text

Bringing SIMD to the web via Dart John McCutchan

Slide 2

Slide 2 text

● inotify ○ Linux kernel system for monitoring file systems for changes ● Port Bullet Physics to the PS3 ○ SPUs are fun ● Optimizer for PS3 and PS Vita ○ Make games run faster ● PS4 CPU/GPU Expert ○ Hardware architecture and algorithms Biography

Slide 3

Slide 3 text

● The Web ○ Dart ○ WebGL ○ HTML5 Biography

Slide 4

Slide 4 text

1. Structure a. Tool visible type system b. Class based, object oriented c. Lexical this d. ; required 2. Performance a. Dart is designed to run fast by being less permissive b. New VM opens up new possibilities i. SIMD

Slide 5

Slide 5 text

Some of the reasons... Why can Dart run fast?

Slide 6

Slide 6 text

Static Object Shape Shape of MyClass function MyClass() { this.a = 1; this.b = "hey"; } a b foo = new MyClass(); ... foo.c = 3.14159; a b c Shape of foo Object Shape Change Previous optimizations invalid.

Slide 7

Slide 7 text

Hole Free Arrays bigData = []; bigData[0] = 0.0; bigData[1] = 1.0; ... bigData[50] = 50.0; 0.0 1.0 ... 49.0 50.0 bigData[20000] = 22.0; 0.0 1.0 49.0 50.0 22.0 0 1 50 20000 49 ...

Slide 8

Slide 8 text

Distinction between growable and fixed sized arrays var growable = new List(); // length == 0 var fixed = new List(200); for (int i = 0; i < fixed.length; i++) { // Safely query length only once // Bounds check hoisted out of loop } for (int i = 0; i < growable.length; i++) { // May have to query length many times // Bounds check inside the loop }

Slide 9

Slide 9 text

No prototype chain foo = new SomeClass(); foo.someFunction(); SomeClass.prototype ParentClass.prototype GrandParentClass.prototype Respond to someFunction? Respond to someFunction? Respond to someFunction?

Slide 10

Slide 10 text

Distinction between integer and double numbers ● JavaScript only has double ○ Double arithmetic slower than Integer arithmetic ■ For mobile processors difference is greater ● Dart has both double and integer ○ Gives choice to developer Double Integer Double slowdown Multiply 6 2 3x Addition 4 1 4x Load 2 2 N/A Store 2 2 N/A http://infocenter.arm.com/help/index.jsp - Cortex A9 CPU

Slide 11

Slide 11 text

... and why does it matter? What is SIMD?

Slide 12

Slide 12 text

Single Instruction Single Data (SISD) What is SIMD? 1.0 2.0 3.0

Slide 13

Slide 13 text

What is SIMD? Single Instruction Multiple Data (SIMD) 1.0 3.0 5.0 7.0 2.0 4.0 6.0 8.0 3.0 7.0 11.0 15.0 Vector Processor

Slide 14

Slide 14 text

Why does SIMD matter? ● SIMD can provide substantial speedup to: ○ 3D Graphics ○ 3D Physics ○ Image Processing ○ Signal Processing ○ Numerical Processing

Slide 15

Slide 15 text

Why does SIMD matter to the web? ● SIMD can provide substantial speedup to: ○ WebGL ○ Canvas ○ Animation ○ Games ○ Physics

Slide 16

Slide 16 text

Why does SIMD matter to the web? Console Games 1998 Web Games 2013

Slide 17

Slide 17 text

Why does SIMD matter?

Slide 18

Slide 18 text

Why does SIMD matter? ● SIMD requires fewer instructions to be executed ○ Fewer instructions means longer battery life VS

Slide 19

Slide 19 text

Why does SIMD matter? ● Mozilla is attempting to automatically use SIMD in IonMonkey VM ○ Gaussian Blur sped up ■ https://bugzilla.mozilla.org/show_bug.cgi?id=832718 ○ Based on pattern recognition ■ Programs must be written to patterns detectable by VM ○ "Automatic Vectorization" ■ Open research topic

Slide 20

Slide 20 text

SIMD in Dart

Slide 21

Slide 21 text

SIMD in Dart ● New types ○ Float32x4 ○ Float32x4List ○ Uint32x4 ● Composable operations ○ Arithmetic ○ Logical ○ Comparisons ○ Reordering (shuffling) 4 Unsigned 32-bit Integer Numbers List of Float32x4 4 IEEE-754 32-bit Floating Point Numbers

Slide 22

Slide 22 text

SIMD in Dart Float32x4 ● + ● - ● / ● * ● sqrt (square root) ● reciprocal ● rsqrt (reciprocal square root) ● min ● max ● clamp ● abs (absolute value) x w y z Lanes

Slide 23

Slide 23 text

Constructing var a = new Float32x4(1.0, 2.0, 3.0, 4.0); var b = new Float32x4.zero(); 1.0 4.0 2.0 3.0 0.0 0.0 0.0 0.0

Slide 24

Slide 24 text

var a = new Float32x4(1.0, 2.0, 3.0, 4.0); var b = a.x; // 1.0 var c = a.withX(5.0); Accessing and Modifying Individual Elements 1.0 4.0 2.0 3.0 5.0 4.0 2.0 3.0

Slide 25

Slide 25 text

var a = new Float32x4(1.0, 2.0, 3.0, 4.0); var b = new Float32x4(5.0, 10.0, 15.0, 20.0); var c = a + b; Arithmetic 1.0 4.0 2.0 3.0 5.0 20.0 10.0 15.0 6.0 24.0 12.0 18.0

Slide 26

Slide 26 text

Example double average(Float32List list) { var n = list.length; var sum = 0.0; for (int i = 0; i < n; i++) { sum += list[i]; } return sum / n; }

Slide 27

Slide 27 text

Example double average(Float32x4List list) { var n = list.length; var sum = new Float32x4.zero(); for (int i = 0; i < n; i++) { sum += list[i]; } var total = sum.x + sum.y + sum.z + sum.w; return total / (n * 4); }

Slide 28

Slide 28 text

Example 1.0 3.0 7.0 7.0 2.0 5.0 6.0 8.0 3.0 7.0 11.0 15.0 17.0 24.0 16.0 18.0 75.0

Slide 29

Slide 29 text

SIMD in Dart 75% fewer loads 75% fewer adds 75% fewer stores 4 times faster!

Slide 30

Slide 30 text

The inner loop sum += list[i]; ;; Load list[i] 0x4ccddcc d1ff sar edi, 1 0x4ccddce 0f104c3807 movups xmm1,[eax+edi*0x1+0x7] 0x4ccddd3 03ff add edi,edi ;; sum += 0x4ccddde 0f59ca addps xmm2,xmm1 Load 4 floats Add 4 floats

Slide 31

Slide 31 text

Shuffling var a = new Float32x4(1.0, 2.0, 3.0, 4.0); var b = a.xxyy; var c = a.wwww; var d = a.wzyx; 1.0 4.0 2.0 3.0 1.0 2.0 1.0 2.0 4.0 4.0 4.0 4.0 4.0 1.0 3.0 2.0

Slide 32

Slide 32 text

Branching double max(double a, double b) { if (a > b) { return a; } else { return b; } } max(4.0, 5.0) -> 5.0

Slide 33

Slide 33 text

Branching Float32x4 max(Float32x4 a, Float32x4 b) { if (a > b) { return a; } else { return b; } } 1.0 4.0 2.0 3.0 0.0 2.0 3.0 5.0

Slide 34

Slide 34 text

Branching Float32x4 max(Float32x4 a, Float32x4 b) { Uint3x4 greaterThan = a.greaterThan(b); return greaterThan.select(b, a); } = 0xF 0xF 0x0 0x0 2.0 3.0 5.0 0.0 > 1.0 4.0 2.0 3.0

Slide 35

Slide 35 text

Branching Float32x4 max(Float32x4 a, Float32x4 b) { Uint3x4 greaterThan = a.greaterThan(b); return greaterThan.select(a, b); } 0xF 0xF 0x0 0x0 0.0 2.0 3.0 5.0 1.0 4.0 2.0 3.0 0xF SELECT 0x0 1.0 3.0 5.0 4.0 =

Slide 36

Slide 36 text

How does the VM optimize for SIMD? 1. Unboxing a. Boxed -> allocated in memory b. Unboxed -> in CPU memory (in registers) 2. Replacing method calls with inlined machine instructions a. Allows values to remain unboxed (in registers) b. Avoids method call overhead

Slide 37

Slide 37 text

More Benchmarks

Slide 38

Slide 38 text

Wrap up

Slide 39

Slide 39 text

● Dart SIMD has landed* ○ Try it out! ○ Use your entire CPU Dart VM stretches the performance envelop. Dart VM makes new, magical experiences possible. Wrap Up Fewer Instructions Faster Performance Longer Battery A better Web

Slide 40

Slide 40 text

Why does SIMD matter to the web?

Slide 41

Slide 41 text

● The web needs SIMD if we want this: Wrap Up

Slide 42

Slide 42 text

Wait, what exactly is "fast"? ... and when will web programs be "fast"?

Slide 43

Slide 43 text

Fast The Web December, 2011 - Joel Webber's blog

Slide 44

Slide 44 text

Questions! www.johnmccutchan.com www.dartgamedevs.org Follow me on Twitter @johnmccutchan Circle me on Google+ gplus.to/cutch

Slide 45

Slide 45 text

● SIMD References ○ Wikipedia ■ http://en.wikipedia.org/wiki/SIMD ○ Intel's site ■ http://software.intel.com/en-us/articles/using-intel-streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms ■ http://software.intel.com/en-us/articles/optimizing-the-rendering-pipeline-of-animated-models-using-the-intel-streaming-simd-extensions ○ ARM's site ■ http://blogs.arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores/ ○ www.gamasutra.com ○ www.gamedev.net Wrap Up

Slide 46

Slide 46 text

What is SIMD?