Slide 1

Slide 1 text

Hastega: Challenge for GPGPU on Elixir Susumu Yamazaki (ZACKY) Associate Professor at Univ. of Kitakyushu Adviser at fukuoka.ex

Slide 2

Slide 2 text

Susumu Yamazaki @ZACKY1972
 Associate Professor Univ. of Kitakyushu Adviser at fukuoka.ex I came from Japan over whole one day, to make this presentation on Hastega My experience of Elixir is
 only 1 year!

Slide 3

Slide 3 text

We are fukuoka.ex In Fukuoka, Kyushu, Japan About 300 engineers join it

Slide 4

Slide 4 text

We’re three Samurais from fukuoka.ex, like “Les Trois Mousquetaires” (The Three Musketeers)

Slide 5

Slide 5 text

We’ll show you
 our spirits of
 Samurai and Zen…

Slide 6

Slide 6 text

Our motivation

Slide 7

Slide 7 text

Data Explosion has come! 55.1 Exa bytes in 2013 It’s growing exponentially Year Amount of Data reported by Cisco 2013

Slide 8

Slide 8 text

We need more power! Computer Architecture: A Quantitative Approach But, evolution of CPU clock
 became over 15 years ago ↑2003 Year Clock rate

Slide 9

Slide 9 text

We need more power! *OUFM$PSF
 &YUSFNF9 $MPDLT ()[ DPSFT *OUFM$PSFJ
 9& $MPDLT ()[ DPSFT •CPU clocks haven’t grown •# of cores is growing rapidly

Slide 10

Slide 10 text

We need more power! •# of cores is growing, but
 CPU clocks haven’t grown •It requires parallel computing to us

Slide 11

Slide 11 text

We need more power! •But we have no effective parallel programming languages •Multi-threaded programming is too early to be used correctly

Slide 12

Slide 12 text

Long, long distance… The world with
 destructive updates is…

Slide 13

Slide 13 text

Dystopia! © 2014, Teresa Prater

Slide 14

Slide 14 text

© 2011, Pavel Medzyun Their works become
 dystopia!

Slide 15

Slide 15 text

The reason of dystopia • Suppose some data are shared with some cores 4IBSFE%BUB $PSF #1 $PSF2 3.14

Slide 16

Slide 16 text

4IBSFE%BUB $PSF #1 $PSF2 3.14 →1.5 The reason of dystopia • If a core updates the shared data • then it notifies other cores, and they stop processing • It causes slow down Update Notify Stop processing

Slide 17

Slide 17 text

The reason of dystopia • If there are many cores,
 waiting time grows exponentially 4IBSFE%BUB $PSF #1 $PSF2 3.14 →1.5 Update Notify Stop processing

Slide 18

Slide 18 text

It causes dystopia! © 2014, Teresa Prater

Slide 19

Slide 19 text

Elixir is a drastic solution • Elixir is immutable • That is, it forbids all updates of shared data • Thus, the other cores don’t need to stop processing D 4IBSFE%BUB $PSF #1 $PSF2 3.14 Don’t Update Don’t need to stop processing

Slide 20

Slide 20 text

Elixir and Phoenix are the most promising solution for the Data Explosion problem Fedrecheski, G., Costa, L. C. P. and Zuffo, M. K.: Elixir programming language evaluation for IoT, 2016 IEEE International Symposium on Consumer Electronics (ISCE), pp. 105–106 (online), DOI: 10.1109/ISCE.2016.7797392 (2016). Java is defeated by highly frequent requests more than 1,200 Requests/sec Elixir endures highly frequent requests
 less than 1,800 Requests/sec Server: a quad-core computer with 6GB RAM Client: an eight-core computer with 12GB RAM ↑Slower ↓Faster

Slide 21

Slide 21 text

In 2016…

Slide 22

Slide 22 text

José creates Flow!!! • It’s not a magic! (by José) • But I think it’s a quite marvelous and fantastic magic! • CPU- and/or IO-bound works will be parallelized and accelerated by multi- core CPUs • Elixir got the stylish and powerful parallel computing technology Single-processing code with Enum: 1..1_000_000
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1)) Multiple-processing code with Flow: 1..1_000_000
 |> Flow.from_enumerable
 |> Flow.map(&foo(&1))
 |> Flow.map(&bar(&1))
 |> Enum.to_list

Slide 23

Slide 23 text

After a few years…

Slide 24

Slide 24 text

More pessimistic prediction 
 of Data Explosion 40 Zeta Bytes (= 40,000 Exa Bytes) in 2020 180 Zeta Bytes (= 180,000 Exa Bytes) in 2025 ©2014 IDC

Slide 25

Slide 25 text

© 3PZ#MVNFOUIBM To summarize our motivation,
 I have a question,

Slide 26

Slide 26 text

© 3PZ#MVNFOUIBM Will Elixir and Phoenix be really enough for the future Data Explosion?

Slide 27

Slide 27 text

Our solution:
 Hastega!!!

Slide 28

Slide 28 text

Hastega • It’s a magic!! • the highest evolved magic to accelerate our party in Final Fantasy!
 (Stronger than Haste) • It will be the highest evolved technology to accelerate our machines in the Elixir ecosystem! It’s inspired by Flow

Slide 29

Slide 29 text

Let’s cast a spell of Hastega on Samurai

Slide 30

Slide 30 text

4-8x faster than Elixir using Flow! and also faster than P-map
 (Parallel map) ↑ Slower ↓ Faster Flow

Slide 31

Slide 31 text

3x+ faster than Python with GPU (CuPy) !!! ↑ Slower ↓ Faster

Slide 32

Slide 32 text

Only a little slower than native code (Rust)!! Dead heat!! ↑ Slower ↓ Faster

Slide 33

Slide 33 text

Overwhelming Effectiveness like a berserk Samurai

Slide 34

Slide 34 text

By the way,
 which is better, 
 recursive call and Enum.map?

Slide 35

Slide 35 text

Enum.map is simpler! It’s like Zen! ˜4UFQIBOF%"MV

Slide 36

Slide 36 text

Why is Enum.map Zen? ˜4UFQIBOF%"MV • Zen is the essential beauty • The essential of programming is data transformation • Enum.map describes only data transformation list = 1..1_000_000 |> Enum.to_list list 
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1))

Slide 37

Slide 37 text

1..1_000_000 
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1)) 1..1_000_000
 |> Enum.to_list
 |> func()
 
 def func( [] ), do: [] def func( [ head | tail ] ) do
 [ head |> foo |> bar 
 | func(tail) ]
 end (A) (B) • The code A, B and C are equivalent. • A: in the loop style in Java • B: in the recursive call style • C: using Enum.map Comparison int i;
 int[] array = new int[1000000];
 for(i = 0; i < 1000000; i++)
 array[i] = i + 1;
 for(i = 0; i < 1000000; i++)
 array[i] = foo(array[i]);
 for(i = 0; i < 1000000; i++)
 array[i] = bar(array[i]); (C)

Slide 38

Slide 38 text

Why are they not Zen? ˜4UFQIBOF%"MV • Loop operation describes flow of processing, loop counter and destructive update int i;
 int[] array = new int[1000000];
 for(i = 0; i < 1000000; i++)
 array[i] = i + 1;
 for(i = 0; i < 1000000; i++)
 array[i] = foo(array[i]);
 for(i = 0; i < 1000000; i++)
 array[i] = bar(array[i]);

Slide 39

Slide 39 text

Why are they not Zen? ˜4UFQIBOF%"MV • Recursive call describes not data transformation but flow of processing 1..1_000_000
 |> Enum.to_list
 |> func()
 
 def func( [] ), do: [] def func( [ head | tail ] ) do
 [ head |> foo |> bar 
 | func(tail) ]
 end

Slide 40

Slide 40 text

1..1_000_000 
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1)) • We propose to call it the Elixir Zen style to write in Enum.map • It is a good programming custom • Because it’s more readable and maintainable The Elixir Zen style

Slide 41

Slide 41 text

Then, what will happen? ˜4UFQIBOF%"MV

Slide 42

Slide 42 text

In Elixir on Erlang VM, the Elixir Zen style is 20 percents slower than recursive call list = 1..1_000_000 |> Enum.to_list list 
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1)) list
 |> func()
 
 def func( [] ), do: [] def func( [ head | tail ] ) do
 [ head |> foo |> bar 
 | func(tail) ]
 end 6msec ↑ Slower ↓ Faster Performance Evaluation

Slide 43

Slide 43 text

© 2011, Pavel Medzyun It’s Zen, but not Samurai…! It will cause dystopia!

Slide 44

Slide 44 text

• make the Elixir Zen styled code faster • by casting the spell of it on Samurai to be berserk • that is, to be transformed into the fastest native code, 
 using all computing resources, • not only multi-core CPUs (with SIMD instructions)
 but also GPUs Hastaga will… We feel it Wabi-Sabi

Slide 45

Slide 45 text

Inspiration from Enum.map • This code has a potential of 1,000,000 parallelism: • Each element will be transformed by the combination function of 
 foo and bar • There are no dependency between each other 1..1_000_000
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1))

Slide 46

Slide 46 text

GPUs • The State-of-the-art GPUs have 3,000+ SIMD cores with 1.5+GHz clocks

Slide 47

Slide 47 text

Principle of Hastega • It can be transformed to SIMD native code such as OpenCL, which drives multi- core CPUs and GPUs, easily. • To write Hastega code is simple • All you have to do is to write defhastega with a do block, to include def blocks you wanna optimize defhastega do
 def func do
 1..1_000_000
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1))
 end
 end _kernel void calc(
 __global long* input,
 __global long* output) {
 size_t i = get_global_id(0);
 long temp = input[i];
 temp = foo(temp);
 temp = bar(temp);
 output[i] = temp;
 }

Slide 48

Slide 48 text

↑ Slower ↓ Faster Flow ↑ Slower ↓ Faster Performance of Hastega from Zen is much better than recursive call

Slide 49

Slide 49 text

Hastega makes the most beautiful code to be transformed the fastest!

Slide 50

Slide 50 text

Demo I’m sorry I cannot tell you details in English well. But I believe common language for us is Elixir! Please feel our passion from Elixir code…

Slide 51

Slide 51 text

Prototype code of Hastega is available at:
 https://github.com/zeam-vm/logistic_map
 branch: range

Slide 52

Slide 52 text

Inside of Hastega • Hastega has two subsystems that we are developing: • SumMag: a meta-programming library • Magicite: an Elixir-LLVM binding via NIFs in Rustler • each code name is from FF

Slide 53

Slide 53 text

SumMag: a meta- programming library • to extract each code block of a series of Enum.map to a new function • without writing such a parser from full- scratch • Thank you, José, for providing meta- programming infrastructure of Elixir

Slide 54

Slide 54 text

Magicite: an Elixir-LLVM binding via NIFs in Rustler • using the state-of-the-art compiler infrastructure, LLVM • for generating native code • commanded by Rust via Rustler • invoked by Elixir • You’ll write only Elixir code, not Rust code

Slide 55

Slide 55 text

Code Viewing Is Rust Common Language for us?

Slide 56

Slide 56 text

Roadmap to Implement • Firstly, Hastega will support x86_64 CPUs, • using SIMD instructions (but on only a single core) • Next, it will support GPUs including AMD and NVIDIA, • which support OpenCL, • implemented by messaging to a process monopolizing communication to a GPU

Slide 57

Slide 57 text

Roadmap to Implement • Supporting multi-core processing in Hastega • may be a little difficult • to implement in current Erlang VM • because we observed that • our prototypes are inefficient • to start and to synchronize new processes

Slide 58

Slide 58 text

Roadmap to Implement • I’m also interested • in implementing to support Metal and CUDA • to realize highly efficiency, • and in load-balancing CPUs and GPUs • to make programming Hastega easier

Slide 59

Slide 59 text

Roadmap to Implement • In future, I wanna implement • not only server-side computing • for data-base manipulation and machine learning on server • but also edge-computing and web-clients • by JS, WebAssembly, WebGL and WebGPU • generated from Elixir • for UI, computer vision and machine learning on edge and/ or web-client

Slide 60

Slide 60 text

We’ll create the new world, the paradise of
 all of engineers

Slide 61

Slide 61 text

Our mission is to establish the technologies, including Elixir, to prevent us from dystopia for all people happiness!!!

Slide 62

Slide 62 text

Do you wanna get power of Hastega, now?

Slide 63

Slide 63 text

I’m sorry, but the 1st practical use version of Hastega will be released before Summer, 2019 m(_ _)m

Slide 64

Slide 64 text

Conclusion • We would call the programming style using Enum.map • “the Elixir Zen style”

Slide 65

Slide 65 text

Conclusion • We should use Hastega • to make the Elixir Zen styled code • to be transformed native code • optimized to CPUs with SIMD instructions and GPUs • like a berserk Samurai

Slide 66

Slide 66 text

Conclusion • Hastega will be released before Summer, 2019
 (I wish…)

Slide 67

Slide 67 text

Our adventure of Samurai, Zen and Wabi-Sabi continues…

Slide 68

Slide 68 text

I’ll be back! to share new research results

Slide 69

Slide 69 text

One more thing…

Slide 70

Slide 70 text

Other Samurais have released more Elixir products!!! Materia: A Collection of Powerful web Authentication APIs with managing account, mail, errors and multi transaction. https://github.com/karabiner-inc/materia Esuna: A Data Science Platform built on Phoenix,
 enabling you to convert and aggregate data with GUI. It’s same to data manipulation as Python's pandas. What will happen if Esuna meets Hastega…!!! https://qiita.com/piacere_ex/items/ab0b32c521293d4ab38e

Slide 71

Slide 71 text

Materia and Esuna will be released
 from Karabiner.inc We are developing systems with Elixir / Phoenix and others.
 https://www.karabiner.tech/

Slide 72

Slide 72 text

Follow #fukuokaex You can join us every month via online https://fukuokaex.connpass.com/

Slide 73

Slide 73 text

We’ll be back! #fukuokaex