Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hastega: Challenge for GPGPU on Elixir @ Lonestar ElixirConf 2019

Hastega: Challenge for GPGPU on Elixir @ Lonestar ElixirConf 2019

We've succeeded in implementing a demonstration program in which an Elixir code invokes directly a GPGPU benchmark by Rustler. We propose a Hastega (Hyper Accelerator of Spreading Tasks for Elixir with GPU Activation) method that converts an Elixir code using Enum/Flow to an executable code for GPU or multi-core CPU with SIMD.
We've conducted a performance evaluation using the Logistic Maps of the experimental implementation of GPGPU using the Hastega method. We've got the following results:
our Hastega is 4-8 times faster than pure Elixir executed by only CPU
our Hastega is up to 3 times faster than CuPy/Python executed with GPU
our Hastega is only 1.5 times slower than native code executed with GPU
Now, we implement Linear Regression and Neural Network in Elixir, and will accelerate it with Hastega. Our main future work is to implement a compiler from Elixir code with Enum/Flow to native code for GPU and/or multi-core CPU with SIMD.

# Bio

Susumu Yamazaki (ZACKY) is currently an Associate Professor at the University of Kitakyushu. His current research projects focus on programming language processors, software engineering, programming education and social implementation of software systems.



Susumu Yamazaki (ZACKY)

March 02, 2019


  1. Hastega: Challenge for GPGPU on Elixir Susumu Yamazaki (ZACKY) Associate

    Professor at Univ. of Kitakyushu Adviser at fukuoka.ex
  2. Susumu Yamazaki @ZACKY1972
 Associate Professor Univ. of Kitakyushu Adviser at

    fukuoka.ex I came from Japan over whole one day, to make this presentation on Hastega My experience of Elixir is
 only 1 year!
  3. We are fukuoka.ex In Fukuoka, Kyushu, Japan About 300 engineers

    join it
  4. We’re three Samurais from fukuoka.ex, like “Les Trois Mousquetaires” (The

    Three Musketeers)
  5. We’ll show you
 our spirits of
 Samurai and Zen…

  6. Our motivation

  7. Data Explosion has come! 55.1 Exa bytes in 2013 It’s

    growing exponentially Year Amount of Data reported by Cisco 2013
  8. We need more power! Computer Architecture: A Quantitative Approach But,

    evolution of CPU clock
 became over 15 years ago ↑2003 Year Clock rate
  9. We need more power! *OUFM$PSF

 9&  $MPDLT ()[ DPSFT   •CPU clocks haven’t grown •# of cores is growing rapidly
  10. We need more power! •# of cores is growing, but

    CPU clocks haven’t grown •It requires parallel computing to us
  11. We need more power! •But we have no effective parallel

    programming languages •Multi-threaded programming is too early to be used correctly
  12. Long, long distance… The world with
 destructive updates is…

  13. Dystopia! © 2014, Teresa Prater

  14. © 2011, Pavel Medzyun Their works become

  15. The reason of dystopia • Suppose some data are shared

    with some cores 4IBSFE%BUB $PSF #1 $PSF2 3.14
  16. 4IBSFE%BUB $PSF #1 $PSF2 3.14 →1.5 The reason of dystopia

    • If a core updates the shared data • then it notifies other cores, and they stop processing • It causes slow down Update Notify Stop processing
  17. The reason of dystopia • If there are many cores,

    waiting time grows exponentially 4IBSFE%BUB $PSF #1 $PSF2 3.14 →1.5 Update Notify Stop processing
  18. It causes dystopia! © 2014, Teresa Prater

  19. Elixir is a drastic solution • Elixir is immutable •

    That is, it forbids all updates of shared data • Thus, the other cores don’t need to stop processing D 4IBSFE%BUB $PSF #1 $PSF2 3.14 Don’t Update Don’t need to stop processing
  20. Elixir and Phoenix are the most promising solution for the

    Data Explosion problem Fedrecheski, G., Costa, L. C. P. and Zuffo, M. K.: Elixir programming language evaluation for IoT, 2016 IEEE International Symposium on Consumer Electronics (ISCE), pp. 105–106 (online), DOI: 10.1109/ISCE.2016.7797392 (2016). Java is defeated by highly frequent requests more than 1,200 Requests/sec Elixir endures highly frequent requests
 less than 1,800 Requests/sec Server: a quad-core computer with 6GB RAM Client: an eight-core computer with 12GB RAM ↑Slower ↓Faster
  21. In 2016…

  22. José creates Flow!!! • It’s not a magic! (by José)

    • But I think it’s a quite marvelous and fantastic magic! • CPU- and/or IO-bound works will be parallelized and accelerated by multi- core CPUs • Elixir got the stylish and powerful parallel computing technology Single-processing code with Enum: 1..1_000_000
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1)) Multiple-processing code with Flow: 1..1_000_000
 |> Flow.from_enumerable
 |> Flow.map(&foo(&1))
 |> Flow.map(&bar(&1))
 |> Enum.to_list
  23. After a few years…

  24. More pessimistic prediction 
 of Data Explosion 40 Zeta Bytes

    (= 40,000 Exa Bytes) in 2020 180 Zeta Bytes (= 180,000 Exa Bytes) in 2025 ©2014 IDC
  25. ©  3PZ#MVNFOUIBM To summarize our motivation,
 I have a

  26. ©  3PZ#MVNFOUIBM Will Elixir and Phoenix be really enough

    for the future Data Explosion?
  27. Our solution:

  28. Hastega • It’s a magic!! • the highest evolved magic

    to accelerate our party in Final Fantasy!
 (Stronger than Haste) • It will be the highest evolved technology to accelerate our machines in the Elixir ecosystem! It’s inspired by Flow
  29. Let’s cast a spell of Hastega on Samurai

  30. 4-8x faster than Elixir using Flow! and also faster than

 (Parallel map) ↑ Slower ↓ Faster Flow
  31. 3x+ faster than Python with GPU (CuPy) !!! ↑ Slower

    ↓ Faster
  32. Only a little slower than native code (Rust)!! Dead heat!!

    ↑ Slower ↓ Faster
  33. Overwhelming Effectiveness like a berserk Samurai

  34. By the way,
 which is better, 
 recursive call and

  35. Enum.map is simpler! It’s like Zen! ˜4UFQIBOF%"MV

  36. Why is Enum.map Zen? ˜4UFQIBOF%"MV • Zen is the essential

    beauty • The essential of programming is data transformation • Enum.map describes only data transformation list = 1..1_000_000 |> Enum.to_list list 
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1))
  37. 1..1_000_000 
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1)) 1..1_000_000
 |> Enum.to_list

 def func( [] ), do: [] def func( [ head | tail ] ) do
 [ head |> foo |> bar 
 | func(tail) ]
 end (A) (B) • The code A, B and C are equivalent. • A: in the loop style in Java • B: in the recursive call style • C: using Enum.map Comparison int i;
 int[] array = new int[1000000];
 for(i = 0; i < 1000000; i++)
 array[i] = i + 1;
 for(i = 0; i < 1000000; i++)
 array[i] = foo(array[i]);
 for(i = 0; i < 1000000; i++)
 array[i] = bar(array[i]); (C)
  38. Why are they not Zen? ˜4UFQIBOF%"MV • Loop operation describes

    flow of processing, loop counter and destructive update int i;
 int[] array = new int[1000000];
 for(i = 0; i < 1000000; i++)
 array[i] = i + 1;
 for(i = 0; i < 1000000; i++)
 array[i] = foo(array[i]);
 for(i = 0; i < 1000000; i++)
 array[i] = bar(array[i]);
  39. Why are they not Zen? ˜4UFQIBOF%"MV • Recursive call describes

    not data transformation but flow of processing 1..1_000_000
 |> Enum.to_list
 |> func()
 def func( [] ), do: [] def func( [ head | tail ] ) do
 [ head |> foo |> bar 
 | func(tail) ]
  40. 1..1_000_000 
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1)) • We propose to

    call it the Elixir Zen style to write in Enum.map • It is a good programming custom • Because it’s more readable and maintainable The Elixir Zen style
  41. Then, what will happen? ˜4UFQIBOF%"MV

  42. In Elixir on Erlang VM, the Elixir Zen style is

    20 percents slower than recursive call list = 1..1_000_000 |> Enum.to_list list 
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1)) list
 |> func()
 def func( [] ), do: [] def func( [ head | tail ] ) do
 [ head |> foo |> bar 
 | func(tail) ]
 end 6msec ↑ Slower ↓ Faster Performance Evaluation
  43. © 2011, Pavel Medzyun It’s Zen, but not Samurai…! It

    will cause dystopia!
  44. • make the Elixir Zen styled code faster • by

    casting the spell of it on Samurai to be berserk • that is, to be transformed into the fastest native code, 
 using all computing resources, • not only multi-core CPUs (with SIMD instructions)
 but also GPUs Hastaga will… We feel it Wabi-Sabi
  45. Inspiration from Enum.map • This code has a potential of

    1,000,000 parallelism: • Each element will be transformed by the combination function of 
 foo and bar • There are no dependency between each other 1..1_000_000
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1))
  46. GPUs • The State-of-the-art GPUs have 3,000+ SIMD cores with

    1.5+GHz clocks
  47. Principle of Hastega • It can be transformed to SIMD

    native code such as OpenCL, which drives multi- core CPUs and GPUs, easily. • To write Hastega code is simple • All you have to do is to write defhastega with a do block, to include def blocks you wanna optimize defhastega do
 def func do
 |> Enum.map(&foo(&1))
 |> Enum.map(&bar(&1))
 end _kernel void calc(
 __global long* input,
 __global long* output) {
 size_t i = get_global_id(0);
 long temp = input[i];
 temp = foo(temp);
 temp = bar(temp);
 output[i] = temp;
  48. ↑ Slower ↓ Faster Flow ↑ Slower ↓ Faster Performance

    of Hastega from Zen is much better than recursive call
  49. Hastega makes the most beautiful code to be transformed the

  50. Demo I’m sorry I cannot tell you details in English

    well. But I believe common language for us is Elixir! Please feel our passion from Elixir code…
  51. Prototype code of Hastega is available at:
 branch: range

  52. Inside of Hastega • Hastega has two subsystems that we

    are developing: • SumMag: a meta-programming library • Magicite: an Elixir-LLVM binding via NIFs in Rustler • each code name is from FF
  53. SumMag: a meta- programming library • to extract each code

    block of a series of Enum.map to a new function • without writing such a parser from full- scratch • Thank you, José, for providing meta- programming infrastructure of Elixir
  54. Magicite: an Elixir-LLVM binding via NIFs in Rustler • using

    the state-of-the-art compiler infrastructure, LLVM • for generating native code • commanded by Rust via Rustler • invoked by Elixir • You’ll write only Elixir code, not Rust code
  55. Code Viewing Is Rust Common Language for us?

  56. Roadmap to Implement • Firstly, Hastega will support x86_64 CPUs,

    • using SIMD instructions (but on only a single core) • Next, it will support GPUs including AMD and NVIDIA, • which support OpenCL, • implemented by messaging to a process monopolizing communication to a GPU
  57. Roadmap to Implement • Supporting multi-core processing in Hastega •

    may be a little difficult • to implement in current Erlang VM • because we observed that • our prototypes are inefficient • to start and to synchronize new processes
  58. Roadmap to Implement • I’m also interested • in implementing

    to support Metal and CUDA • to realize highly efficiency, • and in load-balancing CPUs and GPUs • to make programming Hastega easier
  59. Roadmap to Implement • In future, I wanna implement •

    not only server-side computing • for data-base manipulation and machine learning on server • but also edge-computing and web-clients • by JS, WebAssembly, WebGL and WebGPU • generated from Elixir • for UI, computer vision and machine learning on edge and/ or web-client
  60. We’ll create the new world, the paradise of
 all of

  61. Our mission is to establish the technologies, including Elixir, to

    prevent us from dystopia for all people happiness!!!
  62. Do you wanna get power of Hastega, now?

  63. I’m sorry, but the 1st practical use version of Hastega

    will be released before Summer, 2019 m(_ _)m
  64. Conclusion • We would call the programming style using Enum.map

    • “the Elixir Zen style”
  65. Conclusion • We should use Hastega • to make the

    Elixir Zen styled code • to be transformed native code • optimized to CPUs with SIMD instructions and GPUs • like a berserk Samurai
  66. Conclusion • Hastega will be released before Summer, 2019

  67. Our adventure of Samurai, Zen and Wabi-Sabi continues…

  68. I’ll be back! to share new research results

  69. One more thing…

  70. Other Samurais have released more Elixir products!!! Materia: A Collection

    of Powerful web Authentication APIs with managing account, mail, errors and multi transaction. https://github.com/karabiner-inc/materia Esuna: A Data Science Platform built on Phoenix,
 enabling you to convert and aggregate data with GUI. It’s same to data manipulation as Python's pandas. What will happen if Esuna meets Hastega…!!! https://qiita.com/piacere_ex/items/ab0b32c521293d4ab38e
  71. Materia and Esuna will be released
 from Karabiner.inc We are

    developing systems with Elixir / Phoenix and others.
  72. Follow #fukuokaex You can join us every month via online

  73. We’ll be back! #fukuokaex