Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Creating a serverless Python environment for scientific computing with WebAssembly, for data scientists and Python lovers

4955408544911430a3e0e8669109cff4?s=47 Jeongkyu Shin
September 01, 2020

Creating a serverless Python environment for scientific computing with WebAssembly, for data scientists and Python lovers

Summary (English)

Machine learning and data science are increasingly demanding computational resources. High-performance computing resources are concentrated in the ultra-large computing environment and cloud computing. At the same time, more and more people are entering machine learning and data science field. Not only cluster computing resources are increasing, but personal computers computing resources are also advancing rapidly. So, is there a way to use this local environment and perform scientific calculations anywhere easily? If you're still a beginner and don't have to go to distributed processing or large-scale pipelines, even more!

In this talk, you will learn how to use a Python-based scientific computing environment and machine learning environment through a browser engine without server-side computation. The talk describes the history of technological changes surrounding the web environment in recent years, leading to the Iodide project, an experimental project of the Mozilla Foundation to build a scientific computing environment based on WebAssembly. After that, I introduce the Pyodide project and build process that builds a Python environment based on WebAssembly using iodide. Using Pyodide, I share experiences of developing garage open-source desktop apps and web page that provide a browser-only Python environment.

For practical use, I demonstrate the process of performing various small scientific operations based on Python+WebAssembly with the platform described above, performing data analysis using Pandas on a browser, and drawing graphs using Matplotlib libraries. It also demonstrates the process of training a simple machine learning model by directly linking this data with TensorFlow.js. Lastly, based on the performance measurement results and some advantages and disadvantages seen in the demonstration process, we will talk about the fascinating potential of Python to expand into new areas, as well as the technical limitations of the WebAssembly-based python runtime and scientific computing environment.

Summary (Korean)

머신러닝 및 데이터 과학 분야는 갈수록 많은 연산 자원을 요구합니다. 고성능 컴퓨팅 자원은 초대규모 연산 환경과 클라우드로 집중되고 있습니다. 동시에, 머신러닝 및 데이터 과학에 입문하는 사람들 또한 해가 갈수록 늘어나고 있습니다. 서버 연산 자원만 늘어나는 것이 아니라 개인용 컴퓨터의 연산 자원 또한 빠르게 증가하고 있습니다. 그럼 이 환경을 좀 더 쉽게 잘 써서 어디서나 과학 연산을 쉽게 해 볼 수 있는 방법은 없을까요? 아직 입문자라서 분산 처리나 대용량 파이프라인까진 안 가도 되는 경우라면 더더욱!

이 세션에서는 Python 기반의 과학 연산 환경 및 머신러닝 환경을 서버측 연산 없이 브라우저 엔진을 통해 사용하는 방법에 대해 알아봅니다. 최근 몇 년 동안의 웹환경을 둘러싼 기술적인 변화가, WebAssembly 기반의 과학 연산환경을 구축하려는 Mozilla 재단의 실험 프로젝트인 Iodide 프로젝트로 이어진 과정을 설명합니다. 그 후, iodide를 이용하여 WebAssembly기반의 Python 환경을 구축하는 Pyodide 프로젝트 및 빌드 과정을 소개합니다. Pyodide를 이용하여, 브라우저 단독으로 실행 가능한 Python 환경을 제공하는 오픈소스 데스크탑 앱 및 웹을 개인적으로 개발한 경험을 공유합니다.

실제 사용예를 위해, 위에서 설명한 플랫폼으로 Python+WebAssembly 기반의 여러가지 작은 과학 연산을 수행하는 과정을 브라우저 상에서 Pandas를 이용해 데이터 분석을 수행하고 Matplotlib 라이브러리를 이용해 그래프를 그리는 과정을 시연합니다. 또한 이렇게 나온 데이터를 TensorFlow.js와 바로 연동하여 간단한 머신러닝 모델을 훈련하는 과정을 데모합니다. 마지막으로는 앞에서 다룬 성능 측정 결과 및 시연 과정에서 보이는 몇가지 장단점을 바탕으로, WebAssembly 기반의 python 런타임 및 과학 연산 환경의 기술적인 한계를 이야기하고, 그럼에도 불구하고 매력적인 Python의 새로운 영역으로의 확장 가능성에 대해서도 다루어 보겠습니다.

4955408544911430a3e0e8669109cff4?s=128

Jeongkyu Shin

September 01, 2020
Tweet

Transcript

  1. Creating a serverless Python environment for scientific computing with WebAssembly,

    for data scientists and Python lovers Jeongkyu Shin Lablup Inc. / Google Developers Expert
  2. None
  3. None
  4. [1] https://openwho.org/channels/covid-19-national-languages [2] https://www.youtube.com/watch?v=PjhoPEUcrmI?t=33

  5. What I talk today is… • Python • Scientific computation

    / environments • Calculation resources • Web and Mozilla • Iodide Project • App shells • Demo • What to do (with you)
  6. Python • Not only the greatest language for beginners, but

    the greatest scientific langauge • Julia: Hello? • Slow but Fast • GIL but scalable • Community-driven • Combine with many tools / frameworks / libraries • Can bind anything written with keyboard [1] https://xkcd.com/353/
  7. Scientific Computation • How humankind consumes electricity: • Officeworks? •

    Game? • Adult video? • No! Humanbeings use energy for scientific computation! • From FORTRAN to Python • IMSL, BLAS, LAPACK, OpenMP • cuBLAS, cuLAPACK, cuSOLVER • GSL, ROOTS, Numerical Recipes • And numpy / scipy
  8. Scientific environments • Libraries • NumPy, SciPy, Pandas, Matplotlib, SciKit-Learn…

    • Platforms • Anaconda, Canopy, ActivePython, PyIMSL, Python(x,y) • Container Images • MLWorkspace, Backend.AI Scientific Kernels
  9. Burning fire: • Complex computation resources • CPU, GPU, ASICs

    • Drivers, Libraries • Ultra-scale computation resources • GPU Cloud • Distributed Clusters [1] https://cloud.google.com/blog/products/ai-machine-learning/cloud-tpu-pods-break-ai-training-records
  10. Cut the chicken with a sledge knife • We do

    not need the nuke • e.g. Machine Learning study • Fraction of GPU is enough (2GB) • In fact, no need to use GPU for studying • Wag the dog: Scientific training workshop • Preparation: 2hr. • Training: 4hr. [1] http://www.inven.co.kr/board/webzine/2097/1177426 (Before modification) [2] https://www.yna.co.kr/view/GYH20090602001500044 (Note: now typo modified)
  11. None
  12. Back to the future: battlefield web • JAVA / MS

    JAVA • Active X • Shockwave Adobe Flash • NaCl / PNaCl • For Chrome / ChromeOS extension • .EXE everywhere • Palmface [1] https://en.wikipedia.org/wiki/Facepalm#/media/File:Paris_Tuileries_Garden_Facepalm_statue.jpg
  13. None
  14. Mozilla • Firefox • Rust • WebAssembly (WASM) by W3C

    • Can be a compliation target for low-level languages • Emscripten • LLVM-based Toolchain for compiling to asm.js / WASM • So, what can we do with this?
  15. [1] https://www.pcgamesn.com/quake-live-ditching-web-browsers-standalone-client

  16. Iodide Project • An experimental tool for scientific communication and

    exploration on the web • Scientific computing • Data science • Why no web tech. for scientific computing? • JavaScript in early 21st century • Are you serious? • Now: everyone uses Jupyter as UI for Python stack https://github.com/iodide-project
  17. Iodide to Pyodide • Web-based scientific environment • Complete notebook

    • Visualization • Everybody loves Python • Why don’t we compile scientific python stack with WASM? • And run in the JavaScript VM on browser? • ?! • Does it work? Does it? • It works! https://github.com/iodide-project/pyodide
  18. Pyodide: The Python science stack in the browser • Created

    by Michael Droettboom • Python runtime + scientific packages compiled with WASM • NumPy, SciPy, Pandas, Matplotlib… • Pros • Instant, easy, • Can combine magical ideas • Cons • Big & slow
  19. Pyodide: Packages 2020. Aug. Common packages are listed only.

  20. Pyodide: Bootstrapping <!DOCTYPE html> <html> <head> <script type="text/javascript"> window.languagePluginUrl =

    ‘pyodide/'; </script> <script src=”pyodide/pyodide.js"></script> </head> <body> Runtime test: <div id="result-pane"></div> <script type="text/javascript"> let resultPane = document.querySelector('#result-pane’); languagePluginLoader.then(()=>{ resultPane.innerHTML = ’’; let result = pyodide.runPython (` import sys sys.version `); resultPane.innerHTML = result; }); </script> </body> </html>
  21. Pyodide: Bootstrapping let resultPane = document.querySelector('#result-pane’); languagePluginLoader.then(()=>{ resultPane.innerHTML = ’’;

    let result = pyodide.runPython (` import sys sys.version `); resultPane.innerHTML = result; }); • Note • Sometimes result will return ‘undefined’ • Timing issue. (will cover later)
  22. Practical problems • Everyday use • File system to store

    codes / results • Python module loading: numpy, scipy, matplotlib • Web UI to work / study • Stand-alone / Portable • Iodide: Written in Django • Need installation on server / web connections • Stand-alone / Portable solution • To make usage scenarios simple
  23. Solutions for Practical problems • Fire system for runtime •

    Use BrowserFS (https://github.com/jvilk/BrowserFS) BrowserFS.install(window); BrowserFS.configure({ fs: "LocalStorage" }, function(e) { let fs = BrowserFS.BFSRequire('fs’); fs.writeFileSync('/test.txt', 'Python+WebAssembly is Awesome!’); languagePluginLoader.then(async ()=>{ let FS = pyodide._module.FS; let PATH = pyodide._module.PATH; // Create an Emscripten interface for the BrowserFS let BFS = new BrowserFS.EmscriptenFS(FS, PATH); // Create mount point in Emscripten FS FS.createFolder(FS.root, 'data', true, true); // Mount BrowserFS into Emscripten FS FS.mount(BFS, {root: '/'}, '/data’); // Open file in BrowserFS from python and show contents let result = await pyodide.runPythonAsync(` import numpy as np import sys import glob import js print(sys.version) print(np.__version__) f = open('/data/test.txt') print(f.readline()) `); }); }); • Workflow • LocalStorage in this example • Use Emscripten for WASM backend
  24. Solutions for Practical problems • Module loading: Use Promise-ready running

    API • runPythonAsync automatically detects packages and dynamically imports. let resultPane = document.querySelector('#result-pane’); languagePluginLoader.then(async ()=>{ resultPane.innerHTML = ’’; let result = await pyodide.runPythonAsync(` import numpy as np import sys print(sys.version) np.__version__ `) .then(result=>{ if (typeof result !== "undefined") { resultPane.innerHTML = result; } }); }); Console Log
  25. Making / Testing Simple Python REPL IDE • Problem 1:

    cannot get stdout • Solution: manual stdout() reading pipeline languagePluginLoader.then( ()=>{ pyodide.runPython(` import sys import io sys.stdout = io.StringIO() `); ... }); let stdout = pyodide.runPython("sys.stdout.getvalue()") let stdout_console = document.createElement('div'); stdout_console.innerText = stdout; resultPane.appendChild(stdout_console); After executing each block:
  26. Making / Testing Simple Python REPL IDE • Problem 2:

    Iodide dependency • Solution: Make an Iodide mock-up on code • matplotlib (and other plotting libraries) is monkeypatched to create and use canvas to iodide.output • So let’s provide mockup object like this: globalThis.iodide = { output:{ element: (tagName) => { let outputPane = document.createElement(tagName); document.querySelector("#result-pane").appendChild(outputPane); return outputPane; } } };
  27. ’More’ Practical problems • Data science • Size, access, speed,

    easeness • Runtime size • Cannot deliver through internet connection • 150~450MB for fullstack • depends on your compiled libraries
  28. Problem Solving with application • Electron app. with Chromium •

    Stand-alone browser environment with Node.js • App / Web dual mode • App mode • Complete scientific stack with local Pyodide packages • Web mode • Selectable library loading with ESNext dynamic import • pyodide.runPythonAsync will do the job
  29. Architecture (overview) Limitation: WebWorkers cannot modify DOM due to its

    jail nature
  30. Architecture (implementation) Implements with WebComponnents Custom pyodide.module for ES module

  31. Application Building & Distributing • Automatic build script • Dockerized

    build environment for WASM / Pyodide • rollup.js for Electron app • Electron packager with build script • Distribution • GitHub with source code and runtime • (Plan) Linux/Windows/Mac AppStore for easier distribution on Windows
  32. ’More’ Troubles • Problem 3: String buffer • Should clean

    string buffers! • Solution: run stdout cleanup code after each execution • Problem 4: WebWorker limitation • WebWorker runtime does not provide main thread DOM access • Solution: make data pipeline routine with message system • Limitation: multimedia outputs • They are generated as a part of WorkerGlobalScope • Still no fine way to solve the problem: Any ideas? pyodide.runPython(`sys.stdout.truncate(0);sys.stdout.seek(0)`);
  33. Now the app is ready to use: Let’s test the

    stand-alone IDE app! With those works + other hidden (& tedious) stuffs,
  34. Demo: Simple data science • Data → Analyze → Visualize

  35. None
  36. Go further • Connect with Web • pyodide.pyimport: Access a

    Python object from JavaScript • Internal module : js • Provides direct access from Python to container DOM • Connect with local filestorage • Convert BrowserFS to node.js FileSystem API • Enable nodeIntegration=True on Electron newWindow option from js import document, window
  37. Building Pyodide • To enable FileSystem access with NODEFS •

    Recent versions of Emscripten removes node.js FileSytem support from default FS support • https://emscripten.org/docs/api_reference /Filesystem-API.html • How • Add option to ./Makefile OPTFLAGS=-O3 -lnodefs.js -lworkerfs.js -s NODERAWFS=1 … LDFLAGS=-s NODERAWFS=1
  38. Building Pyodide • To add / update libraries • If

    you have your own scientific libraries, compile it • Building Pyodide package • Use ‘mkpkg’ in Pyodide source code • Generates meta.yaml • bin/pyodide mkpkg [PACKAGE_NAME] package: name: numpy version: 1.15.4 source: url: https://files.pythonhosted.org/packages/... sha256: 3d734559db35aa3697dadcea492a423118c5c... patches: - patches/add-emscripten-cpu.patch - patches/disable-maybe-uninitialized.patch - patches/dont-include-execinfo.patch - patches/fix-longdouble.patch - patches/fix-static-init-of-nditer-pywrap.patch - patches/force_malloc.patch - patches/init-alloc-cache.patch - patches/use-local-blas-lapack.patch - patches/fix-install-with-skip-build.patch build: skip_host: False cflags: -include math.h -I../../config test: imports: - numpy
  39. Demo: Web+Pyodide+Web+Fun

  40. None
  41. Performance 0.44s 11.95s 12.85s Matrix Dot (4096x4096) 0.04ms 1.30ms 0.82ms

    Vector Dot (524288) 28.00s 0.37s 12.85s SVD (2048x1024) 0.06s 3.96s 1.99s Cholesky decomp. (2048x2048) 3.91s 149.73s 67.16s Eigendecomposition (2048x2048) Tested on iMac Intel i9 9900k (8 Core / 5GHz) • Near native • Basically, it is single-threaded
  42. Limitations • Good enough, not for the production • With

    Firefox, user may change config: dom.max_script_run_time • Single-threaded • Slow when performing matrix calculations • Bad for heavy workload, good enough for studying
  43. Ideas • WASM+Micropython • “Usable” python runtime on browsers •

    Python-based SPA solution • Full Web-Python ecosphere with WASM • Micropip: (experimental) supports pure Python package installation • PyPi for WASM-Python • Dynamically loadable Python packages on the web • JupyterLab integration • Use local Pyodide runtime as IPython kernel • Some projects (e.g. jyve) but with security holes • And…
  44. Thank you for listening J inureyes@gmail.com inureyes inureyes jeongkyu.shin End

    ! Source codes: https://github.com/inureyes/pyodide-console