Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Creating a serverless Python environment for scientific computing with WebAssembly, for data scientists and Python lovers

Jeongkyu Shin
September 01, 2020

Creating a serverless Python environment for scientific computing with WebAssembly, for data scientists and Python lovers

Summary (English)

Machine learning and data science are increasingly demanding computational resources. High-performance computing resources are concentrated in the ultra-large computing environment and cloud computing. At the same time, more and more people are entering machine learning and data science field. Not only cluster computing resources are increasing, but personal computers computing resources are also advancing rapidly. So, is there a way to use this local environment and perform scientific calculations anywhere easily? If you're still a beginner and don't have to go to distributed processing or large-scale pipelines, even more!

In this talk, you will learn how to use a Python-based scientific computing environment and machine learning environment through a browser engine without server-side computation. The talk describes the history of technological changes surrounding the web environment in recent years, leading to the Iodide project, an experimental project of the Mozilla Foundation to build a scientific computing environment based on WebAssembly. After that, I introduce the Pyodide project and build process that builds a Python environment based on WebAssembly using iodide. Using Pyodide, I share experiences of developing garage open-source desktop apps and web page that provide a browser-only Python environment.

For practical use, I demonstrate the process of performing various small scientific operations based on Python+WebAssembly with the platform described above, performing data analysis using Pandas on a browser, and drawing graphs using Matplotlib libraries. It also demonstrates the process of training a simple machine learning model by directly linking this data with TensorFlow.js. Lastly, based on the performance measurement results and some advantages and disadvantages seen in the demonstration process, we will talk about the fascinating potential of Python to expand into new areas, as well as the technical limitations of the WebAssembly-based python runtime and scientific computing environment.

Summary (Korean)

머신러닝 및 데이터 과학 분야는 갈수록 많은 연산 자원을 요구합니다. 고성능 컴퓨팅 자원은 초대규모 연산 환경과 클라우드로 집중되고 있습니다. 동시에, 머신러닝 및 데이터 과학에 입문하는 사람들 또한 해가 갈수록 늘어나고 있습니다. 서버 연산 자원만 늘어나는 것이 아니라 개인용 컴퓨터의 연산 자원 또한 빠르게 증가하고 있습니다. 그럼 이 환경을 좀 더 쉽게 잘 써서 어디서나 과학 연산을 쉽게 해 볼 수 있는 방법은 없을까요? 아직 입문자라서 분산 처리나 대용량 파이프라인까진 안 가도 되는 경우라면 더더욱!

이 세션에서는 Python 기반의 과학 연산 환경 및 머신러닝 환경을 서버측 연산 없이 브라우저 엔진을 통해 사용하는 방법에 대해 알아봅니다. 최근 몇 년 동안의 웹환경을 둘러싼 기술적인 변화가, WebAssembly 기반의 과학 연산환경을 구축하려는 Mozilla 재단의 실험 프로젝트인 Iodide 프로젝트로 이어진 과정을 설명합니다. 그 후, iodide를 이용하여 WebAssembly기반의 Python 환경을 구축하는 Pyodide 프로젝트 및 빌드 과정을 소개합니다. Pyodide를 이용하여, 브라우저 단독으로 실행 가능한 Python 환경을 제공하는 오픈소스 데스크탑 앱 및 웹을 개인적으로 개발한 경험을 공유합니다.

실제 사용예를 위해, 위에서 설명한 플랫폼으로 Python+WebAssembly 기반의 여러가지 작은 과학 연산을 수행하는 과정을 브라우저 상에서 Pandas를 이용해 데이터 분석을 수행하고 Matplotlib 라이브러리를 이용해 그래프를 그리는 과정을 시연합니다. 또한 이렇게 나온 데이터를 TensorFlow.js와 바로 연동하여 간단한 머신러닝 모델을 훈련하는 과정을 데모합니다. 마지막으로는 앞에서 다룬 성능 측정 결과 및 시연 과정에서 보이는 몇가지 장단점을 바탕으로, WebAssembly 기반의 python 런타임 및 과학 연산 환경의 기술적인 한계를 이야기하고, 그럼에도 불구하고 매력적인 Python의 새로운 영역으로의 확장 가능성에 대해서도 다루어 보겠습니다.

Jeongkyu Shin

September 01, 2020
Tweet

More Decks by Jeongkyu Shin

Other Decks in Programming

Transcript

  1. Creating a serverless Python environment
    for scientific computing
    with WebAssembly,
    for data scientists and Python lovers
    Jeongkyu Shin
    Lablup Inc. / Google Developers Expert

    View full-size slide

  2. [1] https://openwho.org/channels/covid-19-national-languages
    [2] https://www.youtube.com/watch?v=PjhoPEUcrmI?t=33

    View full-size slide

  3. What I talk today is…
    • Python
    • Scientific computation / environments
    • Calculation resources
    • Web and Mozilla
    • Iodide Project
    • App shells
    • Demo
    • What to do (with you)

    View full-size slide

  4. Python
    • Not only the greatest
    language for beginners, but
    the greatest scientific
    langauge
    • Julia: Hello?
    • Slow but Fast
    • GIL but scalable
    • Community-driven
    • Combine with many tools /
    frameworks / libraries
    • Can bind anything written with
    keyboard
    [1] https://xkcd.com/353/

    View full-size slide

  5. Scientific Computation
    • How humankind consumes electricity:
    • Officeworks?
    • Game?
    • Adult video?
    • No! Humanbeings use energy for scientific computation!
    • From FORTRAN to Python
    • IMSL, BLAS, LAPACK, OpenMP
    • cuBLAS, cuLAPACK, cuSOLVER
    • GSL, ROOTS, Numerical Recipes
    • And numpy / scipy

    View full-size slide

  6. Scientific environments
    • Libraries
    • NumPy, SciPy, Pandas, Matplotlib, SciKit-Learn…
    • Platforms
    • Anaconda, Canopy, ActivePython, PyIMSL, Python(x,y)
    • Container Images
    • MLWorkspace, Backend.AI Scientific Kernels

    View full-size slide

  7. Burning fire:
    • Complex computation resources
    • CPU, GPU, ASICs
    • Drivers, Libraries
    • Ultra-scale computation resources
    • GPU Cloud
    • Distributed Clusters
    [1] https://cloud.google.com/blog/products/ai-machine-learning/cloud-tpu-pods-break-ai-training-records

    View full-size slide

  8. Cut the chicken with a sledge knife
    • We do not need the nuke
    • e.g. Machine Learning study
    • Fraction of GPU is enough (2GB)
    • In fact, no need to use GPU for studying
    • Wag the dog: Scientific training
    workshop
    • Preparation: 2hr.
    • Training: 4hr.
    [1] http://www.inven.co.kr/board/webzine/2097/1177426 (Before modification)
    [2] https://www.yna.co.kr/view/GYH20090602001500044 (Note: now typo modified)

    View full-size slide

  9. Back to the future: battlefield web
    • JAVA / MS JAVA
    • Active X
    • Shockwave Adobe Flash
    • NaCl / PNaCl
    • For Chrome / ChromeOS extension
    • .EXE everywhere
    • Palmface
    [1] https://en.wikipedia.org/wiki/Facepalm#/media/File:Paris_Tuileries_Garden_Facepalm_statue.jpg

    View full-size slide

  10. Mozilla
    • Firefox
    • Rust
    • WebAssembly (WASM) by W3C
    • Can be a compliation target for low-level languages
    • Emscripten
    • LLVM-based Toolchain for compiling to asm.js / WASM
    • So, what can we do with this?

    View full-size slide

  11. [1] https://www.pcgamesn.com/quake-live-ditching-web-browsers-standalone-client

    View full-size slide

  12. Iodide Project
    • An experimental tool for scientific communication and
    exploration on the web
    • Scientific computing
    • Data science
    • Why no web tech. for scientific computing?
    • JavaScript in early 21st century
    • Are you serious?
    • Now: everyone uses Jupyter as UI for Python stack
    https://github.com/iodide-project

    View full-size slide

  13. Iodide to Pyodide
    • Web-based scientific environment
    • Complete notebook
    • Visualization
    • Everybody loves Python
    • Why don’t we compile scientific python stack with WASM?
    • And run in the JavaScript VM on browser?
    • ?!
    • Does it work? Does it?
    • It works!
    https://github.com/iodide-project/pyodide

    View full-size slide

  14. Pyodide: The Python science stack in the browser
    • Created by Michael Droettboom
    • Python runtime + scientific packages compiled with WASM
    • NumPy, SciPy, Pandas, Matplotlib…
    • Pros
    • Instant, easy,
    • Can combine magical ideas
    • Cons
    • Big & slow

    View full-size slide

  15. Pyodide: Packages
    2020. Aug. Common packages are listed only.

    View full-size slide

  16. Pyodide: Bootstrapping



    <br/>window.languagePluginUrl = ‘pyodide/';<br/>



    Runtime test:

    <br/>let resultPane = document.querySelector('#result-pane’);<br/>languagePluginLoader.then(()=>{<br/>resultPane.innerHTML = ’’;<br/>let result = pyodide.runPython (`<br/>import sys<br/>sys.version<br/>`);<br/>resultPane.innerHTML = result;<br/>});<br/>


    View full-size slide

  17. Pyodide: Bootstrapping
    let resultPane =
    document.querySelector('#result-pane’);
    languagePluginLoader.then(()=>{
    resultPane.innerHTML = ’’;
    let result = pyodide.runPython (`
    import sys
    sys.version
    `);
    resultPane.innerHTML = result;
    });
    • Note
    • Sometimes result will
    return ‘undefined’
    • Timing issue. (will
    cover later)

    View full-size slide

  18. Practical problems
    • Everyday use
    • File system to store codes / results
    • Python module loading: numpy, scipy, matplotlib
    • Web UI to work / study
    • Stand-alone / Portable
    • Iodide: Written in Django
    • Need installation on server / web connections
    • Stand-alone / Portable solution
    • To make usage scenarios simple

    View full-size slide

  19. Solutions for Practical problems
    • Fire system for runtime
    • Use BrowserFS (https://github.com/jvilk/BrowserFS)
    BrowserFS.install(window);
    BrowserFS.configure({
    fs: "LocalStorage"
    }, function(e) {
    let fs = BrowserFS.BFSRequire('fs’);
    fs.writeFileSync('/test.txt', 'Python+WebAssembly is Awesome!’);
    languagePluginLoader.then(async ()=>{
    let FS = pyodide._module.FS;
    let PATH = pyodide._module.PATH;
    // Create an Emscripten interface for the BrowserFS
    let BFS = new BrowserFS.EmscriptenFS(FS, PATH);
    // Create mount point in Emscripten FS
    FS.createFolder(FS.root, 'data', true, true);
    // Mount BrowserFS into Emscripten FS
    FS.mount(BFS, {root: '/'}, '/data’);
    // Open file in BrowserFS from python and show contents
    let result = await pyodide.runPythonAsync(`
    import numpy as np
    import sys
    import glob
    import js
    print(sys.version)
    print(np.__version__)
    f = open('/data/test.txt')
    print(f.readline())
    `);
    });
    });
    • Workflow
    • LocalStorage in this example
    • Use Emscripten for WASM
    backend

    View full-size slide

  20. Solutions for Practical problems
    • Module loading: Use Promise-ready running API
    • runPythonAsync automatically detects packages and dynamically
    imports.
    let resultPane =
    document.querySelector('#result-pane’);
    languagePluginLoader.then(async ()=>{
    resultPane.innerHTML = ’’;
    let result = await pyodide.runPythonAsync(`
    import numpy as np
    import sys
    print(sys.version)
    np.__version__
    `) .then(result=>{
    if (typeof result !== "undefined") {
    resultPane.innerHTML = result;
    }
    });
    });
    Console Log

    View full-size slide

  21. Making / Testing Simple Python REPL IDE
    • Problem 1: cannot get stdout
    • Solution: manual stdout() reading pipeline
    languagePluginLoader.then( ()=>{
    pyodide.runPython(`
    import sys
    import io
    sys.stdout = io.StringIO()
    `);
    ...
    });
    let stdout = pyodide.runPython("sys.stdout.getvalue()")
    let stdout_console = document.createElement('div');
    stdout_console.innerText = stdout;
    resultPane.appendChild(stdout_console);
    After executing each block:

    View full-size slide

  22. Making / Testing Simple Python REPL IDE
    • Problem 2: Iodide dependency
    • Solution: Make an Iodide mock-up on code
    • matplotlib (and other plotting libraries) is
    monkeypatched to create and use canvas to
    iodide.output
    • So let’s provide mockup object like this:
    globalThis.iodide = {
    output:{
    element: (tagName) => {
    let outputPane = document.createElement(tagName);
    document.querySelector("#result-pane").appendChild(outputPane);
    return outputPane;
    }
    }
    };

    View full-size slide

  23. ’More’ Practical problems
    • Data science
    • Size, access, speed, easeness
    • Runtime size
    • Cannot deliver through internet
    connection
    • 150~450MB for fullstack
    • depends on your compiled libraries

    View full-size slide

  24. Problem Solving with application
    • Electron app. with Chromium
    • Stand-alone browser environment with Node.js
    • App / Web dual mode
    • App mode
    • Complete scientific stack with local Pyodide packages
    • Web mode
    • Selectable library loading with ESNext dynamic import
    • pyodide.runPythonAsync will do the job

    View full-size slide

  25. Architecture (overview)
    Limitation:
    WebWorkers cannot modify DOM due to its jail nature

    View full-size slide

  26. Architecture (implementation)
    Implements with
    WebComponnents
    Custom pyodide.module
    for ES module

    View full-size slide

  27. Application Building & Distributing
    • Automatic build script
    • Dockerized build environment for WASM / Pyodide
    • rollup.js for Electron app
    • Electron packager with build script
    • Distribution
    • GitHub with source code and runtime
    • (Plan) Linux/Windows/Mac AppStore for easier distribution on
    Windows

    View full-size slide

  28. ’More’ Troubles
    • Problem 3: String buffer
    • Should clean string buffers!
    • Solution: run stdout cleanup code after each execution
    • Problem 4: WebWorker limitation
    • WebWorker runtime does not provide main thread DOM access
    • Solution: make data pipeline routine with message system
    • Limitation: multimedia outputs
    • They are generated as a part of WorkerGlobalScope
    • Still no fine way to solve the problem: Any ideas?
    pyodide.runPython(`sys.stdout.truncate(0);sys.stdout.seek(0)`);

    View full-size slide

  29. Now the app is ready to use:
    Let’s test the stand-alone IDE app!
    With those works + other hidden (& tedious) stuffs,

    View full-size slide

  30. Demo: Simple data science
    • Data → Analyze → Visualize

    View full-size slide

  31. Go further
    • Connect with Web
    • pyodide.pyimport: Access a Python object from JavaScript
    • Internal module : js
    • Provides direct access from Python to container DOM
    • Connect with local filestorage
    • Convert BrowserFS to node.js FileSystem API
    • Enable nodeIntegration=True on Electron newWindow option
    from js import document, window

    View full-size slide

  32. Building Pyodide
    • To enable FileSystem access with
    NODEFS
    • Recent versions of Emscripten removes
    node.js FileSytem support from default FS
    support
    • https://emscripten.org/docs/api_reference
    /Filesystem-API.html
    • How
    • Add option to ./Makefile
    OPTFLAGS=-O3 -lnodefs.js -lworkerfs.js -s NODERAWFS=1

    LDFLAGS=-s NODERAWFS=1

    View full-size slide

  33. Building Pyodide
    • To add / update libraries
    • If you have your own scientific
    libraries, compile it
    • Building Pyodide package
    • Use ‘mkpkg’ in Pyodide source
    code
    • Generates meta.yaml
    • bin/pyodide mkpkg
    [PACKAGE_NAME]
    package:
    name: numpy
    version: 1.15.4
    source:
    url: https://files.pythonhosted.org/packages/...
    sha256: 3d734559db35aa3697dadcea492a423118c5c...
    patches:
    - patches/add-emscripten-cpu.patch
    - patches/disable-maybe-uninitialized.patch
    - patches/dont-include-execinfo.patch
    - patches/fix-longdouble.patch
    - patches/fix-static-init-of-nditer-pywrap.patch
    - patches/force_malloc.patch
    - patches/init-alloc-cache.patch
    - patches/use-local-blas-lapack.patch
    - patches/fix-install-with-skip-build.patch
    build:
    skip_host: False
    cflags: -include math.h -I../../config
    test:
    imports:
    - numpy

    View full-size slide

  34. Demo: Web+Pyodide+Web+Fun

    View full-size slide

  35. Performance
    0.44s
    11.95s
    12.85s
    Matrix Dot
    (4096x4096)
    0.04ms
    1.30ms
    0.82ms
    Vector Dot
    (524288)
    28.00s
    0.37s
    12.85s
    SVD
    (2048x1024)
    0.06s
    3.96s
    1.99s
    Cholesky decomp.
    (2048x2048)
    3.91s
    149.73s
    67.16s
    Eigendecomposition
    (2048x2048)
    Tested on iMac Intel i9 9900k (8 Core / 5GHz)
    • Near native
    • Basically, it is single-threaded

    View full-size slide

  36. Limitations
    • Good enough, not for the production
    • With Firefox, user may change config: dom.max_script_run_time
    • Single-threaded
    • Slow when performing matrix calculations
    • Bad for heavy workload, good enough for studying

    View full-size slide

  37. Ideas
    • WASM+Micropython
    • “Usable” python runtime on browsers
    • Python-based SPA solution
    • Full Web-Python ecosphere with WASM
    • Micropip: (experimental) supports pure Python package installation
    • PyPi for WASM-Python
    • Dynamically loadable Python packages on the web
    • JupyterLab integration
    • Use local Pyodide runtime as IPython kernel
    • Some projects (e.g. jyve) but with security holes
    • And…

    View full-size slide

  38. Thank you for listening J
    [email protected]
    inureyes inureyes
    jeongkyu.shin End
    !
    Source codes:
    https://github.com/inureyes/pyodide-console

    View full-size slide