$30 off During Our Annual Pro Sale. View Details »

AI Community Day Bangkok 2025 - In-Browser ML/L...

Avatar for Karn Wong Karn Wong
December 03, 2025

AI Community Day Bangkok 2025 - In-Browser ML/LLM Inference Ecosystem

Avatar for Karn Wong

Karn Wong

December 03, 2025
Tweet

More Decks by Karn Wong

Other Decks in Technology

Transcript

  1. Karn Wong Loves optimization Has too much fun cranking out

    benchmarks HashiCorp Ambassador & AWS Community Builder Website: karnwong.me Independent Consultant
  2. Machine Learning is a Subtype of AI Classical Machine Learning

    Neural Networks (Deep Learning, Reinforcement Learning, etc.) Large Language Models
  3. Do You Want to Maintain a Separate ML System? But

    if you have a dedicated team to maintain the ML system, go ahead
  4. WASM to the Rescue If it can be converted into

    WASM, it can run on a web browser
  5. 🐘 Classical ML Classification example: # input { "feature1": 10.5,

    "feature2": 250, "feature3": 0.75, "feature4": 1, "feature5": "category_A" } # output { "0": 0.15, "1": 0.85 }
  6. 🐘 Neural Networks Facial recognition (YOLO) example: # input {

    "input_tensor": [[[ [0.23, 0.25, 0.26, ...], [0.22, 0.24, 0.26, ...], ...], ...]], "shape": [1, 3, 640, 640 ], "dtype": "float32", "normalized": true } # output { "image": "image_01.jpg", "detections": [ { "class_id": 0, "class_name": "person", "confidence": 0.87, "bbox_xywh": [233.0, 328.5, 242.0, 567.0] } ] }
  7. 🐘 LLM It’s complicated… You need the model’s tokenizer and

    model config # python from ai_edge_torch.generative.examples.gemma3 import gemma3 // rust use candle_transformers::models::gemma::{Config as Config1, Model as Model1};
  8. ONNX Runtime Web Most models can be converted into ONNX

    format Works for classical ML and neural networks For LLM: need to construct input tensors + decode output tensors Not realistic for LLM inference in-browser due to model size 300M LLM model is 1.2 GB https://onnxruntime.ai/docs/tutorials/web/deploy.html
  9. LiteRT Can be converted from PyTorch, TensorFlow, JAX Does not

    support classical ML For LLM: see MediaPipe https://ai.google.dev/edge/litert
  10. MediaPipe (via LiteRT) Plug-and-play solutions with default models that allow

    for customization Face detection, image classification, etc. Can customize the models - only the input data HuggingFace provides pre-built LLM models Can also convert the models yourself LLM is packaged as a .task file Includes LiteRT files, components, metadata .task format already includes tokenizer + model config https://ai.google.dev/edge/mediapipe/solutions/guide
  11. Candle Pure Rust implementation Can compile into WASM Have to

    construct input tokenizer + tensors Needs to use both Rust + JS Focuses on NN + LLM https://github.com/huggingface/candle/tree/main/candle-wasm-examples
  12. wllama WASM binding for llama.cpp WASM binary for wllama runtime

    GGUF model file Focuses on LLM https://github.com/ngxson/wllama
  13. Runtimes Comparison *GitHub Stars Language **Runtime is Automatically Provided Abstracted

    LLM Input / Output Model Type Include Pre-Built Models ONNX 18.3K JS ❌ ❌ Classical / NN / LLM ❌ LiteRT 21.9K JS ✅ (MediaPipe) ✅ (MediaPipe) NN / LLM ✅ (MediaPipe + HuggingFace) Candle 18.5K Rust + JS ❌ ❌ NN / LLM ✅ (HuggingFace) wllama 993 JS ❌ ✅ LLM ✅ (HuggingFace) *GitHub stars as of 2025-11-17 **Have to explicitly reference WASM binary for model runtime