Software 2.0 With Go

Software 2.0 with Go

:= := MLG := SpaceStock ⏪ := Ice House ->
DANA -> OnTel AB

@ Ice House (3.5 yr) Ruby on Rails + MySQL
+ Redis ElasticSearh + Salesforce + AWS Scala + Play + Akka + Cassandra Python Flask Go + PostgreSQL @ DANA (8 mo) Java Spring + MySQL + Aliyun @ OnTel AB (5 mo) Python + React + Kafka + RocksDB @ SpaceStock (2 mo) Go + MongoDB + ElasticSearch

Software 2.0 with Go . Software 2.0 . Software 1.0
-> Software 2.0 . Lifecycle of Software 2.0 Project . Infrastructure and Tooling . Python is the King, Go can be the Minister . Go Case Study: Data Processing Pipeline

Software 2.0

Software 2.0 vs 1.0

Programmer 2.0 vs 1.0 2.0: curate + maintain + massage
+ clean + label dataset 1.0: maintain tools + analytics + visualization + labeling interface + infrastructure + training code

Transition Visual Recognition Speech Recoginition Speech Synthesis Machine Translation Games
Databases

Case Study: Databases

Benefit of S/W 2.0 Work better in practice Computationally homogeneous
Simple to bake into silicon Constant running time Constant memory use Highly portable Agile Module can meld into optimal whole Better than you

Limitation of S/W 2.0 Magic Fail in unintuitive way Adversarial
attack

Programming 2.0 Stack Condition Tooling Opportunity

S/W 1.0 -> S/W 2.0

S/W 1.0 vs S/W 2.0 S/W 1.0 -> Write code,
express how system achieve goal S/@ 2.0 -> Curating training data, spec-by-example of what system should do Toolchain 1.0 -> Create + Validate Logic Toolchain 2.0 -> Create / Curate + Validate Data

Example: Google Email Extraction System Learn template for B2C email
Use template to extract info (order number / travel date) Use heuristic based with handcrafted rule Lesson Learned -> Coverage of heuristic based extraction system flat for several month because too brittle to improve without introducing error

Benefit of switching to S/W 2.0 Precision and Recall quickly
surpassed result from S/W 1.0 Google delete 45k line of code New system is easier to maintain Old system is brittle, difficult to debug error difficult to make further accuracy improvement New possibility -> Cross-language word embedding to learn extraction model across several languages

Result machine learned > heuristic based easier to understand and
improve critical ingredient for software 2.0 manage training data acquiring debugging versioning transforming

Lifecycle of S/W 2.0 Project

Mental Model for S/W 2.0 Project High Impact Complex parts
of your pipeline Where "cheap prediction" is valuable Where automatic complicated manual process is valuable Low Cost Cost is driven by: Data availability Performance requirements Problem difficulty

Infrastructure and Tooling

Data Management . Data Sources . Data Labeling . Data
Storage . Data Versioning . Data Processing

Data Sources Supervised deep learning requires a lot of labeled
data Labeling own data is costly Here are some resources for data Open source data (good to start with, not an advantage) Data augmentation (a MUST for CV, optional for NLP) Synthetic data (worth starting with, esp. in NLP)

Data Labeling Requires: labeling platforms, temporary labor, and QC Sources
of labor: Crowdsourcing: cheap and scalable, less reliable, needs QC Hiring own annotators: less QC needed, expensive, slow to scale Data labeling service companies: FigureEight

Data Labeling Labeling platforms: Diffgram: Training Data Software (CV) Prodigy:
Annotation tool powered by active learning (Text + Image) HIVE: AI as a Service platform for CV Supervisely: entire CV platform Labelbox: CV Scale: AI data platform (CV & NLP)

Data Storage Object store: Store binary data (images, sound files,
compressed texts) Amazon S3 Ceph Object Store Database: Store metadata (file paths, labels, user activity, etc) Postgres: right choice for most applications, best-in-class SQL and great support for unstructured JSON

Data Storage Data Lake: aggregate features which are not obtainable
from database (e.g. logs) Amazon Redshift Feature Store: store, access, and share ML features FEAST Michelangelo Palette At training time, copy data into local or networked filesystem (NFS)

Data Versioning It's a "MUST" for deployed ML models: Deployed
ML models are part code, part data. No data versioning means no model versioning. Data versioning platforms: DVC: Open source version control system for ML projects Pachyderm: version control for data Dolt: versioning for SQL database

Data Processing Training data for production models may come from
different sources, Stored data in db and object stores, log processing, and outputs of other classifiers. There are dependencies between tasks, each needs to be kicked off after its dependencies are finished. For example, training on new log data, requires a preprocessing step before training. Makefiles are not scalable. "Workflow manager"s become pretty essential in this regard.

Data Processing Workflow orchestration: Luigi by Spotify Airflow by Airbnb:
Dynamic, extensible, elegant, and scalable (the most widely used) DAG workflow Robust conditional execution: retry in case of failure Pusher supports docker images with tensorflow serving Whole workflow in a single .py file

Development, Training, and Evaluation . Software Engineering . Resource Management
. Deep Learning Frameworks . Experiment Management . Hyperparameter Tuning . Distributed Training

Software Engineering Winner language: Python Editors: Vim / Emacs /
VS Code Notebooks: Great as starting point of the projects, hard to scale nteract: a next-gen React-based UI for Jupyter notebooks Papermill: is an nteract library built for parameterizing, executing, and analyzing Jupyter Notebooks. Commuter: another nteract project which provides a read-only display of notebooks (e.g. from S3 buckets). Streamlit: interactive data science tool with applets

Software Engineering Compute recommendations 1: For individuals or startups: Development:
a 4x Turing-architecture PC Training/Evaluation: Use the same 4x GPU PC. When running many experiments, either buy shared servers or use cloud instances. For large companies: Development: Buy a 4x Turing-architecture PC per ML scientist or let them use V100 instances Training/Evaluation: Use cloud instances with proper provisioning and handling of failures

Software Engineering Cloud Providers: GCP: option to connect GPUs to
any instance + has TPUs AWS:

Resource Management Allocating free resources to programs Resource management options:
Old school cluster job scheduler ( e.g. Slurm workload manager ) Docker + Kubernetes Kubeflow Polyaxon (paid features)

Deep Learning Frameworks

Experiment Management Development, training, and evaluation strategy: Always start simple
Train a small model on a small batch. Only if it works, scale to larger data and models, and hyperparameter tuning!

Experiment Management Experiment management tools: Tensorboard provides the visualization and
tooling needed for ML experimentation Losswise (Monitoring for ML) Comet: lets you track code, experiments, and results on ML projects Weights & Biases: Record and visualize every detail of your research with easy collaboration

Experiment Management MLFlow Tracking: for logging parameters, code versions, metrics,
and output files as well as visualization of the results. Automatic experiment tracking with one line of code in python Side by side comparison of experiments Hyper parameter tuning Supports Kubernetes based jobs

Hyperparameter Tuning Approaches: Grid search Random search Bayesian Optimization HyperBand
and Asynchronous Successive Halving Algorithm (ASHA) Population-based Training

Hyperparameter Tuning RayTune Katib Hyperas SIGOPT Sweeps Keras Tuner

Distributed Training Data parallelism: Use it when iteration time is
too long (both tensorflow and PyTorch support) Ray Distributed Training Horovod Model parallelism: when model does not fit on a single GPU

Testing and Deployment . Testing and CI/CD . Web Deployment
. Service Mesh and Traffic Routing . Monitoring . Deploying on Embedded and Mobile Devices

Testing and CI/CD

Testing and CI/CD Unit and Integration Testing: Types of tests:
Training system tests: testing training pipeline Validation tests: testing prediction system on validation set Functionality tests: testing prediction system on few important examples Continuous Integration: Running tests after each new code change pushed to the repo

Testing and CI/CD SaaS for continuous integration: Argo: Open source
Kubernetes native workflow engine for orchestrating parallel jobs (incudes workflows, events, CI and CD). CircleCI: Language-Inclusive Support, Custom Environments, Flexible Resource Allocation, used by instacart, Lyft, and StackShare. Travis CI Buildkite: Fast and stable builds, Open source agent runs on almost any machine and architecture, Freedom to use your own tools and services Jenkins: Old school build system

Web Deployment Consists of a Prediction System and a Serving
System Prediction System: Process input data, make predictions Serving System (Web server): Serve prediction with scale in mind Use REST API to serve prediction HTTP requests Calls the prediction system to respond

Web Deployment Serving options: Deploy to VMs, scale by adding
instances Deploy as containers, scale via orchestration Containers Docker Container Orchestration: Kubernetes (the most popular now) MESOS Marathon Deploy code as a "serverless function" Deploy via a model serving solution

Web Deployment Model serving: Specialized web deployment for ML models
Batches request for GPU inference Frameworks: Tensorflow serving MXNet Model server Clipper (Berkeley) SaaS solutions Seldon: serve and scale models built in any framework on Kubernetes Algorithmia Deploying Jupyter Notebooks: Kubeflow Fairing is a hybrid deployment package that let's you deploy your Jupyter notebook codes!

Web Deployment Decision making: CPU or GPU? CPU inference: CPU
inference is preferable if it meets the requirements. Scale by adding more servers, or going serverless. GPU inference: TF serving or Clipper Adaptive batching is useful

Service Mesh and Traffic Routing Transition from monolithic applications towards
a distributed microservice architecture could be challenging. A Service mesh (consisting of a network of microservices) reduces the complexity of such deployments, and eases the strain on development teams. Istio: a service mesh to ease creation of a network of deployed services load balancing service-to-service authentication monitoring with few or no code changes in service code.

Monitoring Purpose of monitoring: Alerts for downtime, errors, and distribution
shifts Catching service and data regressions Cloud providers solutions are decent Kiali: observability console for Istio with service mesh configuration capabilities. It answers these questions: How are the microservices connected? How are they performing?

Deploying on Embedded and Mobile Devices Main challenge: memory footprint
and compute constraints Solutions: Quantization Reduced model size MobileNets Knowledge Distillation DistillBERT (for NLP)

Deploying on Embedded and Mobile Devices Embedded and Mobile Frameworks:
Tensorflow Lite PyTorch Mobile Core ML ML Kit FRITZ OpenVINO Model Conversion: Open Neural Network Exchange (ONNX): open-source format for deep learning models

Post Deployment

Though Python is the King, Go can be the Minister
Python is the King for S/W 2.0 To actually run a production system at scale -> need infra that implement Autoscaling -> traffic fluctuation don't break API API management -> handle simultaneous API deployment Rolling update -> update models while still serving user Logging Cost optimization

Why Go? Concurrency is crucial for S/W 2.0 infrastructure Wrangle
few different APIs Can be used to programmatically calls these APIs to provision cluster, launch deployment, and monitor APIs Challenging to have performant, reliable. Go has goroutines + channels Build cross-platform CLI is easier in Go Go ecosystem is great for infrastructure projects Go is just a pleasure to work with Good for large project Fast compilation, static typing and great tooling

Go Case Study: Data Processing Pipeline

type Payload interface { Clone() Payload MarkAsProcessed() }

type Processor interface { Process(context.Context, Payload) (Payload, error) } type
ProcessorFunc func(context.Context, Payload) (Payload, error) func (f ProcessorFunc) Process(ctx context.Context, p Payload) (Payload, error) { return f(ctx, p) }

type StageParams interface { StageIndex() int Input() <-chan Payload Output()
chan<- Payload Error() chan<- error } type StageRunner interface { Run(context.Context, StageParams) }

type Source interface { Next(context.Context) bool Payload() Payload Error() error
} type Sink interface { Consume(context.Context, Payload) error }

func sourceWorker(ctx context.Context, source Source, outCh chan<- Payload, errCh chan<-
error) { for source.Next(ctx) { payload := source.Payload() select { case outCh <- payload: case <-ctx.Done(): return } } if err := source.Error(); err != nil { wrappedErr := xerrors.Errorf("pipeline source: %w", err) maybeEmitError(wrappedErr, errCh) } }

func sinkWorker(ctx context.Context, sink Sink, inCh <-chan Payload, errCh chan<-
error) { for { select { case payload, ok := <-inCh: if !ok { return } if err := sink.Consume(ctx, payload); err != nil { wrappedErr := xerrors.Errorf("pipeline sink: %w", err) maybeEmitError(wrappedErr, errCh) return } payload.MarkAsProcessed() case <-ctx.Done(): return } } }

type Pipeline struct { stages []StageRunner } func New(stages ...StageRunner)
*Pipeline { return &Pipeline{stages: stages} } Pipeline

func (p *Pipeline) Process(ctx context.Context, source Source, sink Sink) error
{ var wg sync.WaitGroup pCtx, ctxCancelFn := context.WithCancel(ctx) stageCh := make([]chan Payload, len(p.stages)+1) errCh := make(chan error, len(p.stages)+2) for i := 0; i < len(stageCh); i++ { stageCh[i] = make(chan Payload) } for i := 0; i < len(p.stages); i++ { wg.Add(1) go func(stageIndex int) { p.stages[stageIndex].Run(pCtx, &workerParams{ stage: stageIndex, inCh: stageCh[stageIndex], outCh: stageCh[stageIndex+1], errCh: errCh, }) close(stageCh[stageIndex+1]) wg.Done() }(i) } wg.Add(2) go func() { sourceWorker(pCtx, source, stageCh[0], errCh) close(stageCh[0]) wg.Done() }() go func() { sinkWorker(pCtx, sink, stageCh[len(stageCh)-1], errCh) wg.Done() }() go func() { wg.Wait() close(errCh) ctxCancelFn() }() var err error for pErr := range errCh { err = multierror.Append(err, pErr) ctxCancelFn() } return err }

func (r fifo) Run(ctx context.Context, params StageParams) { for {
select { case <-ctx.Done(): return case payloadIn, ok := <-params.Input(): if !ok { return } payloadOut, err := r.proc.Process(ctx, payloadIn) if err != nil { wrappedErr := xerrors.Errorf("pipeline stage %d: %w", params.StageIndex(), err) maybeEmitError(wrappedErr, params.Error()) return } if payloadOut == nil { payloadIn.MarkAsProcessed() continue } select { case params.Output() <- payloadOut: case <-ctx.Done(): return } } } }

fifos := make([]StageRunner, numWorkers) for i := 0; i <
numWorkers; i++ { fifos[i] = FIFO(proc) } return &fixedWorkerPool{fifos: fifos} } func (p *fixedWorkerPool) Run(ctx context.Context, params StageParams) { var wg sync.WaitGroup for i := 0; i < len(p.fifos); i++ { wg.Add(1) go func(fifoIndex int) { p.fifos[fifoIndex].Run(ctx, params) wg.Done() }(i) } wg.Wait() }

payloadOut, err := p.proc.Process(ctx, payloadIn) if err != nil {
wrappedErr := xerrors.Errorf("pipeline stage %d: %w", params.StageIndex(), err) maybeEmitError(wrappedErr, params.Error()) return } if payloadOut == nil { payloadIn.MarkAsProcessed() return } select { case params.Output() <- payloadOut: case <-ctx.Done(): } }(payloadIn, token) } } for i := 0; i < cap(p.tokenPool); i++ { <-p.tokenPool } }

type broadcast struct { fifos []StageRunner } func Broadcast(procs ...Processor)
StageRunner { if len(procs) == 0 { panic("Broadcast: at least one processor must be specified") } fifos := make([]StageRunner, len(procs)) for i, p := range procs { fifos[i] = FIFO(p) } return &broadcast{fifos: fifos} } func (b *broadcast) Run(ctx context.Context, params StageParams) { var ( wg sync.WaitGroup inCh = make([]chan Payload, len(b.fifos)) ) for i := 0; i < len(b.fifos); i++ { wg.Add(1) inCh[i] = make(chan Payload) go func(fifoIndex int) { fifoParams := &workerParams{ stage: params.StageIndex(), inCh: inCh[fifoIndex], outCh: params.Output(), errCh: params.Error(), } b.fifos[fifoIndex].Run(ctx, fifoParams) wg.Done() }(i) } done: for { select { case <-ctx.Done(): break done case payload, ok := <-params.Input(): if !ok { break done } for i := len(b.fifos) - 1; i >= 0; i-- { var fifoPayload = payload if i != 0 { fifoPayload = payload.Clone() } select { case <-ctx.Done(): break done case inCh[i] <- fifoPayload: } } } } for _, ch := range inCh { close(ch) } wg.Wait() }

Thanks

Software 2.0 With Go

Software 2.0 With Go

More Decks by Ngalam Backend Community

Other Decks in Programming

Featured

Transcript