Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction of Rekcurd; A flexible managing system for machine learning models.

Introduction of Rekcurd; A flexible managing system for machine learning models.

Clovaにおける機械学習モジュールの配信管理基盤 Rekcurd について
分析基盤トーク DAFT #2
https://daft.connpass.com/event/124408/

LINE Developers

April 12, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Who I am @keigohtr Keigo Hattori keigohtr Keigo Hattori Software

    Engineer 2009 東北大学 情報工学 修士 2009~2017.10 富士ゼロックス 2017.11~ LINE 自然言語処理 x 機械学習が専 門。担当プロダクトはClova。 Apitore創業者。
  2. 1. High Availability 2. Management i. Upload the latest ML

    model ii. Switch a model without stopping services iii. Versioning models 3. Monitor i. Load balancing ii. Auto healing iii. Auto scaling iv. Performance/Results check 4. Others i. Server setup (development/staging/production) ii. Integration to the existing services iii. AB testing iv. Managing many ML services v. Logging Tasks in serving ML service
  3. Rekcurd Features Rekcurd is a flexible managing system for ML

    modules Features • Kubernetes • Service Mesh (Istio, Envoy) • Developer-Friendly interface (Rekcurd dashboard) • Django-like gRPC micro-framework (Rekcurd) • SDK (Rekcurd client)
  4. Architecture (v1.0) grpc Reckurd dashboard Existing service Rekcurd client grpc

    Online storage MySQL/sqlite [Optional] Internal/External service [Optional] WorkFlow rest Kubernetes Rekcurd Istio-proxy Rekcurd Istio-proxy Rekcurd Istio-proxy
  5. 1. High Availability 2. Management i. Upload the latest ML

    model ii. Switch a model without stopping services iii. Versioning models 3. Monitor i. Load balancing ii. Auto healing iii. Auto scaling iv. Performance/Results check 4. Others i. Server setup (development/staging/production) ii. Integration to the existing services iii. AB testing iv. Managing many ML services v. Logging Tasks in serving ML service Rekcurd on Kubernetes で全ての課題を解決(したい) まだ課題あるかも
  6. Rekcurd Features (v1.0) Rekcurd is a flexible managing system for

    ML modules Features • Kubernetes • Service Mesh (Istio, Envoy) • Developer-Friendly interface (Rekcurd dashboard) • Django-like gRPC micro-framework (Rekcurd) • SDK (Rekcurd client)
  7. Kubernetes Most famous container orchestration system Features • Container /

    microservices platform • Auto-scaling • Auto-healing • Rolling update (Auto-deployment without stopping services) • more...
  8. Service Mesh (Istio) Traffic management plane without changing your code

    Features • Add traffic management (e.g. AB testing, canary rollouts, traffic splits, circuit breakers, timeouts, retries, ...) • Add security (e.g. authentication, authorization, encryption) • Add observability (e.g. Zipkin, Jaeger, Querying from Prometheus, Grafana) • more...
  9. Django-like gRPC micro-framework (v1.0) Procedures 1. Run commands $ pip

    install rekcurd $ rekcurd startapp sample $ cd sample 2. Implement the contents $ vi app.py 3. Boot it $ python app.py
  10. Rekcurd Roadmap v2.0 v1.x v1.0 v0.x Now Apr 30th 2019

    TBD TBD v1.0 • Dashboard renewal • Istio • AB testing • GitOps / ImageOps option • Less necessity component • AirFlow support v1.x • ML model evaluation and visualization • Canary release • GPU support • Log visualization v2.0 • TBD
  11. Node1 label: dev Node2 label : stg Node4 label :

    prod Node3 label : prod Appendix: Service level independency Pod App: hoge label: dev Pod App: hoge label : stg Pod App: hoge label : prod Pod App: hoge label : prod Pod App: hoge SL: prod
  12.  fluentd official Daemonset  Can be log-forwarded to the

    specific server (e.g. Kibana)  Just output logs to stdout/stderr Appendix: fluentd-kubernetes-daemonset