From data science to scalable NLU and vision cloud service by Bernardt Duvenhage

FROM DATA SCIENCE TO SCALABLE NLU & VISION CLOUD SERVICE
BERNARDT DUVENHAGE FEERSUM ENGINE, PRAEKELT CONSULTING

Scope Overview of the NLU and vision API. The data
science & model building pipeline. NLU & Vision Python module. The multi-tenant rest service/resource layer. Swagger spec to Flask app, monitoring & deployment.

The NLU & Vision API

The NLU & Vision API Developed mainly for building task
oriented chatbots:

oriented chatbots: Navigation intents, entity extraction, natural language FAQs, emotion detection.

oriented chatbots: Navigation intents, entity extraction, natural language FAQs, emotion detection. Image classiﬁcation, visual entity extraction, assessment/regression.

The NLU & Vision API

The NLU & Vision API Flexible on which algorithms we
use.

use. Local language support.

use. Local language support. Custom pre-trained vision models.

use. Local language support. Custom pre-trained vision models. Own costing model.

The Data Science Pipeline

The Data Science Pipeline Develop & test models in isolation:

Notebooks (linear please).

Notebooks (linear please). Model unit tests.

Notebooks (linear please). Model unit tests. In same repo and Python environment as the service.

Detour into Transfer Learning - NLU

char n-grams Detour into Transfer Learning - NLU

char n-grams word n-grams Detour into Transfer Learning - NLU

char n-grams word n-grams POS meaning Detour into Transfer Learning
- NLU

char n-grams word n-grams POS meaning Domain Model Detour into
Transfer Learning - NLU

Detour into Transfer Learning - Vision

Retrain Detour into Transfer Learning - Vision

The Python Module

The Python Module To be used by chatbot and training
software.

software. Model management and consistent API for loading and using the models.

software. Model management and consistent API for loading and using the models. Models stored in ﬁles or SQLAlchemy DB.

software. Model management and consistent API for loading and using the models. Models stored in ﬁles or SQLAlchemy DB. Module level notebooks & unit tests.

The Python Module 1 nlpe.create_feers_language_model('feers_elmo_eng') 2 training_list, testing_list = nlpe_data.load_quora_data(…)
3 nlpe.train_text_clsfr("example_clsfr", training_list, testing_list, clsfr_algorithm=…, …) 4 accuracy, f1, cm = nlpe.test_text_clsfr("example_clsfr", testing_list, …) 5 score_labels, _ = nlpe.retrieve_text_class("example_clsfr", input_text, …)

The Python Module 1 vise.create_feers_vision_model('feers_resnet152') 2 training_list, testing_list = vise_data.load_cat_dog_data(…)
3 vise.train_image_clsfr("example_clsfr", training_list, testing_list, clsfr_algorithm=…, …) 4 accuracy, f1, cm = vise.test_image_clsfr("example_clsfr", testing_list, …) 5 score_labels, _ = vise.retrieve_image_class("example_clsfr", input_image, …)

The Python Module

The Python Module Model workﬂow & life cycle management was
ok.

ok. Diﬃculties:

ok. Diﬃculties: Performance scalability of inference.

ok. Diﬃculties: Performance scalability of inference. Ownership of training & testing data and model hyper params.

Architecture Idea User of module

Architecture Idea User of module Add a REST API …

User of module

The Service Wrapper Layer 1 text_clsfr_wrapper.text_clsfr_create(name, auth_token,   desc,…) 2
text_clsfr_wrapper.text_clsfr_add_training_samples(name, auth_token, json_training_data={…})  2 text_clsfr_wrapper.text_clsfr_train(name, auth_token,   json_training_data={…})  4 _, response_json =   text_clsfr_wrapper.text_clsfr_retrieve(name,   auth_token, text=text)

The Service Wrapper Layer Beneﬁts: Multi-tenancy via API key auth
& model namespaces. Training & testing data and model hyper params via CRUD.

Lasso (Corpus/Dataset Manager)

From Swagger Spec to Flask App OpenAPI / Swagger spec.

app = connexion.App(__name__, specification_dir=…, debug=…)

app = connexion.App(__name__, specification_dir=…, debug=…) Connect controllers to service wrapper!

From Swagger Spec to Flask App Beneﬁts: Don’t have to
write Flask code. Spec driven development. API implementation and tests can live in ﬂask_server folder. Python API wrapper using codegen.

app = connexion.App(__name__, specification_dir=…, debug=…) Connect controllers to service wrapper! connexion_app.add_api(  specification='swagger.yaml', arguments={…}, options={…})

Monitoring

Monitoring Prometheus + Grafana.

Monitoring Prometheus + Grafana. promths_request_latency_gauge = Gauge('feersum_nlu_request_latency_seconds', 'FeersumNLU - Request
Latency', ['endpoint'])

Monitoring Prometheus + Grafana. promths_request_latency_gauge = Gauge('feersum_nlu_request_latency_seconds', 'FeersumNLU - Request
Latency', ['endpoint']) promths_request_latency_gauge.labels(endpoint=f.__name__).set( call_duration)

Monitoring

Alerting Service /health endpoint & pingdom. Slack webhook integration for
Grafana alerts. Resource alerts from hosting infrastructure.

Deployment

Deployment Flask app on gunicorn.

Deployment Flask app on gunicorn. Docker containers.

Deployment Flask app on gunicorn. Docker containers. Rancher 2.0 on
top of Kubernetes on GCP.

top of Kubernetes on GCP. CloudSQL Postgres DB.

top of Kubernetes on GCP. CloudSQL Postgres DB. nginx load balancer.

Deployment DB _0

10 request/s; 1.5M MAU Deployment DB _0

10 request/s; 1.5M MAU 20 request/s; 3.0M MAU Deployment DB
_0 _1

10 request/s; 1.5M MAU 20 request/s; 3.0M MAU 30 request/s;
4.5M MAU Deployment DB _0 _2 _1

10 request/s; 1.5M MAU 20 request/s; 3.0M MAU 30 request/s;
4.5M MAU … 100 request/s; 45M MAU Deployment DB _0 _2 _1

QUESTIONS?

From data science to scalable NLU and vision cl...

From data science to scalable NLU and vision cloud service by Bernardt Duvenhage

More Decks by Pycon ZA

Other Decks in Programming

Featured

Transcript