Slide 1

Slide 1 text

Django in the age of AI the good, the bad and the ugly Joyz, Inc. 1

Slide 2

Slide 2 text

Who I am ‣ Yoshiyuki Kakihara ‣ Twitter: @y15a_ ‣ Founder & co-CEO at Joyz, Inc. ‣Also the acting CTO - anyone want to replace me? :) ‣ Python for ~7 years ‣ Django for ~5 years Joyz, Inc. 2

Slide 3

Slide 3 text

Joyz, Inc. 3

Slide 4

Slide 4 text

What I do ‣ Develops & manages TerraTalk ‣ An AI-driven app that helps students and teachers with learning English ‣ Role playing chatbots + automatic English assessment ‣ K12, Universities, Businesses ‣ iOS & Android. Browser version on its way Joyz, Inc. 4

Slide 5

Slide 5 text

Role-playing conversations ‣ You have a mission to clear e.g. immigration ‣ Speech recognition + speech synthesis ‣ We have a Software Engineer course too! Joyz, Inc. 5

Slide 6

Slide 6 text

Pronunciation Coaching ‣ Checks how close your pronuncation is to standard ‣ Phonemes ‣ Generates personalised training material Joyz, Inc. 6

Slide 7

Slide 7 text

AI, huh? ‣ It's no black magic. ‣ In our case, it's: ‣ (Statistical) natural language processing ‣ Acoustics and speech processing ‣ Statistical modelling of students' abilities and learning processes Joyz, Inc. 7

Slide 8

Slide 8 text

Why Python? ‣ NLP & scientific computing ‣ numpy, scipy ‣ jupyter notebooks ‣ sklearn, keras & tensorflow ‣ nltk ...and then an excellent choice of web frameworks Joyz, Inc. 8

Slide 9

Slide 9 text

Why Django over other frameworks? ‣ More structure, less left to personal taste ‣ Test suite ‣ Basic but usable contrib.auth module ‣ MIGRATION! ‣ Easier to hire for Django allows team to grow without slowing down. Joyz, Inc. 9

Slide 10

Slide 10 text

Other dependencies djangorestframework whitenoise django-celery django-debug-toolbar nltk nose numpy pandas ... and many more Joyz, Inc. 10

Slide 11

Slide 11 text

Joyz, Inc. 11

Slide 12

Slide 12 text

Django is the ______ ______ for ________ with _______. Joyz, Inc. 12

Slide 13

Slide 13 text

Django is the web framework for perfectionists with deadlines. Joyz, Inc. 13

Slide 14

Slide 14 text

Django is the ultimate tool for gorillas with bananas. Joyz, Inc. 14

Slide 15

Slide 15 text

Django is the survived mainly for technical with the. Joyz, Inc. 15

Slide 16

Slide 16 text

Language models ‣ Central to many tasks - speech recognition, grammar and vocabulary analysis, machine translation ‣ N-gram -> Neural Network ‣ KenLM ‣ Compact & fast N-gram implementation ‣ Python bridge Joyz, Inc. 16

Slide 17

Slide 17 text

Everyone has their own LM ‣ Suppose: Someone only knows three words: "I", "love" and "Django" ‣ Six permutations. ‣ "I love Django.", "Django, I love." ‣ What's YOUR vocabulary like? Best to build LM for learners. Joyz, Inc. 17

Slide 18

Slide 18 text

Acoustic models ‣ Speech recognition, pronunciation ‣ Features in frequency space ‣ C++ ‣ Python through SWIG ‣ Several 10s-100s MB RAM -> Your phones can run it. Joyz, Inc. 18

Slide 19

Slide 19 text

Choosing the right protocol ‣ Custom protocol over: ‣ HTTP ‣ HTTP (long poll) ‣ WebSocket If we need WebSocket, we may choose tornado instead... or Django Channels Joyz, Inc. 19

Slide 20

Slide 20 text

Criteria ‣ Do you need real time = fully duplex, streaming protocol? ‣ Stop/resume? ‣ Who manages the state, server, client, or both? ‣ Learners, not native speakers Joyz, Inc. 20

Slide 21

Slide 21 text

UX Prototype ‣ Talking to test user over Skype ‣ Responsiveness/need for pauses ‣ Fluency ‣ Listening comprehenshion ‣ Vocabulary ‣ Dialog comprehension Joyz, Inc. 21

Slide 22

Slide 22 text

Conclusion: HTTP ‣ No need for real-time protocol ‣ Learners need time to pause when speaking ‣ No need for standing connection ‣ Some needs references (e.g. dictionary) when formulating what to say Timescale of interaction is similar to that of web apps (1s~10s) Joyz, Inc. 22

Slide 23

Slide 23 text

Models ‣ States & transitions, stored as records ‣ User speech triggers transition ‣ Pronunciation, grammar, semantics ‣ In process of moving to a document based approach ‣Document & format versioning ‣Need to explore serialiser options Joyz, Inc. 23

Slide 24

Slide 24 text

Authoring Tool ‣ Non-engineers to author chatbot lessons ‣ GUI only - mouse clicks, drag & drop, pure English ‣ Need a web app ‣ Graph building UI with autosave Joyz, Inc. 24

Slide 25

Slide 25 text

Django REST Framework ‣ Class based views ‣ Authorization framework ‣ Form-like API for serialisers & deserialisers ‣ Debug mode GUI ‣ Great for front-end developers Joyz, Inc. 25

Slide 26

Slide 26 text

Front-end ‣ JointJS JavaScript library for charts, supports digraph ‣ backbone.js ‣ Client-generated UUID for states and transitions Joyz, Inc. 26

Slide 27

Slide 27 text

Scalability Joyz, Inc. 27

Slide 28

Slide 28 text

Different kinds of scalability 1. Traffic 2. Storage 3. Team 4. Logic Joyz, Inc. 28

Slide 29

Slide 29 text

Scaling for traffic ‣ Multiple web processes ‣ Auto-scaling of stateless processes ‣ S3 for static resources ‣ Cache for hot dynamic content Peak times: morning-early afternoon for K12, evening for universities and businesses Joyz, Inc. 29

Slide 30

Slide 30 text

Scaling for storage ‣ We haven't had too many issues (yet) ‣ Proper use of indexes + revise as needed Joyz, Inc. 30

Slide 31

Slide 31 text

Scaling for growing team 1. Start out with a monolithic Django + Celery app ‣ Avoid cyclic dependencies (not even dirty hacks!) 2. Factor out to microservice if: ‣ RAM/CPU profiles are vastly different ‣ Code is very stable and/or a dedicated team can be assigned Joyz, Inc. 31

Slide 32

Slide 32 text

NLP components ‣ Resource hoggers, even for inference ‣ Language Models (GBs of RAM) ‣ Parsers (GBs of RAM) ‣ Django + gunicorn needs multiple processes ‣ Hybrid setup (multiple web/worker processes + 1 process for each NLP) or Microservice Joyz, Inc. 32

Slide 33

Slide 33 text

Measuring English Ability ‣ Raw data points to actionable feedback ‣ Speech data into phoneme-wise pronunciation accuracy ‣ Estimating vocabulary ‣ Measuring fluency ‣ Data query, aggregation, analysis using NLP microservices Joyz, Inc. 33

Slide 34

Slide 34 text

Scaling for logic ‣ Managing growing complexity of logic ‣ Data processing ‣ Computational graph with multiple roots # Node D has two roots, B and C # A is an input node i.e. outputs raw data edges = [(A, B), (A, C), (B, D), (C, D)] # There can be cases where nodes can't even be organised into distinct layers like the example above Joyz, Inc. 34

Slide 35

Slide 35 text

Celery for computation ‣ Celery: a distributed task queue ‣ Supports multiple brokers - Redis, RabbitMQ, AmazonSQS ‣ We use Redis ‣ Low latency ‣ May move to AmazonSQS for throughput Joyz, Inc. 35

Slide 36

Slide 36 text

Celery Pros 1. Familiar API, integrates with Django very well 2. Can run on the same infrastructure as other background tasks such as ‣ Talking SMTP (sending emails) ‣ Data manipulation 3. Out-of-the-box API for parallel execution + callback Joyz, Inc. 36

Slide 37

Slide 37 text

The catch ‣ No trivial way to handle multiple roots / merging branches. ‣ Batch with completion flags + retry ‣ Robust but not optimal Joyz, Inc. 37

Slide 38

Slide 38 text

Future ‣ Custom library, Celery as backend ‣ AWS Lambda? ‣ Easy to scale, like SQS ‣ Need to manage code & tests outside Django-Celery monolith ‣ Apache Spark? Joyz, Inc. 38

Slide 39

Slide 39 text

Tests ‣ Conventional Django tests: Fixture + Logic + Assert ‣ Machine learning based components: assert on statistical validation, not on the output of single inference ‣ Online learning algorithms need a modified strategy ‣ Test should contain training as well as inference Joyz, Inc. 39

Slide 40

Slide 40 text

Making tests faster # unless you need DB specific features... DATABASES = { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': "memory", } } # make it less secure (doesn't matter in tests), but faster PASSWORD_HASHERS = ( 'django.contrib.auth.hashers.MD5PasswordHasher', ) CELERY_ALWAYS_EAGER = True Joyz, Inc. 40

Slide 41

Slide 41 text

Backward compatibility ‣ Some integration tests need to be "deterministic" ‣ Accepted answers today should still be accepted tomorrow ‣ Running behavioural tests over user logs ‣ Very large fixture ‣ Aggressive parallelisation Joyz, Inc. 41

Slide 42

Slide 42 text

Sounds difficult? Joyz, Inc. 42

Slide 43

Slide 43 text

Performance & load test ‣ Responsiveness of web process measured in time spent in web process ‣ 95% percentile ‣ Distribution of response times over N queries ‣ Statistical! Joyz, Inc. 43

Slide 44

Slide 44 text

Summary ‣ Django is still a great, all-around framework that lets you move fast ‣ Test your UX before you build anything (or pick a framework) ‣ Celery is robust, performant and versatile, allowing you to build complex logi on top ‣ Tests can be heavy, so plan your CI accordingly Joyz, Inc. 44

Slide 45

Slide 45 text

Joyz, Inc. 45