Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Django in the age of AI

Django in the age of AI

For Django Congress JP 2018 https://djangocongress.jp

Yoshiyuki Kakihara

May 19, 2018
Tweet

More Decks by Yoshiyuki Kakihara

Other Decks in Programming

Transcript

  1. Django in the age of AI the good, the bad

    and the ugly Joyz, Inc. 1
  2. Who I am ‣ Yoshiyuki Kakihara ‣ Twitter: @y15a_ ‣

    Founder & co-CEO at Joyz, Inc. ‣Also the acting CTO - anyone want to replace me? :) ‣ Python for ~7 years ‣ Django for ~5 years Joyz, Inc. 2
  3. What I do ‣ Develops & manages TerraTalk ‣ An

    AI-driven app that helps students and teachers with learning English ‣ Role playing chatbots + automatic English assessment ‣ K12, Universities, Businesses ‣ iOS & Android. Browser version on its way Joyz, Inc. 4
  4. Role-playing conversations ‣ You have a mission to clear e.g.

    immigration ‣ Speech recognition + speech synthesis ‣ We have a Software Engineer course too! Joyz, Inc. 5
  5. Pronunciation Coaching ‣ Checks how close your pronuncation is to

    standard ‣ Phonemes ‣ Generates personalised training material Joyz, Inc. 6
  6. AI, huh? ‣ It's no black magic. ‣ In our

    case, it's: ‣ (Statistical) natural language processing ‣ Acoustics and speech processing ‣ Statistical modelling of students' abilities and learning processes Joyz, Inc. 7
  7. Why Python? ‣ NLP & scientific computing ‣ numpy, scipy

    ‣ jupyter notebooks ‣ sklearn, keras & tensorflow ‣ nltk ...and then an excellent choice of web frameworks Joyz, Inc. 8
  8. Why Django over other frameworks? ‣ More structure, less left

    to personal taste ‣ Test suite ‣ Basic but usable contrib.auth module ‣ MIGRATION! ‣ Easier to hire for Django allows team to grow without slowing down. Joyz, Inc. 9
  9. Language models ‣ Central to many tasks - speech recognition,

    grammar and vocabulary analysis, machine translation ‣ N-gram -> Neural Network ‣ KenLM ‣ Compact & fast N-gram implementation ‣ Python bridge Joyz, Inc. 16
  10. Everyone has their own LM ‣ Suppose: Someone only knows

    three words: "I", "love" and "Django" ‣ Six permutations. ‣ "I love Django.", "Django, I love." ‣ What's YOUR vocabulary like? Best to build LM for learners. Joyz, Inc. 17
  11. Acoustic models ‣ Speech recognition, pronunciation ‣ Features in frequency

    space ‣ C++ ‣ Python through SWIG ‣ Several 10s-100s MB RAM -> Your phones can run it. Joyz, Inc. 18
  12. Choosing the right protocol ‣ Custom protocol over: ‣ HTTP

    ‣ HTTP (long poll) ‣ WebSocket If we need WebSocket, we may choose tornado instead... or Django Channels Joyz, Inc. 19
  13. Criteria ‣ Do you need real time = fully duplex,

    streaming protocol? ‣ Stop/resume? ‣ Who manages the state, server, client, or both? ‣ Learners, not native speakers Joyz, Inc. 20
  14. UX Prototype ‣ Talking to test user over Skype ‣

    Responsiveness/need for pauses ‣ Fluency ‣ Listening comprehenshion ‣ Vocabulary ‣ Dialog comprehension Joyz, Inc. 21
  15. Conclusion: HTTP ‣ No need for real-time protocol ‣ Learners

    need time to pause when speaking ‣ No need for standing connection ‣ Some needs references (e.g. dictionary) when formulating what to say Timescale of interaction is similar to that of web apps (1s~10s) Joyz, Inc. 22
  16. Models ‣ States & transitions, stored as records ‣ User

    speech triggers transition ‣ Pronunciation, grammar, semantics ‣ In process of moving to a document based approach ‣Document & format versioning ‣Need to explore serialiser options Joyz, Inc. 23
  17. Authoring Tool ‣ Non-engineers to author chatbot lessons ‣ GUI

    only - mouse clicks, drag & drop, pure English ‣ Need a web app ‣ Graph building UI with autosave Joyz, Inc. 24
  18. Django REST Framework ‣ Class based views ‣ Authorization framework

    ‣ Form-like API for serialisers & deserialisers ‣ Debug mode GUI ‣ Great for front-end developers Joyz, Inc. 25
  19. Front-end ‣ JointJS JavaScript library for charts, supports digraph ‣

    backbone.js ‣ Client-generated UUID for states and transitions Joyz, Inc. 26
  20. Scaling for traffic ‣ Multiple web processes ‣ Auto-scaling of

    stateless processes ‣ S3 for static resources ‣ Cache for hot dynamic content Peak times: morning-early afternoon for K12, evening for universities and businesses Joyz, Inc. 29
  21. Scaling for storage ‣ We haven't had too many issues

    (yet) ‣ Proper use of indexes + revise as needed Joyz, Inc. 30
  22. Scaling for growing team 1. Start out with a monolithic

    Django + Celery app ‣ Avoid cyclic dependencies (not even dirty hacks!) 2. Factor out to microservice if: ‣ RAM/CPU profiles are vastly different ‣ Code is very stable and/or a dedicated team can be assigned Joyz, Inc. 31
  23. NLP components ‣ Resource hoggers, even for inference ‣ Language

    Models (GBs of RAM) ‣ Parsers (GBs of RAM) ‣ Django + gunicorn needs multiple processes ‣ Hybrid setup (multiple web/worker processes + 1 process for each NLP) or Microservice Joyz, Inc. 32
  24. Measuring English Ability ‣ Raw data points to actionable feedback

    ‣ Speech data into phoneme-wise pronunciation accuracy ‣ Estimating vocabulary ‣ Measuring fluency ‣ Data query, aggregation, analysis using NLP microservices Joyz, Inc. 33
  25. Scaling for logic ‣ Managing growing complexity of logic ‣

    Data processing ‣ Computational graph with multiple roots # Node D has two roots, B and C # A is an input node i.e. outputs raw data edges = [(A, B), (A, C), (B, D), (C, D)] # There can be cases where nodes can't even be organised into distinct layers like the example above Joyz, Inc. 34
  26. Celery for computation ‣ Celery: a distributed task queue ‣

    Supports multiple brokers - Redis, RabbitMQ, AmazonSQS ‣ We use Redis ‣ Low latency ‣ May move to AmazonSQS for throughput Joyz, Inc. 35
  27. Celery Pros 1. Familiar API, integrates with Django very well

    2. Can run on the same infrastructure as other background tasks such as ‣ Talking SMTP (sending emails) ‣ Data manipulation 3. Out-of-the-box API for parallel execution + callback Joyz, Inc. 36
  28. The catch ‣ No trivial way to handle multiple roots

    / merging branches. ‣ Batch with completion flags + retry ‣ Robust but not optimal Joyz, Inc. 37
  29. Future ‣ Custom library, Celery as backend ‣ AWS Lambda?

    ‣ Easy to scale, like SQS ‣ Need to manage code & tests outside Django-Celery monolith ‣ Apache Spark? Joyz, Inc. 38
  30. Tests ‣ Conventional Django tests: Fixture + Logic + Assert

    ‣ Machine learning based components: assert on statistical validation, not on the output of single inference ‣ Online learning algorithms need a modified strategy ‣ Test should contain training as well as inference Joyz, Inc. 39
  31. Making tests faster # unless you need DB specific features...

    DATABASES = { 'default': { 'ENGINE': 'django.db.backends.sqlite3', 'NAME': "memory", } } # make it less secure (doesn't matter in tests), but faster PASSWORD_HASHERS = ( 'django.contrib.auth.hashers.MD5PasswordHasher', ) CELERY_ALWAYS_EAGER = True Joyz, Inc. 40
  32. Backward compatibility ‣ Some integration tests need to be "deterministic"

    ‣ Accepted answers today should still be accepted tomorrow ‣ Running behavioural tests over user logs ‣ Very large fixture ‣ Aggressive parallelisation Joyz, Inc. 41
  33. Performance & load test ‣ Responsiveness of web process measured

    in time spent in web process ‣ 95% percentile ‣ Distribution of response times over N queries ‣ Statistical! Joyz, Inc. 43
  34. Summary ‣ Django is still a great, all-around framework that

    lets you move fast ‣ Test your UX before you build anything (or pick a framework) ‣ Celery is robust, performant and versatile, allowing you to build complex logi on top ‣ Tests can be heavy, so plan your CI accordingly Joyz, Inc. 44