Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Django in the age of AI

Django in the age of AI

For Django Congress JP 2018 https://djangocongress.jp

Yoshiyuki Kakihara

May 19, 2018
Tweet

More Decks by Yoshiyuki Kakihara

Other Decks in Programming

Transcript

  1. Django in the age of AI
    the good, the bad and the ugly
    Joyz, Inc. 1

    View Slide

  2. Who I am
    ‣ Yoshiyuki Kakihara
    ‣ Twitter: @y15a_
    ‣ Founder & co-CEO at Joyz, Inc.
    ‣Also the acting CTO - anyone
    want to replace me? :)
    ‣ Python for ~7 years
    ‣ Django for ~5 years
    Joyz, Inc. 2

    View Slide

  3. Joyz, Inc. 3

    View Slide

  4. What I do
    ‣ Develops & manages TerraTalk
    ‣ An AI-driven app that helps students and teachers with
    learning English
    ‣ Role playing chatbots + automatic English assessment
    ‣ K12, Universities, Businesses
    ‣ iOS & Android. Browser version on its way
    Joyz, Inc. 4

    View Slide

  5. Role-playing
    conversations
    ‣ You have a mission to clear e.g.
    immigration
    ‣ Speech recognition + speech
    synthesis
    ‣ We have a Software Engineer
    course too!
    Joyz, Inc. 5

    View Slide

  6. Pronunciation
    Coaching
    ‣ Checks how close your
    pronuncation is to standard
    ‣ Phonemes
    ‣ Generates personalised training
    material
    Joyz, Inc. 6

    View Slide

  7. AI, huh?
    ‣ It's no black magic.
    ‣ In our case, it's:
    ‣ (Statistical) natural language processing
    ‣ Acoustics and speech processing
    ‣ Statistical modelling of students' abilities and learning
    processes
    Joyz, Inc. 7

    View Slide

  8. Why Python?
    ‣ NLP & scientific computing
    ‣ numpy, scipy
    ‣ jupyter notebooks
    ‣ sklearn, keras & tensorflow
    ‣ nltk
    ...and then an excellent choice of web frameworks
    Joyz, Inc. 8

    View Slide

  9. Why Django over other frameworks?
    ‣ More structure, less left to personal taste
    ‣ Test suite
    ‣ Basic but usable contrib.auth module
    ‣ MIGRATION!
    ‣ Easier to hire for
    Django allows team to grow without slowing down.
    Joyz, Inc. 9

    View Slide

  10. Other dependencies
    djangorestframework
    whitenoise
    django-celery
    django-debug-toolbar
    nltk
    nose
    numpy
    pandas
    ... and many more
    Joyz, Inc. 10

    View Slide

  11. Joyz, Inc. 11

    View Slide

  12. Django is the ______ ______ for
    ________ with _______.
    Joyz, Inc. 12

    View Slide

  13. Django is the web framework for
    perfectionists with deadlines.
    Joyz, Inc. 13

    View Slide

  14. Django is the ultimate tool for
    gorillas with bananas.
    Joyz, Inc. 14

    View Slide

  15. Django is the survived mainly for
    technical with the.
    Joyz, Inc. 15

    View Slide

  16. Language models
    ‣ Central to many tasks - speech recognition, grammar and
    vocabulary analysis, machine translation
    ‣ N-gram -> Neural Network
    ‣ KenLM
    ‣ Compact & fast N-gram implementation
    ‣ Python bridge
    Joyz, Inc. 16

    View Slide

  17. Everyone has their own LM
    ‣ Suppose: Someone only knows three words: "I", "love" and
    "Django"
    ‣ Six permutations.
    ‣ "I love Django.", "Django, I love."
    ‣ What's YOUR vocabulary like?
    Best to build LM for learners.
    Joyz, Inc. 17

    View Slide

  18. Acoustic models
    ‣ Speech recognition, pronunciation
    ‣ Features in frequency space
    ‣ C++
    ‣ Python through SWIG
    ‣ Several 10s-100s MB RAM -> Your phones can run it.
    Joyz, Inc. 18

    View Slide

  19. Choosing the right protocol
    ‣ Custom protocol over:
    ‣ HTTP
    ‣ HTTP (long poll)
    ‣ WebSocket
    If we need WebSocket, we may choose tornado instead... or
    Django Channels
    Joyz, Inc. 19

    View Slide

  20. Criteria
    ‣ Do you need real time = fully duplex, streaming protocol?
    ‣ Stop/resume?
    ‣ Who manages the state, server, client, or both?
    ‣ Learners, not native speakers
    Joyz, Inc. 20

    View Slide

  21. UX Prototype
    ‣ Talking to test user over Skype
    ‣ Responsiveness/need for pauses
    ‣ Fluency
    ‣ Listening comprehenshion
    ‣ Vocabulary
    ‣ Dialog comprehension
    Joyz, Inc. 21

    View Slide

  22. Conclusion: HTTP
    ‣ No need for real-time protocol
    ‣ Learners need time to pause when speaking
    ‣ No need for standing connection
    ‣ Some needs references (e.g. dictionary) when
    formulating what to say
    Timescale of interaction is similar to that of web apps (1s~10s)
    Joyz, Inc. 22

    View Slide

  23. Models
    ‣ States & transitions, stored as
    records
    ‣ User speech triggers transition
    ‣ Pronunciation, grammar, semantics
    ‣ In process of moving to a
    document based approach
    ‣Document & format versioning
    ‣Need to explore serialiser options
    Joyz, Inc. 23

    View Slide

  24. Authoring Tool
    ‣ Non-engineers to author chatbot lessons
    ‣ GUI only - mouse clicks, drag & drop, pure English
    ‣ Need a web app
    ‣ Graph building UI with autosave
    Joyz, Inc. 24

    View Slide

  25. Django REST Framework
    ‣ Class based views
    ‣ Authorization framework
    ‣ Form-like API for serialisers & deserialisers
    ‣ Debug mode GUI
    ‣ Great for front-end developers
    Joyz, Inc. 25

    View Slide

  26. Front-end
    ‣ JointJS JavaScript library for charts, supports digraph
    ‣ backbone.js
    ‣ Client-generated UUID for states and transitions
    Joyz, Inc. 26

    View Slide

  27. Scalability
    Joyz, Inc. 27

    View Slide

  28. Different kinds of scalability
    1. Traffic
    2. Storage
    3. Team
    4. Logic
    Joyz, Inc. 28

    View Slide

  29. Scaling for traffic
    ‣ Multiple web processes
    ‣ Auto-scaling of stateless processes
    ‣ S3 for static resources
    ‣ Cache for hot dynamic content
    Peak times: morning-early afternoon for K12, evening for
    universities and businesses
    Joyz, Inc. 29

    View Slide

  30. Scaling for storage
    ‣ We haven't had too many issues (yet)
    ‣ Proper use of indexes + revise as needed
    Joyz, Inc. 30

    View Slide

  31. Scaling for growing team
    1. Start out with a monolithic Django + Celery app
    ‣ Avoid cyclic dependencies (not even dirty hacks!)
    2. Factor out to microservice if:
    ‣ RAM/CPU profiles are vastly different
    ‣ Code is very stable and/or a dedicated team can be
    assigned
    Joyz, Inc. 31

    View Slide

  32. NLP components
    ‣ Resource hoggers, even for inference
    ‣ Language Models (GBs of RAM)
    ‣ Parsers (GBs of RAM)
    ‣ Django + gunicorn needs multiple processes
    ‣ Hybrid setup (multiple web/worker processes + 1 process
    for each NLP) or Microservice
    Joyz, Inc. 32

    View Slide

  33. Measuring English Ability
    ‣ Raw data points to actionable feedback
    ‣ Speech data into phoneme-wise pronunciation accuracy
    ‣ Estimating vocabulary
    ‣ Measuring fluency
    ‣ Data query, aggregation, analysis using NLP microservices
    Joyz, Inc. 33

    View Slide

  34. Scaling for logic
    ‣ Managing growing complexity of logic
    ‣ Data processing
    ‣ Computational graph with multiple roots
    # Node D has two roots, B and C
    # A is an input node i.e. outputs raw data
    edges = [(A, B), (A, C), (B, D), (C, D)]
    # There can be cases where nodes can't even be organised into distinct layers like the example above
    Joyz, Inc. 34

    View Slide

  35. Celery for computation
    ‣ Celery: a distributed task queue
    ‣ Supports multiple brokers - Redis, RabbitMQ, AmazonSQS
    ‣ We use Redis
    ‣ Low latency
    ‣ May move to AmazonSQS for throughput
    Joyz, Inc. 35

    View Slide

  36. Celery Pros
    1. Familiar API, integrates with Django very well
    2. Can run on the same infrastructure as other background
    tasks such as
    ‣ Talking SMTP (sending emails)
    ‣ Data manipulation
    3. Out-of-the-box API for parallel execution + callback
    Joyz, Inc. 36

    View Slide

  37. The catch
    ‣ No trivial way to handle multiple roots / merging branches.
    ‣ Batch with completion flags + retry
    ‣ Robust but not optimal
    Joyz, Inc. 37

    View Slide

  38. Future
    ‣ Custom library, Celery as backend
    ‣ AWS Lambda?
    ‣ Easy to scale, like SQS
    ‣ Need to manage code & tests outside Django-Celery
    monolith
    ‣ Apache Spark?
    Joyz, Inc. 38

    View Slide

  39. Tests
    ‣ Conventional Django tests: Fixture + Logic + Assert
    ‣ Machine learning based components: assert on statistical
    validation, not on the output of single inference
    ‣ Online learning algorithms need a modified strategy
    ‣ Test should contain training as well as inference
    Joyz, Inc. 39

    View Slide

  40. Making tests faster
    # unless you need DB specific features...
    DATABASES = {
    'default': {
    'ENGINE': 'django.db.backends.sqlite3',
    'NAME': "memory",
    }
    }
    # make it less secure (doesn't matter in tests), but faster
    PASSWORD_HASHERS = (
    'django.contrib.auth.hashers.MD5PasswordHasher',
    )
    CELERY_ALWAYS_EAGER = True
    Joyz, Inc. 40

    View Slide

  41. Backward compatibility
    ‣ Some integration tests need to be "deterministic"
    ‣ Accepted answers today should still be accepted
    tomorrow
    ‣ Running behavioural tests over user logs
    ‣ Very large fixture
    ‣ Aggressive parallelisation
    Joyz, Inc. 41

    View Slide

  42. Sounds difficult?
    Joyz, Inc. 42

    View Slide

  43. Performance & load test
    ‣ Responsiveness of web process measured in time spent in
    web process
    ‣ 95% percentile
    ‣ Distribution of response times over N queries
    ‣ Statistical!
    Joyz, Inc. 43

    View Slide

  44. Summary
    ‣ Django is still a great, all-around framework that lets you
    move fast
    ‣ Test your UX before you build anything (or pick a
    framework)
    ‣ Celery is robust, performant and versatile, allowing you to
    build complex logi on top
    ‣ Tests can be heavy, so plan your CI accordingly
    Joyz, Inc. 44

    View Slide

  45. Joyz, Inc. 45

    View Slide