Keynote: Lisa Guo & Hui Ding - Python @ Instagram

Keynote: Lisa Guo & Hui Ding - Python @ Instagram

Bde70c0ba031a765ff25c19e6b7d6d23?s=128

PyCon 2017

June 07, 2017
Tweet

Transcript

  1. 2.
  2. 4.
  3. 8.

    “HOW DO YOU RUN DJANGO+PYTHON AT THAT SCALE?” "WHY HAVEN’T

    YOU RE- WRITTEN EVERYTHING IN NODE.JS YET?"
  4. 11.
  5. 12.

    THINGS INSTAGRAM LOVES Maturity of the language and Django framework

    Use Django user model for supporting 3B+ registered users
  6. 13.

    WE HAVE TAKEN OUR PYTHON 1 Sharded database support 2

    Run our stack across multiple geographically distributed data centers 3 Disable garbage-collection to improve memory utilization
  7. 16.

    PYTHON IS SIMPLE AND CLEAN, AND 1 Scope the problem,

    AKA, do simple things first 2 Use proven technology 3 User first: focus on adding value to user facing features
  8. 18.
  9. 19.

    SCALING PYTHON TO SUPPORT USER 20 40 60 80 00

    0 2 4 6 8 10 12 14 16 18 20 22 24 Server growth User growth
  10. 20.

    PYTHON EFFICIENCY 1 Build extensive tools to profile and understand

    perf bottleneck 2 Moving stable, critical components to C/C++, e.g., memcached access 3 Async? New python runtime? 4 Cythonization
  11. 21.
  12. 25.
  13. 35.

    Create separate branch? MIGRATION OPTIONS • Branch sync overhead, error

    prone; • Merging back will be a risk; • Lose the opportunity to educate.
  14. 36.

    MIGRATION OPTIONS One endpoint at a time? Create separate branch?

    • Common modules across end points • Context switch for developers • Overhead of managing separate pools
  15. 37.

    MIGRATION OPTIONS One endpoint at a time? Create separate branch?

    Micro services? • Massive code restructuring • Higher latency • Deployment complexity
  16. 43.

    Rule: No Python3, no new package Delete unused, incompatible packages

    twisted django-paging django-sentry django-templatetag-sugar dnspython enum34 hiredis httplib2 ipaddr jsonfig pyapns phpserialize python-memcached thrift THIRD-PARTY PACKAGES
  17. 44.

    Upgraded packages Rule: No Python3, no new package Delete unused,

    incompatible packages THIRD-PARTY PACKAGES
  18. 45.

    CODEMOD modernize -f libmodernize.fixes.fix_filter <dir> -w -n raise, metaclass, methodattr,

    next, funcattr, library renaming, range, maxint/maxsize, filter->list comprehension, long integer, itertools, tuple unpacking in method, cython, urllib parse, StringiO, context mgr/decorator, ipaddr, cmp, except, nested, dict iter fixes, mock
  19. 46.
  20. 50.
  21. 51.
  22. 52.

    • Not 100% code coverage • Many external services have

    mocks • Data compatibility issues typically do not show up in unit tests UNIT TESTS: LIMITS
  23. 57.

    from __future__ import absolute_import from __future__ import print_function from __future__

    import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/
  24. 58.

    from __future__ import absolute_import from __future__ import print_function from __future__

    import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/ from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals
  25. 59.

    CHALLENGE: UNICODE/STR/ mymac = hmac.new(‘abc’) TypeError: key: expected bytes or

    bytearray, but got 'str' value = ‘abc’ if isinstance(value, six.text_type): value = value.encode(encoding=‘utf-8’) mymac = hmac.new(value) Error Fix
  26. 62.

    CHALLENGE: PICKLE Memcache Me Python3 Others Python2 memcache_data = pickle.dumps(data,

    pickle.HIGHEST_PROTOCOL) data = pickle.load(memcache_data) Write Read 4 ValueError: unsupported pickle protocol: 4 2
  27. 63.

    CHALLENGE: PICKLE pickle.dumps({'a': 'ᆽ'}, 2) UnicodeDecodeError: 'ascii' codec can’t decode

    byte 0xe9 in position 0: ordinal not in range(128) memcache_data = pickle.dumps(data, 2) Write Python2 writes Python3 reads
  28. 64.

    pickle.dumps({'a': 'ᆽ'}, 2) {u'a': u'\u7231'} != {'a': 'ᆽ'} Python2 reads

    Python3 writes memcache_data = pickle.dumps(data, 2) Write CHALLENGE: PICKLE
  29. 66.
  30. 68.

    1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  31. 69.

    1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  32. 70.

    1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  33. 71.

    1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = list(map(BuildProcess,

    CYTHON_SOURCES)) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  34. 72.

    '{"a": 1, "c": 3, "b": 2}' CHALLENGE: DICTIONARY Python2 Python3.5.1

    >>> testdict = {'a': 1, 'b': 2, 'c': 3} >>> json.dumps(testdict) Python3.6 Cross version '{"c": 3, "b": 2, "a": 1}' '{"c": 3, "a": 1, "b": 2}' '{"a": 1, "b": 2, "c": 3}' >>> json.dumps(testdict, sort_keys=True) '{"a": 1, "b": 2, "c": 3}'
  35. 74.

    CPU instructions per request max Requests Per Second -12% 0%

    Memory configuration difference? 12%
  36. 78.
  37. 80.

    Video View Notification Save Draft Comment Filtering Story Viewer Ranking

    First Story Notification Self-harm Prevention Live
  38. 81.

    MOTIVATION: TYPE HINTS Type hints - 2% done def compose_from_max_id(max_id:

    Optional[str]) -> Optional[str]: MyPy and typeshed contribution Tooling - collect data and suggest type hints
  39. 84.
  40. 85.