Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Keynote: Lisa Guo & Hui Ding - Python @ Instagram

Keynote: Lisa Guo & Hui Ding - Python @ Instagram

PyCon 2017

June 07, 2017
Tweet

More Decks by PyCon 2017

Other Decks in Programming

Transcript

  1. “HOW DO YOU RUN DJANGO+PYTHON AT THAT SCALE?” "WHY HAVEN’T

    YOU RE- WRITTEN EVERYTHING IN NODE.JS YET?"
  2. THINGS INSTAGRAM LOVES Maturity of the language and Django framework

    Use Django user model for supporting 3B+ registered users
  3. WE HAVE TAKEN OUR PYTHON 1 Sharded database support 2

    Run our stack across multiple geographically distributed data centers 3 Disable garbage-collection to improve memory utilization
  4. PYTHON IS SIMPLE AND CLEAN, AND 1 Scope the problem,

    AKA, do simple things first 2 Use proven technology 3 User first: focus on adding value to user facing features
  5. SCALING PYTHON TO SUPPORT USER 20 40 60 80 00

    0 2 4 6 8 10 12 14 16 18 20 22 24 Server growth User growth
  6. PYTHON EFFICIENCY 1 Build extensive tools to profile and understand

    perf bottleneck 2 Moving stable, critical components to C/C++, e.g., memcached access 3 Async? New python runtime? 4 Cythonization
  7. Create separate branch? MIGRATION OPTIONS • Branch sync overhead, error

    prone; • Merging back will be a risk; • Lose the opportunity to educate.
  8. MIGRATION OPTIONS One endpoint at a time? Create separate branch?

    • Common modules across end points • Context switch for developers • Overhead of managing separate pools
  9. MIGRATION OPTIONS One endpoint at a time? Create separate branch?

    Micro services? • Massive code restructuring • Higher latency • Deployment complexity
  10. Rule: No Python3, no new package Delete unused, incompatible packages

    twisted django-paging django-sentry django-templatetag-sugar dnspython enum34 hiredis httplib2 ipaddr jsonfig pyapns phpserialize python-memcached thrift THIRD-PARTY PACKAGES
  11. Upgraded packages Rule: No Python3, no new package Delete unused,

    incompatible packages THIRD-PARTY PACKAGES
  12. CODEMOD modernize -f libmodernize.fixes.fix_filter <dir> -w -n raise, metaclass, methodattr,

    next, funcattr, library renaming, range, maxint/maxsize, filter->list comprehension, long integer, itertools, tuple unpacking in method, cython, urllib parse, StringiO, context mgr/decorator, ipaddr, cmp, except, nested, dict iter fixes, mock
  13. • Not 100% code coverage • Many external services have

    mocks • Data compatibility issues typically do not show up in unit tests UNIT TESTS: LIMITS
  14. from __future__ import absolute_import from __future__ import print_function from __future__

    import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/
  15. from __future__ import absolute_import from __future__ import print_function from __future__

    import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/ from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals
  16. CHALLENGE: UNICODE/STR/ mymac = hmac.new(‘abc’) TypeError: key: expected bytes or

    bytearray, but got 'str' value = ‘abc’ if isinstance(value, six.text_type): value = value.encode(encoding=‘utf-8’) mymac = hmac.new(value) Error Fix
  17. CHALLENGE: PICKLE Memcache Me Python3 Others Python2 memcache_data = pickle.dumps(data,

    pickle.HIGHEST_PROTOCOL) data = pickle.load(memcache_data) Write Read 4 ValueError: unsupported pickle protocol: 4 2
  18. CHALLENGE: PICKLE pickle.dumps({'a': 'ᆽ'}, 2) UnicodeDecodeError: 'ascii' codec can’t decode

    byte 0xe9 in position 0: ordinal not in range(128) memcache_data = pickle.dumps(data, 2) Write Python2 writes Python3 reads
  19. pickle.dumps({'a': 'ᆽ'}, 2) {u'a': u'\u7231'} != {'a': 'ᆽ'} Python2 reads

    Python3 writes memcache_data = pickle.dumps(data, 2) Write CHALLENGE: PICKLE
  20. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  21. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  22. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  23. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = list(map(BuildProcess,

    CYTHON_SOURCES)) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  24. '{"a": 1, "c": 3, "b": 2}' CHALLENGE: DICTIONARY Python2 Python3.5.1

    >>> testdict = {'a': 1, 'b': 2, 'c': 3} >>> json.dumps(testdict) Python3.6 Cross version '{"c": 3, "b": 2, "a": 1}' '{"c": 3, "a": 1, "b": 2}' '{"a": 1, "b": 2, "c": 3}' >>> json.dumps(testdict, sort_keys=True) '{"a": 1, "b": 2, "c": 3}'
  25. CPU instructions per request max Requests Per Second -12% 0%

    Memory configuration difference? 12%
  26. Video View Notification Save Draft Comment Filtering Story Viewer Ranking

    First Story Notification Self-harm Prevention Live
  27. MOTIVATION: TYPE HINTS Type hints - 2% done def compose_from_max_id(max_id:

    Optional[str]) -> Optional[str]: MyPy and typeshed contribution Tooling - collect data and suggest type hints