Keynote: Lisa Guo & Hui Ding - Python @ Instagram

Keynote: Lisa Guo & Hui Ding - Python @ Instagram

Bde70c0ba031a765ff25c19e6b7d6d23?s=128

PyCon 2017

June 07, 2017
Tweet

Transcript

  1. PYTHON@INSTAGRAM Hui Ding & Lisa Guo May 20, 2017

  2. None
  3. @TechCrunch

  4. None
  5. STORIES DIRECT LIVE EXPLORE

  6. “INSTAGRAM, WHAT THE HECK ARE YOU DOING AT PYCON?” CIRCA

    PYCON 2015
  7. “HOW DO YOU RUN DJANGO+PYTHON AT THAT SCALE?”

  8. “HOW DO YOU RUN DJANGO+PYTHON AT THAT SCALE?” "WHY HAVEN’T

    YOU RE- WRITTEN EVERYTHING IN NODE.JS YET?"
  9. WHY DID INSTAGRAM
 CHOOSE PYTHON?

  10. AND HERE’S WHAT THEY “Friends don’t let friends use RoR”

  11. THINGS INSTAGRAM LOVES Easy to become productive Practicality Easy to

    grow
 engineering team Popular language
  12. THINGS INSTAGRAM LOVES Maturity of the language and Django framework

    Use Django user model for supporting 3B+ registered users
  13. WE HAVE TAKEN OUR PYTHON 1 Sharded database support 2

    Run our stack across multiple geographically distributed data centers 3 Disable garbage-collection to improve memory utilization
  14. WE HAVE TAKEN OUR PYTHON

  15. WE HAVE TAKEN OUR PYTHON

  16. PYTHON IS SIMPLE AND CLEAN, AND 1 Scope the problem,

    AKA, do simple things first 2 Use proven technology 3 User first: focus on adding value to user facing features
  17. Subtitle AT INSTAGRAM, OUR BOTTLENECK IS
 DEVELOPMENT BUT PYTHON IS

    STILL SLOW,
  18. None
  19. SCALING PYTHON TO SUPPORT USER 20 40 60 80 00

    0 2 4 6 8 10 12 14 16 18 20 22 24 Server growth User growth
  20. PYTHON EFFICIENCY 1 Build extensive tools to profile and understand

    perf bottleneck 2 Moving stable, critical components to C/C++, e.g., memcached access 3 Async? New python runtime? 4 Cythonization
  21. None
  22. ROAD TO PYTHON 3

  23. 1 Motivation 2 Strategy 3 Challenges 4 Resolution

  24. MOTIVATION

  25. None
  26. def compose_from_max_id(max_id): ‘’’ @param str max_id ’’’ MOTIVATION: DEV VELOCITY

  27. MOTIVATION: PERFORMANCE uWSGI/web async tier/celery media storage user/media metadata search/ranking

    Python
  28. MOTIVATION: PERFORMANCE N processes M CPU cores N >> M

    Request
  29. MOTIVATION: PERFORMANCE N processes M CPU cores N == M

    Request
  30. MOTIVATION: COMMUNITY

  31. STRATEGIES

  32. SERVICE DOWNTIME

  33. SERVICE DOWNTIME PRODUCT SLOWDOWN

  34. MASTER LIVE DEVELOP/TEST/DOGFOOD

  35. Create separate branch? MIGRATION OPTIONS • Branch sync overhead, error

    prone; • Merging back will be a risk; • Lose the opportunity to educate.
  36. MIGRATION OPTIONS One endpoint at a time? Create separate branch?

    • Common modules across end points • Context switch for developers • Overhead of managing separate pools
  37. MIGRATION OPTIONS One endpoint at a time? Create separate branch?

    Micro services? • Massive code restructuring • Higher latency • Deployment complexity
  38. MIGRATION OPTIONS One endpoint at a time? Create separate branch?

    Micro service?
  39. MAKE MASTER COMPATIBLE

  40. MASTER PYTHON3 PYTHON2

  41. Third-party packages
 3-4 months Codemod
 2-3 months Unit tests
 2

    months Production rollout
 4 months
  42. THIRD-PARTY PACKAGES Rule: No Python3, no new package

  43. Rule: No Python3, no new package Delete unused, incompatible packages

    twisted django-paging django-sentry django-templatetag-sugar dnspython enum34 hiredis httplib2 ipaddr jsonfig pyapns phpserialize python-memcached thrift THIRD-PARTY PACKAGES
  44. Upgraded packages Rule: No Python3, no new package Delete unused,

    incompatible packages THIRD-PARTY PACKAGES
  45. CODEMOD modernize -f libmodernize.fixes.fix_filter <dir> -w -n raise, metaclass, methodattr,

    next, funcattr, library renaming, range, maxint/maxsize, filter->list comprehension, long integer, itertools, tuple unpacking in method, cython, urllib parse, StringiO, context mgr/decorator, ipaddr, cmp, except, nested, dict iter fixes, mock
  46. None
  47. Failed include_list: passed tests Passed UNIT TESTS

  48. Failed Passed UNIT TESTS

  49. exclude_list: failed tests Failed Passed UNIT TESTS

  50. None
  51. None
  52. • Not 100% code coverage • Many external services have

    mocks • Data compatibility issues typically do not show up in unit tests UNIT TESTS: LIMITS
  53. 100% 20% 0.1% EMPLOYEES DEVELOP ROLLOUT

  54. 100% 20% 0.1% EMPLOYEES DEVELOPERS ROLLOUT

  55. CHALLENGES

  56. CHALLENGES 1 Unicode 2 Data format incompatible 3 4 Dictionary

    ordering Iterator
  57. from __future__ import absolute_import from __future__ import print_function from __future__

    import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/
  58. from __future__ import absolute_import from __future__ import print_function from __future__

    import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/ from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals
  59. CHALLENGE: UNICODE/STR/ mymac = hmac.new(‘abc’) TypeError: key: expected bytes or

    bytearray, but got 'str' value = ‘abc’ if isinstance(value, six.text_type): value = value.encode(encoding=‘utf-8’) mymac = hmac.new(value) Error Fix
  60. CHALLENGE: UNICODE/STR/ ensure_binary() ensure_str() ensure_text() mymac = hmac.new(ensure_binary(‘abc’)) Helper functions

    Fix
  61. CHALLENGE: PICKLE memcache_data = pickle.dumps(data, pickle.HIGHEST_PROTOCOL) data = pickle.load(memcache_data) Write

    Read Memcache Me Python3 Others Python2
  62. CHALLENGE: PICKLE Memcache Me Python3 Others Python2 memcache_data = pickle.dumps(data,

    pickle.HIGHEST_PROTOCOL) data = pickle.load(memcache_data) Write Read 4 ValueError: unsupported pickle protocol: 4 2
  63. CHALLENGE: PICKLE pickle.dumps({'a': 'ᆽ'}, 2) UnicodeDecodeError: 'ascii' codec can’t decode

    byte 0xe9 in position 0: ordinal not in range(128) memcache_data = pickle.dumps(data, 2) Write Python2 writes Python3 reads
  64. pickle.dumps({'a': 'ᆽ'}, 2) {u'a': u'\u7231'} != {'a': 'ᆽ'} Python2 reads

    Python3 writes memcache_data = pickle.dumps(data, 2) Write CHALLENGE: PICKLE
  65. CHALLENGE: PICKLE Memcache Python3 Python2 4 Memcache 2

  66. None
  67. map() filter() dict.items()

  68. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  69. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  70. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess,

    CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  71. 1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = list(map(BuildProcess,

    CYTHON_SOURCES)) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] <do some work> CHALLENGE: ITERATOR
  72. '{"a": 1, "c": 3, "b": 2}' CHALLENGE: DICTIONARY Python2 Python3.5.1

    >>> testdict = {'a': 1, 'b': 2, 'c': 3} >>> json.dumps(testdict) Python3.6 Cross version '{"c": 3, "b": 2, "a": 1}' '{"c": 3, "a": 1, "b": 2}' '{"a": 1, "b": 2, "c": 3}' >>> json.dumps(testdict, sort_keys=True) '{"a": 1, "b": 2, "c": 3}'
  73. ALMOST THERE...

  74. CPU instructions per request max Requests Per Second -12% 0%

    Memory configuration difference? 12%
  75. if uwsgi.opt.get(‘mem_config', None) == ’True’: config_mem() b

  76. RESOLUTION

  77. FEB 2017 PYTHON3 PYTHON2

  78. Saving of 30%
 (on celery) INSTAGRAM ON PYTHON3 Saving of

    12%
 (on uwsgi/django) CPU MEMOR Y
  79. June 2016 Python3 migration Sept 2015 Dec 2016 Apr 2017

    400M 500M 600M
  80. Video View Notification Save Draft Comment Filtering Story Viewer Ranking

    First Story Notification Self-harm Prevention Live
  81. MOTIVATION: TYPE HINTS Type hints - 2% done def compose_from_max_id(max_id:

    Optional[str]) -> Optional[str]: MyPy and typeshed contribution Tooling - collect data and suggest type hints
  82. MOTIVATION: ASYNC IO Asynchronize web framework Parallel network access within

    a request
  83. MOTIVATION: COMMUNITY Benchmark web workload Run time, memory profiling, etc

  84. None
  85. None