Slide 1

Slide 1 text

PYTHON@INSTAGRAM Hui Ding & Lisa Guo May 20, 2017

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

@TechCrunch

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

STORIES DIRECT LIVE EXPLORE

Slide 6

Slide 6 text

“INSTAGRAM, WHAT THE HECK ARE YOU DOING AT PYCON?” CIRCA PYCON 2015

Slide 7

Slide 7 text

“HOW DO YOU RUN DJANGO+PYTHON AT THAT SCALE?”

Slide 8

Slide 8 text

“HOW DO YOU RUN DJANGO+PYTHON AT THAT SCALE?” "WHY HAVEN’T YOU RE- WRITTEN EVERYTHING IN NODE.JS YET?"

Slide 9

Slide 9 text

WHY DID INSTAGRAM
 CHOOSE PYTHON?

Slide 10

Slide 10 text

AND HERE’S WHAT THEY “Friends don’t let friends use RoR”

Slide 11

Slide 11 text

THINGS INSTAGRAM LOVES Easy to become productive Practicality Easy to grow
 engineering team Popular language

Slide 12

Slide 12 text

THINGS INSTAGRAM LOVES Maturity of the language and Django framework Use Django user model for supporting 3B+ registered users

Slide 13

Slide 13 text

WE HAVE TAKEN OUR PYTHON 1 Sharded database support 2 Run our stack across multiple geographically distributed data centers 3 Disable garbage-collection to improve memory utilization

Slide 14

Slide 14 text

WE HAVE TAKEN OUR PYTHON

Slide 15

Slide 15 text

WE HAVE TAKEN OUR PYTHON

Slide 16

Slide 16 text

PYTHON IS SIMPLE AND CLEAN, AND 1 Scope the problem, AKA, do simple things first 2 Use proven technology 3 User first: focus on adding value to user facing features

Slide 17

Slide 17 text

Subtitle AT INSTAGRAM, OUR BOTTLENECK IS
 DEVELOPMENT BUT PYTHON IS STILL SLOW,

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

SCALING PYTHON TO SUPPORT USER 20 40 60 80 00 0 2 4 6 8 10 12 14 16 18 20 22 24 Server growth User growth

Slide 20

Slide 20 text

PYTHON EFFICIENCY 1 Build extensive tools to profile and understand perf bottleneck 2 Moving stable, critical components to C/C++, e.g., memcached access 3 Async? New python runtime? 4 Cythonization

Slide 21

Slide 21 text

No content

Slide 22

Slide 22 text

ROAD TO PYTHON 3

Slide 23

Slide 23 text

1 Motivation 2 Strategy 3 Challenges 4 Resolution

Slide 24

Slide 24 text

MOTIVATION

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

def compose_from_max_id(max_id): ‘’’ @param str max_id ’’’ MOTIVATION: DEV VELOCITY

Slide 27

Slide 27 text

MOTIVATION: PERFORMANCE uWSGI/web async tier/celery media storage user/media metadata search/ranking Python

Slide 28

Slide 28 text

MOTIVATION: PERFORMANCE N processes M CPU cores N >> M Request

Slide 29

Slide 29 text

MOTIVATION: PERFORMANCE N processes M CPU cores N == M Request

Slide 30

Slide 30 text

MOTIVATION: COMMUNITY

Slide 31

Slide 31 text

STRATEGIES

Slide 32

Slide 32 text

SERVICE DOWNTIME

Slide 33

Slide 33 text

SERVICE DOWNTIME PRODUCT SLOWDOWN

Slide 34

Slide 34 text

MASTER LIVE DEVELOP/TEST/DOGFOOD

Slide 35

Slide 35 text

Create separate branch? MIGRATION OPTIONS • Branch sync overhead, error prone; • Merging back will be a risk; • Lose the opportunity to educate.

Slide 36

Slide 36 text

MIGRATION OPTIONS One endpoint at a time? Create separate branch? • Common modules across end points • Context switch for developers • Overhead of managing separate pools

Slide 37

Slide 37 text

MIGRATION OPTIONS One endpoint at a time? Create separate branch? Micro services? • Massive code restructuring • Higher latency • Deployment complexity

Slide 38

Slide 38 text

MIGRATION OPTIONS One endpoint at a time? Create separate branch? Micro service?

Slide 39

Slide 39 text

MAKE MASTER COMPATIBLE

Slide 40

Slide 40 text

MASTER PYTHON3 PYTHON2

Slide 41

Slide 41 text

Third-party packages
 3-4 months Codemod
 2-3 months Unit tests
 2 months Production rollout
 4 months

Slide 42

Slide 42 text

THIRD-PARTY PACKAGES Rule: No Python3, no new package

Slide 43

Slide 43 text

Rule: No Python3, no new package Delete unused, incompatible packages twisted django-paging django-sentry django-templatetag-sugar dnspython enum34 hiredis httplib2 ipaddr jsonfig pyapns phpserialize python-memcached thrift THIRD-PARTY PACKAGES

Slide 44

Slide 44 text

Upgraded packages Rule: No Python3, no new package Delete unused, incompatible packages THIRD-PARTY PACKAGES

Slide 45

Slide 45 text

CODEMOD modernize -f libmodernize.fixes.fix_filter -w -n raise, metaclass, methodattr, next, funcattr, library renaming, range, maxint/maxsize, filter->list comprehension, long integer, itertools, tuple unpacking in method, cython, urllib parse, StringiO, context mgr/decorator, ipaddr, cmp, except, nested, dict iter fixes, mock

Slide 46

Slide 46 text

No content

Slide 47

Slide 47 text

Failed include_list: passed tests Passed UNIT TESTS

Slide 48

Slide 48 text

Failed Passed UNIT TESTS

Slide 49

Slide 49 text

exclude_list: failed tests Failed Passed UNIT TESTS

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

• Not 100% code coverage • Many external services have mocks • Data compatibility issues typically do not show up in unit tests UNIT TESTS: LIMITS

Slide 53

Slide 53 text

100% 20% 0.1% EMPLOYEES DEVELOP ROLLOUT

Slide 54

Slide 54 text

100% 20% 0.1% EMPLOYEES DEVELOPERS ROLLOUT

Slide 55

Slide 55 text

CHALLENGES

Slide 56

Slide 56 text

CHALLENGES 1 Unicode 2 Data format incompatible 3 4 Dictionary ordering Iterator

Slide 57

Slide 57 text

from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/

Slide 58

Slide 58 text

from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals CHALLENGE: UNICODE/STR/ from __future__ import absolute_import from __future__ import print_function from __future__ import division from __future__ import unicode_literals

Slide 59

Slide 59 text

CHALLENGE: UNICODE/STR/ mymac = hmac.new(‘abc’) TypeError: key: expected bytes or bytearray, but got 'str' value = ‘abc’ if isinstance(value, six.text_type): value = value.encode(encoding=‘utf-8’) mymac = hmac.new(value) Error Fix

Slide 60

Slide 60 text

CHALLENGE: UNICODE/STR/ ensure_binary() ensure_str() ensure_text() mymac = hmac.new(ensure_binary(‘abc’)) Helper functions Fix

Slide 61

Slide 61 text

CHALLENGE: PICKLE memcache_data = pickle.dumps(data, pickle.HIGHEST_PROTOCOL) data = pickle.load(memcache_data) Write Read Memcache Me Python3 Others Python2

Slide 62

Slide 62 text

CHALLENGE: PICKLE Memcache Me Python3 Others Python2 memcache_data = pickle.dumps(data, pickle.HIGHEST_PROTOCOL) data = pickle.load(memcache_data) Write Read 4 ValueError: unsupported pickle protocol: 4 2

Slide 63

Slide 63 text

CHALLENGE: PICKLE pickle.dumps({'a': 'ᆽ'}, 2) UnicodeDecodeError: 'ascii' codec can’t decode byte 0xe9 in position 0: ordinal not in range(128) memcache_data = pickle.dumps(data, 2) Write Python2 writes Python3 reads

Slide 64

Slide 64 text

pickle.dumps({'a': 'ᆽ'}, 2) {u'a': u'\u7231'} != {'a': 'ᆽ'} Python2 reads Python3 writes memcache_data = pickle.dumps(data, 2) Write CHALLENGE: PICKLE

Slide 65

Slide 65 text

CHALLENGE: PICKLE Memcache Python3 Python2 4 Memcache 2

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

map() filter() dict.items()

Slide 68

Slide 68 text

1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess, CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] CHALLENGE: ITERATOR

Slide 69

Slide 69 text

1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess, CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] CHALLENGE: ITERATOR

Slide 70

Slide 70 text

1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = map(BuildProcess, CYTHON_SOURCES) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] CHALLENGE: ITERATOR

Slide 71

Slide 71 text

1 CYTHON_SOURCES = [a.pyx, b.pyx, c.pyx] 2 builds = list(map(BuildProcess, CYTHON_SOURCES)) 3 while any(not build.done() for build in builds): 4 pending = [build for build in builds if not build.started()] CHALLENGE: ITERATOR

Slide 72

Slide 72 text

'{"a": 1, "c": 3, "b": 2}' CHALLENGE: DICTIONARY Python2 Python3.5.1 >>> testdict = {'a': 1, 'b': 2, 'c': 3} >>> json.dumps(testdict) Python3.6 Cross version '{"c": 3, "b": 2, "a": 1}' '{"c": 3, "a": 1, "b": 2}' '{"a": 1, "b": 2, "c": 3}' >>> json.dumps(testdict, sort_keys=True) '{"a": 1, "b": 2, "c": 3}'

Slide 73

Slide 73 text

ALMOST THERE...

Slide 74

Slide 74 text

CPU instructions per request max Requests Per Second -12% 0% Memory configuration difference? 12%

Slide 75

Slide 75 text

if uwsgi.opt.get(‘mem_config', None) == ’True’: config_mem() b

Slide 76

Slide 76 text

RESOLUTION

Slide 77

Slide 77 text

FEB 2017 PYTHON3 PYTHON2

Slide 78

Slide 78 text

Saving of 30%
 (on celery) INSTAGRAM ON PYTHON3 Saving of 12%
 (on uwsgi/django) CPU MEMOR Y

Slide 79

Slide 79 text

June 2016 Python3 migration Sept 2015 Dec 2016 Apr 2017 400M 500M 600M

Slide 80

Slide 80 text

Video View Notification Save Draft Comment Filtering Story Viewer Ranking First Story Notification Self-harm Prevention Live

Slide 81

Slide 81 text

MOTIVATION: TYPE HINTS Type hints - 2% done def compose_from_max_id(max_id: Optional[str]) -> Optional[str]: MyPy and typeshed contribution Tooling - collect data and suggest type hints

Slide 82

Slide 82 text

MOTIVATION: ASYNC IO Asynchronize web framework Parallel network access within a request

Slide 83

Slide 83 text

MOTIVATION: COMMUNITY Benchmark web workload Run time, memory profiling, etc

Slide 84

Slide 84 text

No content

Slide 85

Slide 85 text

No content