Slide 1

Slide 1 text

Cherry-Picking For Huge Success Armin Ronacher

Slide 2

Slide 2 text

Who Am I • Armin Ronacher • @mitsuhiko on Twitter/Github • Part of the Pocoo Team • Flask, Jinja2, Werkzeug, …

Slide 3

Slide 3 text

Preface Framework / Programming language fights are boring. Just use the best tool for the job.

Slide 4

Slide 4 text

The Problem

Slide 5

Slide 5 text

Consider Twitter • 2006: Off the shelf Ruby on Rails application; static HTML; basic XML API • Now: The API is the service the website itself is a JavaScript frontend to that API; everything is rate limited; Erlang/Java

Slide 6

Slide 6 text

Does Ruby Suck? • No it does not. • Neither does Python. • Ruby / Python are amazing for quick prototyping. • Expect your applications to change with the challenges of the problem.

Slide 7

Slide 7 text

Shifting Focus • Expect your problems and implementations to change over time • You might want to rewrite a part of your application in another language

Slide 8

Slide 8 text

Proposed Solution • Build smaller applications • Combine these apps together to a large one

Slide 9

Slide 9 text

Cross Boundaries • “Pygments is awesome” • “I need Pygments in Ruby” • A: rewrite Pygments in Ruby • B: use a different syntax highlighter • C: Just accept Python and implement a service you can use from Ruby

Slide 10

Slide 10 text

Agnostic Code

Slide 11

Slide 11 text

It only does Django • You wrote a useful library that creates thumbnails? • Don't make it depend on Django, even if you never want to switch from Django • You might want to move the thumbnailing into a queue at one point and not need Django and your DB in your queue

Slide 12

Slide 12 text

Pass “X” in :-) • Do not import “X” • Store “X” on a class instance • or pass “X” in as parameter • Make “X” as specific as possible • But not more than it has to be

Slide 13

Slide 13 text

Protocol Examples

Slide 14

Slide 14 text

Flask's Views • Views can return response objects • Response objects are WSGI applications • No typecheck! • Return any WSGI app

Slide 15

Slide 15 text

Flask + WSGI from  flask  import  Flask app  =  Flask(__name__) def  hello_world_app(environ,  start_response):        headers  =  [('Content-­‐Type',  'text/plain')]        start_response('200  OK',  headers)        return  ['Hello  World!'] @app.route('/') def  index():        return  hello_world_app

Slide 16

Slide 16 text

Difflib • Python's difflib module does not need strings, it works on anything that is iterable and contains hashable and comparable items. • “X” == hashable and comparable • As specific as possible, but not too restrictive. Bad would be “X” == String

Slide 17

Slide 17 text

Consequences • This came in very helpful when I had to diff HTML documents • Parse into a stream of XML events — diff • Render out inline HTML again with the differences wrapped in /

Slide 18

Slide 18 text

Beauty in Design • Genshi's XML stream's events is made of hashable, immutable objects • The Stream is a Python iterable • difflib can work with exactly that: hashable objects in a sequence • Goes well together, but was never designed to be used together

Slide 19

Slide 19 text

Genshi's Stream >>>  from  genshi.template  import  MarkupTemplate >>>  t  =  MarkupTemplate('' ...  '') >>>  g  =  iter(t.generate()) >>>  g.next() ('XML_DECL',  (u'1.0',  None,  -­‐1),  (None,  1,  0)) >>>  g.next() ('START',  (QName('test'),  Attrs()),  (None,  1,  21)) >>>  g.next() ('START',  (QName('foo'),  Attrs([(QName('bar'),  u'baz')])),  (None,  1,  27)) ...

Slide 20

Slide 20 text

Diffing XML from  genshi.template  import  MarkupTemplate from  difflib  import  SequenceMatcher get_stream  =  lambda  x:  list(MarkupTemplate(x).generate()) a  =  get_stream('') b  =  get_stream('') matcher  =  SequenceMatcher(a=a,  b=b) for  op,  i1,  i2,  j1,  j2  in  matcher.get_opcodes():        if  op  ==  'replace':                print  'del',  a[i1:i2]                print  'ins',  b[j1:j2]        elif  op  ==  'delete':                print  'del',  a[i1:i2]        elif  op  ==  'insert':                print  'ins',  b[j1:j2]

Slide 21

Slide 21 text

Diff Result del        [('START',  (QName('a'),  Attrs()),  (None,  1,  26)),                  ('END',  QName('a'),  (None,  1,  30))] ins        [('START',  (QName('b'),  Attrs()),  (None,  1,  26)),                  ('END',  QName('b'),  (None,  1,  30))]

Slide 22

Slide 22 text

Inline Diffing HTML • mitsuhiko/htmldiff >>>  from  htmldiff  import  render_html_diff >>>  render_html_diff('Foo  bar  baz',  'Foo  bar  baz') u'
Foo  bar  baz
' >>>  render_html_diff('Foo  bar  baz',  'Foo  baz') u'
Foo  bar  baz
' >>>  render_html_diff('Foo  baz',  'Foo  blah  baz') u'
Foo  blah  baz
'

Slide 23

Slide 23 text

Interface Examples

Slide 24

Slide 24 text

Serializers • pickle, phpserialize, itsdangerous, json • Within the compatible set of types, they all work as drop-in replacements for each other

Slide 25

Slide 25 text

Example >>>  from  itsdangerous  import  URLSafeSerializer >>>  smod  =  URLSafeSerializer('secret-­‐key') >>>  smod.dumps([1,  2,  3]) 'WzEsMiwzXQ.ss4nn3igDDAwxiqsWvj3EQ9FdIQ' >>>  smod.loads(_) [1,  2,  3] >>> >>>  import  pickle  as  smod >>>  smod.dumps([1,  2,  3]) '(lp0\nI1\naI2\naI3\na.' >>>  smod.loads(_) [1,  2,  3]

Slide 26

Slide 26 text

“What's your Point Armin?”

Slide 27

Slide 27 text

Loosely Coupled • Small, independent pieces of code (both “libraries” and “apps”) • Combine them with protocols and through interfaces • This is how you can structure applications

Slide 28

Slide 28 text

Splitting up … • … is not the problem • Combining things together is

Slide 29

Slide 29 text

Mergepoints • WSGI • HTTP • ZeroMQ • Message queues • A datastore • JavaScript

Slide 30

Slide 30 text

WSGI

Slide 31

Slide 31 text

Overview • Pros: • Every Python framework speaks it or can easily be ported to work on top of WSGI or to be able to host WSGI apps • Cons: • Only works within Python • Often insufficient

Slide 32

Slide 32 text

The WSGI Env • Apps that need request data can limit themselves to the data in the WSGI env • That way they are 100% framework independent. • Good: env['PATH_INFO'] • Bad: request.path_info

Slide 33

Slide 33 text

Middlewares • Often overused • Sometimes helpful though: • Debugging • Profiling • Dispatching to different applications • Fixing server / browser bugs

Slide 34

Slide 34 text

WSGI as Mergepoint from  myflaskapp  import  application  as  app1 from  mybottleapp  import  application  as  app2 from  mydjangoapp  import  application  as  app3 app  =  DispatchedApplication({        '/':              app1,        '/api':        app2,        '/admin':    app3 })

Slide 35

Slide 35 text

Not merging? • Correct: these applications are independent • But what happens if we inject common information into them?

Slide 36

Slide 36 text

WSGI as Mergepoint class  InjectCommonInformation(object):      def  __init__(self,  app):              self.app  =  app      def  __call__(self,  environ,  start_response):              db_connection  =  connect_database()              user  =  get_current_user(environ,  db_connection)              environ['myapplication.data']  =  {                      'current_user':  user,                      'db':                      db_connection              }              return  self.app(environ,  start_response) app  =  InjectCommonInformation(app)

Slide 37

Slide 37 text

Problems with That • Cannot consume form data • Processing responses from applications is a complex matter • Cannot inject custom HTML into responses easily due to the various ways WSGI apps can be written • What if an app runs outside of the WSGI request/response cycle?

Slide 38

Slide 38 text

Libraries • Werkzeug • WebOb • Paste

Slide 39

Slide 39 text

Django & WSGI • Django used to do WSGI really badly • Getting a documented WSGI entrypoint for applying middlewares • Easy enough to pass out WSGI apps with the Django Response object

Slide 40

Slide 40 text

WSGI -> Django from  werkzeug.test  import  run_wsgi_app from  werkzeug.wrappers  import  WerkzeugResponse from  django.http  import  HttpResponse def  make_response(request,  app):        iter,  status,  headers  =  run_wsgi_app(app,  request.META)        status_code  =  int(status.split(None)[0])        resp  =  HttpResponse(iter,  status=status_code)        for  key,  value  in  headers.iteritems():                resp[key]  =  value        return  resp def  make_wsgi_app(resp):        return  WerkzeugResponse(resp,  status=resp.status_code,                                                        headers=resp.items())

Slide 41

Slide 41 text

Usage from  my_wsgi_app  import  application from  wsgi_to_django  import  make_response def  my_django_view(request):        return  make_response(request,  application)

Slide 42

Slide 42 text

HTTP

Slide 43

Slide 43 text

Overview • Pros: • Language independent • Cacheable • Cons: • Harder to work with than WSGI • Complex specification • Same problems as WSGI

Slide 44

Slide 44 text

Proxying • Write three different apps • Let nginx do the proxying • The more HTTP you speak, the better

Slide 45

Slide 45 text

Cool Things • If all your services speak HTTP properly you can just put caching layers between them • HTTP can be debugged easily (curl) • Entirely language independent

Slide 46

Slide 46 text

Suggestion • Let your services speak HTTP. • You need syntax highlighting with Pygments but your application is written in Ruby? Write a small Flask app that exposes Pygments via HTTP

Slide 47

Slide 47 text

Libraries • Python-Requests • Your favorite WSGI Server (gunicorn, CherryPy, Paste etc.) • Tornado, Twisted

Slide 48

Slide 48 text

ZeroMQ

Slide 49

Slide 49 text

Not a Queue • ZeroMQ is basically sockets on steroids • Language independent • Different usage patterns: • push/pull • pub/sub

Slide 50

Slide 50 text

ZeroMQ vs HTTP • ZeroMQ is easier to use than HTTP • You however don't get the nice caching • On the plus side you can dispatch message to many subscribers • ZeroMQ abstracts the bad parts of sockets and HTTP away from you (timeouts, EINTR, etc.)

Slide 51

Slide 51 text

Random Thoughts • ZeroMQ hides connection problems • Blocks on lack of connectivity • You might have to build your own broker

Slide 52

Slide 52 text

Message Queues

Slide 53

Slide 53 text

It might take a while • Move long running tasks outside of the request handling process • Possibly dispatch it to different machines • But: It can be an entirely different code that processes the queue entry, different language even

Slide 54

Slide 54 text

Queues • Accessor library: Celery • AMQP (RabbitMQ) • Redis • Tokyo Tyrant

Slide 55

Slide 55 text

Various Things • Don't expect your calls to be nonblocking • Greatly simplifies testing! • Build your own queue > no queue • Redis queues are a good start

Slide 56

Slide 56 text

A Datastore

Slide 57

Slide 57 text

The Obvious One • Use the same datastore for two different applications. • For as long as everybody plays by the rules this is simple and efficient.

Slide 58

Slide 58 text

Classical Example • Flask application • Django Admin

Slide 59

Slide 59 text

Redis • A datastore • Remote datastructures! • Can easily be used as a queue • Simple interface, bindings for every language • Python pushes, Java pulls and executes

Slide 60

Slide 60 text

Bash Queue Consumer #!/bin/bash QUEUE_NAME=my_key while  : do    args=`redis-­‐cli  -­‐d  $'\t'  blpop  $QUEUE_NAME  0  |  cut  -­‐f2`    ./my-­‐script  $args done

Slide 61

Slide 61 text

JavaScript

Slide 62

Slide 62 text

It's awesome • Geeks hate JavaScript • The average users does not care at all • Why do we hate JavaScript? • Language us ugly • Can be abused for things we think are harmful (user tracking)

Slide 63

Slide 63 text

Ugly Language • Accept it • Use CoffeeScript • it's the C kind of ugly, not the PHP one

Slide 64

Slide 64 text

Can be abused • So can cars, bittorrent etc. • Grow up :-)

Slide 65

Slide 65 text

Google's Bar • That Google bar on top of all their products? • You can implement that in JavaScript only • Fetch some JSON • Display current user info • Application independent

Slide 66

Slide 66 text

Is it used? • Real world example: xbox.com • Login via live.com • Your user box on xbox.com is fetched entirely with JavaScript • Login requires JavaScript, no fallback

Slide 67

Slide 67 text

DICE's Battlelog • Made by DICE/ESN for Battlefield 3 • Players join games via their Browser • The joining of games is triggered by the browser and a token is handed over to the game. • Browser plugin hands over to the game client.

Slide 68

Slide 68 text

Technologies • Python for the Battlelog service • JavaScript for the frontend • Java for the push service • C++ for the Game Client and Server • HTTP for communication

Slide 69

Slide 69 text

Other Things • JavaScript can efficiently transform the DOM • You can do things you always wanted to do no the server side but never could because of performance or scaling considerations • Instantly updating page elements! • backbone.js

Slide 70

Slide 70 text

Testing • JavaScript testing only sucks for others • You control the service, you know the API endpoints. Speak HTTP with them • HtmlUnit has pretty good JavaScript support • Selenium supports HtmlUnit

Slide 71

Slide 71 text

Processes

Slide 72

Slide 72 text

Daemons • Yes, you need to keep them running • Yes it can be annoying • systemd / supervisord help

Slide 73

Slide 73 text

systemd • Socket is managed by the OS • Your application activates on the first request to that socket • Restart applications, clients queue up in the OS • Python's socket module does not operate on arbitrary file numbers before 3 (AFAIK)

Slide 74

Slide 74 text

Processes+ • But processes are a good idea on Unix: • Different privileges • You can shoot down individual pieces without breaking the whole system • You can performance tune individual things better • No global lock :-)

Slide 75

Slide 75 text

Python 3 • libpython2 and libpython3 have clashing symbols • You cannot run Python 2 and Python 3 in the same process • ZeroMQ / HTTP etc. are an upgrade option

Slide 76

Slide 76 text

!Q&A ? lucumr.pocoo.org/talks/