Upgrade to Pro — share decks privately, control downloads, hide ads and more …

I am doing HTTP wrong

I am doing HTTP wrong

A fresh look at HTTP for agile languages (more importantly: Python)

Armin Ronacher

May 13, 2012
Tweet

More Decks by Armin Ronacher

Other Decks in Programming

Transcript

  1. I am doing HTTP wrong
    — a presentation by Armin Ronacher
    @mitsuhiko

    View Slide

  2. The Web developer's Evolution

    View Slide

  3. echo

    View Slide

  4. request.send_header(…)
    request.end_headers()
    request.write(…)

    View Slide

  5. return Response(…)

    View Slide

  6. Why Stop there?

    View Slide

  7. What do we love about HTTP?

    View Slide

  8. Text Based

    View Slide

  9. REST

    View Slide

  10. Cacheable

    View Slide

  11. Content Negotiation

    View Slide

  12. Well Supported

    View Slide

  13. Works where TCP doesn't

    View Slide

  14. Somewhat Simple

    View Slide

  15. Upgrades to custom protocols

    View Slide

  16. Why does my
    application look
    like HTTP?

    View Slide

  17. everybody does it

    View Slide

  18. Natural Conclusion

    View Slide

  19. we can do better!

    View Slide

  20. we're a level too low

    View Slide

  21. Streaming: one piece at the time, constant
    memory usage, no seeking.

    View Slide

  22. Buffering: have some data in memory, variable
    memory usage, seeking.

    View Slide

  23. TYPICAL Request / Response Cycle
    User Agent Proxy Server
    Application
    Stream
    “Buffered”
    Dispatcher
    View

    View Slide

  24. In Python Terms
    def application(environ, start_response):
    # Step 1: acquire data
    data = environ['wsgi.input'].read(...)
    # Step 2: process data
    response = process_data(data)
    # Step 3: respond
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return [response]

    View Slide

  25. One Level Up
    s = socket.accept()
    f = s.makefile('rb')
    requestline = f.readline()
    headers = []
    while 1:
    headerline = f.readline()
    if headerline == '\r\n':
    break
    headers.append(headerline)

    View Slide

  26. Weird Mixture on the app
    request.headers <- buffered
    request.form <- buffered
    request.files <- buffered to disk
    request.body <- streamed

    View Slide

  27. HTTP's Limited signalling
    Strict Request / Response
    The only communication during request from the
    server to the client is closing the connection once
    you started accepting the body.

    View Slide

  28. Bailing out early
    def application(request):
    # At this point, headers are parsed, everything else
    # is not parsed yet.
    if request.content_length > TWO_MEGABYTES:
    return error_response()
    ...

    View Slide

  29. Bailing out a little bit later
    def application(request):
    # Read a little bit of data
    request.input.read(4096)
    # You just committed to accepting data, now you have to
    # read everything or the browser will be very unhappy and
    # Just time out. No more responding with 413
    ...

    View Slide

  30. Rejecting
    Form fields -> memory
    File uploads -> disk
    What's your limit? 16MB in total? All could go to
    memory. Reject file sizes individually?
    Needs overall check as well!

    View Slide

  31. The Consequences
    How much data do you accept?
    Limit the overall request size?
    Not helpful because all of it could be in-memory

    View Slide

  32. It's not just limiting
    Consider a layered system
    How many of you write code that streams?
    What happens if you pass streamed data through
    your layers?

    View Slide

  33. A new approach

    View Slide

  34. Dynamic typing made us lazy

    View Slide

  35. we're trying to solve both use cases in one
    we're not supporting either well

    View Slide

  36. How we do it
    Hide HTTP from the apps
    HTTP is an implementation detail

    View Slide

  37. Pseudocode
    user_pagination = make_pagination_schema(User)
    @export(
    specs=[('page', types.Int32()),
    ('per_page', types.Int32())],
    returns=user_pagination,
    semantics='select',
    http_path='/users/'
    )
    def list_users(page, per_page):
    users = User.query.paginate(page, per_page)
    return users.to_dict()

    View Slide

  38. Types are specific
    user_type = types.Object([
    ('username', types.String(30)),
    ('email', types.Optional(types.String(250))),
    ('password_hash', types.String(250)),
    ('is_active', types.Boolean()),
    ('registration_date', types.DateTime())
    ])

    View Slide

  39. Why?
    Support for different input/output formats
    keyless transport
    support for non-HTTP
    no hash collision attacks :-)
    Predictable memory usage

    View Slide

  40. Comes for free
    Easier to test
    Helps documenting the public APIs
    Catches common errors early
    Handle errors without invoking code
    Predictable dictionary ordering

    View Slide

  41. Strict vs Lenient

    View Slide

  42. Rule of Thumb
    Be strict in what you send,
    but generous in what you receive
    — variant of Postel's Law

    View Slide

  43. Being Generous
    In order to be generous you
    need to know what to receive.
    Just accepting any input is a
    security disaster waiting to happen.

    View Slide

  44. Support unsupported types
    {
    "foo": [1, 2, 3],
    "bar": {"key": "value"},
    "now": "Thu, 10 May 2012 14:16:09 GMT"
    }
    foo.0=1&
    foo.1=2&
    foo.2=3&
    bar.key=value&
    now=Thu%2C%2010%20May%202012%2014:16:09%20GMT

    View Slide

  45. Solves the GET issue
    GET has no body
    parameters have to be URL encoded
    inconsistency with JSON post requests

    View Slide

  46. Where is the streaming?

    View Slide

  47. There is none

    View Slide

  48. there are always two sides to an API

    View Slide

  49. If the server has streaming endpoints —
    the client will have to support them as well

    View Slide

  50. For things that need actual streaming we have
    separate endpoints.

    View Slide

  51. streaming is different

    View Slide

  52. but we can stream until we need buffering

    View Slide

  53. Discard useless stuff
    {
    "foo": [list, of, thousands, of, items, we don't, need],
    "an_important_key": "we're actually interested in"
    }

    View Slide

  54. What if I don't make an API?

    View Slide

  55. modern web apps are APIs

    View Slide

  56. Dumb client?
    Move the client to the server

    View Slide

  57. Q&A

    View Slide

  58. Oh hai. We're hiring
    http://fireteam.net/careers

    View Slide