I am doing HTTP wrong - Speaker Deck

Slide 1

Slide 1 text

I am doing HTTP wrong — a presentation by Armin Ronacher @mitsuhiko

Slide 2

Slide 2 text

The Web developer's Evolution

Slide 3

Slide 3 text

echo

Slide 4

Slide 4 text

request.send_header(…) request.end_headers() request.write(…)

Slide 5

Slide 5 text

return Response(…)

Slide 6

Slide 6 text

Why Stop there?

Slide 7

Slide 7 text

What do we love about HTTP?

Slide 8

Slide 8 text

Text Based

Slide 9

Slide 9 text

REST

Slide 10

Slide 10 text

Cacheable

Slide 11

Slide 11 text

Content Negotiation

Slide 12

Slide 12 text

Well Supported

Slide 13

Slide 13 text

Works where TCP doesn't

Slide 14

Slide 14 text

Somewhat Simple

Slide 15

Slide 15 text

Upgrades to custom protocols

Slide 16

Slide 16 text

Why does my application look like HTTP?

Slide 17

Slide 17 text

everybody does it

Slide 18

Slide 18 text

Natural Conclusion

Slide 19

Slide 19 text

we can do better!

Slide 20

Slide 20 text

we're a level too low

Slide 21

Slide 21 text

Streaming: one piece at the time, constant memory usage, no seeking.

Slide 22

Slide 22 text

Buffering: have some data in memory, variable memory usage, seeking.

Slide 23

Slide 23 text

TYPICAL Request / Response Cycle User Agent Proxy Server Application Stream “Buffered” Dispatcher View

Slide 24

Slide 24 text

In Python Terms def application(environ, start_response): # Step 1: acquire data data = environ['wsgi.input'].read(...) # Step 2: process data response = process_data(data) # Step 3: respond start_response('200 OK', [('Content-Type', 'text/plain')]) return [response]

Slide 25

Slide 25 text

One Level Up s = socket.accept() f = s.makefile('rb') requestline = f.readline() headers = [] while 1: headerline = f.readline() if headerline == '\r\n': break headers.append(headerline)

Slide 26

Slide 26 text

Weird Mixture on the app request.headers <- buffered request.form <- buffered request.files <- buffered to disk request.body <- streamed

Slide 27

Slide 27 text

HTTP's Limited signalling Strict Request / Response The only communication during request from the server to the client is closing the connection once you started accepting the body.

Slide 28

Slide 28 text

Bailing out early def application(request): # At this point, headers are parsed, everything else # is not parsed yet. if request.content_length > TWO_MEGABYTES: return error_response() ...

Slide 29

Slide 29 text

Bailing out a little bit later def application(request): # Read a little bit of data request.input.read(4096) # You just committed to accepting data, now you have to # read everything or the browser will be very unhappy and # Just time out. No more responding with 413 ...

Slide 30

Slide 30 text

Rejecting Form ﬁelds -> memory File uploads -> disk What's your limit? 16MB in total? All could go to memory. Reject ﬁle sizes individually? Needs overall check as well!

Slide 31

Slide 31 text

The Consequences How much data do you accept? Limit the overall request size? Not helpful because all of it could be in-memory

Slide 32

Slide 32 text

It's not just limiting Consider a layered system How many of you write code that streams? What happens if you pass streamed data through your layers?

Slide 33

Slide 33 text

A new approach

Slide 34

Slide 34 text

Dynamic typing made us lazy

Slide 35

Slide 35 text

we're trying to solve both use cases in one we're not supporting either well

Slide 36

Slide 36 text

How we do it Hide HTTP from the apps HTTP is an implementation detail

Slide 37

Slide 37 text

Pseudocode user_pagination = make_pagination_schema(User) @export( specs=[('page', types.Int32()), ('per_page', types.Int32())], returns=user_pagination, semantics='select', http_path='/users/' ) def list_users(page, per_page): users = User.query.paginate(page, per_page) return users.to_dict()

Slide 38

Slide 38 text

Types are specific user_type = types.Object([ ('username', types.String(30)), ('email', types.Optional(types.String(250))), ('password_hash', types.String(250)), ('is_active', types.Boolean()), ('registration_date', types.DateTime()) ])

Slide 39

Slide 39 text

Why? Support for different input/output formats keyless transport support for non-HTTP no hash collision attacks :-) Predictable memory usage

Slide 40

Slide 40 text

Comes for free Easier to test Helps documenting the public APIs Catches common errors early Handle errors without invoking code Predictable dictionary ordering

Slide 41

Slide 41 text

Strict vs Lenient

Slide 42

Slide 42 text

Rule of Thumb Be strict in what you send, but generous in what you receive — variant of Postel's Law

Slide 43

Slide 43 text

Being Generous In order to be generous you need to know what to receive. Just accepting any input is a security disaster waiting to happen.

Slide 44

Slide 44 text

Support unsupported types { "foo": [1, 2, 3], "bar": {"key": "value"}, "now": "Thu, 10 May 2012 14:16:09 GMT" } foo.0=1& foo.1=2& foo.2=3& bar.key=value& now=Thu%2C%2010%20May%202012%2014:16:09%20GMT

Slide 45

Slide 45 text

Solves the GET issue GET has no body parameters have to be URL encoded inconsistency with JSON post requests

Slide 46

Slide 46 text

Where is the streaming?

Slide 47

Slide 47 text

There is none

Slide 48

Slide 48 text

there are always two sides to an API

Slide 49

Slide 49 text

If the server has streaming endpoints — the client will have to support them as well

Slide 50

Slide 50 text

For things that need actual streaming we have separate endpoints.

Slide 51

Slide 51 text

streaming is different

Slide 52

Slide 52 text

but we can stream until we need buffering

Slide 53

Slide 53 text

Discard useless stuff { "foo": [list, of, thousands, of, items, we don't, need], "an_important_key": "we're actually interested in" }