Slide 1

Slide 1 text

riak & webmachine on github:pages

Slide 2

Slide 2 text

github.com/jnewland @jnewland

Slide 3

Slide 3 text

riak & webmachine on github:pages

Slide 4

Slide 4 text

what is pages?

Slide 5

Slide 5 text

simple static file hosting*

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

github.com/user/repo/gh-pages „ http://user.github.com/repo

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

github.com/user/user.github.com „ http://user.github.com

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

$ cat CNAME example.com custom domain support

Slide 13

Slide 13 text

what is wrong with pages?

Slide 14

Slide 14 text

SLOW

Slide 15

Slide 15 text

1 node ext3 gigs and gigs of HTML

Slide 16

Slide 16 text

IO bound

Slide 17

Slide 17 text

fscking downtime

Slide 18

Slide 18 text

not HA

Slide 19

Slide 19 text

LOTS TO FIX THROW IT ALL OUT

Slide 20

Slide 20 text

what can power pages 2.0?

Slide 21

Slide 21 text

1. grab content from git 2. run through jekyll 3. write somewhere 4. serve over HTTP

Slide 22

Slide 22 text

what currently powers pages?

Slide 23

Slide 23 text

1. ruby 2. ruby 3. ext3 4. nginx

Slide 24

Slide 24 text

$ wc -l pages_map.conf 57615 pages_map.conf

Slide 25

Slide 25 text

1. ruby 2. ruby 3. riak_kv 4. riak_kv*

Slide 26

Slide 26 text

* read-only transactional builds CNAMEs & redirects index.html fallback custom 404.html per-repo almost

Slide 27

Slide 27 text

1. ruby 2. ruby 3. riak_kv 4. webmachine resource

Slide 28

Slide 28 text

why not x?

Slide 29

Slide 29 text

squid + nginx/apache + filesystem

Slide 30

Slide 30 text

should work fine what happens when you need N > 1? just shard ‘em how do you populate new partitions? just rsync stuff around

Slide 31

Slide 31 text

building a distributed system ASS FIRST

Slide 32

Slide 32 text

remember, I do ops low maintenance, resilient systems make lazy sysadmins love you

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

schema design

Slide 35

Slide 35 text

2 buckets hostspages

Slide 36

Slide 36 text

hosts key: HTTP Host Header value: redirect or repo/sha map use: data key prefix lookup index: user_id

Slide 37

Slide 37 text

hosts { “repos”: { “jnewland.github.com”: “deadbeef”, “pages”: “beadfeed” } } jnewland.github.com

Slide 38

Slide 38 text

pages key: sha/URI value: HTML / other data use: data storage index: repo_id

Slide 39

Slide 39 text

Jesse Newland

Slide 40

Slide 40 text

how does data get in?

Slide 41

Slide 41 text

riak-ruby-client read files from disk write to riak update sites object gc old builds

Slide 42

Slide 42 text

how does data get out?

Slide 43

Slide 43 text

curl jnewland.github.com/raptor.gif GET /riak/hosts/jnewland.github.com GET /riak/pages/deadbeef/raptor.gif

Slide 44

Slide 44 text

curl jnewland.github.com/pages/flow.png GET /riak/hosts/jnewland.github.com GET /riak/pages/beadfeed/flow.png

Slide 45

Slide 45 text

what about CNAMEs?

Slide 46

Slide 46 text

hosts example.github.com { “redirect”: “example.com” }

Slide 47

Slide 47 text

hosts example.com { “repos”: { “example.com”: “deadbeef” } }

Slide 48

Slide 48 text

curl example.github.com/raptor.gif GET /riak/hosts/example.github.com 301 example.com/raptor.gif GET /riak/hosts/example.com GET /riak/pages/deadbeef/raptor.gif

Slide 49

Slide 49 text

pages_wm_resource.erl

Slide 50

Slide 50 text

webmachine is SO DAMN COOL

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

%% webmachine resource exports -export([ init/1, service_available/2, malformed_request/2, content_types_provided/2, resource_exists/2, previously_existed/2, moved_permanently/2, last_modified/2, generate_etag/2, produce_doc_body/2 ]).

Slide 53

Slide 53 text

grab local riak client check app config var service_available/2

Slide 54

Slide 54 text

service_available(RD, Ctx=#ctx{riak=RiakProps,req_id=ReqId}) -> IdRD = wrq:set_resp_header("X-Request-Id", ReqId, RD), BrandedRD = wrq:set_resp_header( "X-GitHub-Pages-Version", release_handler_util:app_version(pages), IdRD), case application:get_env(pages, disabled) of {ok, true} -> {false, BrandedRD, Ctx}; _ -> case riak_kv_wm_utils:get_riak_client( RiakProps, riak_kv_wm_utils:get_client_id(RD)) of {ok, C} -> {true, BrandedRD, Ctx#ctx{client=C}}; _Error -> {false, BrandedRD, Ctx} end end.

Slide 55

Slide 55 text

parse Host header, URI malformed_request/2

Slide 56

Slide 56 text

malformed_request(RD, Ctx) -> try Host = wrq:get_req_header("Host", RD), HostWithoutPort = re:replace( Host, "\:.*", "", [{return,list}]), Tokens = [ riak_kv_wm_utils:maybe_decode_uri(RD, X) || X <- wrq:path_tokens(RD)], ParsedCtx = Ctx#ctx{tokens=Tokens,host=HostWithoutPort}, {false, RD, ParsedCtx} catch Exception:Reason -> log_error({exception, Exception, Reason}, RD, Ctx), {true, RD, Ctx} end.

Slide 57

Slide 57 text

guess mime type from path assume text/html for / URIs content_types_provided/2

Slide 58

Slide 58 text

content_types_provided(RD, Ctx) -> Filename = lists:last(Ctx#ctx.tokens), Extension = filename:extension(Filename), case mochiweb_mime:from_extension(Extension) of undefined -> {[{"text/html", produce_doc_body}], RD, Ctx}; Mime -> {[{Mime, produce_doc_body}], RD, Ctx} end.

Slide 59

Slide 59 text

hit hosts bucket (r=1) stash redirect or sha 404 if no hosts data resource_exists/2

Slide 60

Slide 60 text

resource_exists(RD, Ctx) -> RedirectOrSha = redirect_or_sha(Ctx), case RedirectOrSha of {redirect, Redirect} -> {true, RD, Ctx#ctx{redirect={redirect, Redirect}}}; {sha, Sha} -> page_data_exists(RD, Ctx#ctx{sha={sha, Sha}}); _ -> {false, RD, Ctx} end.

Slide 61

Slide 61 text

previously_existed/2 moved_permanently/2

Slide 62

Slide 62 text

previously_existed(RD, Ctx) -> case Ctx#ctx.redirect of {redirect, _} -> {true, RD, Ctx}; _ -> {false, RD, Ctx} end. moved_permanently(RD, Ctx) -> case Ctx#ctx.redirect of {redirect, RedirectHost} -> MovedURI = list_join(lists:append( [RedirectHost], Ctx#ctx.tokens), "/"), {{true}, MovedURI, RD, Ctx}; _ -> {false, RD, Ctx} end.

Slide 63

Slide 63 text

hit pages bucket (r=1) fallback for index.html fallback for 404.html page_data_exists/2

Slide 64

Slide 64 text

curl foo.github.com/ GET /riak/hosts/foo.github.com GET /riak/pages/f0f0f0f0/ GET /riak/pages/f0f0f0f0/index.html GET /riak/pages/f0f0f0f0/index.htm GET /riak/pages/f0f0f0f0/index.xhtml GET /riak/pages/f0f0f0f0/index.xml GET /riak/pages/f0f0f0f0/404.html

Slide 65

Slide 65 text

< 300 lines of erlang simple

Slide 66

Slide 66 text

{webmachine, [{ dispatch_list, [{ %% riak_kv stuff {["pages",'*'],pages_wm_resource,[]}, {["pages"],pages_wm_resource,[]}]}} ]} nginx proxies to / /pages

Slide 67

Slide 67 text

remember, I do ops one system service data store and api predictable performance busy ops best friend

Slide 68

Slide 68 text

what’s next

Slide 69

Slide 69 text

metrics with folsom graphite / gaug.es logging with lager HTTP caching ???

Slide 70

Slide 70 text

private beta soon turn on / off with DNS repo access @jnewland

Slide 71

Slide 71 text

erlang ruby ops c work with me

Slide 72

Slide 72 text

thanks