Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GitHub Pages on Riak and Webmachine

GitHub Pages on Riak and Webmachine

GitHub Pages, a feature allowing users to publish content to the web by simply pushing content to one of their GitHub hosted repositories, has had lackluster performance and uptime in the recent years. In this talk, Jesse will discuss the core requirements of the GitHub Pages application, why Erlang, Riak, and Webmachine were chosen for the development, and how they were used to fulfill those requirements now and for years to come with minimal development and operational maintenance.

Jesse Newland

March 30, 2012
Tweet

More Decks by Jesse Newland

Other Decks in Technology

Transcript

  1. riak &
    webmachine
    on
    github:pages

    View full-size slide

  2. github.com/jnewland
    @jnewland

    View full-size slide

  3. riak &
    webmachine
    on
    github:pages

    View full-size slide

  4. what
    is
    pages?

    View full-size slide

  5. simple
    static
    file
    hosting*

    View full-size slide

  6. github.com/user/repo/gh-pages

    http://user.github.com/repo

    View full-size slide

  7. github.com/user/user.github.com

    http://user.github.com

    View full-size slide

  8. $ cat CNAME
    example.com
    custom domain support

    View full-size slide

  9. what
    is wrong with
    pages?

    View full-size slide

  10. 1 node
    ext3
    gigs and gigs of HTML

    View full-size slide

  11. fscking downtime

    View full-size slide

  12. LOTS
    TO
    FIX
    THROW
    IT ALL
    OUT

    View full-size slide

  13. what
    can power
    pages 2.0?

    View full-size slide

  14. 1. grab content from git
    2. run through jekyll
    3. write somewhere
    4. serve over HTTP

    View full-size slide

  15. what
    currently powers
    pages?

    View full-size slide

  16. 1. ruby
    2. ruby
    3. ext3
    4. nginx

    View full-size slide

  17. $ wc -l pages_map.conf
    57615 pages_map.conf

    View full-size slide

  18. 1. ruby
    2. ruby
    3. riak_kv
    4. riak_kv*

    View full-size slide

  19. *
    read-only
    transactional builds
    CNAMEs & redirects
    index.html fallback
    custom 404.html per-repo
    almost

    View full-size slide

  20. 1. ruby
    2. ruby
    3. riak_kv
    4. webmachine resource

    View full-size slide

  21. squid
    +
    nginx/apache
    +
    filesystem

    View full-size slide

  22. should work fine
    what happens when you need N > 1?
    just shard ‘em
    how do you populate new partitions?
    just rsync stuff around

    View full-size slide

  23. building a
    distributed
    system
    ASS
    FIRST

    View full-size slide

  24. remember, I do ops
    low maintenance, resilient
    systems make lazy
    sysadmins love you

    View full-size slide

  25. schema
    design

    View full-size slide

  26. 2 buckets
    hostspages

    View full-size slide

  27. hosts
    key: HTTP Host Header
    value: redirect or repo/sha map
    use: data key prefix lookup
    index: user_id

    View full-size slide

  28. hosts
    {
    “repos”: {
    “jnewland.github.com”: “deadbeef”,
    “pages”: “beadfeed”
    }
    }
    jnewland.github.com

    View full-size slide

  29. pages
    key: sha/URI
    value: HTML / other data
    use: data storage
    index: repo_id

    View full-size slide




  30. content="text/html; charset=utf-8" />
    Jesse Newland

    View full-size slide

  31. how does
    data get in?

    View full-size slide

  32. riak-ruby-client
    read files from disk
    write to riak
    update sites object
    gc old builds

    View full-size slide

  33. how does
    data get out?

    View full-size slide

  34. curl jnewland.github.com/raptor.gif
    GET /riak/hosts/jnewland.github.com
    GET /riak/pages/deadbeef/raptor.gif

    View full-size slide

  35. curl jnewland.github.com/pages/flow.png
    GET /riak/hosts/jnewland.github.com
    GET /riak/pages/beadfeed/flow.png

    View full-size slide

  36. what
    about
    CNAMEs?

    View full-size slide

  37. hosts
    example.github.com
    {
    “redirect”: “example.com”
    }

    View full-size slide

  38. hosts
    example.com
    {
    “repos”: {
    “example.com”: “deadbeef”
    }
    }

    View full-size slide

  39. curl example.github.com/raptor.gif
    GET /riak/hosts/example.github.com
    301 example.com/raptor.gif
    GET /riak/hosts/example.com
    GET /riak/pages/deadbeef/raptor.gif

    View full-size slide

  40. pages_wm_resource.erl

    View full-size slide

  41. webmachine
    is
    SO DAMN COOL

    View full-size slide

  42. %% webmachine resource exports
    -export([
    init/1,
    service_available/2,
    malformed_request/2,
    content_types_provided/2,
    resource_exists/2,
    previously_existed/2,
    moved_permanently/2,
    last_modified/2,
    generate_etag/2,
    produce_doc_body/2
    ]).

    View full-size slide

  43. grab local riak client
    check app config var
    service_available/2

    View full-size slide

  44. service_available(RD, Ctx=#ctx{riak=RiakProps,req_id=ReqId}) ->
    IdRD = wrq:set_resp_header("X-Request-Id", ReqId, RD),
    BrandedRD = wrq:set_resp_header(
    "X-GitHub-Pages-Version",
    release_handler_util:app_version(pages),
    IdRD),
    case application:get_env(pages, disabled) of
    {ok, true} ->
    {false, BrandedRD, Ctx};
    _ ->
    case riak_kv_wm_utils:get_riak_client(
    RiakProps,
    riak_kv_wm_utils:get_client_id(RD)) of
    {ok, C} ->
    {true, BrandedRD, Ctx#ctx{client=C}};
    _Error ->
    {false, BrandedRD, Ctx}
    end
    end.

    View full-size slide

  45. parse Host header, URI
    malformed_request/2

    View full-size slide

  46. malformed_request(RD, Ctx) ->
    try
    Host = wrq:get_req_header("Host", RD),
    HostWithoutPort = re:replace(
    Host,
    "\:.*",
    "",
    [{return,list}]),
    Tokens = [
    riak_kv_wm_utils:maybe_decode_uri(RD, X) ||
    X <- wrq:path_tokens(RD)],
    ParsedCtx = Ctx#ctx{tokens=Tokens,host=HostWithoutPort},
    {false, RD, ParsedCtx}
    catch
    Exception:Reason ->
    log_error({exception, Exception, Reason}, RD, Ctx),
    {true, RD, Ctx}
    end.

    View full-size slide

  47. guess mime type from path
    assume text/html for / URIs
    content_types_provided/2

    View full-size slide

  48. content_types_provided(RD, Ctx) ->
    Filename = lists:last(Ctx#ctx.tokens),
    Extension = filename:extension(Filename),
    case mochiweb_mime:from_extension(Extension) of
    undefined ->
    {[{"text/html", produce_doc_body}], RD, Ctx};
    Mime ->
    {[{Mime, produce_doc_body}], RD, Ctx}
    end.

    View full-size slide

  49. hit hosts bucket (r=1)
    stash redirect or sha
    404 if no hosts data
    resource_exists/2

    View full-size slide

  50. resource_exists(RD, Ctx) ->
    RedirectOrSha = redirect_or_sha(Ctx),
    case RedirectOrSha of
    {redirect, Redirect} ->
    {true, RD, Ctx#ctx{redirect={redirect, Redirect}}};
    {sha, Sha} ->
    page_data_exists(RD, Ctx#ctx{sha={sha, Sha}});
    _ ->
    {false, RD, Ctx}
    end.

    View full-size slide

  51. previously_existed/2
    moved_permanently/2

    View full-size slide

  52. previously_existed(RD, Ctx) ->
    case Ctx#ctx.redirect of
    {redirect, _} ->
    {true, RD, Ctx};
    _ ->
    {false, RD, Ctx}
    end.
    moved_permanently(RD, Ctx) ->
    case Ctx#ctx.redirect of
    {redirect, RedirectHost} ->
    MovedURI = list_join(lists:append(
    [RedirectHost],
    Ctx#ctx.tokens),
    "/"),
    {{true}, MovedURI, RD, Ctx};
    _ ->
    {false, RD, Ctx}
    end.

    View full-size slide

  53. hit pages bucket (r=1)
    fallback for index.html
    fallback for 404.html
    page_data_exists/2

    View full-size slide

  54. curl foo.github.com/
    GET /riak/hosts/foo.github.com
    GET /riak/pages/f0f0f0f0/
    GET /riak/pages/f0f0f0f0/index.html
    GET /riak/pages/f0f0f0f0/index.htm
    GET /riak/pages/f0f0f0f0/index.xhtml
    GET /riak/pages/f0f0f0f0/index.xml
    GET /riak/pages/f0f0f0f0/404.html

    View full-size slide

  55. < 300 lines of erlang
    simple

    View full-size slide

  56. {webmachine, [{
    dispatch_list, [{
    %% riak_kv stuff
    {["pages",'*'],pages_wm_resource,[]},
    {["pages"],pages_wm_resource,[]}]}}
    ]}
    nginx proxies to / /pages

    View full-size slide

  57. remember, I do ops
    one system service
    data store and api
    predictable performance
    busy ops best friend

    View full-size slide

  58. what’s
    next

    View full-size slide

  59. metrics with folsom
    graphite / gaug.es
    logging with lager
    HTTP caching
    ???

    View full-size slide

  60. private beta soon
    turn on / off with DNS
    repo access
    @jnewland

    View full-size slide

  61. erlang
    ruby
    ops
    c
    work with me

    View full-size slide