Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building distributed applications with riak_core

Building distributed applications with riak_core

Presents https://github.com/mrallen1/udon - an example application which demonstrates how to write a distributed application using riak_core, including handoff implementation. From Erlang Factory SF 2015

Jade Allen

March 26, 2015
Tweet

More Decks by Jade Allen

Other Decks in Programming

Transcript

  1. About  me • Sys  admin  and  software  developer  for  almost

      20  years  now. • Work  at  Basho  Technologies  (purveyors  of  fine   databasses) • Coding  Erlang for  about  3  years  now. • Very  interested  in  distributed  systems.
  2. What  is  riak_core? • A  modular  distributed  systems  library •

    Provides  consistent  hashing  functions • Focus  on  your  application  not  on  the  plumbing   of  a  distributed  application.
  3. Concepts:  Consistent  Hash • Limits  reshuffling  of  keys  when  hash

     table   data  structure  is  rebalanced. • riak_core uses  consistent  hashing  to   determine  where  to  store  data  on  a   primary  replica  as  well  as  fallbacks  if  the   primary  is  offline.
  4. Concepts:  Virtual  Node • AKA  "vnode" • An  Erlang process

     which  represents  a  slice  of   the  ring  hash  space. • Number  of  data  replicas  is  tunable  (by  default,   there  are  3)
  5. What  you  get  "out  of  the  box" • "Physical"  cluster

     state  management,   • Ring  state  management,   • Vnode placement,   • Vnode replication,   • Cluster  and  ring  state  gossip  protocols, • Consistent  hashing  utilities   • Handoff  activities • Covering  set  callbacks  (optional)
  6. Contrast  Distributed  Erlang • Sits  at  a  lower  level.  Necessary

     but  not   sufficient  to  build  robust  distributed  apps. • Have  to  manage  a  lot  of  tricky  things  with  a  lot   of  non-­‐intuitive  failure  modes. – Node  failure – Vnode placement  updates
  7. Is  it  a  good  fit? Riak_core makes  certain  assumptions  about

     the   application  you  want  to  build  on  top  it. – It  expects  you  to  have  a  "key"  that  links  to  a  blob   of  data. – The  key  (or  rather  its  chash)  determines  its   primary  vnode and  adjacent  replicas. – The  data  itself  is  opaque  and  has  application   context. – Presumably  you  want  behavior  semantics  that  are   different  than  riak_kv with  your  data.
  8. riak_core gaps • riak_core has  a  lot  of  capabilities  but

     also   some  gaps  you  should  know  about. – Cluster  membership  is  controlled  by  a  human – Yes,  even  when  a  node  failure  has  been  (correctly)   detected  by  the  cluster  manager. – vnode distribution  around  the  ring  is  sometimes   suboptimal  (this  is  not  always  easy  to  fix) • You  should  know  that  Basho  uses/tests/ships   its  own  OTP  (but  17.X  works  fine;  YMMV)
  9. Example  App:  Udon • Distributed  static  file  web  server •

    Why? • Key  =  URL-­‐like  path • Data  =  static  file  at  that  URL
  10. Getting  started $ git clone https://github.com/basho/rebar_riak_core $ cd rebar_riak_core; make

    install; cd .. $ mkdir foo; cd foo $ rebar create template=riak_core appid=foo
  11. udon/ !"" LICENSE !"" Makefile !"" README.md !"" rebar !""

    rebar.config !"" rel # !"" files # # !"" app.config # # !"" erl # # !"" nodetool # # !"" udon # # !"" udon-admin # # $"" vm.args # !"" gen_dev # !"" reltool.config # !"" vars # # $"" dev_vars.config.src # $"" vars.config $"" src !"" udon.app.src !"" udon.erl !"" udon.hrl !"" udon_app.erl !"" udon_console.erl !"" udon_node_event_handler.erl !"" udon_ring_event_handler.erl !"" udon_sup.erl $"" udon_vnode.erl
  12. Write  an  API • ping/0  – you  get  this  for

     free! • store/2  (Path,  Data) • fetch/1  (Path) • redirect/2  (OldPath,  NewPath)  – not   implemented • serve_file_version/2  (Path,  Version)  – also  not   implemented
  13. How  does  this  even  work? • Take  the  path  and

     hash  it  using  MD5  (object   name),  make  some  metadata • Object  name  is  fed  into  the  CHASH  function   for  vnode placement • In  the  vnode,  update  some  metadata  about   the  object  (file  version) • Store  the  metadata  as  (object.meta)  and  the   data  as  (object.version)  on  disk
  14. !"" 0 # !"" 098f6bcd4621d373cade4e832627b4f6.1 # $"" 098f6bcd4621d373cade4e832627b4f6.meta !"" 1004782375664995756265033322492444576013453623296

    !"" 1027618338748291114361965898003636498195577569280 ... !"" 1415829711164312202009819681693899175291684651008 # !"" 098f6bcd4621d373cade4e832627b4f6.1 # $"" 098f6bcd4621d373cade4e832627b4f6.meta !"" 1438665674247607560106752257205091097473808596992 # !"" 098f6bcd4621d373cade4e832627b4f6.1 # $"" 098f6bcd4621d373cade4e832627b4f6.meta ... !"" 91343852333181432387730302044767688728495783936 !"" 913438523331814323877303020447676887284957839360 # !"" 5a105e8b9d40e1329780d62ea2265d8a.1 # $"" 5a105e8b9d40e1329780d62ea2265d8a.meta !"" 936274486415109681974235595958868809467081785344 !"" 959110449498405040071168171470060731649205731328 $"" 981946412581700398168100746981252653831329677312 Udon on-­‐disk  layout
  15. Fetch  implementation fetch(Path) -> PHash = path_to_hash(Path), Idx = riak_core_util:chash_key(?KEY(PHash)),

    %% TODO: Get results from more than one node [{Node, _Type}] = riak_core_apl:get_primary_apl( Idx, 1, udon), riak_core_vnode_master:sync_spawn_command( Node, {fetch, PHash}, udon_vnode_master).
  16. Fetch  Vnode Implementation handle_command({fetch, PHash}, _Sender, State) -> MetaPath =

    make_metadata_path(State, PHash), Res = case filelib:is_regular(MetaPath) of true -> MD = get_metadata(State, PHash), get_data(State, MD); % returns {ok, Data} false -> not_found end, {reply, {Res, filename:join([ make_base_path(State), make_filename(PHash)])}, State};
  17. Store  implementation store(Path, Data) -> N = 3, % number

    of nodes to contact W = 3, % number of writes Timeout = 5000, % millisecs PHash = path_to_hash(Path), PRec = #file{ request_path = Path, path_md5 = PHash, csum = erlang:adler32(Data) }, {ok, ReqId} = udon_op_fsm:op( N, W, {store, PRec, Data}, ?KEY(PHash)), wait_for_reqid(ReqId, Timeout).
  18. Store  Vnode Implementation handle_command({RequestId, {store, R, Data}}, _Sender, State) ->

    MetaPath = make_metadata_path(State, R), NewVersion = case filelib:is_regular(MetaPath) of true -> OldMD = get_metadata(State, R), OldMD#file.version + 1; false -> 1 end, {MetaResult, DataResult, Loc} = store(State, R#file{version=NewVersion}, Data), {reply, {RequestId, {MetaResult, DataResult, Loc}}, State};
  19. Handoffs • There  are  4  distinct  types  of  handoffs  in

      riak_core:   – ownership,   – hinted,   – repair  and, – resize
  20. Implementing  handoff • At  a  high  level  what  are  the

     goals  of  a  handoff   implementation? – Per  vnode,  get  the  objects  associated  with  that   vnode. – Per  object,  find/compute  the  data  needed  to   transfer  object  state  to  another  node – Serialize  the  object  state  and  send  it – At  the  receiving  end,  deserialize and  store  state
  21. Implementing  udon handoff Per  vnode,  get  objects: get_all_objects(State) -> [

    strip_meta(F) || F <- filelib:wildcard("*.meta", make_base_path(State)) ]. Reminder:  Base  path  is  “udon_data/$partition_id”
  22. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  23. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  24. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  25. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  26. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  27. Implementing  udon handoff handle_handoff_command( ?FOLD_REQ{foldfun=VisitFun, acc0=Acc0}, _Sender, State) -> AllObjects

    = get_all_objects(State), Base = make_base_path(State), Do = fun(Object, AccIn) -> %% Already seen this... end, Final = lists:foldl(Do, Acc0, AllObjects), {reply, Final, State};
  28. The  mysterious  VisitFun • riak_core_handoff_sender:visit_ item/3 • Does  the  work

     of  calling  your  serialization   callback  (encode_handoff_item/2)  with  the   {Bucket,  Key},  Data  as  parameters • Sends  the  serialized  data  to  the  receiver
  29. Receiving  handoff  data handle_handoff_data(Data, State) -> {Meta, Blob} = binary_to_term(Data),

    R = case Meta#file.csum =:= erlang:adler32(Blob) of true -> store(State, Meta, Blob), ok; false -> {error, file_checksum_differs} end, {reply, R, State}.
  30. Recent  riak_core resources • Riak_core in  small  bytes:   http://marianoguerra.github.io/presentations/riak-­‐core-­‐

    small-­‐bytes-­‐berlin-­‐efl-­‐2014.html • YouTube  video  of  above  talk:  https://youtu.be/rJXM33EieGQ • FlavioDB project:  https://github.com/marianoguerra/flaviodb • Udon project:  https://github.com/mrallen1/udon • Older  resources  are  still  useful  from  a  conceptual   understanding  but  often  are  “bitrotted”  from  a  running  code   perspective.