Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A perfect Storm for legacy migration

ryan lemmer
October 21, 2013

A perfect Storm for legacy migration

EuroClojure 2013 - Berlin

ryan lemmer

October 21, 2013
Tweet

More Decks by ryan lemmer

Other Decks in Programming

Transcript

  1. The situation It’s not broken, just Immutable It’s valuable vintage

    - still generating revenue We don’t need to “replace” We need to “make the Legacy Problem go away”
  2. vintage migration vintage We chose to migrate “financial” parts first

    because it posed the highest risk to the business ?
  3. feeding off vintage clients invoices transform old client write new

    client write new invoice transform old invoice ... ...
  4. ... ... migration bridge statemen tage Big Run every night

    + incremental run every 10 mins Bridge is one-directional, Statements is read-only Imperative, sequential code
  5. migration bridge clients invoices index-field index-field index-field index-field index-field write

    client write invoice contacts index- entity search statements transform client transform invoice ... ... ... clients invoices ... ... }
  6. ... ... ... statements age search statements (batched) bridge search

    About 10 million rows several hours to migrate sequentially
  7. first pass solution Batched data migration BUT WHAT NEXT? it

    was the easiest thing to do it is not performant not fault tolerant fragile because of data dependencies go parallel and distributed have fault tolerance go real-time served as scaffolding for the next solution
  8. storm - data model TUPLE named list of values [“seekoei”

    7] [“panda” 10] [147 {:name ‘John’ ...}] [253 {:name ‘Mary’ ...}] word frequency ID client
  9. ... storm - spout a SPOUT emits TUPLES UNBOUNDED STREAM

    of TUPLES continuously over time a SPOUT is an
  10. ... storm - client spout [“client” {:id 147, ...}] CLIENT

    SPOUT CLIENT TUPLE periodically emits a entity values
  11. clojure spout (defspout  client-­‐spout  ["entity"  “values”]    [conf  context  collector]

       (let  [next-­‐client  (next-­‐legacy-­‐client)                tuple              [“client”  next-­‐client]]        (spout          (nextTuple  []              (Thread/sleep  100)              (emit-­‐spout!  collector  tuple))          (ack  [id])))) creates a pulse
  12. clojure spout (defspout  client-­‐spout  ["entity"  “values”]    [conf  context  collector]

       (let  [next-­‐client  (next-­‐legacy-­‐client)                tuple              [“client”  next-­‐client]]        (spout          (nextTuple  []              (Thread/sleep  100)              (emit-­‐spout!  collector  tuple))          (ack  [id]))))
  13. clojure spout [“client” {:id 147, ...}] CLIENT TUPLE (defspout  client-­‐spout

     ["entity"  “values”]    [conf  context  collector]    (let  [next-­‐client  (next-­‐legacy-­‐client)                tuple              [“client”  next-­‐client]]        (spout          (nextTuple  []              (Thread/sleep  100)              (emit-­‐spout!  collector  tuple))          (ack  [id])))) TUPLE SCHEMA
  14. ... storm - spout [“client” {:id 147, ...}] [“client” {:id

    201, ...}] [“client” {:id 407, ...}] [“client” {:id 101, ...}] The client SPOUT packages input and emits TUPLES continuously over time
  15. storm - bolts (defbolt  transform-­‐client-­‐bolt  ["client"]        

           {:prepare  true}                [conf  context  collector]        (bolt          (execute  [tuple]              (let  [h  (.getValue  tuple  1)]                  (emit-­‐bolt!  collector  [(transform-­‐tuple  h)])                  (ack!  collector  tuple)))))
  16. storm - bolts [{:id 147, ...}] OUTGOING TUPLE [“client” {:id

    147, ...}] INCOMING TUPLE (defbolt  transform-­‐client-­‐bolt  ["client"]                {:prepare  true}                [conf  context  collector]        (bolt          (execute  [tuple]              (let  [h  (.getValue  tuple  1)]                  (emit-­‐bolt!  collector  [(transform-­‐tuple  h)])                  (ack!  collector  tuple)))))
  17. storm - topology (topology      {"1"  (spout-­‐spec  (client-­‐spout)  

                                           :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  1)})) 1 2 ...
  18. storm - topology (topology      {"1"  (spout-­‐spec  (client-­‐spout)  

                                           :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  1)})) 1 2 ...
  19. bolt tasks (topology      {"1"  (spout-­‐spec  (client-­‐spout)    

                                         :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  1)})) 1 2 ...
  20. bolt tasks (topology      {"1"  (spout-­‐spec  (client-­‐spout)    

                                         :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  3)})) 1 2 ...
  21. which task? (topology      {"1"  (spout-­‐spec  (client-­‐spout)    

                                         :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  3)})) 1 2 ? ...
  22. grouping - “shuffle” (topology      {"1"  (spout-­‐spec  (client-­‐spout)  

                                           :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  3)})) 1 2 ...
  23. grouping - “ field” 1 2 ... [“active” {:id 147,

    ...}] [12 {:inv-id 147, ...}] TUPLE SCHEMA ["client-­‐id"  “invoice-­‐vals”] count invoices per client (in memory)
  24. grouping - “ field” 1 2 ... [“active” {:id 147,

    ...}] [12 {:inv-id 147, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [401 {:inv-id 32, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [232 {:inv-id 45, ...}] TUPLE SCHEMA ["client-­‐id"  “invoice-­‐vals”] group by field “client-id”
  25. grouping - “ field” (topology      {"1"  (spout-­‐spec  (client-­‐spout)

                                             :p  1)}      {"2"  (bolt-­‐spec  {"1"  [“client-­‐id”]}                                      transform-­‐client-­‐bolt                                      :p  3)})) 1 2 ...
  26. grouping - “ field” 1 2 ... [“active” {:id 147,

    ...}] [12 {:inv-id 147, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [401 {:inv-id 32, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [232 {:inv-id 45, ...}] 2 2 similar “client-id” vals go to the same Bolt Task
  27. bridge - topology index-field write client write invoice index- fields

    transform client transform invoice ... ... ... clients invoices contacts
  28. storm reliability Build a tree of tuples so that Storm

    knows which tuples are related ack/fail Spouts + Bolts
  29. storm guarantees Storm will re-process the entire tuple tree on

    failure First attempt fails Storm retries the tuple tree until it succeeds
  30. transactional topologies write client transform client x1 x1 run-once semantics

    ... strong ordering on data processing Storm Trident
  31. grouping + parallelism index-field write client write invoice index- fields

    transform client transform invoice :shuffle :shuffle :shuffle :shuffle :shuffle :shuffle :p  1 :p  1 :p  1 :p  10 :p  3 :p  3 ... ... ... tune the runtime by annotating the graph edges
  32. topology - tuple schema [“client”] [“entity” “values”] [“invoice”] [“entity” “values”]

    [“entity” “values”] [“client”] [“invoice”] [“key_val_pairs”] [“key_val”] We are actually processing streams of tuples continuously
  33. search statements age storm topologies not code refactor if you

    can! (but only if it’s worth the effort)
  34. search statements age storm topologies not a picnic because we’re

    still replacing code and now we’ve added replication
  35. but worth it Big Replace Smaller replacements In-situ changes Augment:

    new alongside old Replace Evolve new Kill Starve (until irrelevant)