A perfect Storm for legacy migration

D2ab8b4fe9ddebc6774777e5f4549304?s=47 ryan lemmer
October 21, 2013

A perfect Storm for legacy migration

EuroClojure 2013 - Berlin

D2ab8b4fe9ddebc6774777e5f4549304?s=128

ryan lemmer

October 21, 2013
Tweet

Transcript

  1. @ryanlemmer a perfect storm for legacy migration CAPE TOWN @clj_ug_ct

  2. legacy monolith Customer Accounting Billing Product Catalog CRM ... MySQL

    Ruby on Rails
  3. legacy Billing Run Customer Accounting Billing Product Catalog CRM ...

    Bank Recon MySQL Ruby Ruby
  4. legacy backlog bugs

  5. legacy replacement replace this

  6. legacy replacement replace substitute something that is broken, old or

    inoperative
  7. the “legacy problem” can’t fix bugs can’t add features not

    performant
  8. a “legacy solution” immutable It’s just too risky to do

    in-situ changes
  9. a “legacy solution” vintage the grapes or wine produced in

    a particular season
  10. The situation It’s not broken, just Immutable It’s valuable vintage

    - still generating revenue We don’t need to “replace” We need to “make the Legacy Problem go away”
  11. vintage migration vintage ?

  12. vintage migration vintage We chose to migrate “financial” parts first

    because it posed the highest risk to the business ?
  13. vintage migration vintage statements MySQL Mongo & Redis

  14. feeding off vintage vintage clients invoices ... ...

  15. feeding off vintage statements clients invoices ? ... ...

  16. feeding off vintage clients invoices transform old client write new

    client write new invoice transform old invoice ... ...
  17. ... ... migration bridge statemen tage Big Run every night

    + incremental run every 10 mins Bridge is one-directional, Statements is read-only Imperative, sequential code
  18. ... ... new migration ? full text search stateme vintage

    bridge
  19. migration bridge: search clients invoices index- entity index-field index-field index-field

    index-field index-field contacts ... ... ...
  20. migration bridge clients invoices index-field index-field index-field index-field index-field write

    client write invoice contacts index- entity search statements transform client transform invoice ... ... ... clients invoices ... ... }
  21. ... ... ... statements age search statements (batched) bridge search

    About 10 million rows several hours to migrate sequentially
  22. first pass solution Batched data migration BUT WHAT NEXT? it

    was the easiest thing to do it is not performant not fault tolerant fragile because of data dependencies go parallel and distributed have fault tolerance go real-time served as scaffolding for the next solution
  23. storm Apache Thrift + Nimbus Ingredients: Zookeeper Clojure (> 50%)

    * suitable for polyglots
  24. ... storm - spouts clients index-field index-field index-field index-field index-field

    write client index- entity transform client ... clients
  25. ... storm - spout SPOUT TUPLE

  26. storm - data model TUPLE named list of values [“seekoei”

    7] [“panda” 10] [147 {:name ‘John’ ...}] [253 {:name ‘Mary’ ...}] word frequency ID client
  27. ... storm - spout a SPOUT emits TUPLES UNBOUNDED STREAM

    of TUPLES continuously over time a SPOUT is an
  28. ... storm - client spout [“client” {:id 147, ...}] CLIENT

    SPOUT CLIENT TUPLE periodically emits a entity values
  29. clojure spout (defspout  client-­‐spout  ["entity"  “values”]    [conf  context  collector]

       (let  [next-­‐client  (next-­‐legacy-­‐client)                tuple              [“client”  next-­‐client]]        (spout          (nextTuple  []              (Thread/sleep  100)              (emit-­‐spout!  collector  tuple))          (ack  [id])))) creates a pulse
  30. clojure spout (defspout  client-­‐spout  ["entity"  “values”]    [conf  context  collector]

       (let  [next-­‐client  (next-­‐legacy-­‐client)                tuple              [“client”  next-­‐client]]        (spout          (nextTuple  []              (Thread/sleep  100)              (emit-­‐spout!  collector  tuple))          (ack  [id]))))
  31. clojure spout [“client” {:id 147, ...}] CLIENT TUPLE (defspout  client-­‐spout

     ["entity"  “values”]    [conf  context  collector]    (let  [next-­‐client  (next-­‐legacy-­‐client)                tuple              [“client”  next-­‐client]]        (spout          (nextTuple  []              (Thread/sleep  100)              (emit-­‐spout!  collector  tuple))          (ack  [id])))) TUPLE SCHEMA
  32. ... storm - spout [“client” {:id 147, ...}] [“client” {:id

    201, ...}] [“client” {:id 407, ...}] [“client” {:id 101, ...}] The client SPOUT packages input and emits TUPLES continuously over time
  33. ... storm - bolts transform client CLIENT SPOUT BOLT

  34. storm - bolts (defbolt  transform-­‐client-­‐bolt  ["client"]        

           {:prepare  true}                [conf  context  collector]        (bolt          (execute  [tuple]              (let  [h  (.getValue  tuple  1)]                  (emit-­‐bolt!  collector  [(transform-­‐tuple  h)])                  (ack!  collector  tuple)))))
  35. storm - bolts [{:id 147, ...}] OUTGOING TUPLE [“client” {:id

    147, ...}] INCOMING TUPLE (defbolt  transform-­‐client-­‐bolt  ["client"]                {:prepare  true}                [conf  context  collector]        (bolt          (execute  [tuple]              (let  [h  (.getValue  tuple  1)]                  (emit-­‐bolt!  collector  [(transform-­‐tuple  h)])                  (ack!  collector  tuple)))))
  36. storm - topology (topology      {"1"  (spout-­‐spec  (client-­‐spout)  

                                           :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  1)})) 1 2 ...
  37. storm - topology (topology      {"1"  (spout-­‐spec  (client-­‐spout)  

                                           :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  1)})) 1 2 ...
  38. bolt tasks (topology      {"1"  (spout-­‐spec  (client-­‐spout)    

                                         :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  1)})) 1 2 ...
  39. bolt tasks (topology      {"1"  (spout-­‐spec  (client-­‐spout)    

                                         :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  3)})) 1 2 ...
  40. which task? (topology      {"1"  (spout-­‐spec  (client-­‐spout)    

                                         :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  3)})) 1 2 ? ...
  41. grouping - “shuffle” (topology      {"1"  (spout-­‐spec  (client-­‐spout)  

                                           :p  1)}      {"2"  (bolt-­‐spec  {"1"  :shuffle}                                      transform-­‐client-­‐bolt                                      :p  3)})) 1 2 ...
  42. grouping - “ field” 1 2 ... [“active” {:id 147,

    ...}] [12 {:inv-id 147, ...}] TUPLE SCHEMA ["client-­‐id"  “invoice-­‐vals”] count invoices per client (in memory)
  43. grouping - “ field” 1 2 ... [“active” {:id 147,

    ...}] [12 {:inv-id 147, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [401 {:inv-id 32, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [232 {:inv-id 45, ...}] TUPLE SCHEMA ["client-­‐id"  “invoice-­‐vals”] group by field “client-id”
  44. grouping - “ field” (topology      {"1"  (spout-­‐spec  (client-­‐spout)

                                             :p  1)}      {"2"  (bolt-­‐spec  {"1"  [“client-­‐id”]}                                      transform-­‐client-­‐bolt                                      :p  3)})) 1 2 ...
  45. grouping - “ field” 1 2 ... [“active” {:id 147,

    ...}] [12 {:inv-id 147, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [401 {:inv-id 32, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [232 {:inv-id 45, ...}] 2 2 similar “client-id” vals go to the same Bolt Task
  46. grouping - “ field” ... field compute aggregation

  47. bridge - topology index-field write client write invoice index- fields

    transform client transform invoice ... ... ... clients invoices contacts
  48. storm - failure success! oops! a failure! ...

  49. storm reliability Build a tree of tuples so that Storm

    knows which tuples are related ack/fail Spouts + Bolts
  50. storm guarantees Storm will re-process the entire tuple tree on

    failure First attempt fails Storm retries the tuple tree until it succeeds
  51. failure + idempotency write client transform client x2 x2 side-effects!

    ...
  52. transactional topologies write client transform client x1 x1 run-once semantics

    ... strong ordering on data processing Storm Trident
  53. search statements storm topologies real-time bridge age

  54. topology design ... ... ...

  55. topology design ... ... ... design the (directed) graph

  56. grouping + parallelism index-field write client write invoice index- fields

    transform client transform invoice :shuffle :shuffle :shuffle :shuffle :shuffle :shuffle :p  1 :p  1 :p  1 :p  10 :p  3 :p  3 ... ... ... tune the runtime by annotating the graph edges
  57. topology - tuple schema [“client”] [“entity” “values”] [“invoice”] [“entity” “values”]

    [“entity” “values”] [“client”] [“invoice”] [“key_val_pairs”] [“key_val”] We are actually processing streams of tuples continuously
  58. ntage topology design clients context sales context billing context (queue)

    (queue) .. .. .. .. .. ..
  59. storm “real-time, distributed, fault-tolerant, computation system” stream processing realtime analytics

    continuous computation distributed RPC ...
  60. reflections

  61. search statements age storm topologies vintage is first- class

  62. search statements age storm topologies transform data

  63. search statements age storm topologies not code refactor if you

    can! (but only if it’s worth the effort)
  64. search statements age storm topologies not a picnic because we’re

    still replacing code and now we’ve added replication
  65. but worth it Big Replace Smaller replacements In-situ changes Augment:

    new alongside old Replace Evolve new Kill Starve (until irrelevant)
  66. EUROCLOJURE Berlin 2013 thanks