Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
A perfect Storm for legacy migration
Search
ryan lemmer
October 21, 2013
Programming
0
1.6k
A perfect Storm for legacy migration
EuroClojure 2013 - Berlin
ryan lemmer
October 21, 2013
Tweet
Share
More Decks by ryan lemmer
See All by ryan lemmer
Modern Haskell: making sense of the type system
ryanlemmer
1
520
Distributed Computation: dealing with Time and Failure in the wild
ryanlemmer
0
780
Other Decks in Programming
See All in Programming
dbt Pythonモデルで実現するSnowflake活用術
trsnium
0
170
チームリードになって変わったこと
isaka1022
0
200
Bedrock Agentsレスポンス解析によるAgentのOps
licux
3
850
データベースのオペレーターであるCloudNativePGがStatefulSetを使わない理由に迫る
nnaka2992
0
160
How mixi2 Uses TiDB for SNS Scalability and Performance
kanmo
38
14k
バックエンドのためのアプリ内課金入門 (サブスク編)
qnighy
8
1.8k
『品質』という言葉が嫌いな理由
korimu
0
160
ファインディの テックブログ爆誕までの軌跡
starfish719
2
1.1k
苦しいTiDBへの移行を乗り越えて快適な運用を目指す
leveragestech
0
630
時計仕掛けのCompose
mkeeda
1
300
ARA Ansible for the teams
kksat
0
150
AIの力でお手軽Chrome拡張機能作り
taiseiue
0
170
Featured
See All Featured
Measuring & Analyzing Core Web Vitals
bluesmoon
6
240
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
10
1.3k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
Why Our Code Smells
bkeepers
PRO
336
57k
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
[RailsConf 2023] Rails as a piece of cake
palkan
53
5.2k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Gamification - CAS2011
davidbonilla
80
5.1k
Scaling GitHub
holman
459
140k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
630
Writing Fast Ruby
sferik
628
61k
Transcript
@ryanlemmer a perfect storm for legacy migration CAPE TOWN @clj_ug_ct
legacy monolith Customer Accounting Billing Product Catalog CRM ... MySQL
Ruby on Rails
legacy Billing Run Customer Accounting Billing Product Catalog CRM ...
Bank Recon MySQL Ruby Ruby
legacy backlog bugs
legacy replacement replace this
legacy replacement replace substitute something that is broken, old or
inoperative
the “legacy problem” can’t fix bugs can’t add features not
performant
a “legacy solution” immutable It’s just too risky to do
in-situ changes
a “legacy solution” vintage the grapes or wine produced in
a particular season
The situation It’s not broken, just Immutable It’s valuable vintage
- still generating revenue We don’t need to “replace” We need to “make the Legacy Problem go away”
vintage migration vintage ?
vintage migration vintage We chose to migrate “financial” parts first
because it posed the highest risk to the business ?
vintage migration vintage statements MySQL Mongo & Redis
feeding off vintage vintage clients invoices ... ...
feeding off vintage statements clients invoices ? ... ...
feeding off vintage clients invoices transform old client write new
client write new invoice transform old invoice ... ...
... ... migration bridge statemen tage Big Run every night
+ incremental run every 10 mins Bridge is one-directional, Statements is read-only Imperative, sequential code
... ... new migration ? full text search stateme vintage
bridge
migration bridge: search clients invoices index- entity index-field index-field index-field
index-field index-field contacts ... ... ...
migration bridge clients invoices index-field index-field index-field index-field index-field write
client write invoice contacts index- entity search statements transform client transform invoice ... ... ... clients invoices ... ... }
... ... ... statements age search statements (batched) bridge search
About 10 million rows several hours to migrate sequentially
first pass solution Batched data migration BUT WHAT NEXT? it
was the easiest thing to do it is not performant not fault tolerant fragile because of data dependencies go parallel and distributed have fault tolerance go real-time served as scaffolding for the next solution
storm Apache Thrift + Nimbus Ingredients: Zookeeper Clojure (> 50%)
* suitable for polyglots
... storm - spouts clients index-field index-field index-field index-field index-field
write client index- entity transform client ... clients
... storm - spout SPOUT TUPLE
storm - data model TUPLE named list of values [“seekoei”
7] [“panda” 10] [147 {:name ‘John’ ...}] [253 {:name ‘Mary’ ...}] word frequency ID client
... storm - spout a SPOUT emits TUPLES UNBOUNDED STREAM
of TUPLES continuously over time a SPOUT is an
... storm - client spout [“client” {:id 147, ...}] CLIENT
SPOUT CLIENT TUPLE periodically emits a entity values
clojure spout (defspout client-‐spout ["entity" “values”] [conf context collector]
(let [next-‐client (next-‐legacy-‐client) tuple [“client” next-‐client]] (spout (nextTuple [] (Thread/sleep 100) (emit-‐spout! collector tuple)) (ack [id])))) creates a pulse
clojure spout (defspout client-‐spout ["entity" “values”] [conf context collector]
(let [next-‐client (next-‐legacy-‐client) tuple [“client” next-‐client]] (spout (nextTuple [] (Thread/sleep 100) (emit-‐spout! collector tuple)) (ack [id]))))
clojure spout [“client” {:id 147, ...}] CLIENT TUPLE (defspout client-‐spout
["entity" “values”] [conf context collector] (let [next-‐client (next-‐legacy-‐client) tuple [“client” next-‐client]] (spout (nextTuple [] (Thread/sleep 100) (emit-‐spout! collector tuple)) (ack [id])))) TUPLE SCHEMA
... storm - spout [“client” {:id 147, ...}] [“client” {:id
201, ...}] [“client” {:id 407, ...}] [“client” {:id 101, ...}] The client SPOUT packages input and emits TUPLES continuously over time
... storm - bolts transform client CLIENT SPOUT BOLT
storm - bolts (defbolt transform-‐client-‐bolt ["client"]
{:prepare true} [conf context collector] (bolt (execute [tuple] (let [h (.getValue tuple 1)] (emit-‐bolt! collector [(transform-‐tuple h)]) (ack! collector tuple)))))
storm - bolts [{:id 147, ...}] OUTGOING TUPLE [“client” {:id
147, ...}] INCOMING TUPLE (defbolt transform-‐client-‐bolt ["client"] {:prepare true} [conf context collector] (bolt (execute [tuple] (let [h (.getValue tuple 1)] (emit-‐bolt! collector [(transform-‐tuple h)]) (ack! collector tuple)))))
storm - topology (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 1)})) 1 2 ...
storm - topology (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 1)})) 1 2 ...
bolt tasks (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 1)})) 1 2 ...
bolt tasks (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 3)})) 1 2 ...
which task? (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 3)})) 1 2 ? ...
grouping - “shuffle” (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" :shuffle} transform-‐client-‐bolt :p 3)})) 1 2 ...
grouping - “ field” 1 2 ... [“active” {:id 147,
...}] [12 {:inv-id 147, ...}] TUPLE SCHEMA ["client-‐id" “invoice-‐vals”] count invoices per client (in memory)
grouping - “ field” 1 2 ... [“active” {:id 147,
...}] [12 {:inv-id 147, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [401 {:inv-id 32, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [232 {:inv-id 45, ...}] TUPLE SCHEMA ["client-‐id" “invoice-‐vals”] group by field “client-id”
grouping - “ field” (topology {"1" (spout-‐spec (client-‐spout)
:p 1)} {"2" (bolt-‐spec {"1" [“client-‐id”]} transform-‐client-‐bolt :p 3)})) 1 2 ...
grouping - “ field” 1 2 ... [“active” {:id 147,
...}] [12 {:inv-id 147, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [401 {:inv-id 32, ...}] [“active” {:id 147, ...}] [“active” {:id 147, ...}] [232 {:inv-id 45, ...}] 2 2 similar “client-id” vals go to the same Bolt Task
grouping - “ field” ... field compute aggregation
bridge - topology index-field write client write invoice index- fields
transform client transform invoice ... ... ... clients invoices contacts
storm - failure success! oops! a failure! ...
storm reliability Build a tree of tuples so that Storm
knows which tuples are related ack/fail Spouts + Bolts
storm guarantees Storm will re-process the entire tuple tree on
failure First attempt fails Storm retries the tuple tree until it succeeds
failure + idempotency write client transform client x2 x2 side-effects!
...
transactional topologies write client transform client x1 x1 run-once semantics
... strong ordering on data processing Storm Trident
search statements storm topologies real-time bridge age
topology design ... ... ...
topology design ... ... ... design the (directed) graph
grouping + parallelism index-field write client write invoice index- fields
transform client transform invoice :shuffle :shuffle :shuffle :shuffle :shuffle :shuffle :p 1 :p 1 :p 1 :p 10 :p 3 :p 3 ... ... ... tune the runtime by annotating the graph edges
topology - tuple schema [“client”] [“entity” “values”] [“invoice”] [“entity” “values”]
[“entity” “values”] [“client”] [“invoice”] [“key_val_pairs”] [“key_val”] We are actually processing streams of tuples continuously
ntage topology design clients context sales context billing context (queue)
(queue) .. .. .. .. .. ..
storm “real-time, distributed, fault-tolerant, computation system” stream processing realtime analytics
continuous computation distributed RPC ...
reflections
search statements age storm topologies vintage is first- class
search statements age storm topologies transform data
search statements age storm topologies not code refactor if you
can! (but only if it’s worth the effort)
search statements age storm topologies not a picnic because we’re
still replacing code and now we’ve added replication
but worth it Big Replace Smaller replacements In-situ changes Augment:
new alongside old Replace Evolve new Kill Starve (until irrelevant)
EUROCLOJURE Berlin 2013 thanks