Slide 1

Slide 1 text

Patterns and Pains of Migrating Legacy Applications to Kubernetes Josef Adersberger & Michael Frank, QAware Robert Bichler, Allianz Germany @adersberger @qaware

Slide 2

Slide 2 text

Michael Frank, Lead Developer, QAware Robert Bichler, Project Manager, Allianz Germany Josef Adersberger, Architect, QAware

Slide 3

Slide 3 text

CIO Let’s bring all our web applications onto a cloud native Platform

Slide 4

Slide 4 text

COSTS AVAILABILITY PRODUCTIVITY Digitalization => Agile => Cloud Native Platforms

Slide 5

Slide 5 text

Priorities: (1) Time (1,5 years) (2) Ops cost savings (3) Migration costs

Slide 6

Slide 6 text

6 WE WERE BRAVE

Slide 7

Slide 7 text

WE FELT PAIN

Slide 8

Slide 8 text

WE DISCOVERED PATTERNS

Slide 9

Slide 9 text

9 ❏ All 152 legacy applications migrated and in production within 17 months ❏ All security-hardened and modernized to containerized 12-factor-apps ❏ Benefits leveraged: strong business case, higher availability, more agile teams WE WERE SUCCESSFUL

Slide 10

Slide 10 text

The Architect’s Point of View

Slide 11

Slide 11 text

Patterns for success

Slide 12

Slide 12 text

12 Visibility

Slide 13

Slide 13 text

The Cloudalyzer Tableau analysis MIGRATION DATABASE QAVALIDATOR SONARQUBE EAM TOOL QUESTIONNAIRES JIRA XLS STATIC ANALYSIS IBM MIGRATION TOOL … MIGRATION TASKS BASIC TOUR-DE-MIGRATION SYSTEM PROPERTIES OWASP Scanner jQAssistant

Slide 14

Slide 14 text

Questionnaire: Typical questions • Technology stack (e.g. OS, appserver, jvm) • Required resources (memory, CPU cores) • Writes to storage (local/remote storage, write mode, volume) • Special requirements (native libs, special hardware) • Inbound and outbound protocols (protocol stack, TLS, multicast, dynamic ports) • Ability to execute (regression/load tests, business owner, dev knowhow, release cycle, end of life) • Client authentication (e.g. SSO, login, certificates)

Slide 15

Slide 15 text

15 Emergent design of cloud native software landscapes

Slide 16

Slide 16 text

Architecting hundreds of applications • Application Blueprint: Describing target architecture and some rules & principles • Migration Cookbook: Guidance on how to migrate the applications based on the application blueprint. Single source of truth & know-how externalization • Tour-de-Migration: Visiting all applications and collect open issues • GoLive Readiness Checklist: Criteria to be checked before GoLive APPLICATION BLUEPRINT MIGRATION COOKBOOK TOUR-DE-MIGRATION GOLIVE READINESS CHECKLIST Q1/17 Q2/17 Q3/17 Q4/17 Q1/18 Q2/18 APPLICATION MIGRATION CLOUD PLATFORM SETUP

Slide 17

Slide 17 text

APPLICATION HTTPD WEB LAYER J2EE 1.4 APPSERVER JVM 1.6 DB MQ HOST BATCH FS CLIENTS TLS 1.0+ TCP-Binary, WS, REST, C:D, LDAP Corba, SMTP, FTP, NAS, … RACF ESB ONPREM DATA CENTER ONPREM DATA CENTER DB MQ HOST BATCH FS RACF ESB KUBERNETES / OPENSHIFT DOCKER JVM 8 INNER APPLICATIONS AWS WEB LAYER AWS CLIENTS TLS 1.2 all TLS 1.2 JEE 7 APPSERVER SECURITY GATEWAY OUTER APPLICATIONS all 2-way TLS 1.2 & OIDC identity token Only data In transit The Blueprint

Slide 18

Slide 18 text

MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER

Slide 19

Slide 19 text

MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER

Slide 20

Slide 20 text

A sweet spot for legacy apps Cloud Friendly Apps … and enhance the application according the 12 factors Put the monolith into a container: do not cut, do not enhance with features in parallel

Slide 21

Slide 21 text

Sidecars to the rescue

Slide 22

Slide 22 text

Container patterns applied • Log extraction • Task scheduling Sidecar: Enhance container behaviour Ambassador: Proxy communication Adapter: Provide standardized interface • Configuration (ConfigMaps & Secrets to files) • mTLS tunnel • Circuit Breaking • Request monitoring Pod Application Container Pattern Container Other Container “Design patterns for container-based distributed systems”. Brendan Burns, David Oppenheimer. 2016

Slide 23

Slide 23 text

MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER

Slide 24

Slide 24 text

Anti-pain rule: Don’t cut the monolith

Slide 25

Slide 25 text

Anti-pain rule: Don’t cut the monolith MONOLITH SOME MAGIC SAUCE BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS BEFORE AFTER MONOLITH

Slide 26

Slide 26 text

MONOLITH INNER APPLICATIONS OUTER APPLICATIONS BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS 1+2 3 1) how to enhance cloud nativeness? 2) how to cut the monolith? 3) how to obtain an identity token? BEFORE AFTER

Slide 27

Slide 27 text

Security service to the rescue MONOLITH MONOLITH SECURITY SERVICE BACKEND CLIENTS SECURITY GATEWAY BACKEND CLIENTS BEFORE AFTER TOKEN PROVIDER IAM SYSTEMS Adapting multiple authentication mechanisms to a uniform OIDC token.

Slide 28

Slide 28 text

Kubernetes constraints Initially we thought we’ll run into k8s restrictions on our infrastructure like: ‣ No support for multicast ‣ No RWX PVC available We did. But all required refactorings were moderate effort and lead to a better architecture.

Slide 29

Slide 29 text

Pain

Slide 30

Slide 30 text

The Lead Developer’s Point of View

Slide 31

Slide 31 text

The almighty legacy framework • “worry-free package framework” from the early 2000s with about 500kLOC, 0% test coverage and multiple forks • Strategies: • the hard way: consolidate forks and migrate manually and increase coverage • decorate with ambassadors, sidekicks and adapters • do not migrate parts and replace that API within the applications APPLICATION ALMIGHTY LEGACY FRAMEWORK J2EE 1.4 APPSERVER JVM 1.6 • from J2EE 1.4 to JEE 7 and Java 6 to 8 • add identity token check and relay • modify session handling (synchronization) • modify logging (to STDOUT) • modify configuration (overwrite from ConfigMap) • enforce TLS 1.2 • place circuit breakers • predefined liveness and readiness probes

Slide 32

Slide 32 text

TIME- OUTS

Slide 33

Slide 33 text

Timeouts: The pain • Kinds • Timeouts often too high. This ... – causes bad user experience – hurts the stability of your entire cloud • Unable to distinguish errors from legitimate waits • Diminishes self healing capabilities • Promotes cascading failures Con Pool Server Socket getConnection connect read connection TTL/keepAlive

Slide 34

Slide 34 text

Timeouts: The pain • Kinds • Timeouts often too high. This ... – causes bad user experience – hurts the stability of your entire cloud • Unable to distinguish errors from legitimate waits • Diminishes self healing capabilities • Promotes cascading failures Con Pool Server Socket getConnection connect read connection TTL/keepAlive

Slide 35

Slide 35 text

Timeouts: Recommendations • Keep timeouts within the following ranges – 1-3s for getConnection & connect – 3-60s for socket/read - aim as low as possible – 1-3min for TTL/KeepAlive of pooled connections • Allow for dynamic DNS changes and dynamic scaling of backend services • Tradeoff between reaction time and performance • Cascade timeouts – outer layer highest – inner layer lowest 60s 57s 54s 51s

Slide 36

Slide 36 text

LATENCY

Slide 37

Slide 37 text

Latency • Pain: Dramatic increase in latency You can't scale away latency! – Every layer and new infrastructure component adds processing time – Everything TLS1.2 secured adds processing time – Physical distance: Cloud -> OnPrem • Heaviest impact on n+1 patterns in applications – Adjust batch/fetch size – Parallel fetch – Ultima ratio: on prem (lightweight) service layer close to DB • General – Performance experts in support team – Caching – Use diagnosability tools...

Slide 38

Slide 38 text

Latency • Pain: Dramatic increase in latency You can't scale away latency! – Every layer and new infrastructure component adds processing time – Everything TLS1.2 secured adds processing time – Physical distance: Cloud -> OnPrem • Heaviest impact on n+1 patterns in applications – Adjust batch/fetch size – Parallel fetch – Ultima ratio: on prem (lightweight) service layer close to DB • General – Performance experts in support team – Caching – Use diagnosability tools...

Slide 39

Slide 39 text

DIAGNO- SABILITY

Slide 40

Slide 40 text

Diagnosability 1. Early on - diagnose cloud platform issues upfront 2. Holistic - monitor and correlate everything (infrastructure & apps, multiple levels, metrics & logs & traces) 3. Mandatory - everyone has to use it 4. Automatically - auto-instrumentation not involving devs

Slide 41

Slide 41 text

Metrics Events / Logs Traces • High effort to instrument for valuable insights • Scalability unclear for hundreds of applications • Applications have no time to run their own Prometheus instance • Scalability unclear for hundreds of applications (Jaeger & ZipKin) • Applications have no time to run their own instance • Scalability unclear (a lot of events lost) • Applications have no time to run their own EFK instance • Non-standardized log format requires custom log rewrite adapter but no fluentd DaemonSet Application Diagnosability?

Slide 42

Slide 42 text

Metrics Events / Logs Traces … use APM tools like Dynatrace and Instana Want to move fast? Buy first, reduce cost later Application Diagnosability

Slide 43

Slide 43 text

SESSION STATE

Slide 44

Slide 44 text

Session state 1. Session Stickiness: not within the cloud! 2. Session Persistence • Existing DB: perf impact to high ☹ • Redis: no TLS out of the box and infrastructure required ☹ 3. Session Synchronization • App-Server: no dynamic peer lookup within k8s ☹ • Hazelcast: TLS only in paid enterprise edition ☹ • ...

Slide 45

Slide 45 text

Session synchronization with Ignite • Apache Ignite as in-memory data grid – Embedded within application or standalone (in sidecar) – Cumbersome but working k8s peer lookup • Look out for ... – Java serialization – Legacy frameworks with custom session handling – Prevent generating sessions for e.g. health check requests – Applications putting large things into the “session” and misuse session as cache

Slide 46

Slide 46 text

#@!!#@$

Slide 47

Slide 47 text

Other technical pain points Pain Pattern Legacy crypto without TLS 1.2 and SNI support (e.g. Java 1.6) ● Find matching cipher suites ● Add a security proxy Legacy apps violating HTTP standards Refactor Access source URLs in redirect loops (e.g. IDP login) Use x-forwarded header and provide according filter No automated test suites ● Automated high-level tests ● Test generation (e.g. evosuite)?

Slide 48

Slide 48 text

The Project Manager’s Point of View

Slide 49

Slide 49 text

Patterns for success

Slide 50

Slide 50 text

Management support ❏ Strong management support ❏ Clear scope ❏ Courage to drive the change to cloud native development

Slide 51

Slide 51 text

Project Marketing & Motivation Identification & Celebration

Slide 52

Slide 52 text

Co-Location space One LEAP-Area ❏ Support- & ❏ Industrialization team ❏ In case of required support: Migration team

Slide 53

Slide 53 text

Industrialization

Slide 54

Slide 54 text

ARCHITECTURE TEAM DOZENS OF MIGRATION PROJECTS RUNNING IN PARALLEL (organized in release trains) ‣ Training sessions ‣ Support sessions ‣ Co-Location & remote ‣ Guidance / best practice sharing (cookbook, sample application) ‣ Unified development environment (via GitHub) ‣ Standard base images ‣ Pre-migrated frameworks ‣ Solutions: Security service, ambassadors INDUSTRIALIZATION TEAM ‣ Application blueprint ‣ Migration database SUPPORT TEAM ‣ Feedback

Slide 55

Slide 55 text

Transparency & information radiators App-Support Activities & Milestones Quality GoLive Planning Operational

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

No content