Re-architecting in GANMA!

by Naoki Aoyama - @aoiroaoino

Slide 1

Slide 1 text

Re-architecting in GANMA! 2020-10-17 ScalaMatsuri 2020 - Day1 Naoki Aoyama - @aoiroaoino

Slide 2

Slide 2 text

❖ Naoki Aoyama ❖ Twitter/GitHub: @aoiroaoino ❖ Working at: $ whoami

Slide 3

Slide 3 text

And team members

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Agenda ➢ Introduction ○ Why did we perform re-architecting? ➢ How did we perform re-architecting? ○ Infrastructure and Backend application Improving ➢ To avoid adding to Technical debt ○ Development Team Initiatives ➢ Conclusion どのようにリアーキテクチャを行なったのか、技術的負債を増やさない対策をしたのか

Slide 10

Slide 10 text

> Introduction

Slide 11

Slide 11 text

Why did we perform re-architecting? 7 years have passed since starting a project and have a lot of technical debt. プロジェクト開始から7年が経過し、多くの技術的負債が溜まっていました

Slide 12

Slide 12 text

Why did we perform re-architecting? It was becoming impossible to ignore the negative eﬀects of technical debt that could aﬀect the business. ビジネスに影響を及ぼしかねない、技術的負債による弊害を無視できない状態だった

Slide 13

Slide 13 text

Why did we perform re-architecting? e.g. If there's an system failure during a high-traﬃc time, you won't be able to read the comic. 例: アクセスが集中する時間帯に障害が発生すれば、マンガが読めない

Slide 14

Slide 14 text

Why did we perform re-architecting? e.g. Exhausted engineers quit, after days of conﬂicting with technical debt and system troubleshooting. 例: 技術的負債や障害対応に追われる日々が続いて疲弊し、エンジニアが辞めてしまう

Slide 15

Slide 15 text

We want to work on it. But... ➔ We can't stop feature development ➔ We can’t stop backend system 24/7 ➔ There's no time to pay oﬀ the technical debt ➔ Almost none people involved in the initial development of the system サービスも機能開発も止められない、時間も足りない、初期開発メンバーもほぼいない

Slide 16

Slide 16 text

Therefore... There is no choice but to continue steady improvement activities little by little. 少しずつ、地道な改善活動を続けるしかない

Slide 17

Slide 17 text

> How did we perform re-architecting?

Slide 18

Slide 18 text

Objectives Increase system availability and maintainability Create a new implementation policy and de facto standard 可用性と保守性を高め、新しい実装の方針、デファクトスタンダードを作る

Slide 19

Slide 19 text

What we’ll talk about 今回話す内容、リアーキテクチャ対象について Infrastructure Backend Application Web Browser iOS App Android App User

Slide 20

Slide 20 text

What we’ll talk about 今回話す内容、リアーキテクチャ対象について Infrastructure Backend Application Web Browser iOS App Android App User

Slide 21

Slide 21 text

How did we perform re-architecting? ❖ There were issues with both the infrastructure and backend application. ➢ Diﬃcult to solve just by refactoring the application ❖ We decided to focus on improving the infrastructure ﬁrst and then improve the application. アプリケーションのリファクタリングだけでは根本解決が難しいので、インフラ改善から着手

Slide 22

Slide 22 text

>> Infrastructure Improving

Slide 23

Slide 23 text

Legacy Infrastructure and Release ﬂow ➔ The environment was built manually and Chef on AWS ➔ Create and deploy artifacts in Fabric ➔ Launching backend application on Amazon EC2 instance 大半が手動で AWS 上に構築されたインフラで、EC2 インスタンスに Fabric でデプロイしていた

Slide 24

Slide 24 text

Legacy Infrastructure and Release ﬂow Manager Server (EC2 Instance) これまでのシステム、管理サーバーやアプリケーションサーバー群の構成イメージ App Servers (EC2 Instances) Load Balancer

Slide 25

Slide 25 text

Legacy Infrastructure and Release ﬂow Update settings with Chef 手動で構築したインフラに Chef を適用して設定を変更したり

Slide 26

Slide 26 text

Legacy Infrastructure and Release ﬂow CI and create JAR ﬁle リモートリポジトリに git push し、管理サーバーにて CI と JAR ファイルの生成を行い git push

Slide 27

Slide 27 text

Legacy Infrastructure and Release ﬂow Deploying with Fabric Fabric で EC2 インスタンス群にデプロイする

Slide 28

Slide 28 text

Legacy Infrastructure and Release ﬂow ➔ Auto Scaling is not possible ➔ Chef, Fabric are practically unmaintainable due to the secret sauce ➔ The current infrastructure is not reproducible Auto Scaling ができず、Chef や Fabric が秘伝のタレでインフラの再現が難しい状態

Slide 29

Slide 29 text

Decided to use Amazon EKS To Kubernetes cluster Kubernetes クラスターへ

Slide 30

Slide 30 text

Decided to use Amazon EKS ❖ We was already using a lot of AWS services ➢ As a result of the test, we determined that it was practical and could be operated by us if it was managed ❖ The team had engineers familiar with Cloud Native technology ➢ Not EKS, but the k8s itself had already been introduced in the company EKS を採用。社内導入事例もあり、検証して実用可能と判断

Slide 31

Slide 31 text

Preparing for Infrastructure Migration ❖ Re-examine the configuration ➢ Supports Auto Scaling ❖ Using Terraform for Configuration Management ➢ Managing configuration with declarative statements ❖ Prepare load test scenarios with Gatling ➢ Added the ability to simulate the load on the system during operation ➢ To verify that the newly built environment meets the required specifications Auto Scaling に対応し、構成を宣言的記述で管理。負荷テスト環境を構築した。

Slide 32

Slide 32 text

How to migrate Infrastructure GitLab インフラのマイグレーション方法 Old System Create JAR ﬁle CI git push example.com Note: Database and so on are shared and will be omitted.

Slide 33

Slide 33 text

Application build generates docker images New Cluster GitLab 新しいクラスターを用意し、アプリケーションをデプロイできるようにした Old System Push to ECR CI git push example.com Note: Database and so on are shared and will be omitted.

Slide 34

Slide 34 text

App launch, load testing and client integration testing New Cluster GitLab 新規に構築したクラスターに対して負荷試験や結合試験を実施 Old System example.com Note: Database and so on are shared and will be omitted. Load test using Gatling

Slide 35

Slide 35 text

Canary release while monitoring the system New Cluster GitLab Old System example.com Note: Database and so on are shared and will be omitted. Switching DNS カナリアリリース実施。問題が発生しないか監視しつつ、徐々にトラフィックを切り替える・・・

Slide 36

Slide 36 text

Switch traﬃc by DNS and migration is complete New Cluster GitLab トラフィックが全て新クラスターの方に切り替わり、移行作業は完了 Old System example.com Note: Database and so on are shared and will be omitted. Switching DNS

Slide 37

Slide 37 text

Modern Infrastructure and Release ﬂow Kubernetes Cluster GitLab 移行後のシステム、クラスターの構成イメージ Amazon ECR Load Balancer

Slide 38

Slide 38 text

Modern Infrastructure and Release ﬂow Kubernetes Cluster git push CI Push to ECR GitLab へ git push すると CI が実行され、ECR へ docker image が push される

Slide 39

Slide 39 text

Modern Infrastructure and Release flow Kubernetes Cluster helmfile apply GitLab の CI/CD 機能から helmfile apply を実行し、マニフェストを更新

Slide 40

Slide 40 text

Modern Infrastructure and Release ﬂow Kubernetes Cluster Pull and rolling update ECR からイメージを取得し、ローリングアップデートされてデプロイ完了

Slide 41

Slide 41 text

Infrastructure Improving - before/after Before After Infrastructure EC2 Instances Kubernetes Infrastructure as Code Manually(Chef) Terraform Deploy Fabric Helmfile(Helm) Artifact JAR File Docker Image インフラ構築/構成/運用に利用される技術スタックの変化

Slide 42

Slide 42 text

Result of Infrastructure Improving ❖ The division of the Backend Application is now more ﬂexible ➢ Terraform allows us to rebuild our own systems from the infrastructure ➢ Existing members of the team are now experienced in building initial infrastructure ❖ It led to reduced operational costs ➢ All application engineers can now manage the infrastructure as well アプリケーション分割に自由度が生まれ、再構築が可能になり、コスト削減にもつながった

Slide 43

Slide 43 text

>> Application Improving

Slide 44

Slide 44 text

Our Monolithic Application issues ➔ Single sbt-project ➔ Roughly Layered Architecture ➔ Tightly coupled with Play framework name := "API Server" version := "1.0.0" scalaVersion := "2.11.8" lazy val root = (project in file(".")) .enablePlugins(PlayScala) libraryDependencies ++= Seq( // ... ) scalacOptions += Seq( /* ... */ ) javaOptions += Seq( /* ... */ ) build.sbt Play にべったりのおおよそレイヤードアーキテクチャで単独の sbt プロジェクト

Slide 45

Slide 45 text

Our Monolithic Application issues Package Structure ➔ Single sbt-project ➔ Roughly Layered Architecture ➔ Tightly coupled with Play framework Infrastructure Application Domain UI (JSON) Play にべったりのおおよそレイヤードアーキテクチャで単独の sbt プロジェクト

Slide 46

Slide 46 text

Our Monolithic Application issues ➔ Infra layer depended domain models ➔ Odd DI patterns (like Service Locator) everywhere ➔ Complex multi-stage cache ➔ Limited release time ➔ etc... 課題: インフラレイヤ依存のドメイン、独特な DI、複雑な cache、限られるリリース時刻など

Slide 47

Slide 47 text

Our Monolithic Application issues Monolith to Modular Monolith 「モノリス」から「モジュラモノリス」へ

Slide 48

Slide 48 text

Why didn't we go with microservices? ❖ It was difficult for the team structure. ❖ The aggregation was not sufficiently analyzed. ➢ It was decided that the first step was to better define the context boundaries of the application. ❖ There were many considerations ➢ The first priority is to develop features. Then we had to get used to developing and operating on the new infrastructure. ➢ Technology verification is underway. マイクロサービス化実施せず。機能開発、新インフラの運用に慣れることを優先。技術検証中

Slide 49

Slide 49 text

Analyze application context boundaries Backend Application (sbt-project) ❖ We analyzed how our monolithic application features 我々のモノリシックなアプリケーションがどのような機能を持っているのかを改めて分析

Slide 50

Slide 50 text

Analyze application context boundaries Backend Application (sbt-project) ❖ We determined whether or not to split these feature in terms of their independence and release cycle. Manga Management feature Manga Distribution feature Foo feature Bar feature ・・・これらについて機能の独立性、リリースサイクル等の観点で分割するか否か判断

Slide 51

Slide 51 text

Make it an independent repository Backend Application (sbt-project) Manga Management feature Manga Distribution feature Foo feature Bar feature ・・・ Foo feature Foo Application (sbt-project) ❖ Another git repository ❖ Diﬀerent release cycle 独立可能な(リリースサイクルを別にできる)機能を切り出し、別のリポジトリで管理する

Slide 52

Slide 52 text

Make it an independent sub-project Backend Application (sbt-project) Manga Management feature Manga Distribution feature Bar feature ❖ As a result, we're left with "something that is independent as a feature but wants to be released together". 結果として「機能としては独立しているが、リリースは一緒にしたいもの」が残った

Slide 53

Slide 53 text

Make it an independent sub-project Backend Application (sbt-project) Manga Management sub-project Manga Distribution sub-project Bar sub-project ❖ What remains is split as a sub-project of sbt in terms of dependency control and independence. 残したものは依存関係の制御や独立性の観点から sbt の sub-project として分割

Slide 54

Slide 54 text

Make it an independent sub-project ❖ Easy to manage dependencies between features ❖ Localized settings ❖ Easier to test and conﬁrm operation lazy val root = project .aggregate( mangaManagement, mangaDistribution, bar ) lazy val mangaManagement = project lazy val mangaDistribution = project lazy val bar = project build.sbt sub-project に切り出したことで依存管理や設定の局所化、テスト /動作確認が行いやすくなる

Slide 55

Slide 55 text

More detailed Re-architecting Re-architecting per separated modules 分割されたモジュール毎に、更にリアーキテクチャを行う

Slide 56

Slide 56 text

Updates are roughly completed in one aggregate case class Manga( id: MangaId, title: String, // ... ) trait MangaRepository { def store(manga: Manga): Future[Unit] def resolveBy(id: MangaId): Future[Option[Manga]] // ... } Backend Application case class UpdateRequest( title: Option[String], subtitle: Option[String], // ... ) Response: OK 更新系は集約単位でエンティティの取得、変更、永続化で完結

Slide 57

Slide 57 text

final case class Manga(id: MangaId, /* ... */) trait MangaRepository { // ... } final case class Author(id: AuthorId, /* ... */) trait AuthorRepository { // ... } final case class Page(id: PageId, /* ... */) trait PageRepository { // ... } Ineﬃcient data acquisition from multiple aggregates Backend Application case class MangaResponse( title: String, authorName: String, authorProfile: String, pages: Seq[PageResponse], // ... ) Request: MagazineId=xxx レスポンスを組み立てる為に複数の集約から結果整合でデータを取得するので効率が悪い

Slide 58

Slide 58 text

final case class Manga(id: MangaId, /* ... */) trait MangaRepository { // ... } final case class Author(id: AuthorId, /* ... */) trait AuthorRepository { // ... } final case class Page(id: PageId, /* ... */) trait PageRepository { // ... } It was caching per entity Backend Application case class MangaResponse( title: String, authorName: String, authorProfile: String, pages: Seq[PageResponse], // ... ) Request: MagazineId=xxx Cache エンティティ毎にキャッシュをしていた

Slide 59

Slide 59 text

Separate models for reading and writing ➔ Use a common domain model ➔ API response consisting of multiple aggregates ➔ Will request multiple queries from the DB. Diﬃcult to JOIN in SQL ➔ Discrepancies in data structures required by read/write are a factor 読み込み/書き込み、それぞれの場面で求められるドメインモデルは構造が異なる

Slide 60

Slide 60 text

Separate models for reading and writing CQRS コマンドクエリ責務分離の導入

Slide 61

Slide 61 text

Our policy on CQRS ❖ Database is shared ❖ We want to keep the release cycle the same, so we'll separate it in build.sbt ❖ The sbt project setting is split based on the port/adapter pattern データベースは共有とした。ヘキサゴナルアーキテクチャをベースに sbt-project を分割

Slide 62

Slide 62 text

Our policy on CQRS - Command lazy val commandModel = project lazy val commandUseCase = project .dependsOn(commandModel) lazy val commandAdapterRDB = project .dependsOn(commandModel, commandUseCase) lazy val commandAdapterHTTP = project .dependsOn(commandModel, commandUseCase) lazy val commandMain = project .dependsOn(commandModel, commandUseCase, commandAdapterRDB, commandAdapterHTTP) commandAdapter(s) commandModel commandUseCase commandMain Command の sbt-project 定義と構成の概要

Slide 63

Slide 63 text

Our policy on CQRS - Query lazy val queryUseCase = project lazy val queryAdapterRDB = project .dependsOn(queryUseCase) lazy val queryAdapterHTTP = project .dependsOn(queryUseCase, queryAdapterRDB) lazy val queryMain = project .dependsOn(queryUseCase, queryAdapterRDB, queryAdapterHTTP) queryUseCase queryAdapter(s) queryMain Query の sbt-project 定義と構成の概要

Slide 64

Slide 64 text

For migration, root depends on all sbt-projects lazy val root = project .dependsOn( // Manga Management mangaManagement, // Manga Distribution commandModel, commandUseCase, commandAdapterRDB, commandAdapterHTTP, commandMain, queryUseCase, queryAdapterRDB, queryAdapterHTTP, queryMain, // other bar ) .aggregate( /* ... */ ) 移行のため root は全ての sbt-project に依存させる

Slide 65

Slide 65 text

commandAdapter(s) For migration, root depends on all sbt-projects Domain Layer commandModel commandUseCase root queryUseCase queryAdapter(s) Service Layer Infra Layer Command Query Backend Application (sbt-project) 移行のため root は全ての sbt-project に依存させる

Slide 66

Slide 66 text

Migration to the new architecture ❖ We've made old projects dependent on new projects ❖ The API implementation was moved (re-implemented), categorized by command / query コマンド/クエリの分類をしながら古い実装を移動 (再実装)していった

Slide 67

Slide 67 text

Migration to the new architecture The rest is just a matter of time. Everything is ﬁne ... あとは粛々と進めるだけ。全て順調 ...

Slide 68

Slide 68 text

Unspeciﬁed API responses ➔ There was no DTO and the response JSON was dynamically assembled ➔ The structure cannot be ignored in order to migrate the implementation without changing the behavior of the API ➔ There was a demand for this in the development of new API ➔ It's hard to documentation with OpenAPI レスポンス JSON が動的に組み立てられていた。構造をドキュメント化したいが OpenAPI は辛い

Slide 69

Slide 69 text

Original API Specification Language We developed Outer DSL to define the API specifications API 仕様を定義する独自の DSL を開発

Slide 70

Slide 70 text

Slide 71

Slide 71 text

Original API Speciﬁcation Language endpoint getAccount { GET /api/v1/accounts/{id} summary "Get Account Information" tags "account" request { // Some request parameters } response 200 { body { success: true data: AccountResponse } } response 404 {} } URL, Overview, and other API information API 仕様記述言語: URL や概要などの情報を記述

Slide 72

Slide 72 text

Original API Speciﬁcation Language endpoint getAccount { GET /api/v1/accounts/{id} summary "Get Account Information" tags "account" request { // Some request parameters } response 200 { body { success: true data: AccountResponse } } response 404 {} } Request parameters. Headers, forms, query strings, etc. API 仕様記述言語: ヘッダーやフォーム、クエリ文字列などリクエストを定義

Slide 73

Slide 73 text

Original API Speciﬁcation Language endpoint getAccount { GET /api/v1/accounts/{id} summary "Get Account Information" tags "account" request { // Some request parameters } response 200 { body { success: true data: AccountResponse } } response 404 {} } Response data. Status, Headers, body, etc API 仕様記述言語: ステータスやヘッダー、ボディの構造などレスポンスを定義

Slide 74

Slide 74 text

Original API Speciﬁcation Language endpoint getAccount { GET /api/v1/accounts/{id} summary "Get Account Information" tags "account" request { // Some request parameters. } response 200 { body { success: true data: AccountResponse } } response 404 {} } Generate openapi: 3.0.0 ... paths: /api/v1/accounts/{id}: get: operationId: getAccount parameters: ... responses: '200': content: application/json: schema: properties: data: $ref: '#/components/schemas/AccountResponse' success: enum: - 'true' type: boolean required: - success - data type: object description: '' '404': ... API 仕様記述言語から OpenAPI の YAML を生成する

Slide 75

Slide 75 text

Slide 76

Slide 76 text

Original API Speciﬁcation Language type EmailAddress = String type AccountResponse = { name: String emailAddress: EmailAddress age?: Int32 } DSL - API Speciﬁcation Language API 仕様記述言語で定義されたデータ構造から直接 Scala のコードを生成し、DTO として利用

Slide 77

Slide 77 text

Original API Speciﬁcation Language type EmailAddress = String type AccountResponse = { name: String emailAddress: EmailAddress age?: Int32 } object generated { type EmailAddress = String final case class AccountResponse( name: String, emailAddress: EmailAddress, age: Option[Int] ) } Generate DSL - API Speciﬁcation Language Generated Scala Code (DTO) API 仕様記述言語で定義されたデータ構造から直接 Scala のコードを生成し、DTO として利用

Slide 78

Slide 78 text

Original API Specification Language ❖ Save time on specification definition ❖ No need to test the JSON structure, so less testing time is required ❖ The generated DTO makes it easier to implement new features DSL を開発したことで仕様定義やテスト、 API の移行や新機能の開発について工数を削減できた

Slide 79

Slide 79 text

Result of Application Improving ❖ Dividing it up by feature has made it easier to estimate the work ➢ Reduced CI and release time ❖ Open to expand and close to modify compared to before re-architecture ➢ It also led to a system for inter-service communication in the k8s cluster 分割したことで作業見積もりのしやすさや CI/CD の時間削減、機能の追加変更に強くなった

Slide 80

Slide 80 text

> To avoid adding to Technical debt

Slide 81

Slide 81 text

Prevention is also important Even if you do your best to repay your technical debt, there is no point in adding it faster than that. 技術的負債を増やさないよう予防する

Slide 82

Slide 82 text

Focus on design and documentation ❖ Flexible development flow based on the size of the development item ➢ Design is always necessary, but you can flexibly switch between them depending on the scale of the project and the time it takes to work. ➢ If it is complicated, implement user story mapping etc ❖ Incorporated the definition of communication format by DSL into the flow ➢ Developers can now seamlessly define specifications between server and client 規模感に応じた設計/開発の流れを柔軟に。API 仕様定義 DSL を開発フローに組み込んだ

Slide 83

Slide 83 text

An eﬀort called “Camp” Happy “Camp” time 楽しい “キャンプ” の時間

Slide 84

Slide 84 text

No content

Slide 85

Slide 85 text

No content

Slide 86

Slide 86 text

An eﬀort called “Camp” ❖ It’s like pre-season training for a baseball team. ➢ It is not a Camping ➢ Held once a quarter, two weeks ❖ Do “Not Urgent but Important” tasks. ➢ No feature development tickets will be implemented. ❖ Contributes not only to repayment of technical debt, but also to elimination of events that could become debt in the future 四半期に一度「重要だけど緊急でない」作業に二週間がっつり取り組む通称「キャンプ」を実施

Slide 87

Slide 87 text

> Conclusion

Slide 88

Slide 88 text

Conclusion We chose to re-architect and continue to make steady improvements. 我々はリアーキテクチャを選び、地道に改善を続けることを選んだ

Slide 89

Slide 89 text

Result of Infrastructure Improving (re-post) ❖ The division of the Backend Application is now more ﬂexible ➢ Terraform allows us to rebuild our own systems from the infrastructure ➢ Existing members of the team are now experienced in building initial infrastructure ❖ It led to reduced operational costs ➢ All application engineers can now manage the infrastructure as well アプリケーション分割に自由度が生まれ、再構築が可能になり、コスト削減にもつながった

Slide 90

Slide 90 text

Result of Application Improving (re-post) ❖ Dividing it up by feature has made it easier to estimate the work ➢ Reduced CI and release time ❖ Open to expand and close to modify compared to before re-architecture ➢ It also led to a system for inter-service communication in the k8s cluster 分割したことで作業見積もりのしやすさや CI/CD の時間削減、機能の追加変更に強くなった

Slide 91

Slide 91 text

Achieve our objectives Increase system availability and maintainability Create a new implementation policy and de facto standard 可用性と保守性を高める事に成功し、新しい実装の方針、デファクトスタンダードを確立した

Slide 92

Slide 92 text

Conclusion ❖ As a result of accumulating small improvements, we have achieved great results ➢ We didn't choose to system replace or big-rewrite ➢ It took a while, but I was able to see and feel the changes ❖ Re-architecture given us a ﬂexible system ➢ The groundwork has been laid for the introduction of advanced technology. ➢ This is passage. To be continued… ❖ We have an ongoing system in place to confront technical debt ➢ An approach from both a technical debt repayment and prevention perspective. 小さな改善を積み重ね、柔軟なシステムを得た。技術的負債と向き合う体制も整備。改善はつづく