History of Falcon, the way to production release

History of Falcon, the way to production release Junichi Kato
(@j5ik2o) ChatWorkのScala採用プロダクト “Falcon” リリースまでの失敗と成功の歴史

Self Introduction • Approximately 6 years Scala experience. • A
backend software engineer developing a "business chat" service: ChatWork • Responsible to architect and develop backends of ChatWork are my job. 自己紹介。ChatWorkでエンジニアとして働いています

Agenda • The history of the "Falcon" project which has
been released at the end of 2016 ◦ “ChatWork” chat service ◦ History of Falcon Project ▪ Phase-1: Live Migration Project ▪ Rebooting Falcon ▪ POC(= Proof Of Concept) ▪ Phase-2: Production Development ▪ DevOps ▪ Finally Released ◦ Conclusion アジェンダ。

About ChatWork • ChatWork is a chat service for business
instead of mail or personal chat. ChatWorkは、メール・チャットに変わるビジネスチャットです。 • Number Of Clients ◦ 124,000 companies (as of the end of Jan, 2017) • Country / Region ◦ 205 places • Best of Business Chat • Support for iOS, Android, Web • ISO27001(ISMS) and ISO27018 Certificated • Functions ◦ Group Messaging ◦ Task Management ◦ File Sharing ◦ Video Conferencing

Scale of User Generated Data チャットワークユーザが生成するデータの規模 rapid increase of messages
! Number of . 5th Annivers ary 6th Anniversa ry Chat Rooms 2.4 million 4.2million Messages 1 billion 1.8 billion Tasks 37 million 60 million Files 64 million 133 million

Background of Developing ChatWork • In 2010, ChatWork was developed
for a internal product, built on PHP framework. • Development for business opportunities led to technical debts. • the system cannot support increasing data and loads. チャットワーク開発の経緯

Way to re-implementation • Occurred events by the technical debts.
delayed delivery-time, system down trouble by SPoF, increasing workloads etc • After that, the technical debts partially was improved, but they were supportive countermeasures. • Eventually, We decided to re-implement it, because it became difficult to extend it any more. • Of course, that is not easy. チャットワークの再実装

We chose Scala • Scala won in our training camp.
• The reasons are ◦ Maintainability and performance are high ◦ From dynamic typed to static typed, success stories ◦ AWS SDK for Java is the most fulfilling. ◦ Congeniality of Scala and real-time proccesing for chat ◦ Even PHP engineers became be able to coding by Scala as quickly as possible. Scalaの採用決定

I joined ChatWork • At July 2014, I joined ChatWork
for migration to Scala. • Approximately 6 years Scala experience. ◦ REST API Server by Play2, for VOD Service ◦ Chat Server by Finagle with Akka • After that, we started the server side project that adopted Scala in ChatWork. このタイミングで入社しました

Phase1: Live Migration Project P1: ライブマイグレーションプロジェクト

Phase1: Strategies for Migrating Architecture • To minimize the impact
of stable legacy systems. ◦ Don’t modify existing code as much as possible ◦ The new system should be migrated without maintenance with downtime. ◦ Don’t migrate existing data. • Include rooms, messages, tasks, files, contacts in function scope. P1: アーキテクチャ移行のための戦略

Phase1: Our Project Team Structure • Since 07/2014 • Team
Structure (Total 19 memebers) ◦ Falcon Team (New Server Side by Scala) ▪ 8 members (I belong to this team.) ◦ Phoenix Team (Legacy Service Side by PHP) ▪ 5 members ◦ iOS Team(New-Version iOS Application Team) ▪ 6 members • Note: The number of members means final. It has grown as hiring Scala engineers. P1: プロジェクトチーム体制

Phase1: Function Scope • Chat Room (is a collection of
messages) ◦ Creating the Chat Room or Updating MetaData of it ◦ Posting Messages, Updating them, Deleting them ◦ Adding Members, Removing them, Modifying Role of them ◦ Uploading Files, Deleting them ◦ Adding Tasks, Updating them, Deleting them • Contact (indicate connections between users) ◦ Applying contacts, Reject them, Approving them • FalconID (is 64bit ID by generating distributed id-workers) ◦ Generating 64 bit ID with distributed id-worker ◦ Mapping old id with it. P1: 機能スコープ

Phase1 : Architecture Overview • The former "master data" had
been persisted in RDS of Legacy system. Access to the master data is via the Phoenix API. • Falcon receives IOEvent that occurred in Legacy system.With IOEvent as a trigger, the system constructs the event to be delivered by the stream and the model cache in DynamoDB. • The new client uses Falcon external API and stream API. • internal-api performs Id generation and Id mapping. P1: アーキテクチャ概要

Phase1: Context Map of DDD • The downstream customer depends
on the upstream supplier. • At planning time, the downstream behaves as the customer to the upstream. At running time, the upstream behave as the interface supplier. • Actually the communication of our teams were very complicated. It was a difficult problem together with technical issues. P1: アーキテクチャ概要 Falcon (as Customer, Supplier) iOS Team (as Customer) ChatWork Web (as Supplier) Phoenix (as Customer/Supplier)

• Specifications and implementations side ◦ Missing specifications spawned one
after another. ◦ Too much DynamoDB I/O cost due to overused secondary-index. ◦ Pheonix API server is overloaded than expected. ◦ High ID mapping cost. ◦ limit of the managed service’ s performance. • Project side ◦ Project definition is ambiguous. ◦ The review of each sprint was not enough. ◦ The Integration-testing between subsystems was delayed. ◦ Exhaustion due to long-term development. • Scala itself had no major problems. The true problems was about project management, function-scope, and performance. Phase1: Various Problems that occurred P1: 発生した様々な問題

Phase1: Make a Tough Decision • Rescheduling the project repeatedly
occurred around 2015. It eventually resulted in suspension in January 2016... • We reviewed why we failed. • There were many problems, but the good results were obtained. ◦ The size and complexity of our challenge was reconfirmed concretely. ◦ Our strong team was organized to solve complex issues. ◦ Our practices for Akka and DDD was deepened. Especially We wanted to make Akka's ability more apply effectively to our applications. 苦渋の決断

Rebooting the Project • We welcomed a new leader and
project management and strategy were totally revised. • New Project Strategy ◦ To be the robust architecture for infrastructure system. ◦ Clarification of business and technology issues to be solved. ◦ POC is MUST. ◦ Clarification of final non-functional requirements. ▪ Decrease infrastructure cost by 30% ▪ 15 billion messages / month ▪ 500k writes/s, 5000k reads/s (100 times the legacy system) ◦ The Data Migration with down time was accepted instead of Live Migration to cope with the rapidly increasing data volume. プロジェクトの再起動

• POC Bootcamp(2016/1) ◦ Prototyping and review My Best Falcon
Application with each members. • Properties that the system should satisfy ◦ Scalability(High throughput, Low latency) ◦ Resiliency(Non SPoF, Backoff recovery) ◦ twice the number of concurrent connections and R/W throughput. ◦ Low cost ◦ Functionality (based on DDD) • Requirement ◦ AWS ◦ CQRS + Event Sourcing ◦ Reactive Systems POC: Objective of “Proof of Concept” POC: POCの目的

• Since 2/2016 • Target scope is the messaging function
contains chat room and member. • As architecture, CQRS+ES was adopted because reading requests are more than writing requests, depending on chat characteristics. ◦ akka-http, akka-actor, akka-stream, akka-persistence(-query). ◦ our commponents are write-api, read-api, read-model-updater. ◦ Layered architecture on our applications is Hexagonal-Architecture. • Infrastructure and middleware ◦ AWS EC2, ELB ◦ Deployment tool is Lightbend ConductR. ◦ Write DB is Cassandra, Read DB is Aurora ▪ These DBs was selected to handle easy with Akka as a temporary option. In production, other options were choiced. POC: Verification for Risk Hedging POC: リスクヘッジのための検証

• Write API uses ClusterSharding and PersistentActor as Aggregate. •
Aggregate generates domain events from the received commands then adds them to the write db. • ReadModelUpdater consumes domain events and constructs read-models asynchronously. • Read API is non-cluster and stateless , has functions to return a flattened read-models. • Multiple layers(Interface, UseCase, Domain etc) of "Hexagonal Architecture" in application, and each layers are composed with stream DSL (of akka-stream). POC: Architecture Overview POC: アーキテクチャ概要

• Instance Type ◦ c3.x2large(vCPU = 8, Mem = 15GB)
◦ Cassandra(m3.xlarge x 3) ◦ Aurora(db.r3.2xlarge, write x 1, replica x 2) • Throughput (from Write to Read) ◦ random request ◦ About 5,000 users concurrency ◦ Almost linear and scale out possible. ◦ KOs are zero. • Posting messages ◦ 3 nodes, 2,000 users concurrency, 2000rps(120krpm) response time is 90pct max 30ms ! POC: Result of POC(1/2) POC: 成果(1/2)

POC: Result of POC (2/2) POC: 成果(2/2) • Our adoption
of akka cluster had many operational problems to make it the production service level within a short period of time. ◦ How to solve the Split-Brain problem in 2-AZ? it’s impossible. ◦ In our requirements, stateful actors were overkill and high operational cost . ▪ Stateful actors are not effective because retrieving old data are few. ▪ must be ‘ClusterSharding’ for stateful actors ◦ Even other methods with low operation costs was able to satisfied our requirements. • Cassandra ◦ Estimated 24 hrs to re-create failure node. ◦ The data distribution method by DHT and virtual node are not intuitive and difficult to understand. • Aurora ◦ Write performance cannot scale well in a single master manner. Sharding can solve it but needs expensive development and operation.

Phase2: Production Development P2: プロダクション向け開発

Phase2: Re-Architecture from POC P2: POCからのリアーキテクチャ • akka-cluster was not
adopted for reduction of operation cost, then to be stateless actors on APIs. • For write-db, Kafka replaced Cassandra as write storage ◦ straightforward append-only domain event storage with great produce/consume rate performance • For read-db, HBase replaced Aurora as read storage. ◦ Auto sharding based on row key on the storage level, and Master/Slave configuration is intuitive and easy to understand. ◦ Underlying HDFS is fault tolerant and easy to manage • Only focused on messaging system ◦ core function that has many dependent features (e.g. tasks, files) ◦ the highest business risk ◦ the largest business opportunity

• Since 7/2016 • Team Structure (Total 11 members) ◦
Falcon Team ▪ 4 members (I belong to this team) ◦ Data Migration Team ▪ 1 members ◦ Sparrow Team (Legacy Service Side by PHP) ▪ 3 members ◦ Infrastructure Team ▪ 3 members • Note: Since the early stages, our starting members is above. Phase2: Our Project Structure P2: プロジェクト体制

P2: アーキテクチャ概要 • Concept ◦ Backend service providing messaging function
to Legacy system. ◦ Storage selection was changed but CQRS+ES was kept. • Components ◦ ReadModelUpdater uses Kafka Streams. ◦ Sparrow is mediator system bridging Falcon and the legacy system. ◦ The Domain Events to the legacy system are sent from sparrow-forwarder to sparrow. ◦ SparrowForwarder propagates domain events to Sparrow. Phase2 : Architecture Overview

Phase2: Context Map of DDD • Simpler Context Map of
DDD than Phase1. • Inter-team communication structure became simple as well. P2: コンテキストマップ Web, iOS, Android (as Existing Customer) Falcon (as Supplier) ChatWork (contains Sparrow) (as Customer/Supplier)

• System Configurations ◦ c3.xlarge(vCPU = 4, Mem = 7.5GB)
* 7 ▪ Write API * 2, Read API * 4, ▪ ReadModelUpdater * 2, SparrowForwarder * 2 • Post Message API ◦ 3000 users concurrency, throughput mean 2.6Kreq/s (latency 95percentile 104ms) ▪ max 70 req/s at exsiting system (37 times throughput) • Get Message API ◦ 1340 users concurrency, throughput mean 1.2Kreq/s (latency 95percentile 62.9ms) ▪ max 1.3 Kreq/s at exsiting system Phase2: Results of Stress Test P2: 負荷試験結果

Phase2: Data Migration(1/2) • Data Migration project aimed to migrate
message data from Aurora to HBase. Minimizing service downtime is the most important mission. • Considering them, the migration strategy was decided as follows ◦ Basic Migration ▪ All data except 4 days before final maintenance. ◦ Incremental Migration ▪ For INSERT, difference is based on ID increase from previous migration. ▪ For UPDATE, difference is based on binlog from previous migration. ◦ Verification After Migration ▪ It is checked whether column data on HBase matches column data on Aurora. P2: データマイグレーション(1/2)

Phase2: Data Migration(2/2) • Data Migration engine: ◦ Spark •
performance ◦ Execution of Basic Migration ▪ 3.5 hours (1.6 billion messages、60million chat rooms) ◦ Verification of Basic Migration ▪ 7.5 hours ◦ Execution of Incremental Migration ▪ 1 hour ◦ Verification of Incremental Migration ▪ 1 hour P2: データマイグレーション(2/2)

• Existing issues ◦ It isn’t easy for developers to
flexibly construct infrastructure for application development. because it is necessary to collaborate with infrastructure personnel. Collaboration with them has been made more efficient, and the design such as deployment, provisioning, scaling etc needs to be flexible. • Countermeasure ◦ coreos/kube-aws was adopted ▪ kube-aws is tool and the installation artifacts for kubernetes on aws, developed by CoreOS. • Create, update and destroy Kubernetes clusters on AWS • Highly available and scalable Kubernetes clusters backed by multi-AZ deployment and Node Pools. • Powered by various AWS services including CloudFormation, KMS, Auto Scaling, Spot Fleet, EC2, ELB, S3, etc. ◦ concourse/concourse was adopted ▪ Concourse is a pipeline-based CI system written in Go, developed by Pivotal. treats build pipelines and artifacts as first-class citizens. ▪ In ThoughtWork's TECNOLOGY-RADAR 11/2015, the concourse-ci is contained in tools that 'ACCESS' category. DevOps: Improving Development Efficiency DevOps: 開発効率の向上

DevOps : Falcon Infrastructure by kube-aws DevOps: kube-awsによるFalconインフラ • kubelet
is the primary “node agent” that runs on each node. • kube-proxy runs on each node. • APIs validates and configures data for the api objects which include pods, services, replication-controllers, and others. • Pod is a group of one or more containers, the shared storage for those containers, and options about how to run the containers. • Falcon applications are deployed as Pods via helm(is package manager for k8s).

DevOps : Concourse CI (1/2) DepOps: Concourse CI (1/2) •
Core Concepts ◦ End goal of Concourse is to provide an expressive system with as few distinct moving parts as possible. • Resources ◦ A resource is any entity that can be checked for new versions, pulled down at a specific version, and/or pushed up to idempotently create new versions. • Jobs ◦ At a high level, a job describes some actions to perform when dependent resources change (or when manually triggered). Build Job Git Resource Deploy Job

DevOps : Concourse CI (2/2) DepOps: Concourse CI (2/2) •
Tasks ◦ A task is the execution of a script in an isolated environment with dependent resources available to it. Build Task Notification Task

Finally Release • The final release started at midnight December
29th, 2016, finished after 7 hours later. It succeeded! • We are grateful for cheering messages from the Scala community. Thank you very much! • Performance after release ◦ As expected, Falcon achieves high throughput, low latency, resilliency. ◦ And improvements to achieve the final goal will continue. ついにリリースへ

Conclusion • Falcon was released though twists and turns. •
Success Factors ◦ Clarification of Project Strategy ▪ The technical methods of achieving the project's goal was clarified. ◦ Risk Hedging by POC ▪ Verification the potential of CQRS + ES with Akka ◦ Re-Architecture from POC ▪ Review with consideration of operation costs ▪ Function Scope Limitation ◦ Data migration accepting downtime ◦ Improving Development Efficiency by k8s, concourse-ci • As a result, we succeeded in adoption an excellent architecture (CQRS+ES, Akka, Kafka, HBase) based on the verification. まとめ

Thank you for listening! ご静聴ありがとうございました。

History of Falcon, the way to production release

History of Falcon, the way to production release

More Decks by かとじゅん

Other Decks in Technology

Featured

Transcript