Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

Self Introduction › Server-Side Software Engineer of LINE Official Account Manager › Previously developed Smart Channel, LINE Ads, and LINE Points Ads › Joined to LINE in 2015 as a new graduate

Slide 3

Slide 3 text

National Survey for COVID-19

Slide 4

Slide 4 text

National Survey for COVID-19 › Conducted 5 times of surveys › Provided answer data to the Ministry of Health, Labor and Welfare › Surveyed the physical condition and awareness of infection prevention

Slide 5

Slide 5 text

Numbers of the Survey Message Recipients 83M Preparation Days 6 Answers 25M As of March 2020

Slide 6

Slide 6 text

Background › In March, the number of positives was increasing in Japan › Data was needed Open Data by the Ministry of Health, Labor and Welfare https://www.mhlw.go.jp/stf/covid-19/open-data.html 0 50 100 150 200 250 300 350 400 450 2/1 2/8 2/15 2/22 2/29 3/7 3/14 3/21 3/28 4/4 Number of Positives April 7: The State of Emergency was declared March 25: The project was started Asked to stay home on weekends in Tokyo

Slide 7

Slide 7 text

What We Can Do? › Collect a huge amount of data in a short period › Take the initiative in the survey

Slide 8

Slide 8 text

Options › Need a lot of time to communicate with partners Collaborate with other companies Develop a new system from scratch › Can design specifically for the survey Use our existing service (LINE Research) › Does not meet the purpose

Slide 9

Slide 9 text

Timeline Day 7 Release Day 5 Start QA Day 2 Start Development Day 6 Finish QA Day 4 Finish Development Day 1 Start Project Only 3 Days

Slide 10

Slide 10 text

Development Team › Supported by › Infrastructure engineers › Security engineers › Data scientists › DBAs › Engineers of collaborating services › And more... Server-side Front-end Planner Core Team

Slide 11

Slide 11 text

Related Articles › LINEユーザー8300万人を対象とするアンケートを1週間で開発するには? コロナ禍におけるLINEの施策とフロントエンド開発 › https://logmi.jp/tech/articles/322999 Front-end Development Product Management › リリースまで6日、LINE『新型コロナ対策のための全国調査』の舞台裏 › https://careerhack.en-japan.com/report/detail/1422

Slide 12

Slide 12 text

Development Challenges Develop rapidly Handle high traffic Never stop the system due to trouble

Slide 13

Slide 13 text

Policy to Develop in 3 Days Develop a simple system in the minimum specification Develop a system step by step Avoid system troubles by design

Slide 14

Slide 14 text

First Step Survey Page / Answer Store

Slide 15

Slide 15 text

User Experience Message Survey Page Thanks Page Chat Room (Flex Message) In-App Browser (LIFF Platform)

Slide 16

Slide 16 text

First Idea (Not Adopted) › Concerns › Time to implement › Stability when high traffic › Reduce the amount of code › Reduce time to implementation › Reduce unexpected behaviors Web App Client Master Data Serve Survey Page LB App Server Answer Store Send Answers

Slide 17

Slide 17 text

Survey Page / Answer Store First Step nginx Client Serve Survey Page (Static File) Open Survey Page LB App Server Answer Store

Slide 18

Slide 18 text

Sending Answers Message Answer Page Thanks Page Entry Event Submit Event Answer of First Question All Answers

Slide 19

Slide 19 text

Survey Page / Answer Store First Step nginx Client Answer Store (access log) Send Answers Store Answers Open Survey Page LB App Server

Slide 20

Slide 20 text

Access Log Format log_format main "request_id:$request_id" ”¥t" "remote_addr:$remote_addr" "¥t" "real_ip:$http_x_true_userip" "¥t" "msec:$msec” "¥t" "server_protocol:$server_protocol" "¥t" "method:$request_method" "¥t" "scheme:$scheme" "¥t" "host:$host" "¥t" "path:$request_uri" "¥t" "status:$status” ... "¥t" "request_body:$request_body" ; request_id:b58cbf9c66a55a4a7b79e7f18906badd remote_addr:… real_ip:… msec:1585441263.563 server_protocol:HTTP/1.1 method:POST scheme:https host:covid19.line-apps.com path:/api/survey status:200 … request_body:{¥x22userId¥x22:¥x22ue216a4b17f4946f19e8f47 889830f275¥x22,…,¥x22body¥x22:{¥x22a4¥x22:¥x221¥x22,¥x22 a5¥x22:¥x223¥x22,¥x22a6¥x22:[¥x221¥x22],¥x22a7¥x22:[¥x22 2¥x22],…}} Nginx Config Actual Log Entry Answer JSON as a Field in LTSV

Slide 21

Slide 21 text

Second Step Aggregate Answer Logs

Slide 22

Slide 22 text

Need for Log Aggregation access log Aggregate Answer Store access log App Servers Original data for the final result Monitoring Maximize the number of answers

Slide 23

Slide 23 text

Aggregate Answer Logs Second Step nginx access log Transfer access logs every second (Asynchronously) App Server fluentd MySQL Client LB

Slide 24

Slide 24 text

Pipeline in fluentd Source @type tail Filter @type record_transformer Filter @type parser Match @type mysql_bulk Read access logs and parse LTSV Unescape request_body Parse request_body Insert access logs to MySQL

Slide 25

Slide 25 text

Third Step Verify Answer Logs

Slide 26

Slide 26 text

Need for Log Verification (Unverified) Answer Store Filter unauthorized answer logs Filter duplicate answers Verified Answer Store Aggregated data The original data of the final result

Slide 27

Slide 27 text

Verify Answer Logs Third Step nginx access log App Server fluentd MySQL Verifier Batch Server LINE Login Client LB Issue ID Token Send Answers with ID Token

Slide 28

Slide 28 text

ID Token LINE Login … Issue ID Token • User Data • Expiration Date • … • Signature Answer Store Verifier Client Verify ID Token Can be verified the login locally and asynchronously

Slide 29

Slide 29 text

Verify Answer Logs Third Step nginx access log App Server fluentd MySQL Verifier Batch Server LINE Login Client LB 1. Fetch Access Logs 2. Verify ID Tokens 3. Write Results Issue ID Token Send Answers with ID Token

Slide 30

Slide 30 text

Preparing for the Survey Delivering Messages / Performance Test / Monitoring

Slide 31

Slide 31 text

Message Delivery Spikes Time Fast Delivery Messages Sent Answer Rate Time Slow Delivery Messages Sent Answer Rate Higher Peak Traffic Lower Peak Traffic

Slide 32

Slide 32 text

Pseudo Slow Delivery by Manual Control Time Messages Sent Answer Rate (Pseudo Slow Delivery) Answer Rate (Fast Delivery) Answer Rate (Slow Delivery) Control the Delivery Pace Reduced Peak Traffic Baseline

Slide 33

Slide 33 text

Performance Test › Handle 1K answers / sec by single instance › Reached to 25K requests / sec in past campaigns › Prepare 50 instances for 50K (=25K * 2) requests / sec nginx access log App Server fluentd MySQL

Slide 34

Slide 34 text

Monitoring node_exporter App Server Prometheus nginx_exporter fluent-plugin- prometheus Push Gateway node_exporter Batch Server Verifier Monitoring Server Notify App MySQL (Replica) Grafana LINE Notify Monitoring App Notify Server Alert Manager Since the 4th Survey

Slide 35

Slide 35 text

Monitoring Dashboard Grafana LINE Notify

Slide 36

Slide 36 text

Delivery Result

Slide 37

Slide 37 text

Changes in the Number of Events 0 1,000 2,000 3,000 4,000 5,000 0 10,000,000 20,000,000 30,000,000 40,000,000 50,000,000 60,000,000 70,000,000 3/31 9:00 3/31 18:00 4/1 3:00 4/1 12:00 4/1 21:00 Events / sec Events Events Answer Events Event Rate 66M Events 25M Answers 5K Events / sec

Slide 38

Slide 38 text

ID Token Verification Delay 0 1,000 2,000 3,000 4,000 5,000 0 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 3/31 09:00 3/31 12:00 3/31 15:00 3/31 18:00 3/31 21:00 4/1 00:00 Events / sec Events Delay Event Rate 3.5M Events Delay

Slide 39

Slide 39 text

Retrospective › Felt comfortable with the development Develop a system step by step Avoid system troubles by design › Some background processing was delayed › Avoided the impact of processing delays on the users Develop a simple system in the minimum specification › Ensured performance and stability

Slide 40

Slide 40 text

Conclusion › The scale of the survey was very large, but the system was very simple › Handled high traffic stably and prepared in a short period › Conducted the national survey for COVID-19 in 3 days development

Slide 41

Slide 41 text

Thank you