Slide 1

Slide 1 text

BigQuery for DDDD 2017-11-16
 In-House GCPUG presented by @yuzutas0
 https://www.pexels.com/photo/close-up-of-computer-keyboard-257949/ɹhttps://www.pexels.com/photo/technology-computer-lines-board-50711/ɹhttps://www.pexels.com/photo/black-and-white-business-chart-computer-241544/

Slide 2

Slide 2 text

1. Profile 2. Use Case 3. Benefits for Business 4. Tips to manage Data ɹOutline

Slide 3

Slide 3 text

1. Profile 2. Use Case 3. Benefits for Business 4. Tips to manage Data ɹOutline

Slide 4

Slide 4 text

ɹ@yuzutas0 ɹɹ 
 
 Certificated Scrum Product Owner
 ɹɹprev; founder at VC-backed company “D4C”: Data Management Team
 ɹɹDating Data Driven Development Center
 Using BigQuery as Team Director
 (not only Software Engineer)

Slide 5

Slide 5 text

ɹWeekend Challenge I created a web app “in one day” with GCP (GAE + Cloud SQL) last Saturday

Slide 6

Slide 6 text

1. Profile 2. Use Case 3. Benefits for Business 4. Tips to manage Data ɹOutline

Slide 7

Slide 7 text

Dating
 Data
 Driven
 Development

Slide 8

Slide 8 text

Dating
 Data
 Driven
 Development

Slide 9

Slide 9 text

ɹOnline Dating / Love Tech IUUQTXXXTUBOEBSEDPVLMJGFTUZMFMPOEPOMJGFFWFSZUIJOHZPVOFFEUPLOPXBCPVUVTJOHEBUJOHBQQTJOBIUNM

Slide 10

Slide 10 text

Enmusubi Koimusubi ϓϩμΫτ঺հʢࣾ಺ݶʣ

Slide 11

Slide 11 text

Monthly Subscriptions
 except for female Koimusubi user ɹWay to use Message Like Match ϓϩμΫτͷ࢖͍ํʢࣾ಺ݶʣ

Slide 12

Slide 12 text

Dating
 Data
 Driven
 Development

Slide 13

Slide 13 text

Dating
 Data
 Driven
 Development

Slide 14

Slide 14 text

ɹAction in Many Teams Design Security Legal Infrastructure System Management (like SRE) App Arch Marketing Customer Support Data Science Machine Learning Feature Product
 Dev & Ops
 Team A Product
 Dev & Ops
 Team B Product
 Dev & Ops
 Team C PO PO PO Direction Public Relations

Slide 15

Slide 15 text

Revenue, churn rate, DAU…
 
 Daily notificationɹɹɹɹSpreadsheetɹɹɹɹDashboard ɹɹɹBusiness metrics 1

Slide 16

Slide 16 text

From sign up to graduate (= partner found) FunnelɹɹɹɹɹɹɹɹɹJourney ɹɹɹCustomer action 2

Slide 17

Slide 17 text

ɹɹ ɹɹɹMeasure 3 Analysis about
 Split (AB) Testing Utilization of
 New feature

Slide 18

Slide 18 text

ɹɹɹRecommendation 4 Ϩίϝϯυը໘ʢࣾ಺ݶʣ

Slide 19

Slide 19 text

ɹɹɹSpam Filter 5

Slide 20

Slide 20 text

ɹɹɹCustomer Support 6

Slide 21

Slide 21 text

ɹɹɹAdvertisement 7

Slide 22

Slide 22 text

ɹɹɹPress Release 8 IUUQXXXSFDSVJUNQDPKQOFXTSFMFBTF@IUNM ϓϨεϦϦʔε࣮ྫʢࣾ಺ݶʣ

Slide 23

Slide 23 text

Crash Rate, Response Time… ɹɹɹSystem Monitoring 9 IUUQZV[VUBTIBUFOBCMPHDPNFOUSZ

Slide 24

Slide 24 text

ɹɹɹSystem failure investigation 10

Slide 25

Slide 25 text

Velocity for each sprint, Lead time to complete tickets … ɹɹɹTeam capacity 11

Slide 26

Slide 26 text

ɹTeams use “same data store” Data Warehouse

Slide 27

Slide 27 text

like Ribbon ɹMatching: data source and teams Data Warehouse

Slide 28

Slide 28 text

ɹArchitecture ࣾ಺ݶ(স)

Slide 29

Slide 29 text

ɹArchitecture ࣾ಺ݶ(স)

Slide 30

Slide 30 text

ɹBased on BigQuery ɹɹɹɹɹɹBigQuery - Google Cloud Platform Source Layer Warehouse Layer App Layer

Slide 31

Slide 31 text

1. Profile 2. Use Case 3. Benefits for Business 4. Tips to manage Data ɹOutline

Slide 32

Slide 32 text

Make everything easier 
 Enable to… ● skip build & maintain Infrastructure
 ● input & output every type of data
 ● export data to another DB whenever we like ɹWhy BigQuery?

Slide 33

Slide 33 text

ɹPrice IUUQZBQDBTJBPSHUBMLTIPXDFBGBDFDBCEDB ● Advantage of scale ● Charge revision

Slide 34

Slide 34 text

ɹPerformance IUUQZBQDBTJBPSHUBMLTIPXDFBGBDFDBCEDB ● Columnar Storage ● Parallel disk IO

Slide 35

Slide 35 text

ɹPsychological safety Account management by GCPɾGsuite
 every member can use SQL Interface - just call query without regard for Infrastructure Standard SQL
 easy to transport
 easy to study
 easy to test Ecosystem
 supported by famous BI tools
 pandas.io.gbq → debuggable by Jupyter

Slide 36

Slide 36 text

1. Focus on using data
 (not management data) 2. Attention to optimize Cost
 for Dev&Ops ɹBenefit for Business

Slide 37

Slide 37 text

1. Focus on using data
 (not management data) 2. Attention to optimize Cost
 for Dev&Ops ɹBenefit for Business

Slide 38

Slide 38 text

like Ribbon ɹMatching: data source and teams Data Warehouse

Slide 39

Slide 39 text

ɹTechnologies & Seeds Oriented ? Data Warehouse

Slide 40

Slide 40 text

ɹɹ- Dashboard nobody watches
 ɹɹ- Chatbot nobody talks with ɹAnti-Pattern

Slide 41

Slide 41 text

ɹUsers & Needs Oriented ! Data Warehouse

Slide 42

Slide 42 text

ɹFocus on using data (not management data)

Slide 43

Slide 43 text

Make data-management easier ɹBigQuery is one of best solutions

Slide 44

Slide 44 text

1. Focus on using data
 (not management data) 2. Attention to optimize Cost
 for Dev&Ops ɹBenefit for Business

Slide 45

Slide 45 text

ɹNo revenue by itself ɹɹɹɹCost CenterɹɹProfit Center Data Warehouse Value Market

Slide 46

Slide 46 text

Cost Limit
 in the future IUUQTXXXQFYFMTDPNQIPUPCJMMTDBQJUBMDBTIDFOU

Slide 47

Slide 47 text

ɹRecession Risk Budget depends on company’s overall sales
 Sales in many business areas depends on macro economy
 
 e.g. Human Resources, Housing, Bridal, Automobiles, Education, Life Style
 (This logic doesn’t apply to companies about utilities and necessities) =IUUQTXXXQFYFMTDPNQIPUPCMBDLBOEXIJUFCVTJOFTTDIBSUDPNQVUFS

Slide 48

Slide 48 text

ɹTrough of Disillusionment IUUQOFXTNZOBWJKQOFXT

Slide 49

Slide 49 text

ɹData democratization Data management
 gets to be “staff function”, 
 
 interested in cost reduction 
 (not growth)

Slide 50

Slide 50 text

IUUQTXXXQFYFMTDPNQIPUPCJMMTDBQJUBMDBTIDFOU Needs for
 Cost Optimization

Slide 51

Slide 51 text

Make data-management easier ɹBigQuery is one of best solutions

Slide 52

Slide 52 text

1. Focus on using data
 (not management data) 2. Attention to optimize Cost
 for Dev&Ops ɹBenefit for Business

Slide 53

Slide 53 text

1. Profile 2. Use Case 3. Benefits for Business 4. Tips to manage Data ɹOutline

Slide 54

Slide 54 text

pipeline prevents data from
 flowing backward and making itself uncorrectable ɹ3 layers ɹɹɹɹɹɹBigQuery - Google Cloud Platform Source
 copy from original data Warehouse
 Key Indicators, 
 Intermediate table App
 interface for tools

Slide 55

Slide 55 text

e.g. enmusubi__source__db two under scores `__` split above elements like BEM ɹnaming rule about dataset Product Name source warehouse app Production DB Apache Log Adobe Analytics

Slide 56

Slide 56 text

Service Level ɹɹɹɹɹ >>>ɹ ɹɹɹ> ɹPrivacy protection on-premise BigQuery personal information accounting,
 billing info others

Slide 57

Slide 57 text

ɹPrivacy protection on-premise BigQuery EC2 masking
 view masking user production
 database

Slide 58

Slide 58 text

Design to make retry easier ɹɹɹɹɹ ● Job divided
 ɹɹɹɹɹ ● Partition divide
 ɹɹɹɹɹ ● Records divided ɹFrequent connection error

Slide 59

Slide 59 text

ɹType of data; way to transfer 4UBUFEBUB &WFOUEBUB FYBNQMF MBTUTJHOJOEBUFVQEBUFE TJHOJOIJTUPSZDSFBUFE VTBHF EJTQMBZWJFXGPSDVTUPNFS
 
 zUIJTVTFSTJHOFEJOZFTUFSEBZ BOBMZTJTGPSQSPWJEFS
 
 IPXGSFRVFOUVTFSTTJHOJO DIBSBDUFSJTUJDT QFSGPSNBODFUVOJOH
 EFOPSNBMJ[BUJPOUPSFEVDFKPJORVFSZ TBNF42- TBNFSFTVMUT
 
 DBOOPUTBNFSFTVMUT
 JGTZTUFNVQEBUFTXIFOVTFSTTJHOFEJO USBOTGFSEBUB SFQMBDFBMMEBUB BEEPOMZlEJ⒎z

Slide 60

Slide 60 text

product growthɹ → ɹnew feature needs ER changeɹ → ɹBQ follows 
 ɹER Migration

Slide 61

Slide 61 text

ɹJupyter notebook to debug 1. write script at local Jupyter 2. test; migrate only 1 day data 3. export .py file → deploy IUUQTXXXTIBSFJDPOOFUEPDVNFOUpMFQZ

Slide 62

Slide 62 text

→ *needs to convert Python 2.7 ● Dataflow; auto-scaled server
 
 ● Datalab; paste jupyter codes ɹuse server on GCP ɹ ● leased lines
 
 ɹ ● same region — U.S. ɹDeploy to GCP bottle-neck 1. network bottle neck 2. memory

Slide 63

Slide 63 text

We are looking for new tips
 to manage data by BQ, give me advices

Slide 64

Slide 64 text

1. Profile 2. Use Case 3. Benefits for Business 4. Tips to manage Data ɹOutline

Slide 65

Slide 65 text

Product
 x
 Technology
 ↓
 Pleasure IUUQTXXXQFYFMTDPNQIPUPBEVMUTCFBDICFBDIXFEEJOHDPVQMF

Slide 66

Slide 66 text

IUUQTXXXQFYFMTDPNQIPUPCMBDLBOEXIJUFDPOOFDUFEIBOETMPWF We're hiring part-time job

Slide 67

Slide 67 text

Thank you for
 your kind attention presented by @yuzutas0
 https://www.pexels.com/photo/close-up-of-computer-keyboard-257949/ɹhttps://www.pexels.com/photo/technology-computer-lines-board-50711/ɹhttps://www.pexels.com/photo/black-and-white-business-chart-computer-241544/