Upgrade to Pro — share decks privately, control downloads, hide ads and more …

BigQuery for DDDD #gcpug / 20171116

yuzutas0
November 16, 2017

BigQuery for DDDD #gcpug / 20171116

社内GCPUGの発表資料です。
refs. http://yuzutas0.hatenablog.com/entry/2018/06/21/083000

yuzutas0

November 16, 2017
Tweet

More Decks by yuzutas0

Other Decks in Technology

Transcript

  1.  1. Profile 2. Use Case 3. Benefits for Business

    4. Tips to manage Data ɹOutline
  2.  1. Profile 2. Use Case 3. Benefits for Business

    4. Tips to manage Data ɹOutline
  3. ɹ@yuzutas0 ɹɹ 
 
  Certificated Scrum Product Owner
 ɹɹprev;

    founder at VC-backed company “D4C”: Data Management Team
 ɹɹDating Data Driven Development Center
 Using BigQuery as Team Director
 (not only Software Engineer)
  4. ɹWeekend Challenge I created a web app “in one day”

    with GCP (GAE + Cloud SQL) last Saturday 
  5.  1. Profile 2. Use Case 3. Benefits for Business

    4. Tips to manage Data ɹOutline
  6. Monthly Subscriptions
 except for female Koimusubi user ɹWay to use

    Message Like Match ϓϩμΫτͷ࢖͍ํʢࣾ಺ݶʣ 
  7.  ɹAction in Many Teams Design Security Legal Infrastructure System

    Management (like SRE) App Arch Marketing Customer Support Data Science Machine Learning Feature Product
 Dev & Ops
 Team A Product
 Dev & Ops
 Team B Product
 Dev & Ops
 Team C PO PO PO Direction Public Relations
  8.  1. Profile 2. Use Case 3. Benefits for Business

    4. Tips to manage Data ɹOutline
  9. Make everything easier 
 Enable to… • skip build &

    maintain Infrastructure
 • input & output every type of data
 • export data to another DB whenever we like  ɹWhy BigQuery?
  10.  ɹPsychological safety Account management by GCPɾGsuite
 every member can

    use SQL Interface - just call query without regard for Infrastructure Standard SQL
 easy to transport
 easy to study
 easy to test Ecosystem
 supported by famous BI tools
 pandas.io.gbq → debuggable by Jupyter
  11. 1. Focus on using data
 (not management data) 2. Attention

    to optimize Cost
 for Dev&Ops  ɹBenefit for Business
  12. 1. Focus on using data
 (not management data) 2. Attention

    to optimize Cost
 for Dev&Ops  ɹBenefit for Business
  13. 1. Focus on using data
 (not management data) 2. Attention

    to optimize Cost
 for Dev&Ops  ɹBenefit for Business
  14.  ɹRecession Risk Budget depends on company’s overall sales
 Sales

    in many business areas depends on macro economy
 
 e.g. Human Resources, Housing, Bridal, Automobiles, Education, Life Style
 (This logic doesn’t apply to companies about utilities and necessities) =IUUQTXXXQFYFMTDPNQIPUPCMBDLBOEXIJUFCVTJOFTTDIBSUDPNQVUFS
  15.  ɹData democratization Data management
 gets to be “staff function”,

    
 
 interested in cost reduction 
 (not growth)
  16. 1. Focus on using data
 (not management data) 2. Attention

    to optimize Cost
 for Dev&Ops  ɹBenefit for Business
  17.  1. Profile 2. Use Case 3. Benefits for Business

    4. Tips to manage Data ɹOutline
  18. pipeline prevents data from
 flowing backward and making itself uncorrectable

     ɹ3 layers ɹɹɹɹɹɹBigQuery - Google Cloud Platform Source
 copy from original data Warehouse
 Key Indicators, 
 Intermediate table App
 interface for tools
  19. e.g. enmusubi__source__db two under scores `__` split above elements like

    BEM  ɹnaming rule about dataset Product Name source warehouse app Production DB Apache Log Adobe Analytics
  20. Service Level ɹɹɹɹɹ >>>ɹ ɹɹɹ>  ɹPrivacy protection on-premise BigQuery

    personal information accounting,
 billing info others
  21. Design to make retry easier ɹɹɹɹɹ • Job divided
 ɹɹɹɹɹ

    • Partition divide
 ɹɹɹɹɹ • Records divided  ɹFrequent connection error
  22.  ɹType of data; way to transfer 4UBUFEBUB &WFOUEBUB FYBNQMF

    MBTUTJHOJOEBUFVQEBUFE TJHOJOIJTUPSZDSFBUFE VTBHF EJTQMBZWJFXGPSDVTUPNFS
 
 zUIJTVTFSTJHOFEJOZFTUFSEBZ BOBMZTJTGPSQSPWJEFS
 
 IPXGSFRVFOUVTFSTTJHOJO DIBSBDUFSJTUJDT QFSGPSNBODFUVOJOH
 EFOPSNBMJ[BUJPOUPSFEVDFKPJORVFSZ TBNF42- TBNFSFTVMUT
 
 DBOOPUTBNFSFTVMUT
 JGTZTUFNVQEBUFTXIFOVTFSTTJHOFEJO USBOTGFSEBUB SFQMBDFBMMEBUB BEEPOMZlEJ⒎z
  23.  ɹJupyter notebook to debug 1. write script at local

    Jupyter 2. test; migrate only 1 day data 3. export .py file → deploy IUUQTXXXTIBSFJDPOOFUEPDVNFOUpMFQZ
  24. → *needs to convert Python 2.7 • Dataflow; auto-scaled server


    
 • Datalab; paste jupyter codes ɹuse server on GCP ɹ • leased lines
 
 ɹ • same region — U.S.  ɹDeploy to GCP bottle-neck 1. network bottle neck 2. memory
  25.  1. Profile 2. Use Case 3. Benefits for Business

    4. Tips to manage Data ɹOutline