Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Engineering

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Data Engineering

Avatar for Surasit Liangpornrattana

Surasit Liangpornrattana

February 27, 2019
Tweet

More Decks by Surasit Liangpornrattana

Other Decks in Technology

Transcript

  1. WHERE ? I Et Et Et A I 1¥ Et

    I H¥ 㱺 o ¥T ' ' ¥1 it , - - ¥, I I ' ¥, I I Et Et Et ⾨, Et Et
  2. USE - CASE PAGES ? , " Y WHO ?

    INTEREST 'M are events ? - more . . . .
  3. CENTRALIZED LOGGING - n I f¥¥÷÷÷¥¥n¥¥i¥¥¥¥ .ES - I HE

    ET Eh €1 - I I I ⑤ ELUENTD I . INPUT # # ⑤ N - FILTER lTRALfZEfoasEf : :c:*:
  4. FLUENTD ACCESS Lode ¥e¥y ' Ip - - TIMESTAMP REQUEST

    USER - AGENT " # f) mess zooming WPI - MY .mn#EeHa I # LT * - T FILTER it - BUFFER 㱺 JSON - TRANSFORMS - J - FILTER - PERFORMANCE - L IP : . . . i - ENRICH - RELIABILITY ICHIKI TIMESTAMP : . . . , ICHUNKI REQUEST : . . . , - THREAD SAMY = ICHIKI USER - AGENT ' . . . . y LT L output # LT - E - WRITE OR SEND lodes - SYNC OR A- SYNC
  5. BUFFER - n ¥¥÷÷:÷¥¥¥¥i÷¥÷E¥ - = I - N -

    I -1 He HIT It ¥1 I 1 I 8- kafka bio * * * n - HIGH THROUGHPUT \ I / Eas - RE - PLAY • CENTRALIZE -1 o . - i¥ - FAULT TOLERANCE TO
  6. KAFKA ¥:* . . - EI BYTES OF LET SERIALIZED

    r JSON n n z seem :* :㱺i⾨EEf ' ⾨ 㱺 㱺 \TEAM_ RE - PLAY
  7. PROTOCOLS S SERIALIZATION - n I ¥¥¥¥÷i¥¥÷¥¥÷¥¥¥ - I HE

    HIT T €1 - Biao LOG STASH I l l - INPUT - KAFKA - FILTER # # # N - OUTPUT - PROTOBUF \ I / Ea¥ SITDTHARDTZEI Poo PROTOBUF # 1- 8oi.IE#EIfiIEl
  8. LOG STASH 1NPU ~ MELT BUFFER - n BYTES OF

    㱺 - PERFORMANCE SERIALIZED - RELIABILITY lPAG ← HEAD - THREAD SAMY IPA4t ← TALL JSON lPAat ← TAIL LT a ⊥ 㱺 Proto Boe FtLTER - TRANSFORM - FILTER to OUTPUT IT - ENRICH - - CODEC - WRITE OR SEND codes -
  9. PROTO BUE - SMALL → FAST - SIMPLE , KEY

    - VALUE - STRUCTURED DATA - SUPPORT MANY LANGUAGES n @04SER_VETf8gEoPR0ToButylBiBiaaa.L 09 STASH Elf III HII
  10. SCHEDULING JOBS - FOR BATCH DATA PROCESSING - n ¥¥¥÷i÷¥¥¥¥¥¥÷÷

    - = I - N - I -1 He HIT IT El I 1 I ¥B¥df: Airflow t.EE#EE:DPROT0B0F - WORKFLOW ⑤ ⑤ ⑤ n - SCHEDULER ) I / foas¥ ° Biao t PROTOBUF # - MONITORING - ¥7 LEI FI O
  11. p TASK 3 - I y ¥ TASK 't -

    Tasks - Task 3.2 - Task 4 foBE¥%)%§ - spark ← µfB← Motogp ⊥ TASK 3.3 at \ \ EEE. ¥¥*:m÷¥¥÷¥o¥ FE*o¥¥ha is ✓ I \ - IDEMPOTENT - f f - ¥21 ITASKTI Itasca - STATELESS - - - - PREFER INSERT s¥ TO UPDATE - PARTITION - BY TIME
  12. LOG SERVER . 1- - ¥÷i¥÷¥÷¥÷E - f i ¥1

    - N - 1- He - o.o EI ¥1 it I go kafka Et , i n - 㱺¥qs9D Egg elastic search Baa S # # # n Eiseman kibana \ If Eo o;:Ea¥aa* Poo Dinamo t PROTOBUF 1£ ¥iE¥÷÷÷¥¥÷ hadoop 80 - § : EsgEiE÷±¥÷⾨÷÷ ÷i¥a÷iE¥*¥* + 'E' IEEE .io#arE:*@eqT Elida
  13. LOG SERVER INPUT STREAMING SERVING OUTPUT FI FI - 87oi.es#auzeEFE-iIqEmEfE

    - Q FI & PROTOBUF BEEM DEFINED PROTO BUF I > \ / SCHEMA STORAGE gfqgg.EE#*EEqao:o:.I7oi '*¥w*㱺 a :i¥¥¥¥¥¥¥¥ :* :* :. HDFS t HIVE ( SERVING )
  14. DATA PIPELINE → esparto BATCH T - T I ¥¥¥¥i¥¥¥¥÷¥¥¥

    .in#s-I/Et 1¥ = # ACQUISITION 1- je PROCESSING SINGESTLON ¥1 i €1 ⾨ I 1 I I÷*s÷s * * * n ACQUISITION ) I / STORAGE Poo Dinamo t PROTOBUF 1£ S O - *⾨÷÷oEaao¥n TO jog cEEozgEE¥i¥¥÷¥ ÷a:÷:*÷÷** . ACCESS STREAMING ⊥ Eigg PROCESSING E%Biaa SINGES -110N
  15. DATA LAKE N GATEWAY ~ ¥sER) - # t E'

    Ex 9*7 IN EE ESSE i÷a¥** - + 'Eh IEE.EE.ge#EE' ⾨ 1*973 tEaa Sum
  16. QSA