Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalable Big Data Architecture

Scalable Big Data Architecture

Calvin Canh Tran

July 31, 2020
Tweet

Other Decks in Programming

Transcript

  1. About me - 1st Data Engineer @ Grab Finance Group.

    - 2+ years experiences as Data Scientist, 3+ years as Data Engineer. - MTech in Knowledge Engineering (National University of Singapore). - Blogger at http://www.dataguystory.com
  2. Agenda • Hadoop Big Data Architecture. • Cloud Computing Solution

    • “Scalable” Architecture with AWS. • Use Case #1: Machine Learning Toolbox. • Use Case #2: Process Transaction Data.
  3. Hadoop Big Data Architecture Physical Server Rack (Dell PowerEdge R730

    5 nodes rack / 4TB ~ 80K USD). Engineering efforts for setup and maintain physical rack as well as platform. License services for Cloudera, MapR, Hortonworks (10000 USD per node annum). Scaling depend on Hardware. Data security (can get certified for data governance). SELF-HOSTED @calvincanhtran
  4. Cloud Computing Solution - advantage Scalable (pay as you go)

    Reduce the cost of infrastructure and engineering efforts to maintain the system. Suitable with tech company where the products are deployed on cloud. Work from home (mùa Cô Vy). Require infrastructure efforts to get certified for data governance. @calvincanhtran
  5. Cloud Computing Solution - troubles ❏ Depend on Cloud Provider

    Services. ❏ Could be expensive for large scale (E.g Spark EMR, High Available EC2 instances, EFS filesystem…) ❏ Hard to customise the components. ❏ Engineer future careers. ❏ Engineering efforts to run Hybrid Clouds. @calvincanhtran
  6. “Scalable” solution • Containerize with docker, deploy to Kubernetes and

    utilize cloud services. • Scale on different layers. @calvincanhtran
  7. Use case #1: Machine Learning Toolbox • Data is too

    big to aggregate on local machine. • Security concern. @calvincanhtran
  8. Use case #2: Process transaction data Multiple BI, Analytics teams:

    Product Analytics, Marketing Analytics, Finance Analytics… Different use cases and different data marts / data warehouse. Weekly, monthly reports. @calvincanhtran