Slide 1

Slide 1 text

Reactive Realtime Big Data with Open Source Lambda Architecture Make Big Data as simple as possible, but not simpler

Slide 2

Slide 2 text

About speakers Nguyễn Tấn Triều from FPT Online Personal blog: http://nguyentantrieu.info/blog Big Data blog: http://www.mc2ads.com Lê Kiến Trúc from InfoNam

Slide 3

Slide 3 text

Last year, 2013 λ (lambda) architecture

Slide 4

Slide 4 text

This year, 2014 3) Flappy Bird 1) Human 2) Big Data

Slide 5

Slide 5 text

Contents 1. Big Data, we will see it in 1 picture 2. Demands → Realtime 3. Solutions → Reactive 4. Dreams in Data-Driven World in 21st century Yes, the Matrix movies

Slide 6

Slide 6 text

1) BIG DATA is ...?

Slide 7

Slide 7 text

this point is useful data! This is Big DATA

Slide 8

Slide 8 text

2) Demands from Business

Slide 9

Slide 9 text

Big Data can solve these problems? 1. Predicting the future disasters? 2. Understanding our customers better? 3. Optimizing marketing campaigns in realtime? Let’s see 3 pictures

Slide 10

Slide 10 text

Weather forecast “many provinces in the Mekong Delta will be flooded by the year 2030” → Disaster Response System Source: http://en.wikipedia.org/wiki/Mekong_Delta http://www.wired.co.uk/news/archive/2013-10/28/predicting-disasters

Slide 11

Slide 11 text

Connecting "in-house database" with social data ? e.g: MySQL + Facebook Social Graph

Slide 12

Slide 12 text

data-driven marketing system

Slide 13

Slide 13 text

Big Data can solve these problems? NO Big Data is just a buzzword. You need (3R): 1. Solve right problems 2. Build the right team 3. Use right tools

Slide 14

Slide 14 text

Next problem space

Slide 15

Slide 15 text

3) Solutions: architecture and tools

Slide 16

Slide 16 text

Big data Ecosystem ● Frameworks: Hadoop Ecosystem, Apache Spark, Apache Storm, Facebook Presto, Storm, ... ● Patterns: MapReduce, Actor Model, Data Pipeline, ... ● Platforms: Amazon Redshift, Cloudera, Pivotal, HortonWorks , IBM, Google Compute Engine, ... ● Best Practices: ○ How Heineken Interacts With Customers Using Big Data ○ How Nestlé Understands Brand Sentiment Of 2.000 Brands In Real-time Source: http://azadparinda.wordpress.com/2013/10/11/projects-other-than-hadoop/ http://www.bigdata-startups.com/best-practices

Slide 17

Slide 17 text

Is Hadoop the best solution? Top 4 limitations of Mapreduce 1. Computation depends on previously computed values 2. Full-text indexing or ad hoc searching 3. Algorithms depend on shared global state 4. Online learning, aka: stream mining (Reactive Functor will fix this issue) Source: http://csci8980-2.blogspot.com/2012/10/limitations-of-mapreduce-where-not-to.html It’s not {Realtime, Responsive} → Let’s find out new creative idea

Slide 18

Slide 18 text

Existing idea

Slide 19

Slide 19 text

No content

Slide 20

Slide 20 text

+ Lambda

Slide 21

Slide 21 text

Lambda Architecture System data query = function(all data) useful data Reactive Lambda Architecture System data + context + metadata useful (data + relationship)

Slide 22

Slide 22 text

● Reactive Functor: functional actor that receives and responses data reactively to event source and context (just like neuron cell in your brain) ○ Original ideas, are got from my advisor in 2007 Source: http://activefunctor.blogspot.com ● Lambda Architecture: the hydrid model, named by Nathan Marz, a software engineer at twitter.com for designing Big Data system with 3 core layers ○ Speed layer: query stream data (realtime processing) ○ Serving layer: query analyzer ○ Batch layer: query all data (batch processing) Source: http://www.manning.com/marz Core concepts of Reactive Lambda Architecture

Slide 23

Slide 23 text

The architecture of Disaster Response System

Slide 24

Slide 24 text

Why reactive ? It’s the philosophy and pattern for designing a large application at Internet-scaled. Focus on: 1. event-driven 2. scalable 3. resilient 4. responsive

Slide 25

Slide 25 text

No content

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

Slashing wind (chém gió) ? Enough. Show me your demo and code

Slide 28

Slide 28 text

User story and Demo Problem: Social Data Processing User story: User go to Chrome App Store, download the extension called #save2mycloud User selects text, click save and push data to system User will get responses from system ● Realtime trending (hot news) ● Personalized trending (hot news for you) ● Geolocation trending (hot news with context filter) → the solution must be realtime and responsive Let’s test at http://bit.ly/save2mycloud

Slide 29

Slide 29 text

RxSQL Query Parser (RxGroovy + SQL) Data Collector (Netty) Data Crawler (Crawling Actor) Realtime Database (Redis) Batch Database (HDFS + HBase) Reactive Functor Graph Engine (Actor + OrientDB) Messaging (Kafka) Intelligent Algorithms: Spark + Hive Text Indexing: Elasticsearch + Kibana Client side: HTML5 D3 JavaScript Service-side: Netty Groovy Reactive Lambda Architecture for Social Data Processing Stream Topology (Storm API + Akka Actor)

Slide 30

Slide 30 text

Open Source Technologies and Keywords

Slide 31

Slide 31 text

4) My Dreams in Data-Driven World

Slide 32

Slide 32 text

http://bigthink.com/praxis/dont-want-to-die-just-upload-your-brain

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

“You may say I'm a dreamer But I'm not the only one I hope someday you'll join us And the world will live as one” John Lennon Join with us at http://mc2ads. com