Develop powerful Big Data Applications easily with Spring XD

© 2014 Pivotal Introducing Spring XD Mark Pollack, Sr. Software
Engineer, Pivotal

2 Where I’m coming from… Spring XD  Big Physics
-> Big Data • ‘90s @ BNL/FNAL/CERN/INFN  Finance • TIBCO, Reuters, CodeStreet  OpenSource • SpringSource -> VMware -> Pivotal • Spring Framework & .NET – 2004 • Spring Data – 2010 • Co-author O’Reilly Spring Data Book • Spring XD – 2012

3 Spring XD Spring XD XD = eXtreme Data

4 Spring XD “One stop shop for developing and deploying
Big Data Applications”

5 What is a Big Data Application? Spring XD

6 Big Data Architecture Spring XD Stream Processing Analytics Ingest
Workflow Orchestration Spring XD Export FILES SOCIAL SENSORS MOBILE XD> MASTER DATASET Predictive Modeling BATCH VIEWS REALTIME VIEWS Spring BOOT Spring BOOT Spring BOOT

7 REALTIME VIEWS BATCH VIEWS Spring XD MASTER DATASET Spring
BOOT Spring BOOT Spring BOOT FILES SOCIAL SENSORS MOBILE Stream Processing Analytics Ingest Workflow Orchestration Spring XD Export XD> Predictive Modeling Lambda Architecture SPEED LAYER BATCH LAYER SERVING LAYER

8 REALTIME VIEWS BATCH VIEWS Spring XD MASTER DATASET Spring
BOOT Spring BOOT Spring BOOT FILES SOCIAL SENSORS MOBILE Stream Processing Analytics Ingest Workflow Orchestration Spring XD Export XD> GemFire XD Predictive Modeling GemFire XD SPEED LAYER BATCH LAYER SERVING LAYER

9 Spring IO Platform

10 Spring XD 10,000 ft view Spring XD FILES SENSORS
SOCIAL MOBILE

11 Streams Spring XD HTTP Tail File Mail Twitter Gemfire
Syslog TCP UDP JMS RabbitMQ MQTT Trigger Reactor TCP/UDP Filter Transformer Object-to-JSON JSON-to-Tuple Splitter Aggregator HTTP Client Groovy Scripts Java Code JPMML Evaluator File HDFS JDBC TCP Log Mail RabbitMQ Gemfire Splunk MQTT Dynamic Router Counters

12 Streams Spring XD How can we make this easier?
http | filter | file

13 Taps Spring XD  “Listen” to data on another
stream

14 Analytics  Counters and Gauges • Simple & Field
Value Counter • How many tweets for #java • Aggregate Counter • How many tweets for #java in the week/day/hour • Gauge & Rich Gauge • How many requests per minute?  Abstract API. Implemented in • In-Memory • Redis  Predictive Models • Is this transaction fraudulent?  Based on JPMML Evaluator • Wide range of model types  Interoperable with R, Rattle, KNIME, RapidMiner Spring XD

15 Jobs Spring XD CSV to JDBC FTP to HDFS
JDBC to HDFS HDFS to JDBC HDFS to MongoDB

16 Spring XD Runtime Spring XD XD Container XD Container
Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport ZooKeeper Container State XD Admin XD Admin

Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport Spring App Context M1 ZooKeeper Container State XD Admin XD Admin

Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport Spring App Context M1 ZooKeeper Container State XD Admin XD Admin M2

19 SPRING XD Demo Streams & Taps

20 Predictive Models Spring XD

21 Concepts Spring XD  Model • Parameterized algorithm 
Model Building • Derive a parameterized algorithm from the data • Slow process. Done offline, as a batch process, due to amount of data involved  Model Scoring • Use the model to predict new information • Fast process. Can be done as part of stream processing

22 PMML Spring XD  Predictive Model Markup Language 
XML interchange format for analytical models  From the Data Mining Group http://www.dmg.org  Processing + models  Supported by statistics and data minig tools • R/Rattle, SAS Enterprise Miner, SPSS, Weka  Java Evaluator API • JPMML-Evaluator project • Provides model scoring

23 SPRING XD Demo Predictive Models

24 SPRING XD Demo Jobs

25 Distributed, Fault Tolerant Runtime Spring XD

26 Spring XD – Runtime – Fault Tolerance Spring XD
XD Container XD Container Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport Spring App Context M1 ZooKeeper Container State XD Admin XD Admin M2

27 XD Container Spring XD – Runtime – Fault Tolerance
Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport ZooKeeper Container State XD Admin XD Admin M2

Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport ZooKeeper Container State XD Admin XD Admin M2 M1

Spring XD XD Shell Data Transport ZooKeeper Container State XD Admin XD Admin (leader) M2 M1

Spring XD XD Shell Data Transport ZooKeeper Container State XD Admin XD Admin (leader) XD Container M2 M1

XD Shell Data Transport ZooKeeper Container State XD Admin XD Admin (leader) XD Container XD Container XD Container M2 M1

XD Shell Data Transport ZooKeeper Container State XD Admin XD Admin (leader) XD Admin XD Container XD Container XD Container M2 M1

Spring XD XD Shell HTTP POST /streams/aStream “M3| M4” Data Transport ZooKeeper Container State XD Admin XD Admin (leader) XD Container XD Admin M3 XD Container M4 M2 M1

34 Deployment Manifest Spring XD

35 Deployment Manifest Spring XD  The stream/job definition defines
the logical view of processing  The deployment manifest defines the physical view of processing  Important properties relate to module count and data partitioning xd:>stream create test1 --definition "http | transform --expression=payload.toUpperCase() | log” xd:>stream deploy --name test1 --properties "module.transform.count=3"

36 Deployment Manifest – Data Partitioning Spring XD stream create
words --definition "http | splitter --expression=payload.split(' ') | log" stream deploy words --properties module.splitter.producer.partitionKeyExpression=payload,module.log.count=2 http post --data "How much wood would a woodchuck chuck if a woodchuck could chuck wood"

37 Deployment Manifest – Data Partitioning Spring XD In one
container log you will see 16:33:27,486 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - How 16:33:27,507 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - chuck 16:33:27,508 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - chuck and in the other 16:33:27,503 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - much 16:33:27,512 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - wood 16:33:27,513 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - would 16:33:27,514 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - a 16:33:27,520 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - woodchuck 16:33:27,522 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - if 16:33:27,523 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - a 16:33:27,524 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - woodchuck 16:33:27,526 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - could 16:33:27,528 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - wood

38 Learn More… Spring XD  Project: http://projects.spring.io/spring-xd/  GitHub:
https://github.com/spring-projects/spring-xd/  Issues: https://jira.springsource.org/browse/XD  Wiki: https://github.com/spring-projects/spring-xd/wiki  Samples: https://github.com/spring-projects/spring-xd-samples  EC2 Support: https://github.com/spring-projects/spring-xd-ec2

Develop powerful Big Data Applications easily w...

Develop powerful Big Data Applications easily with Spring XD

More Decks by Mark Pollack

Other Decks in Programming

Featured

Transcript