Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Develop powerful Big Data Applications easily with Spring XD

Develop powerful Big Data Applications easily with Spring XD

Spring XD aims to provide a one stop shop for writing and deploying Big Data Applications. It provides a scalable, fault tolerant, distributed runtime for Data Ingestion, Analytics, and Workflow Orchestration using a single programming, configuration and extensibility model. By not requiring developers to rationalize all of this themselves across the many different solutions available today, Spring XD greatly reduces the inherent complexity of Big Data development. It's all built on proven projects like Spring Integration, and Spring Batch. You'll see for yourself how this heritage combines to provide a scalable runtime environment, that is easily configured and assembled via a simple DSL.

Mark Pollack

June 04, 2014
Tweet

More Decks by Mark Pollack

Other Decks in Programming

Transcript

  1. 2 Where I’m coming from… Spring XD  Big Physics

    -> Big Data • ‘90s @ BNL/FNAL/CERN/INFN  Finance • TIBCO, Reuters, CodeStreet  OpenSource • SpringSource -> VMware -> Pivotal • Spring Framework & .NET – 2004 • Spring Data – 2010 • Co-author O’Reilly Spring Data Book • Spring XD – 2012
  2. 6 Big Data Architecture Spring XD Stream Processing Analytics Ingest

    Workflow Orchestration Spring XD Export FILES SOCIAL SENSORS MOBILE XD> MASTER DATASET Predictive Modeling BATCH VIEWS REALTIME VIEWS Spring BOOT Spring BOOT Spring BOOT
  3. 7 REALTIME VIEWS BATCH VIEWS Spring XD MASTER DATASET Spring

    BOOT Spring BOOT Spring BOOT FILES SOCIAL SENSORS MOBILE Stream Processing Analytics Ingest Workflow Orchestration Spring XD Export XD> Predictive Modeling Lambda Architecture SPEED LAYER BATCH LAYER SERVING LAYER
  4. 8 REALTIME VIEWS BATCH VIEWS Spring XD MASTER DATASET Spring

    BOOT Spring BOOT Spring BOOT FILES SOCIAL SENSORS MOBILE Stream Processing Analytics Ingest Workflow Orchestration Spring XD Export XD> GemFire XD Predictive Modeling GemFire XD SPEED LAYER BATCH LAYER SERVING LAYER
  5. 11 Streams Spring XD HTTP Tail File Mail Twitter Gemfire

    Syslog TCP UDP JMS RabbitMQ MQTT Trigger Reactor TCP/UDP Filter Transformer Object-to-JSON JSON-to-Tuple Splitter Aggregator HTTP Client Groovy Scripts Java Code JPMML Evaluator File HDFS JDBC TCP Log Mail RabbitMQ Gemfire Splunk MQTT Dynamic Router Counters
  6. 14 Analytics  Counters and Gauges • Simple & Field

    Value Counter • How many tweets for #java • Aggregate Counter • How many tweets for #java in the week/day/hour • Gauge & Rich Gauge • How many requests per minute?  Abstract API. Implemented in • In-Memory • Redis  Predictive Models • Is this transaction fraudulent?  Based on JPMML Evaluator • Wide range of model types  Interoperable with R, Rattle, KNIME, RapidMiner Spring XD
  7. 15 Jobs Spring XD CSV to JDBC FTP to HDFS

    JDBC to HDFS HDFS to JDBC HDFS to MongoDB
  8. 16 Spring XD Runtime Spring XD XD Container XD Container

    Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport ZooKeeper Container State XD Admin XD Admin
  9. 17 Spring XD Runtime Spring XD XD Container XD Container

    Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport Spring App Context M1 ZooKeeper Container State XD Admin XD Admin
  10. 18 Spring XD Runtime Spring XD XD Container XD Container

    Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport Spring App Context M1 ZooKeeper Container State XD Admin XD Admin M2
  11. 21 Concepts Spring XD  Model • Parameterized algorithm 

    Model Building • Derive a parameterized algorithm from the data • Slow process. Done offline, as a batch process, due to amount of data involved  Model Scoring • Use the model to predict new information • Fast process. Can be done as part of stream processing
  12. 22 PMML Spring XD  Predictive Model Markup Language 

    XML interchange format for analytical models  From the Data Mining Group http://www.dmg.org  Processing + models  Supported by statistics and data minig tools • R/Rattle, SAS Enterprise Miner, SPSS, Weka  Java Evaluator API • JPMML-Evaluator project • Provides model scoring
  13. 26 Spring XD – Runtime – Fault Tolerance Spring XD

    XD Container XD Container Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport Spring App Context M1 ZooKeeper Container State XD Admin XD Admin M2
  14. 27 XD Container Spring XD – Runtime – Fault Tolerance

    Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport ZooKeeper Container State XD Admin XD Admin M2
  15. 28 XD Container Spring XD – Runtime – Fault Tolerance

    Spring XD XD Admin (leader) XD Shell HTTP POST /streams/aStream “M1 | M2” Data Transport ZooKeeper Container State XD Admin XD Admin M2 M1
  16. 29 XD Container Spring XD – Runtime – Fault Tolerance

    Spring XD XD Shell Data Transport ZooKeeper Container State XD Admin XD Admin (leader) M2 M1
  17. 30 XD Container Spring XD – Runtime – Fault Tolerance

    Spring XD XD Shell Data Transport ZooKeeper Container State XD Admin XD Admin (leader) XD Container M2 M1
  18. 31 Spring XD – Runtime – Fault Tolerance Spring XD

    XD Shell Data Transport ZooKeeper Container State XD Admin XD Admin (leader) XD Container XD Container XD Container M2 M1
  19. 32 Spring XD – Runtime – Fault Tolerance Spring XD

    XD Shell Data Transport ZooKeeper Container State XD Admin XD Admin (leader) XD Admin XD Container XD Container XD Container M2 M1
  20. 33 XD Container Spring XD – Runtime – Fault Tolerance

    Spring XD XD Shell HTTP POST /streams/aStream “M3| M4” Data Transport ZooKeeper Container State XD Admin XD Admin (leader) XD Container XD Admin M3 XD Container M4 M2 M1
  21. 35 Deployment Manifest Spring XD  The stream/job definition defines

    the logical view of processing  The deployment manifest defines the physical view of processing  Important properties relate to module count and data partitioning xd:>stream create test1 --definition "http | transform --expression=payload.toUpperCase() | log” xd:>stream deploy --name test1 --properties "module.transform.count=3"
  22. 36 Deployment Manifest – Data Partitioning Spring XD stream create

    words --definition "http | splitter --expression=payload.split(' ') | log" stream deploy words --properties module.splitter.producer.partitionKeyExpression=payload,module.log.count=2 http post --data "How much wood would a woodchuck chuck if a woodchuck could chuck wood"
  23. 37 Deployment Manifest – Data Partitioning Spring XD In one

    container log you will see 16:33:27,486 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - How 16:33:27,507 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - chuck 16:33:27,508 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - chuck and in the other 16:33:27,503 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - much 16:33:27,512 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - wood 16:33:27,513 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - would 16:33:27,514 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - a 16:33:27,520 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - woodchuck 16:33:27,522 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - if 16:33:27,523 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - a 16:33:27,524 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - woodchuck 16:33:27,526 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - could 16:33:27,528 INFO SimpleAsyncTaskExecutor-1 sink.words:155 - wood
  24. 38 Learn More… Spring XD  Project: http://projects.spring.io/spring-xd/  GitHub:

    https://github.com/spring-projects/spring-xd/  Issues: https://jira.springsource.org/browse/XD  Wiki: https://github.com/spring-projects/spring-xd/wiki  Samples: https://github.com/spring-projects/spring-xd-samples  EC2 Support: https://github.com/spring-projects/spring-xd-ec2