Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Apache Zeppelin Meetup (2016): http://bit.ly/2yO5ynW
Unified
Data Analytics Platform
(with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Christian Tzolov

January 21, 2016
Tweet

More Decks by Christian Tzolov

Other Decks in Technology

Transcript

  1. Unified
    Data Analytics Platform
    (with Zeppelin, Ambari, Geode, SpringXD and
    HAWQ)
    by Christian Tzolov
    @christzolov

    View Slide

  2. Whoami
    Christian Tzolov
    Technical Architect at Pivotal,
    BigData, Hadoop, SpringXD,
    Apache Committer, Crunch PMC
    member
    [email protected]
    blog.tzolov.net
    @christzolov

    View Slide

  3. Contents
    • DEMO
    • Zeppelin Interpreters
    • PSQL (to became JDBC in 0.6.x)
    • Geode
    • SpringXD
    • Apache Ambari
    • Zeppelin Service
    • Geode, HAWQ and Spring XD services
    • Webpage Embedder View

    View Slide

  4. Demo: Twitter Streams with
    SpringXD, Geode and HAWQ

    View Slide

  5. Technical Stack
    Apache HDFS Data Lake - PHD or HDP Hadoop
    Apache HAWQ SQL on Hadoop (OLAP)
    Apache Geode In-memory data grid (OLTP)
    Spring XD Integration and Streaming Runtime
    Apache Ambari Manages All Clusters
    Apache Zeppelin Web UI for interaction with Data Systems
    Hadoop/HDFS
    Geode HAWQ
    SpringXD
    Ambari
    Zeppelin

    View Slide

  6. Spring XD
    Orchestrates and automates all steps across multiple
    data stream pipelines
    • HTTP
    • Tail
    • File
    • Mail
    • Twitter
    • Gemfire
    • Syslog
    • TCP
    • UDP
    • JMS
    • RabbitMQ
    • MQTT
    • Kafka
    • Reactor TCP/UDP
    • Filter
    • Transformer
    • Object-to-JSON
    • JSON-to-Tuple
    • Splitter
    • Aggregator
    • HTTP Client
    • Groovy Scripts
    • Java Code
    • JPMML Evaluator
    • Spark Streaming
    • File
    • HDFS
    • JDBC
    • TCP
    • Log
    • Mail
    • RabbitMQ
    • Gemfire
    • Splunk
    • MQTT
    • Kafka
    • Dynamic Router
    • Counters

    View Slide

  7. Apache Geode
    • Cache - Performance / Consistency / Resiliency
    • Region - Highly available, redundant, distributed
    Map
    China Railway
    Corporation
    5,700 train stations
    4.5 million tickets per day
    20 million daily users
    1.4 billion page views per day
    40,000 visits per second
    Indian Railways
    7,000 stations
    72,000 miles of track
    23 million passengers daily
    120,000 concurrent users
    10,000 transactions per minute

    View Slide

  8. Apache HAWQ
    • Built around a Greenplum MPP DB
    • 100% ANSI SQL compliant: SQL-92/99/2003…
    • ODBC and JDBC
    • Hadoop Native: Parquet, HDFS and YARN
    • Extensible - Web Tables, PXF
    • TPC-DS outperforms Impala by overall 454%

    View Slide

  9. Demo
    tweets = twittersearch --query= | hdfs --directory=/user/zeppelin/xd/tweets
    geodeTap = tap:stream:tweets > gemfire-json-server --regionName=regionTweet
    hawqTap = tap:stream:tweets > transform --script=tweetJsonToTsv.groovy | gpfdist --table=xdsink
    tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter

    View Slide

  10. SpringXD Interpreter(s)
    • %xd.stream and %xd.job
    • Multiple streams or jobs in a paragraph.
    • Special Deploy/Launch Semantics
    • Zeppelin Dynamic Forms (${…})
    • Comprihensive Stream and Job DSL auto-
    completion (Ctrl+.)

    View Slide

  11. SpringXD Conf

    View Slide

  12. PSQL Interpreter
    • Prefix: %psql.sql
    • PostgreSQL, HAWQ/PXF, Greenplum … JDBC
    • PSQL command line shell (via %sh)
    • Zeppelin Dynamic Forms (${…})
    • Comprihensive SQL/JDBC autocompletion (Ctrl+.)

    View Slide

  13. PSQL Configuration

    View Slide

  14. PSQL Doc
    https://zeppelin.incubator.apache.org/docs/0.5.5-
    incubating/interpreter/postgresql.html

    View Slide

  15. PSQL/HAWQ Demo
    • http://10.68.58.121:9995/#/notebook/2B2ZYS18Y

    View Slide

  16. Geode Interpreter
    • Prefix: %geode.oql
    • OQL and PDX nested access (user.name)
    • Geode command line shell (via %sh)
    • Zeppelin Dynamic Forms (${…})
    • Basic OQL auto-completion (Ctrl+.)

    View Slide

  17. Geode Configuration

    View Slide

  18. Geode Doc
    https://zeppelin.incubator.apache.org/docs/0.5
    .5-incubating/interpreter/geode.html

    View Slide

  19. Geode Tutorial
    • http://10.68.58.121:9995/#/notebook/2AW57BUN4

    View Slide

  20. Apache Ambari
    Zeppelin, Geode, HAWQ, SpringXD Services …

    View Slide

  21. Ambari Services

    View Slide

  22. Ambari Services
    • Ambari Zeppelin Service: github , rpm, blog
    • Ambari Geode Service: github, rpm
    • Ambari SpringXD Service: github
    • Ambari HAWQ Service (Pivotal BDS dist)

    View Slide

  23. Ambari Blueprint
    http://:8080/api/v1/clusters/mv10?format=blueprint

    View Slide

  24. Webpage Ebedder
    https://github.com/tzolov/ambari-webpage-embedder-view

    View Slide

  25. stay in touch
    [email protected]
    blog.tzolov.net
    @christzolov
    https://nl.linkedin.com/in/tzolov

    View Slide