Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Apache Zeppelin Meetup (2016): http://bit.ly/2yO5ynW
Unified
Data Analytics Platform
(with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

B53016292e87bca26da88fc940070c4f?s=128

Christian Tzolov

January 21, 2016
Tweet

Transcript

  1. Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and

    HAWQ) by Christian Tzolov @christzolov
  2. Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,

    Apache Committer, Crunch PMC member ctzolov@pivotal.io blog.tzolov.net @christzolov
  3. Contents • DEMO • Zeppelin Interpreters • PSQL (to became

    JDBC in 0.6.x) • Geode • SpringXD • Apache Ambari • Zeppelin Service • Geode, HAWQ and Spring XD services • Webpage Embedder View
  4. Demo: Twitter Streams with SpringXD, Geode and HAWQ

  5. Technical Stack Apache HDFS Data Lake - PHD or HDP

    Hadoop Apache HAWQ SQL on Hadoop (OLAP) Apache Geode In-memory data grid (OLTP) Spring XD Integration and Streaming Runtime Apache Ambari Manages All Clusters Apache Zeppelin Web UI for interaction with Data Systems Hadoop/HDFS Geode HAWQ SpringXD Ambari Zeppelin
  6. Spring XD Orchestrates and automates all steps across multiple data

    stream pipelines • HTTP • Tail • File • Mail • Twitter • Gemfire • Syslog • TCP • UDP • JMS • RabbitMQ • MQTT • Kafka • Reactor TCP/UDP • Filter • Transformer • Object-to-JSON • JSON-to-Tuple • Splitter • Aggregator • HTTP Client • Groovy Scripts • Java Code • JPMML Evaluator • Spark Streaming • File • HDFS • JDBC • TCP • Log • Mail • RabbitMQ • Gemfire • Splunk • MQTT • Kafka • Dynamic Router • Counters
  7. Apache Geode • Cache - Performance / Consistency / Resiliency

    • Region - Highly available, redundant, distributed Map China Railway Corporation 5,700 train stations 4.5 million tickets per day 20 million daily users 1.4 billion page views per day 40,000 visits per second Indian Railways 7,000 stations 72,000 miles of track 23 million passengers daily 120,000 concurrent users 10,000 transactions per minute
  8. Apache HAWQ • Built around a Greenplum MPP DB •

    100% ANSI SQL compliant: SQL-92/99/2003… • ODBC and JDBC • Hadoop Native: Parquet, HDFS and YARN • Extensible - Web Tables, PXF • TPC-DS outperforms Impala by overall 454%
  9. Demo tweets = twittersearch --query=<keywork> | hdfs --directory=/user/zeppelin/xd/tweets geodeTap =

    tap:stream:tweets > gemfire-json-server --regionName=regionTweet hawqTap = tap:stream:tweets > transform --script=tweetJsonToTsv.groovy | gpfdist --table=xdsink tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter
  10. SpringXD Interpreter(s) • %xd.stream and %xd.job • Multiple streams or

    jobs in a paragraph. • Special Deploy/Launch Semantics • Zeppelin Dynamic Forms (${…}) • Comprihensive Stream and Job DSL auto- completion (Ctrl+.)
  11. SpringXD Conf

  12. PSQL Interpreter • Prefix: %psql.sql • PostgreSQL, HAWQ/PXF, Greenplum …

    JDBC • PSQL command line shell (via %sh) • Zeppelin Dynamic Forms (${…}) • Comprihensive SQL/JDBC autocompletion (Ctrl+.)
  13. PSQL Configuration

  14. PSQL Doc https://zeppelin.incubator.apache.org/docs/0.5.5- incubating/interpreter/postgresql.html

  15. PSQL/HAWQ Demo • http://10.68.58.121:9995/#/notebook/2B2ZYS18Y

  16. Geode Interpreter • Prefix: %geode.oql • OQL and PDX nested

    access (user.name) • Geode command line shell (via %sh) • Zeppelin Dynamic Forms (${…}) • Basic OQL auto-completion (Ctrl+.)
  17. Geode Configuration

  18. Geode Doc https://zeppelin.incubator.apache.org/docs/0.5 .5-incubating/interpreter/geode.html

  19. Geode Tutorial • http://10.68.58.121:9995/#/notebook/2AW57BUN4

  20. Apache Ambari Zeppelin, Geode, HAWQ, SpringXD Services …

  21. Ambari Services

  22. Ambari Services • Ambari Zeppelin Service: github , rpm, blog

    • Ambari Geode Service: github, rpm • Ambari SpringXD Service: github • Ambari HAWQ Service (Pivotal BDS dist)
  23. Ambari Blueprint http://<ambari>:8080/api/v1/clusters/mv10?format=blueprint

  24. Webpage Ebedder https://github.com/tzolov/ambari-webpage-embedder-view

  25. stay in touch ctzolov@pivotal.io blog.tzolov.net @christzolov https://nl.linkedin.com/in/tzolov