Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and
HAWQ) by Christian Tzolov @christzolov

Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,
Apache Committer, Crunch PMC member [email protected] blog.tzolov.net @christzolov

Contents • DEMO • Zeppelin Interpreters • PSQL (to became
JDBC in 0.6.x) • Geode • SpringXD • Apache Ambari • Zeppelin Service • Geode, HAWQ and Spring XD services • Webpage Embedder View

Demo: Twitter Streams with SpringXD, Geode and HAWQ

Technical Stack Apache HDFS Data Lake - PHD or HDP
Hadoop Apache HAWQ SQL on Hadoop (OLAP) Apache Geode In-memory data grid (OLTP) Spring XD Integration and Streaming Runtime Apache Ambari Manages All Clusters Apache Zeppelin Web UI for interaction with Data Systems Hadoop/HDFS Geode HAWQ SpringXD Ambari Zeppelin

Spring XD Orchestrates and automates all steps across multiple data
stream pipelines • HTTP • Tail • File • Mail • Twitter • Gemfire • Syslog • TCP • UDP • JMS • RabbitMQ • MQTT • Kafka • Reactor TCP/UDP • Filter • Transformer • Object-to-JSON • JSON-to-Tuple • Splitter • Aggregator • HTTP Client • Groovy Scripts • Java Code • JPMML Evaluator • Spark Streaming • File • HDFS • JDBC • TCP • Log • Mail • RabbitMQ • Gemfire • Splunk • MQTT • Kafka • Dynamic Router • Counters

Apache Geode • Cache - Performance / Consistency / Resiliency
• Region - Highly available, redundant, distributed Map China Railway Corporation 5,700 train stations 4.5 million tickets per day 20 million daily users 1.4 billion page views per day 40,000 visits per second Indian Railways 7,000 stations 72,000 miles of track 23 million passengers daily 120,000 concurrent users 10,000 transactions per minute

Apache HAWQ • Built around a Greenplum MPP DB •
100% ANSI SQL compliant: SQL-92/99/2003… • ODBC and JDBC • Hadoop Native: Parquet, HDFS and YARN • Extensible - Web Tables, PXF • TPC-DS outperforms Impala by overall 454%

Demo tweets = twittersearch --query=<keywork> | hdfs --directory=/user/zeppelin/xd/tweets geodeTap =
tap:stream:tweets > gemfire-json-server --regionName=regionTweet hawqTap = tap:stream:tweets > transform --script=tweetJsonToTsv.groovy | gpfdist --table=xdsink tweetsCount = tap:stream:tweets > json-to-tuple | transform --expression='payload.id_str' | counter

SpringXD Interpreter(s) • %xd.stream and %xd.job • Multiple streams or
jobs in a paragraph. • Special Deploy/Launch Semantics • Zeppelin Dynamic Forms (${…}) • Comprihensive Stream and Job DSL autocompletion (Ctrl+.)

SpringXD Conf

PSQL Interpreter • Prefix: %psql.sql • PostgreSQL, HAWQ/PXF, Greenplum …
JDBC • PSQL command line shell (via %sh) • Zeppelin Dynamic Forms (${…}) • Comprihensive SQL/JDBC autocompletion (Ctrl+.)

PSQL Configuration

PSQL Doc https://zeppelin.incubator.apache.org/docs/0.5.5- incubating/interpreter/postgresql.html

PSQL/HAWQ Demo • http://10.68.58.121:9995/#/notebook/2B2ZYS18Y

Geode Interpreter • Prefix: %geode.oql • OQL and PDX nested
access (user.name) • Geode command line shell (via %sh) • Zeppelin Dynamic Forms (${…}) • Basic OQL auto-completion (Ctrl+.)

Geode Configuration

Geode Doc https://zeppelin.incubator.apache.org/docs/0.5 .5-incubating/interpreter/geode.html

Geode Tutorial • http://10.68.58.121:9995/#/notebook/2AW57BUN4

Apache Ambari Zeppelin, Geode, HAWQ, SpringXD Services …

Ambari Services

Ambari Services • Ambari Zeppelin Service: github , rpm, blog
• Ambari Geode Service: github, rpm • Ambari SpringXD Service: github • Ambari HAWQ Service (Pivotal BDS dist)

Ambari Blueprint http://<ambari>:8080/api/v1/clusters/mv10?format=blueprint

Webpage Ebedder https://github.com/tzolov/ambari-webpage-embedder-view

stay in touch [email protected] blog.tzolov.net @christzolov https://nl.linkedin.com/in/tzolov

Unified Data Analytics Platform (with Zeppelin,...

Unified Data Analytics Platform (with Zeppelin, Ambari, Geode, SpringXD and HAWQ)

Christian Tzolov

More Decks by Christian Tzolov

Other Decks in Technology

Featured

Transcript