Apache Kylin

What Is Apache Kylin ? • An analytics data warehouse
• For big data / Apache 2.0 license • Open source / written in Java • Kylin is an OLAP engine with SQL interface • For huge table (e.g., >100 million rows) • Provides second level query performance at TB to PB level

How does Kylin work ? • Kylin runs on a
Hadoop cluster • It needs these services – HDFS, YARN, MapReduce, Hive, HBase, Zookeeper • State information is stored in Hbase • Historic data / star schema stored in Hive • Access Kylin at http://<hostname>:7070/kylin • Uses Lambda architecture for real time streaming – layers: Batch, speed and serving – batch / near real-time processing

Kylin Software Requirements • Requirements as of release v3.0.1 –
Hadoop: 2.7+, 3.1+ (since v2.5) – Hive: 0.13 - 1.2.1+ – HBase: 1.1+, 2.0 (since v2.5) – Spark (optional) 2.3.0+ – Kafka (optional) 1.0.0+ (since v2.5) – JDK: 1.8+ (since v2.5) – OS: Linux only, CentOS 6.5+ or Ubuntu 16.0.4+

Kylin In Cluster Mode

Kylin Real Time Streaming Architecture

Kylin Real Time Streaming Architecture • Streaming Receiver – ingest
data from stream data sources • Streaming Coordinator – coordinate work loads • Metadata Store – store streaming related metadata • Query Engine – query real-time data from streaming receiver • Build Engine – build cube from the real-time data

Kylin Vs Druid • Druid is more suitable for real
time analysis. Kylin is more focused on the OLAP case. • Druid has good integration with Kafka for real time streaming analysis. The real time capability of Kylin (v3) is for real time OLAP. • Druid uses bitmap indexes for internal data structures. Kylin uses bitmap indexes for real time data and molap cubes for historical data. • Kylin provide ANSI SQL, Druid provides a specific query language. • Druid has limitations on table join, Kylin supports star schema. • Kylin has good integration with BI tools, such as Tableau or Excel. Druid has limited integration with existing BI tools. • Since Kylin supports molap cubes, it has very good performance for complex queries on billion level data sets. • Since Druid needs to scan the full index, the performance may be hurt if the data set and query range is too big.

Some Kylin Users

Kylin Ecosystem

Kylin Ecosystem • Kylin Core Fundamental framework of Kylin OLAP
Engine comprises of Metadata Engine, Query Engine, Job Engine and Storage Engine to run the entire stack. It also includes a REST Server to service client requests • Extensions Plugins to support additional functions and features • Integration Lifecycle Management Support to integrate with Job Scheduler, ETL, Monitoring and Alerting Systems • User Interface Allows third party users to build customized user-interface atop Kylin core • Drivers ODBC and JDBC drivers to support different tools and products, such as Tableau

Available Books • See “Big Data Made Easy” – Apress
Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
• See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration

Apache Kylin

Apache Kylin

Mike Frampton

More Decks by Mike Frampton

Other Decks in Technology

Featured

Transcript

What Is Apache Kylin ? • An analytics data warehouse

How does Kylin work ? • Kylin runs on a

Kylin Software Requirements • Requirements as of release v3.0.1 –

Kylin In Cluster Mode

Kylin Real Time Streaming Architecture

Kylin Real Time Streaming Architecture • Streaming Receiver – ingest

Kylin Vs Druid • Druid is more suitable for real

Some Kylin Users

Kylin Ecosystem

Kylin Ecosystem • Kylin Core Fundamental framework of Kylin OLAP

Available Books • See “Big Data Made Easy” – Apress

Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020