• For big data / Apache 2.0 license • Open source / written in Java • Kylin is an OLAP engine with SQL interface • For huge table (e.g., >100 million rows) • Provides second level query performance at TB to PB level
Hadoop cluster • It needs these services – HDFS, YARN, MapReduce, Hive, HBase, Zookeeper • State information is stored in Hbase • Historic data / star schema stored in Hive • Access Kylin at http://<hostname>:7070/kylin • Uses Lambda architecture for real time streaming – layers: Batch, speed and serving – batch / near real-time processing
data from stream data sources • Streaming Coordinator – coordinate work loads • Metadata Store – store streaming related metadata • Query Engine – query real-time data from streaming receiver • Build Engine – build cube from the real-time data
time analysis. Kylin is more focused on the OLAP case. • Druid has good integration with Kafka for real time streaming analysis. The real time capability of Kylin (v3) is for real time OLAP. • Druid uses bitmap indexes for internal data structures. Kylin uses bitmap indexes for real time data and molap cubes for historical data. • Kylin provide ANSI SQL, Druid provides a specific query language. • Druid has limitations on table join, Kylin supports star schema. • Kylin has good integration with BI tools, such as Tableau or Excel. Druid has limited integration with existing BI tools. • Since Kylin supports molap cubes, it has very good performance for complex queries on billion level data sets. • Since Druid needs to scan the full index, the performance may be hurt if the data set and query range is too big.
Engine comprises of Metadata Engine, Query Engine, Job Engine and Storage Engine to run the entire stack. It also includes a REST Server to service client requests • Extensions Plugins to support additional functions and features • Integration Lifecycle Management Support to integrate with Job Scheduler, ETL, Monitoring and Alerting Systems • User Interface Allows third party users to build customized user-interface atop Kylin core • Drivers ODBC and JDBC drivers to support different tools and products, such as Tableau
Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
• See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration