Apache Phoenix - Speaker Deck

Slide 1

Slide 1 text

What Is Apache Phoenix ? ● Massively parallel, relational database engine ● Supports OLTP for Hadoop ● Uses Apache HBase as its backing store ● Open source / Apache 2.0 license ● Written in Java , SQL ● ACID (atomicity, consistency, isolation, durability) – Via Apache Tephra integration

Slide 2

Slide 2 text

Phoenix SQL Support ● Accepts SQL queries ● Compiles them to HBase scans ● Orchestrates running of scans ● Produces regular JDBC result sets ● Creates performance gains by using – HBase API/coprocessors/custom filters ● Results in query response times – Milliseconds for small queries – Seconds for tens of millions of rows

Slide 3

Slide 3 text

Phoenix SQL Support ● See phoenix.apache.org for full syntax support

Slide 4

Slide 4 text

Phoenix Environment

Slide 5

Slide 5 text

Phoenix Bulk Loading ● Bulk load data via ● Single-threaded for CSV via psql i.e. – bin/psql.py -t EXAMPLE localhost data.csv – Load for EXAMPLE table – For HBase on local machine ● MapReduce-based for CSV and JSON – See next slide

Slide 6

Slide 6 text

Phoenix Bulk Loading ● Bulk load example for MapReduce – For CSV and JSON loads – Using Phoenix MapReduce library – Against the EXAMPLE table

Slide 7

Slide 7 text

Phoenix Performance

Slide 8

Slide 8 text

Phoenix User-defined functions(UDFs) ● Create temporary/permanent UDF's – Temporary for session only ● Use UDF's in SQL and Indexes ● Permanent UDF's stored in SYSTEM.FUNCTION ● Tenant specific UDF usage supported ● UDF jar files must be placed on HDFS ● UDF jar updates not currently possible – (without cluster bounce)

Slide 9

Slide 9 text

Phoenix Transactions ● Cross row/table/ACID support using Apache Tephra ● Transactional functionality currently beta ● Enable transactions and snapshot dir in hbase-site.xml ● Also set a transational timeout value ● Start Tephra ● Create tables with flag TRANSACTIONAL=true ● Then transactions act as follows – Start with statement against table – End with commit or rollback

Slide 10

Slide 10 text

Available Books ● See “Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

Slide 11

Slide 11 text

Connect ● Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration