Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Phoenix

Apache Phoenix

This presentation gives an overview of the Apache Phoenix project. It explains Phoenix in terms of its architecture, environment, ETL, SQL, UDF's and transactions.

Links for further information and connecting

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

https://open-source-systems.blogspot.com/

Mike Frampton

May 24, 2020
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is Apache Phoenix ? • Massively parallel, relational database

    engine • Supports OLTP for Hadoop • Uses Apache HBase as its backing store • Open source / Apache 2.0 license • Written in Java , SQL • ACID (atomicity, consistency, isolation, durability) – Via Apache Tephra integration
  2. Phoenix SQL Support • Accepts SQL queries • Compiles them

    to HBase scans • Orchestrates running of scans • Produces regular JDBC result sets • Creates performance gains by using – HBase API/coprocessors/custom filters • Results in query response times – Milliseconds for small queries – Seconds for tens of millions of rows
  3. Phoenix Bulk Loading • Bulk load data via • Single-threaded

    for CSV via psql i.e. – bin/psql.py -t EXAMPLE localhost data.csv – Load for EXAMPLE table – For HBase on local machine • MapReduce-based for CSV and JSON – See next slide
  4. Phoenix Bulk Loading • Bulk load example for MapReduce –

    For CSV and JSON loads – Using Phoenix MapReduce library – Against the EXAMPLE table
  5. Phoenix User-defined functions(UDFs) • Create temporary/permanent UDF's – Temporary for

    session only • Use UDF's in SQL and Indexes • Permanent UDF's stored in SYSTEM.FUNCTION • Tenant specific UDF usage supported • UDF jar files must be placed on HDFS • UDF jar updates not currently possible – (without cluster bounce)
  6. Phoenix Transactions • Cross row/table/ACID support using Apache Tephra •

    Transactional functionality currently beta • Enable transactions and snapshot dir in hbase-site.xml • Also set a transational timeout value • Start Tephra • Create tables with flag TRANSACTIONAL=true • Then transactions act as follows – Start with statement against table – End with commit or rollback
  7. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  8. Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

    • See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration