Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Trafodian

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Apache Trafodian

This presentation gives an overview of the Apache Trafodian project. It explains Trafodian architecture in relation to Hadoop/HBase and it's process structure.

Links for further information and connecting

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

https://open-source-systems.blogspot.com/

Avatar for Mike Frampton

Mike Frampton

May 18, 2020
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is Apache Trafodian ? • A relational database management

    system ( RDBMS ) • Open sourced / Apache 2.0 license • Running on Apache Hadoop • Written in Java and C++ • Originally developed by HP Labs • Uses Apache HBase
  2. Trafodian Overview • Uses Hadoop Hbase / HDFS for storage

    • ANSI SQL support • ACID support (Atomicity, Consistency, Isolation, Durability) • Supports big data sets • Parallel processing in terms of – Query optimization – Query execution
  3. Trafodian Process Architecture • CMP ~ Second instance of the

    compiler code • DCS = Database connectivity services • DTM = Database transaction manager • ISV = Independent software vendor • ESP = Executor server process • JDBC = Java database connectivity • ODBC = Open database connectivity
  4. Trafodian Process Architecture • Clients connect via JDBC or ODBC

    • DCS Master / Server manage connection • Master Executor processes SQL • DTM manages transactions • CMP manages DDL/utilities requiring compiler code • ESP's manage execution time parallelism • Storage engine uses HBase and Hadoop
  5. Trafodian Logs • Uses log4j and log4cpp • Log level

    set to ERROR by default • Master executor logs stored on the local node • All logs can be searched via SQL UDF – select * from udf(event_log_reader( [options] )); • Searches all node log files • Returns time stamped log data
  6. Trafodian Logs • Returned columns: – log_ts timestamp(6), – severity

    char(10 bytes) character set utf8, – component char(24 bytes) character set utf8, – node_number integer, – cpu integer, – pin integer, – process_name char(12 bytes) character set utf8, – sql_code integer, – query_id varchar(200 bytes) character set utf8, – message varchar(4000 bytes) character set utf8
  7. Trafodian Repository • Repository contained in REPOS schema – tables

    • METRIC_QUERY_AGGR_TABLE – Statistics for short running queries (aggregated) • METRIC_QUERY_TABLE – Query statistics information • METRIC_SESSION_TABLE – ODBC and JDBC session statistics • METRIC_TEXT_TABLE – Reserved for future use
  8. Trafodian Configuration • Files stored in install conf directory –

    dcs-site.xml – site specific information – dcs-default.xml – default configuration – dcs-env.sh – environment specific – log4j.properties – log control – master – identifies master host – backup-masters – identifies master backup hosts – servers – identifies server hosts
  9. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  10. Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

    • See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration