Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache Trafodian

Apache Trafodian

This presentation gives an overview of the Apache Trafodian project. It explains Trafodian architecture in relation to Hadoop/HBase and it's process structure.

Links for further information and connecting

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

https://open-source-systems.blogspot.com/

Mike Frampton

May 18, 2020
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is Apache Trafodian ? • A relational database management

    system ( RDBMS ) • Open sourced / Apache 2.0 license • Running on Apache Hadoop • Written in Java and C++ • Originally developed by HP Labs • Uses Apache HBase
  2. Trafodian Overview • Uses Hadoop Hbase / HDFS for storage

    • ANSI SQL support • ACID support (Atomicity, Consistency, Isolation, Durability) • Supports big data sets • Parallel processing in terms of – Query optimization – Query execution
  3. Trafodian Process Architecture • CMP ~ Second instance of the

    compiler code • DCS = Database connectivity services • DTM = Database transaction manager • ISV = Independent software vendor • ESP = Executor server process • JDBC = Java database connectivity • ODBC = Open database connectivity
  4. Trafodian Process Architecture • Clients connect via JDBC or ODBC

    • DCS Master / Server manage connection • Master Executor processes SQL • DTM manages transactions • CMP manages DDL/utilities requiring compiler code • ESP's manage execution time parallelism • Storage engine uses HBase and Hadoop
  5. Trafodian Logs • Uses log4j and log4cpp • Log level

    set to ERROR by default • Master executor logs stored on the local node • All logs can be searched via SQL UDF – select * from udf(event_log_reader( [options] )); • Searches all node log files • Returns time stamped log data
  6. Trafodian Logs • Returned columns: – log_ts timestamp(6), – severity

    char(10 bytes) character set utf8, – component char(24 bytes) character set utf8, – node_number integer, – cpu integer, – pin integer, – process_name char(12 bytes) character set utf8, – sql_code integer, – query_id varchar(200 bytes) character set utf8, – message varchar(4000 bytes) character set utf8
  7. Trafodian Repository • Repository contained in REPOS schema – tables

    • METRIC_QUERY_AGGR_TABLE – Statistics for short running queries (aggregated) • METRIC_QUERY_TABLE – Query statistics information • METRIC_SESSION_TABLE – ODBC and JDBC session statistics • METRIC_TEXT_TABLE – Reserved for future use
  8. Trafodian Configuration • Files stored in install conf directory –

    dcs-site.xml – site specific information – dcs-default.xml – default configuration – dcs-env.sh – environment specific – log4j.properties – log control – master – identifies master host – backup-masters – identifies master backup hosts – servers – identifies server hosts
  9. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  10. Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

    • See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration