• Open source / Apache 2.0 license • Stores data on HDFS and others • For low latency big data queries / ETL • Supports SQL • No release since May 2016
following locations – HDFS – Amazon S3 – Openstack Swift – Hbase – RDBMS • It is also possible to register user defined storage – Place user define jar file in tajo/extlib – Copy modified conf/storage-site.json.template into conf/storage-site.json
– Issue meta commands i.e. \l ( list db ) – Issue HDFS commands – Use \set to set session variables – Issue \admin administration commands – Issues commands interactively or batch – Run as a background process
or more TajoMaster servers – One or more TajoWorker servers • TajoMaster coordinates TajoWorkers • TajoWorkers carry out processing • More TajoWorkers mean more processing capacity • Capacity scales linearly
QueryCoordinator • Decides whether each query should be executed in a distributed way or be executed immediately in TajoMaster – Resource Tracker • Manages membership of cluster nodes – Client Service Provider • Routes client API calls to proper QueryCoordinator or ResourceTracker
NodeResourceManager • Manages resource of worker node – TaskManager • Launches task to the TaskExecutor • Uses multiple threads equal to the number of cpu cores – TaskExecutor • Creates TaskContainers for workload – NodeStatusUpdater • Updates the current status when resources change
may be stored in multiple locations – i.e. HDFS, S3, or Hbase – It might be stored in multiple formats – i.e. CSV, Parquet, or ORC • TableSpaces provide a way to – Easily handle data stored on different storage types – In various file formats
source • A tablespace contains multiple tables while a table has only one tablespace • External tables don't have any tablespaces because they have their own storage information • A database can contain tables of different tablespaces
Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
• See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration