Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache AsterixDB

Apache AsterixDB

This presentation gives an overview of the Apache AsterixDB project. It explains the AsterixDB database in terms of its functionality and capabilities.

Links for further information and connecting

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

https://open-source-systems.blogspot.com/

Mike Frampton

May 28, 2020
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is Apache AsterixDB ? • A Big Data Management

    System (BDMS) • Open source / Apache 2.0 license • Manages semi-structured data • Has a NoSQL style data model (ADM) • Has an expressive and declarative query language (AQL) • Uses a runtime query execution engine, Apache Hyracks • Support for querying and indexing external data (e.g. HDFS)
  2. What Is Apache AsterixDB ? • Has two query languages

    (SQL++ and AQL) • Scale-tested on up to 1000+ cores and 500+ disks • Basic transactional (concurrency and recovery) capabilities • Partitioned LSM-based data storage and indexing • Supports efficient data ingestion • Exploits internal data partitioning and indexes – To avoid scanning data sets – When processing queries
  3. Asterix Built In Functions • Numeric Functions • String Functions

    • Binary Functions • Spatial Functions • Similarity Functions • Tokenizing Functions • Temporal Functions • Object Functions • Aggregate Functions • Comparison Functions • Type Functions • Conditional Functions • Miscellaneous Functions
  4. AsterixDB External Data • Built in adapters for external data

    sets – localfs – hdfs – socket – socket_client – twitter_push – twitter_pull – rss
  5. AsterixDB User Defined Functions • UDF's written in Java, stored

    in libs • Use managix command to – Stop Asterix instance – Install UDF library – Start Asterix instance • Now UDF's in lib can be executed • See simplified example on next slide • For testlib library use against tweet feed
  6. AsterixDB User Defined Functions use dataverse feeds; drop feed ProcessedTwitterFeed

    if exists; create secondary feed ProcessedTwitterFeed from feed TwitterFeed apply function testlib#addHashTags; connect feed ProcessedTwitterFeed to dataset ProcessedTweets; use dataverse feeds; for $i in dataset ProcessedTweets limit 10 return $i;
  7. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  8. Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

    • See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration