Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Apache NiFi

Apache NiFi

This presentation attempts to give an overview of the Apache NiFi project. I had intended to specifically examine the registry but found that there was more to say about Nifi itself. It does examine the Registry project as well as extensions and a possible registry for that area.

Links for further information and connecting

http://www.semtech-solutions.co.nz

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

Mike Frampton

May 25, 2020
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is Apache NiFi ? • A data flow automation

    system maintained by Cloudera • Written in Java • Open source / Apache 2 License • Cluster based and scaleable • Has web based user interface • Widely extendable • Offers data flow monitoring
  2. How does Nifi work ? • NiFi runs in JVM

    on servers in cluster • Uses ZooKeeper for configuration/coordination – One node as a Cluster Coordinator – One node as a primary • JVM encapsulates – Web server – Processor / Extensions – Repositories for • FlowFile / Content / Data Provenance
  3. Nifi Architecture • Web Server for monitoring and administration •

    Flow controller manages extensions and resources • FlowFile processor 1 .. N – actual data flow worker – Each processor supports NiFi data flow • Extensions allow remote system connectivity – Can be user defined • FlowFile Repo – tracks and maintains current flows • Content Repo – maintains data in transit • Provenance Repo – historic data flow information
  4. NiFi Flow Management • Guaranteed data delivery • Uses write

    ahead logs and content repositories • Queue buffering / back pressure • Queue priority configuration • Flow configuration ( latency / throughput ) • UI based data flow builds • UI based data flow monitoring • UI based data provenance
  5. NiFi Cluster • Nifi Can act in cluster mode, configured

    by ZooKeeper • Each node works on a different set of data • ZooKeeper – Elects a single cluster coordinator node – Handles node fail over • Cluster coordinator manages cluster membership • ZooKeeper elects a node as a DataFlow manager
  6. NiFi Repository Storage • All repository storage is pluggable •

    Storage could be change by user defined development • The default is file system storage with – Multiple file system locations used – Multiple physical partitions used – RAID configurations to optimize I/O • Archiving available for the content repository – Deletion is automatic and configurable
  7. NiFi Extensions • Extensions are stored in Nifi Archives (

    NAR's ) • Points of extension include can be – processors, Controller Services, Reporting Tasks, Prioritizers, and Customer User Interfaces • See these example NAR's by Frank Sauer – For InfluxDB access – JSON transformation – https://github.com/fsauer65/NiFi-Extensions
  8. What Is Apache NiFi Registry ? • A subproject of

    Apache NiFi • For storage and management of shared resources • Across one or more instances of NiFi and/or MiNiFi • Offers version control for flows • Define users, groups and policies for flows • Support for Linux, Unix and Mac OS X
  9. NiFi Extension Registry • There was also an extension registry

    proposal in 2016 • Prototyped by Puspendu Banerjee • Created on github at • https://github.com/PuspenduBanerjee/nifi/tree/NIFI-ExtRegistry • Seems like a good idea • A central location for extensions • But no update since 2016 – For proposal or prototype
  10. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  11. Connect • Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

    • See my open source blog at – open-source-systems.blogspot.com/ • I am always interested in – New technology – Opportunities – Technology based issues – Big data integration