Slide 1

Slide 1 text

What Is Apache NiFi ? ● A data flow automation system maintained by Cloudera ● Written in Java ● Open source / Apache 2 License ● Cluster based and scaleable ● Has web based user interface ● Widely extendable ● Offers data flow monitoring

Slide 2

Slide 2 text

How does Nifi work ? ● NiFi runs in JVM on servers in cluster ● Uses ZooKeeper for configuration/coordination – One node as a Cluster Coordinator – One node as a primary ● JVM encapsulates – Web server – Processor / Extensions – Repositories for ● FlowFile / Content / Data Provenance

Slide 3

Slide 3 text

Nifi Architecture

Slide 4

Slide 4 text

Nifi Architecture ● Web Server for monitoring and administration ● Flow controller manages extensions and resources ● FlowFile processor 1 .. N – actual data flow worker – Each processor supports NiFi data flow ● Extensions allow remote system connectivity – Can be user defined ● FlowFile Repo – tracks and maintains current flows ● Content Repo – maintains data in transit ● Provenance Repo – historic data flow information

Slide 5

Slide 5 text

NiFi Flow Management ● Guaranteed data delivery ● Uses write ahead logs and content repositories ● Queue buffering / back pressure ● Queue priority configuration ● Flow configuration ( latency / throughput ) ● UI based data flow builds ● UI based data flow monitoring ● UI based data provenance

Slide 6

Slide 6 text

NiFi Cluster

Slide 7

Slide 7 text

NiFi Cluster ● Nifi Can act in cluster mode, configured by ZooKeeper ● Each node works on a different set of data ● ZooKeeper – Elects a single cluster coordinator node – Handles node fail over ● Cluster coordinator manages cluster membership ● ZooKeeper elects a node as a DataFlow manager

Slide 8

Slide 8 text

NiFi Repository Storage ● All repository storage is pluggable ● Storage could be change by user defined development ● The default is file system storage with – Multiple file system locations used – Multiple physical partitions used – RAID configurations to optimize I/O ● Archiving available for the content repository – Deletion is automatic and configurable

Slide 9

Slide 9 text

NiFi Extensions ● Extensions are stored in Nifi Archives ( NAR's ) ● Points of extension include can be – processors, Controller Services, Reporting Tasks, Prioritizers, and Customer User Interfaces ● See these example NAR's by Frank Sauer – For InfluxDB access – JSON transformation – https://github.com/fsauer65/NiFi-Extensions

Slide 10

Slide 10 text

What Is Apache NiFi Registry ? ● A subproject of Apache NiFi ● For storage and management of shared resources ● Across one or more instances of NiFi and/or MiNiFi ● Offers version control for flows ● Define users, groups and policies for flows ● Support for Linux, Unix and Mac OS X

Slide 11

Slide 11 text

NiFi Extension Registry ● There was also an extension registry proposal in 2016 ● Prototyped by Puspendu Banerjee ● Created on github at ● https://github.com/PuspenduBanerjee/nifi/tree/NIFI-ExtRegistry ● Seems like a good idea ● A central location for extensions ● But no update since 2016 – For proposal or prototype

Slide 12

Slide 12 text

Available Books ● See “Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020

Slide 13

Slide 13 text

Connect ● Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration