Upgrade to Pro — share decks privately, control downloads, hide ads and more …

NiFi

 NiFi

This presentation attempts to give an overview of the Apache NiFi flow management system currently included in Cloudera's CDF product.

Links for further information and connecting

http://www.semtech-solutions.co.nz

http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/

https://nz.linkedin.com/pub/mike-frampton/20/630/385

Music by

"Little Planet", composed and performed by Bensound from http://www.bensound.com/

Mike Frampton

June 29, 2019
Tweet

More Decks by Mike Frampton

Other Decks in Technology

Transcript

  1. What Is NiFi ? • A data flow automation system

    maintained by Cloudera • Written in Java • Apache 2 License • Cluster based and scaleable • Has web based user interface • Widely extendable • Offers data flow monitoring
  2. NiFi History • Based on NiagaraFiles, developed by NSA •

    Open sourced by NSA in 2014 • Commercialised by Onyara Inc • Purchased by HortonWorks in 2015 • HortonWorks merged into Cloudera in 2018 • Cloudera plans full open source path
  3. How does Nifi work ? • NiFi runs in JVM

    on servers in cluster • Uses ZooKeeper for configuration/coordination – One node as a Cluster Coordinator – One node as a primary • JVM encapsulates – Web server – Processor / Extensions – Repositories for • FlowFile / Content / Data Provenance
  4. Nifi Architecture 2 • Web Server for monitoring and administration

    • Flow controller manages extensions and resources • FlowFile processor 1 .. N – actual data flow worker – Each processor supports NiFi data flow • Extensions allow remote system connectivity – Can be user defined • FlowFile Repo – tracks and maintains current flows • Content Repo – maintains data in transit • Provenance Repo – historic data flow information
  5. Nifi Performance • NiFi server RAM limited by JVM memory

    settings • Garbage collection rate important • Nifi.properties file for performance config i.e. – nifi.ui.autorefresh.interval (browser performance) – nifi.queue.swap.threshold (use of swap) – nifi.provenance.repository.index.threads • Change for high volume threads – nifi.provenance.repository.implementation • WriteAheadProvenance might cause Java garbage collection issues
  6. NiFi Flow Management • Guaranteed data delivery • Uses write

    ahead logs and content repositories • Queue buffering / back pressure • Queue priority configuration • Flow configuration ( latency / throughput ) • UI based data flow builds • UI based data flow monitoring • UI based data provenance
  7. NiFi Ease Of Use 1 • Visually create dataFlows in

    real time • Changes take immediate effect • Use flow templates for existing flow types • Data provenance for – Problem tracking – Data compliance issues – Step through historic data transforms • Fine grained data investigation using UI & repositories
  8. NiFi Security • DataFlow based encryption / decryption • 2

    way SSL • User access control • Pluggable / extendable authorization possible • DataFlow level authorization supports – Flow level component access – Supports multi tenant access / sharing – Even multi tenant support within a flow
  9. NiFi Extensible / Scaleable • Many NiFi points of extension

    – Processors, Controller Services, Reporting Tasks – Prioritizers, Customer User Interfaces • NiFi S2S interface for distributed communication • Extension conflicts avoided using NiFi Archives • Scale out NiFi cluster instances • Scale NiFi concurrent tasks up and down
  10. NiFi Further information • For further information see – https://nifi.apache.org

    – https://en.wikipedia.org/wiki/Apache_NiFi – http://vision.cloudera.com/cloudera-dataflow/ I included the Cloudera link because CDF now uses NiFi for edge data and flow management.
  11. Available Books • See “Big Data Made Easy” – Apress

    Jan 2015 • See “Mastering Apache Spark” – Packt Oct 2015 • See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” • Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ • Connect on LinkedIn – nz.linkedin.com/pub/mike-frampton/20/630/385
  12. Contact Us • Feel free to contact at – [email protected]

    • Or connect on LinkedIn • Im always interested in – New technology – Opportunities – Technology based issues – Big data integration