Connecting the data infrastructure with the DataFlow (Apache NiFi)

Connecting the data infrastructure with the DataFlow (Apache NiFi)

The need to integrate a swarm of systems has always been present in the history of IT, however with the advent of big data and the internet of things this has simply exploded. Through the explanation of several real life use cases in companies of all sizes, this talk will introduce you to Apache NiFi, a powerful and scalable system to process, transform and distribute data.

NiFi is an open source project from the Apache Foundation that works perfectly as mediation logic between systems and to perform most of your ETL requirements. This talk will show you how NiFi can be used by humans in BI, Data Science, Development and Operations teams to easily fulfill your data move requirements.

After this talk you will know where you can leverage NiFi, but also where you should not use it, in a nutshell you will add another tool in your belt to work on data integration problems.

4c253af5a9977910b9326b19199d3023?s=128

Pere Urbón

June 13, 2018
Tweet

Transcript

  1. @purbon Connecting the data infrastructure with the DataFlow

  2. @purbon Pere Urbon-Bayes Software Architect pere.urbon@{gmail.com, acm.org}

  3. Topics for Today • Integration patterns for the enterprise startup.

    • What is Apache NIFI. • Examples • NiFi on operation (best practises).
  4. @purbon Integrate all the things!

  5. @purbon Enterprise integration is the task of making separate applications

    work together to produce an unified set of functionality. The applications probably run on multiple computers, which may be geographically dispersed.
  6. @purbon Some application might need to be integrated even though

    they were not designed for integration and can not be changed. This issues, and others, are what makes application integration difficult.
  7. @purbon Each integration faces different needs and criteria, we can

    group them as Application coupling Integration simplicity Data formats and timeliness Data or functionality Communication
  8. @purbon There is only a limited set of integration options

  9. @purbon File transfer

  10. @purbon Shared database

  11. @purbon RPC invoke

  12. @purbon Messaging

  13. @purbon Enterprise Integration Patterns

  14. @purbon What is Apache NiFi?

  15. @purbon An easy to use, powerful, and reliable system to

    process and distribute data. Web-based interface Highly configurable Data Provenance Designed for extension Secure
  16. @purbon NiFi was build to automate the flow of data

    between systems. an automated and managed flow of information between systems. But what is Dataflow?
  17. @purbon How Apache NiFi look like

  18. @purbon Concepts behind Apache NiFi

  19. @purbon A Flow file

  20. @purbon The Flow file Processor

  21. @purbon A Connection

  22. @purbon A Process Group

  23. @purbon Apache NiFi Architecture Distributed using Apache Zookeper

  24. @purbon Let’s take a closer look…

  25. @purbon Apache NiFI Operations

  26. @purbon Maximum file handles hard nofile 50000 soft nofile 50000

    /etc/security/limits.conf
  27. @purbon Maximum forked Procs hard nproc 10000 soft nproc 10000

    /etc/security/limits.conf /etc/security/limits.d/90-nproc.conf
  28. @purbon Increase number of TCP sockets sudo sysctl -w net.ipv4.ip_local_port_range="10000

    65000"
  29. @purbon Timeout sockets in TIMED_WAIT state sudo sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait="1"

  30. @purbon Never SWAP vm.swappiness = 0 /etc/sysctl.conf /dev/sda7 /chroot ext2

    defaults, noatime 1 2 /etc/fstab
  31. @purbon Thanks a lot! Questions? disagreements? threads? Pere Urbon-Bayes Data

    Wrangler pere.urbon@{gmail.com, acm.org}