Slide 1

Slide 1 text

@purbon Connecting the data infrastructure with the DataFlow

Slide 2

Slide 2 text

@purbon Pere Urbon-Bayes Software Architect pere.urbon@{gmail.com, acm.org}

Slide 3

Slide 3 text

Topics for Today • Integration patterns for the enterprise startup. • What is Apache NIFI. • Examples • NiFi on operation (best practises).

Slide 4

Slide 4 text

@purbon Integrate all the things!

Slide 5

Slide 5 text

@purbon Enterprise integration is the task of making separate applications work together to produce an unified set of functionality. The applications probably run on multiple computers, which may be geographically dispersed.

Slide 6

Slide 6 text

@purbon Some application might need to be integrated even though they were not designed for integration and can not be changed. This issues, and others, are what makes application integration difficult.

Slide 7

Slide 7 text

@purbon Each integration faces different needs and criteria, we can group them as Application coupling Integration simplicity Data formats and timeliness Data or functionality Communication

Slide 8

Slide 8 text

@purbon There is only a limited set of integration options

Slide 9

Slide 9 text

@purbon File transfer

Slide 10

Slide 10 text

@purbon Shared database

Slide 11

Slide 11 text

@purbon RPC invoke

Slide 12

Slide 12 text

@purbon Messaging

Slide 13

Slide 13 text

@purbon Enterprise Integration Patterns

Slide 14

Slide 14 text

@purbon What is Apache NiFi?

Slide 15

Slide 15 text

@purbon An easy to use, powerful, and reliable system to process and distribute data. Web-based interface Highly configurable Data Provenance Designed for extension Secure

Slide 16

Slide 16 text

@purbon NiFi was build to automate the flow of data between systems. an automated and managed flow of information between systems. But what is Dataflow?

Slide 17

Slide 17 text

@purbon How Apache NiFi look like

Slide 18

Slide 18 text

@purbon Concepts behind Apache NiFi

Slide 19

Slide 19 text

@purbon A Flow file

Slide 20

Slide 20 text

@purbon The Flow file Processor

Slide 21

Slide 21 text

@purbon A Connection

Slide 22

Slide 22 text

@purbon A Process Group

Slide 23

Slide 23 text

@purbon Apache NiFi Architecture Distributed using Apache Zookeper

Slide 24

Slide 24 text

@purbon Let’s take a closer look…

Slide 25

Slide 25 text

@purbon Apache NiFI Operations

Slide 26

Slide 26 text

@purbon Maximum file handles hard nofile 50000 soft nofile 50000 /etc/security/limits.conf

Slide 27

Slide 27 text

@purbon Maximum forked Procs hard nproc 10000 soft nproc 10000 /etc/security/limits.conf /etc/security/limits.d/90-nproc.conf

Slide 28

Slide 28 text

@purbon Increase number of TCP sockets sudo sysctl -w net.ipv4.ip_local_port_range="10000 65000"

Slide 29

Slide 29 text

@purbon Timeout sockets in TIMED_WAIT state sudo sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait="1"

Slide 30

Slide 30 text

@purbon Never SWAP vm.swappiness = 0 /etc/sysctl.conf /dev/sda7 /chroot ext2 defaults, noatime 1 2 /etc/fstab

Slide 31

Slide 31 text

@purbon Thanks a lot! Questions? disagreements? threads? Pere Urbon-Bayes Data Wrangler pere.urbon@{gmail.com, acm.org}