Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
UrbonBayesPere_BerlinBuzzwords18_ConnectingData...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Pere Urbón
June 13, 2018
0
220
UrbonBayesPere_BerlinBuzzwords18_ConnectingDataInfraWithDataFlow.pdf
Pere Urbón
June 13, 2018
Tweet
Share
More Decks by Pere Urbón
See All by Pere Urbón
Building a self-service Kafka Platform
purbon
0
400
10 ways to deploy Apache Kafka® and have fun along the way
purbon
1
280
Apache Kafka: advice from the trenches or how to successfully fail!
purbon
3
1.5k
Building an Streaming Platform with Kafka
purbon
0
130
Connecting the data infrastructure with the DataFlow (Apache NiFi)
purbon
2
340
Ten Tips to completely fail building your Search Project
purbon
0
170
Learning to Rank 101, Bringing personalisation to data discovery
purbon
0
520
The quantum mechanics of data pipelines
purbon
0
120
Data Engineering without borders
purbon
0
110
Featured
See All Featured
Odyssey Design
rkendrick25
PRO
2
550
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.6k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.1k
The Language of Interfaces
destraynor
162
26k
The World Runs on Bad Software
bkeepers
PRO
72
12k
職位にかかわらず全員がリーダーシップを発揮するチーム作り / Building a team where everyone can demonstrate leadership regardless of position
madoxten
62
52k
Faster Mobile Websites
deanohume
310
31k
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
160
Public Speaking Without Barfing On Your Shoes - THAT 2023
reverentgeek
1
340
Claude Code どこまでも/ Claude Code Everywhere
nwiizo
64
53k
Navigating the Design Leadership Dip - Product Design Week Design Leaders+ Conference 2024
apolaine
0
230
HDC tutorial
michielstock
1
560
Transcript
@purbon Connecting the data infrastructure with the DataFlow
@purbon Pere Urbon-Bayes Software Architect pere.urbon@{gmail.com, acm.org}
Topics for Today • Integration patterns for the enterprise startup.
• What is Apache NIFI. • Examples • NiFi on operation (best practises).
@purbon Integrate all the things!
@purbon Enterprise integration is the task of making separate applications
work together to produce an unified set of functionality. The applications probably run on multiple computers, which may be geographically dispersed.
@purbon Some application might need to be integrated even though
they were not designed for integration and can not be changed. This issues, and others, are what makes application integration difficult.
@purbon Each integration faces different needs and criteria, we can
group them as Application coupling Integration simplicity Data formats and timeliness Data or functionality Communication
@purbon There is only a limited set of integration options
@purbon File transfer
@purbon Shared database
@purbon RPC invoke
@purbon Messaging
@purbon Enterprise Integration Patterns
@purbon What is Apache NiFi?
@purbon An easy to use, powerful, and reliable system to
process and distribute data. Web-based interface Highly configurable Data Provenance Designed for extension Secure
@purbon NiFi was build to automate the flow of data
between systems. an automated and managed flow of information between systems. But what is Dataflow?
@purbon How Apache NiFi look like
@purbon Concepts behind Apache NiFi
@purbon A Flow file
@purbon The Flow file Processor
@purbon A Connection
@purbon A Process Group
@purbon Apache NiFi Architecture Distributed using Apache Zookeper
@purbon Let’s take a closer look…
@purbon Apache NiFI Operations
@purbon Maximum file handles hard nofile 50000 soft nofile 50000
/etc/security/limits.conf
@purbon Maximum forked Procs hard nproc 10000 soft nproc 10000
/etc/security/limits.conf /etc/security/limits.d/90-nproc.conf
@purbon Increase number of TCP sockets sudo sysctl -w net.ipv4.ip_local_port_range="10000
65000"
@purbon Timeout sockets in TIMED_WAIT state sudo sysctl -w net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait="1"
@purbon Never SWAP vm.swappiness = 0 /etc/sysctl.conf /dev/sda7 /chroot ext2
defaults, noatime 1 2 /etc/fstab
@purbon Thanks a lot! Questions? disagreements? threads? Pere Urbon-Bayes Data
Wrangler pere.urbon@{gmail.com, acm.org}