Upgrade to Pro — share decks privately, control downloads, hide ads and more …

UrbonBayesPere_BerlinBuzzwords18_ConnectingDataInfraWithDataFlow.pdf

Pere Urbón
June 13, 2018
80

 UrbonBayesPere_BerlinBuzzwords18_ConnectingDataInfraWithDataFlow.pdf

Pere Urbón

June 13, 2018
Tweet

Transcript

  1. @purbon
    Connecting the data
    infrastructure with the
    DataFlow

    View Slide

  2. @purbon
    Pere Urbon-Bayes
    Software Architect
    [email protected]{gmail.com, acm.org}

    View Slide

  3. Topics for Today
    • Integration patterns for the enterprise startup.
    • What is Apache NIFI.
    • Examples
    • NiFi on operation (best practises).

    View Slide

  4. @purbon
    Integrate all the
    things!

    View Slide

  5. @purbon
    Enterprise integration is the task of making
    separate applications work together to produce
    an unified set of functionality.
    The applications probably run on multiple
    computers, which may be geographically
    dispersed.

    View Slide

  6. @purbon
    Some application might need to be integrated
    even though they were not designed for
    integration and can not be changed.
    This issues, and others, are what makes
    application integration difficult.

    View Slide

  7. @purbon
    Each integration faces different needs and
    criteria, we can group them as
    Application coupling
    Integration simplicity
    Data formats and timeliness
    Data or functionality
    Communication

    View Slide

  8. @purbon
    There is only a limited set of integration
    options

    View Slide

  9. @purbon
    File transfer

    View Slide

  10. @purbon
    Shared database

    View Slide

  11. @purbon
    RPC invoke

    View Slide

  12. @purbon
    Messaging

    View Slide

  13. @purbon
    Enterprise Integration Patterns

    View Slide

  14. @purbon
    What is Apache NiFi?

    View Slide

  15. @purbon
    An easy to use, powerful, and reliable system to
    process and distribute data.
    Web-based interface
    Highly configurable
    Data Provenance
    Designed for extension
    Secure

    View Slide

  16. @purbon
    NiFi was build to automate the flow of data
    between systems.
    an automated and managed flow of information
    between systems.
    But what is Dataflow?

    View Slide

  17. @purbon
    How Apache NiFi look like

    View Slide

  18. @purbon
    Concepts behind Apache NiFi

    View Slide

  19. @purbon
    A Flow file

    View Slide

  20. @purbon
    The Flow file Processor

    View Slide

  21. @purbon
    A Connection

    View Slide

  22. @purbon
    A Process Group

    View Slide

  23. @purbon
    Apache NiFi Architecture
    Distributed using Apache Zookeper

    View Slide

  24. @purbon
    Let’s take a closer
    look…

    View Slide

  25. @purbon
    Apache NiFI
    Operations

    View Slide

  26. @purbon
    Maximum file handles
    hard nofile 50000
    soft nofile 50000
    /etc/security/limits.conf

    View Slide

  27. @purbon
    Maximum forked Procs
    hard nproc 10000
    soft nproc 10000
    /etc/security/limits.conf
    /etc/security/limits.d/90-nproc.conf

    View Slide

  28. @purbon
    Increase number of TCP sockets
    sudo sysctl -w net.ipv4.ip_local_port_range="10000 65000"

    View Slide

  29. @purbon
    Timeout sockets in TIMED_WAIT state
    sudo sysctl -w
    net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait="1"

    View Slide

  30. @purbon
    Never SWAP
    vm.swappiness = 0
    /etc/sysctl.conf
    /dev/sda7 /chroot ext2 defaults, noatime 1 2
    /etc/fstab

    View Slide

  31. @purbon
    Thanks a lot!
    Questions?
    disagreements? threads?
    Pere Urbon-Bayes
    Data Wrangler
    [email protected]{gmail.com, acm.org}

    View Slide