Upgrade to Pro — share decks privately, control downloads, hide ads and more …

SETCON'19 - Siarhei Berdachuk - Использование A...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

SETCON'19 - Siarhei Berdachuk - Использование Apache NiFi при работе с BigData не изобретая колесо

Avatar for Maksim

Maksim

May 10, 2019

More Decks by Maksim

Other Decks in Technology

Transcript

  1. Apache NiFi for Big Data without reinventing the wheel Siarhei

    Berdachuk Software Engineering Team Leader Powered by EPAM
  2. About myself Main expertise: • Java related technologies • Business

    analyst • Database modeling and design More than 20 years in IT Powered by EPAM
  3. Powered by EPAM Apache NiFi as solution • Service-oriented architecture

    (SOA) • Flow-based programming (FBP) Main concepts are:
  4. Powered by EPAM • Data Transformation • Routing and Mediation

    • Database Access • Attribute Extraction • System Interaction • Data Ingestion • Data Egress / Sending Data • Splitting and Aggregation • HTTP • Amazon Web Services 200+ ready processors
  5. Powered by EPAM • Compress or Decompress • Convert the

    character from one character set to another • Encrypt or Decrypt • Use Regular Expressions to modify textual Content • Apply an XSLT transform to XML Content • Transform JSON Content Data Transformation
  6. Powered by EPAM • Route based on the attributes or

    content • Detect Duplicates • Monitor Activity • Distribute Load • Scans the user-defined set of Attributes or content • Validation XML Content against an XML Schema Routing and Mediation
  7. Powered by EPAM • Convert a JSON document into a

    SQL INSERT or UPDATE • Execute SQL SELECT • Update a database by executing the SQL • Execute HiveQL SELECT (for Apache Hadoop) • Update a Hive database by executing the HiveQL Database Access
  8. Powered by EPAM • Evaluate Json Path • Evaluate Xpath

    • Evaluate Xquery • Extract Text • Hash Attribute or Content • Identify Mime Type • Update Attribute Attribute Extraction
  9. Powered by EPAM • Execute process (Operating System command) •

    Execute Stream Command System Interaction
  10. Powered by EPAM • Get File • Get FTP /

    SFTP • Get JMS Queue / Topic • Get HTTP • Listen HTTP / UDP • Get HDFS / List HDFS • FetchS3Object (Amazon Web Services) • Get Kafka • Get Mongo • Get Twitter Data Ingestion
  11. Powered by EPAM • Put Email • Put File •

    Put FTP / SFTP • Put JMS • Put SQL • Put Kafka • Put Mongo Data Egress / Sending Data
  12. Powered by EPAM • Split Text • Split Json •

    Split XML • Unpack Content • Merge Content • Segment Content • Split Content Splitting and Aggregation
  13. Powered by EPAM • Get HTTP • Listen HTTP •

    Invoke HTTP • Post HTTP • Handle Http Request • Handle Http Response HTTP
  14. Powered by EPAM • Fetch S3 Object • Put S3

    Object • Put SNS • Get SQS • Put SQS • Delete SQS Amazon Web Services