Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Java One 2017 - Streaming solutions for real time problems

Java One 2017 - Streaming solutions for real time problems

Java One 2017 - Streaming solutions for real time problems

Abhishek Gupta

October 30, 2017
Tweet

More Decks by Abhishek Gupta

Other Decks in Programming

Transcript

  1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Streaming solutions for real time problems Abhishek Gupta @abhi_tweeter Senior Product Manager, Oracle Oct 2, 2017
  2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Before we dive in… • Goal – Using a practical example, familiarize you with a tech stack for dealing with fast/real time/streaming data • Agenda – 101s - Kafka, Kafka Streams & Redis – Sample app & implementation (using Oracle Cloud) – Q & A • Content – Slideshare – Github
  4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | (traditional) Batch solution EVENTS EVENTS EVENTS DWH Aggregate Batch processing Static view of insights
  5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | (traditional) Messaging based solution Message Broker EVENTS EVENTS EVENTS DB App Consumer Polling etc. 1. Designed for in-memory 2. Consume and delete Stream Processing @ scale ?? DIY !
  6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Stream processing to the rescue! • Streams – Unbounded/infinite data set – Has volume and velocity. Not just Big, but fast data • Stream Processing – Crunching/processing streams of data.. asap! – Req-response – Streaming - Batch – Time, ordering, state etc. http://www.capturearkansas.com/photos/550197
  7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Use Case: Data center monitoring application • Collect (simulate) metrics from multiple machines • Crunch statistics (moving average) • Monitor using a dashboard data: {"machine":"machine-1","metrics":["8","20","36","65","2","20","73","67"]} data: {"machine":"machine-2","metrics":["1","54","42","61","40","35","26","78”]} . . . .
  8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Tech stack for a Streaming solution Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer
  9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Apache Kafka: the Event Store Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer
  10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Apache Kafka Originally built @ LinkedIn OSS in early 2011 Late 2012 – ASF top level 50,000 foot view History
  11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Topics machine1-59 machine3-23 machine5-42 machine6-43 machine2-17 …. cpu-metrics
  12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Partitions https://kafka.apache.org On disk
  13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Replication (and partitioning) in action Humble beginning – single node
  14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Replication (and partitioning) in action Scale out… https://simplydistributed.wordpress.com/2016/12/13/kafka-partitioning/ https://svn.apache.org/repos/asf/zookeeper/logo/zook eeper.jpg Zookeeper
  15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Producers https://kafka.apache.org What goes where ??
  16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Consumers https://kafka.apache.org Pub-sub Queue Kafka
  17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Event producer: Oracle Application Container Cloud
  18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | So… What is Kafka ?? • At its core: a distributed commit log • Messaging system (Pub Sub + Queue) • Reactive (& sharded) key-value store • Database – read this and check out KSQL (a streaming SQL engine for Kafka) • Data pipeline – thanks to Kafka Connect • Streaming platform – stay awake to learn more on this !
  19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Kafka Streams: processing engine Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer
  20. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | • Streams API: no need to deal with the Kafka Consumer, Producer API explicitly • Use cases – big data, fast data, microservices, monoliths etc. • Piggy backs on Kafka for scalability & fault-tolerance • One-record-at-a-time processing (no micro batching) • Separate infra isn’t mandatory – think about Spark, Storm etc. – deploy (and scale) anywhere – its just a Java app after all! • Programming styles: High (fluent DSL) and low level (Processor) APIs • Stateful processing support + Interactive queries • Windowing, aggregations, joins etc. Kafka Streams: what is it ?
  21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Kafka Streams: APIs (High level) Fluent DSL API (Low level) Processor API
  22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Kafka Streams: Topology https://kafka.apache.org conceptually At runtime
  23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Scaling a Kafka Streams app p1 p2 p3 p4 Thread-1 Instance-1 Task 1 Task 2 Task 3 Task 4 Thread-1 Task 3 Task 4 Instance-2 my-topic Stream partitions Scale out
  24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Scaling out is not the only option • Techniques – Scale OUT – more instances – Scale UP – more threads • Max parallelism – [No. of topic partitions / no. of threads per instance] e.g. 50 / 5 = 10 https://issues.apache.org/jira/browse/KAFKA-5683
  25. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Stateful stream processing with Kafka Streams State stores • Conceptually: lightweight embedded database within your stream processing layer to store ‘intermediate’ processing state (state is local to each app instance) • Options: in-memory, persistent (RocksDB), custom store (e.g. external DB) • State stores expose their internals using Interactive Queries Interactive queries • No additional data store.. Just ask your app ! • Needs some dev work to make your app (interactively) query-able
  26. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Stateful processing & (interactive) querying Kafka External app Custom RPC layer (e.g. REST API) machine1:8080 machine2:8080 Local state stores App Instance 1 App instance 2 application.server config + StreamsMetadata API Query and get back the ‘complete’ state using custom API
  27. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Interactive queries in action Blog - http://bit.ly/2fK1Io5
  28. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Fault tolerance – for stateless and stateful apps (internal) Compacted topic k1-v1 k2-v2 Local state stores App Instance 1 App Instance 2 (app specific) Data topic Kafka k3-v3 k4-v4 k1-v1 k2-v2
  29. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Kafka Streams processing app: Oracle Application Container Cloud Let’s not forget about scale out ! Metrics Processor Metrics Processor Kafka
  30. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Redis: the State Store Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer
  31. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | • Stands for: RE(mote) DI(ctionary) S(erver) • Versatile data structure server (written in C) • Focus on in-memory with (tunable) persistence •Not just any KV store • Keys – From a simple string to binary – Max 512 MB (same for values) – Can be expired • Values – any of the following – String, List, Hash – Set, Sorted Set – Geospatial, HyperLogLog – etc. Hello
  32. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Redis data structures • Sorted Sets – Each element has an associated score (basis for sort) – Basic Ops: ZADD, ZINCRBY, ZREM, ZSCORE, ZCARD – View: ZRANGEBYSCORE, ZREVRANGEBYSCORE – Ranking: ZRANK, ZREVRANK • Lists – To be specific: a Linked List – Operations at head (LPUSH) & tail (RPUSH) are O(1), search by index is O(N) – LRANGE, RPOP, LPOP to extract data & LTRIM to cap the size – Blocking ops: BLPOP, BRPOP https://redis.io/commands
  33. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | etc…… • Good stuff – Redis Sentinel (HA), Master-Slave replication, Redis Cluster for partitioning, Pub Sub, Transactions, Lua scripting • Use cases: Messaging, Cache, Job Queue, Live leader board, counting stuff (efficiently), analytics, location based (Geospatial) etc. • Client libraries – Java, Scala, Go, Python, C++…. – https://redis.io/clients
  34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | State store FAQs • Redis vs Kafka Streams state store – Horses for courses! • Can we combine both ? – Depending on the use case, yes! • Oh and you can also use the Cache which comes with Oracle Application Container Cloud ! – (Yet another) Blog - http://bit.ly/2yEN35q
  35. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Monitoring Dashboard Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer
  36. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Dashboard app: Oracle Application Container Cloud • JAX-RS & (Jersey) Server Sent Events • CDI: Jedis (Redis) client @Producer • EJB: TimerService and @Asynchronous • Others: Jackson Note: SSE and JSON-B are available in Java EE 8
  37. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | (Oracle) Cloud based Streaming solution Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer Oracle Application Container Cloud Oracle Event Hub Cloud Oracle Compute Cloud Oracle Application Container Cloud Oracle Application Container Cloud
  38. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.

    | Resources • Oracle Application Container Cloud tutorials • Oracle Stack Manager – Infrastructure-as-code • Oracle PSM CLI – the cli-of-everything (in Oracle PaaS!) • Oracle Devs on Medium (blog) and Twitter • Try Oracle Cloud !