Java One 2017 - Streaming solutions for real time problems

Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
| Streaming solutions for real time problems Abhishek Gupta @abhi_tweeter Senior Product Manager, Oracle Oct 2, 2017

| Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

|

| Before we dive in… • Goal – Using a practical example, familiarize you with a tech stack for dealing with fast/real time/streaming data • Agenda – 101s - Kafka, Kafka Streams & Redis – Sample app & implementation (using Oracle Cloud) – Q & A • Content – Slideshare – Github

| Real time

| (traditional) Batch solution EVENTS EVENTS EVENTS DWH Aggregate Batch processing Static view of insights

| (traditional) Messaging based solution Message Broker EVENTS EVENTS EVENTS DB App Consumer Polling etc. 1. Designed for in-memory 2. Consume and delete Stream Processing @ scale ?? DIY !

| Stream processing to the rescue! • Streams – Unbounded/infinite data set – Has volume and velocity. Not just Big, but fast data • Stream Processing – Crunching/processing streams of data.. asap! – Req-response – Streaming - Batch – Time, ordering, state etc. http://www.capturearkansas.com/photos/550197

| Use Case: Data center monitoring application • Collect (simulate) metrics from multiple machines • Crunch statistics (moving average) • Monitor using a dashboard data: {"machine":"machine-1","metrics":["8","20","36","65","2","20","73","67"]} data: {"machine":"machine-2","metrics":["1","54","42","61","40","35","26","78”]} . . . .

| Tech stack for a Streaming solution Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer

| Apache Kafka: the Event Store Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer

| Apache Kafka Originally built @ LinkedIn OSS in early 2011 Late 2012 – ASF top level 50,000 foot view History

| Topics machine1-59 machine3-23 machine5-42 machine6-43 machine2-17 …. cpu-metrics

| Partitions https://kafka.apache.org On disk

| Replication (and partitioning) in action Humble beginning – single node

| Replication (and partitioning) in action Scale out… https://simplydistributed.wordpress.com/2016/12/13/kafka-partitioning/ https://svn.apache.org/repos/asf/zookeeper/logo/zook eeper.jpg Zookeeper

| Producers https://kafka.apache.org What goes where ??

| Consumers https://kafka.apache.org Pub-sub Queue Kafka

| Managed Kafka: Oracle Event Hub Cloud

| Metrics Topic: Oracle Event Hub Cloud

| Event producer: Oracle Application Container Cloud

| So… What is Kafka ?? • At its core: a distributed commit log • Messaging system (Pub Sub + Queue) • Reactive (& sharded) key-value store • Database – read this and check out KSQL (a streaming SQL engine for Kafka) • Data pipeline – thanks to Kafka Connect • Streaming platform – stay awake to learn more on this !

| Kafka Streams: processing engine Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer

| • Streams API: no need to deal with the Kafka Consumer, Producer API explicitly • Use cases – big data, fast data, microservices, monoliths etc. • Piggy backs on Kafka for scalability & fault-tolerance • One-record-at-a-time processing (no micro batching) • Separate infra isn’t mandatory – think about Spark, Storm etc. – deploy (and scale) anywhere – its just a Java app after all! • Programming styles: High (fluent DSL) and low level (Processor) APIs • Stateful processing support + Interactive queries • Windowing, aggregations, joins etc. Kafka Streams: what is it ?

| Kafka Streams: APIs (High level) Fluent DSL API (Low level) Processor API

| Kafka Streams: Topology https://kafka.apache.org conceptually At runtime

| Scaling a Kafka Streams app p1 p2 p3 p4 Thread-1 Instance-1 Task 1 Task 2 Task 3 Task 4 Thread-1 Task 3 Task 4 Instance-2 my-topic Stream partitions Scale out

| Scaling out is not the only option • Techniques – Scale OUT – more instances – Scale UP – more threads • Max parallelism – [No. of topic partitions / no. of threads per instance] e.g. 50 / 5 = 10 https://issues.apache.org/jira/browse/KAFKA-5683

| Stateful stream processing with Kafka Streams State stores • Conceptually: lightweight embedded database within your stream processing layer to store ‘intermediate’ processing state (state is local to each app instance) • Options: in-memory, persistent (RocksDB), custom store (e.g. external DB) • State stores expose their internals using Interactive Queries Interactive queries • No additional data store.. Just ask your app ! • Needs some dev work to make your app (interactively) query-able

| Stateful processing & (interactive) querying Kafka External app Custom RPC layer (e.g. REST API) machine1:8080 machine2:8080 Local state stores App Instance 1 App instance 2 application.server config + StreamsMetadata API Query and get back the ‘complete’ state using custom API

| Interactive queries in action Blog - http://bit.ly/2fK1Io5

| Fault tolerance – for stateless and stateful apps (internal) Compacted topic k1-v1 k2-v2 Local state stores App Instance 1 App Instance 2 (app specific) Data topic Kafka k3-v3 k4-v4 k1-v1 k2-v2

| Kafka Streams processing app: Oracle Application Container Cloud Let’s not forget about scale out ! Metrics Processor Metrics Processor Kafka

| Redis: the State Store Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer

| • Stands for: RE(mote) DI(ctionary) S(erver) • Versatile data structure server (written in C) • Focus on in-memory with (tunable) persistence •Not just any KV store • Keys – From a simple string to binary – Max 512 MB (same for values) – Can be expired • Values – any of the following – String, List, Hash – Set, Sorted Set – Geospatial, HyperLogLog – etc. Hello

| Redis data structures • Sorted Sets – Each element has an associated score (basis for sort) – Basic Ops: ZADD, ZINCRBY, ZREM, ZSCORE, ZCARD – View: ZRANGEBYSCORE, ZREVRANGEBYSCORE – Ranking: ZRANK, ZREVRANK • Lists – To be specific: a Linked List – Operations at head (LPUSH) & tail (RPUSH) are O(1), search by index is O(N) – LRANGE, RPOP, LPOP to extract data & LTRIM to cap the size – Blocking ops: BLPOP, BRPOP https://redis.io/commands

| etc…… • Good stuff – Redis Sentinel (HA), Master-Slave replication, Redis Cluster for partitioning, Pub Sub, Transactions, Lua scripting • Use cases: Messaging, Cache, Job Queue, Live leader board, counting stuff (efficiently), analytics, location based (Geospatial) etc. • Client libraries – Java, Scala, Go, Python, C++…. – https://redis.io/clients

| State store FAQs • Redis vs Kafka Streams state store – Horses for courses! • Can we combine both ? – Depending on the use case, yes! • Oh and you can also use the Cache which comes with Oracle Application Container Cloud ! – (Yet another) Blog - http://bit.ly/2yEN35q

| Redis: Oracle Cloud Infrastructure 1 2

| Monitoring Dashboard Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer

| Dashboard app: Oracle Application Container Cloud • JAX-RS & (Jersey) Server Sent Events • CDI: Jedis (Redis) client @Producer • EJB: TimerService and @Asynchronous • Others: Jackson Note: SSE and JSON-B are available in Java EE 8

| (Oracle) Cloud based Streaming solution Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer Oracle Application Container Cloud Oracle Event Hub Cloud Oracle Compute Cloud Oracle Application Container Cloud Oracle Application Container Cloud

| Demo

| Resources • Oracle Application Container Cloud tutorials • Oracle Stack Manager – Infrastructure-as-code • Oracle PSM CLI – the cli-of-everything (in Oracle PaaS!) • Oracle Devs on Medium (blog) and Twitter • Try Oracle Cloud !

| Sessions which you should check out!

Java One 2017 - Streaming solutions for real ti...

Java One 2017 - Streaming solutions for real time problems

More Decks by Abhishek Gupta

Other Decks in Programming

Featured

Transcript