| Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
| Before we dive in… • Goal – Using a practical example, familiarize you with a tech stack for dealing with fast/real time/streaming data • Agenda – 101s - Kafka, Kafka Streams & Redis – Sample app & implementation (using Oracle Cloud) – Q & A • Content – Slideshare – Github
| (traditional) Messaging based solution Message Broker EVENTS EVENTS EVENTS DB App Consumer Polling etc. 1. Designed for in-memory 2. Consume and delete Stream Processing @ scale ?? DIY !
| Stream processing to the rescue! • Streams – Unbounded/infinite data set – Has volume and velocity. Not just Big, but fast data • Stream Processing – Crunching/processing streams of data.. asap! – Req-response – Streaming - Batch – Time, ordering, state etc. http://www.capturearkansas.com/photos/550197
| Tech stack for a Streaming solution Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer
| Apache Kafka: the Event Store Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer
| So… What is Kafka ?? • At its core: a distributed commit log • Messaging system (Pub Sub + Queue) • Reactive (& sharded) key-value store • Database – read this and check out KSQL (a streaming SQL engine for Kafka) • Data pipeline – thanks to Kafka Connect • Streaming platform – stay awake to learn more on this !
| • Streams API: no need to deal with the Kafka Consumer, Producer API explicitly • Use cases – big data, fast data, microservices, monoliths etc. • Piggy backs on Kafka for scalability & fault-tolerance • One-record-at-a-time processing (no micro batching) • Separate infra isn’t mandatory – think about Spark, Storm etc. – deploy (and scale) anywhere – its just a Java app after all! • Programming styles: High (fluent DSL) and low level (Processor) APIs • Stateful processing support + Interactive queries • Windowing, aggregations, joins etc. Kafka Streams: what is it ?
| Scaling out is not the only option • Techniques – Scale OUT – more instances – Scale UP – more threads • Max parallelism – [No. of topic partitions / no. of threads per instance] e.g. 50 / 5 = 10 https://issues.apache.org/jira/browse/KAFKA-5683
| Stateful stream processing with Kafka Streams State stores • Conceptually: lightweight embedded database within your stream processing layer to store ‘intermediate’ processing state (state is local to each app instance) • Options: in-memory, persistent (RocksDB), custom store (e.g. external DB) • State stores expose their internals using Interactive Queries Interactive queries • No additional data store.. Just ask your app ! • Needs some dev work to make your app (interactively) query-able
| Stateful processing & (interactive) querying Kafka External app Custom RPC layer (e.g. REST API) machine1:8080 machine2:8080 Local state stores App Instance 1 App instance 2 application.server config + StreamsMetadata API Query and get back the ‘complete’ state using custom API
| Redis: the State Store Partitions Partitions Lists Sorted Set Service App UI <polls> SSE Kafka - Event Store Kafka Streams - Processor Redis – State Store Dashboard Simulated Producer
| • Stands for: RE(mote) DI(ctionary) S(erver) • Versatile data structure server (written in C) • Focus on in-memory with (tunable) persistence •Not just any KV store • Keys – From a simple string to binary – Max 512 MB (same for values) – Can be expired • Values – any of the following – String, List, Hash – Set, Sorted Set – Geospatial, HyperLogLog – etc. Hello
| Redis data structures • Sorted Sets – Each element has an associated score (basis for sort) – Basic Ops: ZADD, ZINCRBY, ZREM, ZSCORE, ZCARD – View: ZRANGEBYSCORE, ZREVRANGEBYSCORE – Ranking: ZRANK, ZREVRANK • Lists – To be specific: a Linked List – Operations at head (LPUSH) & tail (RPUSH) are O(1), search by index is O(N) – LRANGE, RPOP, LPOP to extract data & LTRIM to cap the size – Blocking ops: BLPOP, BRPOP https://redis.io/commands
| State store FAQs • Redis vs Kafka Streams state store – Horses for courses! • Can we combine both ? – Depending on the use case, yes! • Oh and you can also use the Cache which comes with Oracle Application Container Cloud ! – (Yet another) Blog - http://bit.ly/2yEN35q
| Dashboard app: Oracle Application Container Cloud • JAX-RS & (Jersey) Server Sent Events • CDI: Jedis (Redis) client @Producer • EJB: TimerService and @Asynchronous • Others: Jackson Note: SSE and JSON-B are available in Java EE 8