Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data services: Processing big data the microservice way

Data services: Processing big data the microservice way

Big data processing, microservices, and cloud-native technology are a match made in computing heaven, enabling microservices to be used to build a flexible, scalable, and distributed system of loosely coupled data processing tasks, called data services.

Mario-Leander Reimer explores key JEE technologies that can be used to build JEE-powered data services and walks you through implementing the individual data processing tasks of a simplified showcase application. You’ll then deploy and orchestrate the individual data services using OpenShift, illustrating the scalability of the overall processing pipeline. The context and content is taken from a real-world project for a major German car manufacturer, implementing a microservices-based processing pipeline that uses car-related event data (sensor data, traffic events, and other real-time data) for a traffic information management and route optimization system. #CloudNativeNerd @OReillySACon

M.-Leander Reimer

February 27, 2018
Tweet

More Decks by M.-Leander Reimer

Other Decks in Programming

Transcript

  1. Mario- Leander Reimer Chief Technologist, QAware GmbH Contact Details Mail:

    mario- [email protected] Twitter: @LeanderReimer Github: https://github.com/lreimer/data- services- javaee7 27.02.18 3 Developer && Architect 20+ years of experience #CloudNativeNerd Open Source Enthusiast
  2. BIG DATA All things distributed: Distributed Processing Distributed Databases 8

    FAST DATA Low latency and high throughput: Stream processing Messaging Event-driven Data to information: Machine (deep) learning Advanced statistics Natural Language Processing SMART DATA
  3. 9

  4. 10 Components All Along the Software Lifecycle. DESIGN § Complexity

    unit § Data integrity unit § Coherent and cohesive features unit § Decoupled unit Design Components RUN § Release unit § Deployment unit § Runtime unit (crash, slow-down, access) § Scaling unit Ops Components n:1 NEW ! BUILD § Planning unit § Team assignment unit § Knowledge unit § Development unit § Integration unit Dev Components 1:1
  5. 11 Dev Components Ops Components ?:1 System Subsystems Components Services

    Good starting point Decomposition Trade-Offs Microservices Nanoservices Macroservices Monolith  More flexible to scale  Runtime isolation (crash, slow-down, …)  Independent releases, deployments, teams  Higher utilization possible - Distribution debt: Latency - Increasing infrastructure complexity - Increasing troubleshooting complexity - Increasing integration complexity
  6. 14 The basic idea: Input – Processing – Output. Data

    processing using a graph of microservices. I 1 Sources P 1 P n Processors O 1 Sinks Microservice (aka Dataservice) Message Queue
  7. 15 Possible messaging patterns applied for reliable and flexible communication

    between dataservices. P1 C1 Q1 Message Passing P1 C1 Q1 Cn Work Queue P1 C1 Q1 Cn Qn Publish/Subscribe P1 C1 Q1 Q2 Remote Procedure Call
  8. 16 The basic idea: Cloud-native platform for micro- and dataservices.

    CLUSTER OPERATING SYSTEM MICROSERVICE PLATFORM DATASERVICE PLATFORM DATASERVICES MICROSERVICES MESSAGING IMDG
  9. 17 Some Open Source Dataservice Platforms. Standardized API with several

    open source implementations Microservices: JavaEE micro container Messaging: JMS, MQTT, Kafka, SQS Platforms: Docker, Kubernetes, OpenShift, DC/OS Stream processing tightly integrated with Kafka Microservices: main() Messaging: Kafka, Kafka Streams Platforms: any Kafka runs on Open source by Lightbend Microservices: Lagom, Play Messaging: akka Platforms: Conductr, ??? Open source project based on the Spring stack Microservices: Spring Boot, Spring Cloud Stream & Task Messaging: Kafka, RabbitMQ Platforms: PCF, Kuberntes, YARN, Mesos Java EE 7 / 8 Kafka Streams Lagom Framework Cloud Cloud Data Flow
  10. Overview of Java EE 7 APIs suited for Dataservices. 18

    CDI Extensions Web Fragments Bean Validation 1.1 CDI 1.1 Managed Beans 1.0 JCA 1.7 JPA 2.2 JMS 2.0 JSP 2.3 EL 3.0 EJB 3.2 Batch 1.0 JSF 2.2 Interceptors 1.2 Mail 1.5 Common Annotations 1.3 JTA 1.2 JAX-WS 1.4 JAX-RS 2.0 Concurrency 1.0 JSON-P 1.0 WebSocket 1.1 JASPIC 1.1 JACC 1.5 Servlet 3.1 JCache 1.0
  11. @MessageDriven(activationConfig = { @ActivationConfigProperty(propertyName = "serverURIs", propertyValue = "tcp://eclipse-mosquitto:1883"), @ActivationConfigProperty(propertyName

    = "cleanSession", propertyValue = "false"), @ActivationConfigProperty(propertyName = "automaticReconnect", propertyValue = "true"), @ActivationConfigProperty(propertyName = "filePersistence", propertyValue = "false"), @ActivationConfigProperty(propertyName = "connectionTimeout", propertyValue = "30"), @ActivationConfigProperty(propertyName = "maxInflight", propertyValue = "3"), @ActivationConfigProperty(propertyName = "keepAliveInterval", propertyValue = "5"), @ActivationConfigProperty(propertyName = "topicFilter", propertyValue = "de/qaware/oss/cloud/mqtt"), @ActivationConfigProperty(propertyName = "qos", propertyValue = "1") }) public class MqttSourceMDB implements MQTTListener { @OnMQTTMessage @TransactionAttribute(value = TransactionAttributeType.REQUIRED) @Transactional(Transactional.TxType.REQUIRED) public void onMQTTMessage(String topic, MqttMessage message) { JsonReader reader = Json.createReader(new ByteArrayInputStream(message.getPayload())); JsonObject jsonObject = reader.readObject(); // TODO do stuff with the JSON payload } } 19 Simple Message Driven Beans to receive messages. This also works for MQTT, Kafka, Amazon SQS, … For other JCA adapters visit https://github.com/payara/Cloud-Connectors
  12. JsonObject currentWeather = Json.createObjectBuilder() .add("city", “London") .add("weather", “Drizzle") .build(); StringWriter

    payload = new StringWriter(); JsonWriter jsonWriter = Json.createWriter(payload); jsonWriter.writeObject(currentWeather); TextMessage msg = session.createTextMessage(payload.toString()); msg.setJMSType("CurrentWeather"); msg.setStringProperty("contentType", "application/vnd.weather.v1+json"); @ActivationConfigProperty(propertyName = "messageSelector", propertyValue = "(JMSType = 'CurrentWeather') AND (contentType = 'application/vnd.weather.v1+json‘)“) JsonReader reader = Json.createReader(new StringReader(body)); JsonObject jsonObject = reader.readObject(); 20 Use JSON-P to build your JsonObject and JsonArray instances. Use JSON-P to read JSON payloads. Use JSON-P to traverse and access JSON objects and arrays. Upcoming in Java EE 8: JSON Pointers and JSON Patch add even more flexibility. Use Mime-Type versioning for your JSON messages if required. Use JMS message selectors to filter on JMS type and content type. Alternatively use flexible binary protocols like ProtoBuf. Use JSON as payload format for loose coupling. Use JSON-P to implement tolerant reader pattern.
  13. Overview of the demo showcase. 22 JDBC Source Weather Processor

    Weather File Sink Weather DB Sink REST Source JAX-RS JMS MQTT Source JSON-P JMS Kafka Source JSON-P JMS CSV Source JBatch JMS JBatch JMS CSV In-Memory Datagrid Topic Queue Topic https://github.com/lreimer/data-services-javaee7 Location Processor JSON-P JMS JCache JSON-P JMS JCache CSV JMS JSON-P JPA JMS JSON-P JPA
  14. Most important Kubernetes concepts. 24 Services are an abstraction for

    a logical collection of pods. Pods are the smallest unit of compute in Kubernetes Deployments are an abstraction used to declare and update pods, RCs, … Replica Sets ensure that the desired number of pod replicas are running Labels are key/value pairs used to identify Kubernetes resources
  15. apiVersion: extensions/v1beta1 kind: Deployment metadata: name: location-processor spec: replicas: 2

    strategy: type: RollingUpdate template: metadata: labels: io.kompose.service: location-processor spec: containers: - name: location-processor image: lreimer/location-processor:1.0 ports: - containerPort: 8080 - containerPort: 5701 Example K8s Deployment Definition. 25
  16. resources: # Define resources to help K8S scheduler # CPU

    is specified in units of cores # Memory is specified in units of bytes # required resources for a Pod to be started requests: memory: “196Mi" cpu: "250m" # the Pod will be restarted if limits are exceeded limits: memory: “512Mi" cpu: "500m" Resource Constraints Definition. 26
  17. # container will receive requests if probe succeeds readinessProbe: httpGet:

    path: /api/application.wadl port: 8080 initialDelaySeconds: 30 timeoutSeconds: 5 # container will be killed if probe fails livenessProbe: httpGet: path: /admin/health port: 8080 initialDelaySeconds: 60 timeoutSeconds: 5 Liveness and Readiness Probes for Antifragility. 27
  18. apiVersion: v1 kind: Service metadata: labels: io.kompose.service: location-processor name: location-processor

    spec: type: NodePort ports: - name: "http" port: 8080 targetPort: 8080 selector: io.kompose.service: location-processor Example K8s Service Definition. 28
  19. Programmable MIDI Controller. Visualizes Deployments and Pods. Scales Deployments. Supports

    K8s, OpenShift, DC/OS. http://github.com/qaware/kubepad/ Java EE powered Dataservices on Kubernetes in Action. 29