Upgrade to Pro — share decks privately, control downloads, hide ads and more …

London Apache Kafka Meetup (Jan 2017)

Landoop
January 18, 2017

London Apache Kafka Meetup (Jan 2017)

Landoop presenting how to simplify your ETL process using Kafka Connect for (E) and (L). Introducing KCQL - the Kafka Connect Query Language & how it can simplify fast-data (ingress & egress) pipelines. How KCQL can be used to set up Kafka Connectors for popular in-memory and analytical systems and live demos with HazelCast, Redis and InfluxDB. How to get started with a fast-data docker kafka development environment. Enhance your existing Cloudera (Hadoop) clusters with fast-data capabilities.

http://landoop.com
https://github.com/landoop/
http://schema-registry-ui.landoop.com
http://kafka-topics-ui.landoop.com
http://kafka-connect-ui.landoop.com
https://fast-data-dev.demo.landoop.com/

Cloudera CSD documentation: http://docs.landoop.com

Landoop

January 18, 2017
Tweet

Other Decks in Programming

Transcript

  1. @chalkiopoulos Open Source contributor Big Data projects in Media, Betting,

    Retail and 
 Investment Banks in London Books Author, Programming MapReduce with Scalding 
 Founder of Landoop
  2. Data is produced from a source and consumed to a

    sink. Data Source Kafka Connect Kafka Connect KAFKA Data Sink Data Source Kafka Connect Kafka Connect KAFKA Data Sink Stream processing
  3. Developers don’t care about:
 Move data to/from sink/source Support delivery

    semantics Offset Management Serialization / de-serialization Partitioning / Scalability Fault tolerance / fail-over Schema Registry integration Developers care about:
 Domain specific transformations
  4. CONNECTORS Kafka Connect’s framework allows developers to create connectors that

    copy data to/from other systems just by writing configuration files and submitting them to Connect with no code necessary
  5. Connector configurations are key-value mappings name connector’s unique name connector.class

    connector’s java class tasks.max maximum tasks to create topics list of topics (to source or sink data)
  6. Introducing a query language for the connectors name connector’s unique

    name connector.class connector’s java class tasks.max maximum tasks to create topics list of topics (to source or sink data) query KCQL query specifies fields/actions for the target system
  7. KCQL Kafka Connect Query Language is a SQL like syntax

    allowing streamlined configuration of Kafka Sink Connectors and then some more.. Example: Project fields, rename or ignore them and further customise in plain text INSERT INTO transactions SELECT field1 AS column1, field2 AS column2, field3 FROM TransactionTopic; INSERT INTO audits SELECT * FROM AuditsTopic; INSERT INTO logs SELECT * FROM LogsTopic AUTOEVOLVE; INSERT INTO invoices SELECT * FROM InvoiceTopic PK invoiceID;
  8. So while integrating Kafka with in-memory data grid, key-value, document

    stores, NoSQL, search etc systems.. INSERT INTO $TARGET SELECT *|columns(i.e col1,col2 | col1 AS column1,col2) FROM $TOPIC_NAME [ IGNORE columns ] [ AUTOCREATE ] [ PK columns ] [ AUTOEVOLVE ] [ BATCH = N ] [ CAPITALIZE ] [ INITIALIZE ] [ PARTITIONBY cola[,colb] ] [ DISTRIBUTEBY cola[,colb] ] [ CLUSTERBY cola[,colb] ] [ TIMESTAMP cola|sys_current ] [ STOREAS $YOUR_TYPE([key=value, .....]) ] [ WITHFORMAT TEXT|AVRO|JSON|BINARY|OBJECT|MAP ] KCQL How does it look like?
  9. Topic to target mapping Field selection Auto creation Auto evolution

    Error policies Multiple KCQLs / topic 
 - Field extraction
 - Access to Key & Metadata Why KCQL ?
  10. KCQL | { "sensor_id": "01" , "temperature": 52.7943, "ts": 1484648810

    } { “sensor_id": "02" , "temperature": 28.8597, "ts": 1484648810 } Example Kafka topic with IoT data INSERT INTO sensor_reliabletopic 
 SELECT sensor_id, temperature, ts
 FROM coap_sensor_topic 
 WITHFORMAT AVRO
 STOREAS RELIABLE_TOPIC INSERT INTO sensor_ringbuffer 
 SELECT sensor_id, temperature, ts 
 FROM coap_sensor_topic 
 WITHFORMAT JSON
 STOREAS RING_BUFFER
  11. SELECT price 
 FROM yahooFX-topic 
 PK symbol 
 STOREAS

    SortedSet(score=ts) KCQL | { "symbol": "USDGBP" , "price": 0.7943, "ts": 1484648810 } { "symbol": "EURGBP" , "price": 0.8597, "ts": 1484648810 } Example Kafka topic with FX data B:1 A:2 D:3 C:20 Sorted Set -> { value : score } INSERT INTO FXSortedSet 
 SELECT symbol, price 
 FROM yahooFX-topic 
 STOREAS SortedSet(score=ts)
  12. Stream reactor connectors support KCQL kafka-connect-blockchain kafka-connect-bloomberg kafka-connect-cassandra kafka-connect-coap kafka-connect-druid

    kafka-connect-elastic kafka-connect-ftp kafka-connect-hazelcast kafka-connect-hbase kafka-connect-influxdb kafka-connect-jms kafka-connect-kudu kafka-connect-mongodb kafka-connect-mqtt kafka-connect-redis kafka-connect-rethink kafka-connect-voltdb kafka-connect-yahoo Source: https://github.com/datamountaineer/stream-reactor Integration Tests: http://coyote.landoop.com/connect/
  13. DEMO Kafka Connect InfluxDB We ‘ll need: • Zookeeper •

    Kafka Broker • Schema Registry • Kafka Connect Distributed • Kafka REST Proxy We ‘ll also use: • StreamReactor connectors • Landoop Fast Data Web Tools docker run --rm -it \ -p 2181:2181 -p 3030:3030 -p 8081:8081 \ -p 8082:8082 -p 8083:8083 -p 9092:9092 \ -e ADV_HOST=192.168.99.100 \ landoop/fast-data-dev case class DeviceMeasurements(
 deviceId: Int, temperature: Int, moreData: String, timestamp: Long) We’ll generate some Avro messages
  14. Deployment
 apps Containers 
 mesos -kubernetes Hadoop 
 integration *

    state-less apps = container-friendly
 schema registry, kafka connect How do I IT? Available features: 
 Kafka ecosystem StreamReactor Connectors Landoop web tools Monitoring & Alerting Security features
  15. Wrap up - KCQL - Connectors - Kafka Web Tools

    - Automation & Integrations