Slide 1

Slide 1 text

Data Serialization Using Data Serialization Using Apache Avro Apache Avro Presented By: Moh. Ropiyudin KMK’s Tech Talk – 10 Nov 2017

Slide 2

Slide 2 text

Data Serialization Using Apache Avro 2 Protocol to use? ● Native ● JSON ● Protobuf ● Avro → the choosen one Problem Statement (case: timeline backup, restore, feeds) Key factors? ● Ability To grow ● Efficient ● Performance ● Simple – Easy To Use + Maintain

Slide 3

Slide 3 text

Data Serialization Using Apache Avro 3 JSON Binary Protocol Size 3x Protocol Time Protocol Ease of Use Important Consideration

Slide 4

Slide 4 text

Data Serialization Using Apache Avro 4 Avro is a data serialization system. Avro provides: ● Rich data structures. ● A compact, fast, binary data format. ● A container file, to store persistent data. ● Simple integration with dynamic languages. ● Developer(s) : Apache Software Foundation What is Avro ?

Slide 5

Slide 5 text

Data Serialization Using Apache Avro 5 Schema Definition { "namespace": "com.puter.avro", "type": "record", "name": "Employee", "fields": [ {"name": "name", "type": "string"}, {"name": "dob", "type": "timestamp"}, {"name": "height", "type": "int"}, {"name": "previosCompany", "type": "string"}, {"name": "favoriteColor", "type": ["string", "null"]} ] } { "namespace": "com.puter.avro", "type": "record", "name": "Employee", "fields": [ {"name": "name", "type": "string"}, {"name": "dob", "type": "timestamp"}, {"name": "height", "type": "int"}, {"name": "previosCompany", "type": "string"}, {"name": "favoriteColor", "type": ["string", "null"]} ] } A Schema is represented in JSON: ● Primitive Types : null, boolean, int, long, float, double, bytes, string ● Complex Types : records, enums, arrays, maps, unions and fixed

Slide 6

Slide 6 text

Data Serialization Using Apache Avro 6 Compact Binary Representation Octal representation Size : 352 Byte RealData : 32 Byte { "type": "record", "name": "Person", "fields": [ {"name": "userName", "type": "string"}, {"name": "favouriteNumber", "type": ["null", "long"]}, {"name": "interests", "type": {"type": "array", "items": "string"}} ] }

Slide 7

Slide 7 text

Data Serialization Using Apache Avro 7 * Schema evolution * Untagged data * Dynamic typing Top 3 Feature of Avro

Slide 8

Slide 8 text

Data Serialization Using Apache Avro 8 ● Avro requires schemas when data is written or read. ● We can use different schemas for serialization and deserialization. ● Avro will handle the missing/extra/modified fields. Schema Evolution

Slide 9

Slide 9 text

Data Serialization Using Apache Avro 9 ● Providing a schema with binary data allows each datum be written without overhead. ● The result is more compact data encoding, and faster data processing. Untagged Data

Slide 10

Slide 10 text

Data Serialization Using Apache Avro 10 ● Serialization and deserialization without code generation. ● Used by dynamically-typed language : Ruby, Python ● But, code generation still available in Avro for statically typed languages as an optional optimization. Dynamic Typing SCHEMA = <<-JSON { "type": "record", "name": "User", "fields" : [ {"name": "username", "type": "string"}, {"name": "age", "type": "int"}, {"name": "verified", "type": "boolean", "default": "false"} ]} JSON file = File.open('data.avr', 'wb') schema = Avro::Schema.parse(SCHEMA) writer = Avro::IO::DatumWriter.new(schema) dw = Avro::DataFile::Writer.new(file, writer, schema) dw << {"username" => "john", "age" => 25, "verified" => true} dw << {"username" => "ryan", "age" => 23, "verified" => false} dw.close

Slide 11

Slide 11 text

Data Serialization Using Apache Avro 11 val employee = Employee.newBuilder().apply { name = "name1" dob = DateTime.parse("2017-10-26T18:00:00Z") previosCompany = "previousCompany1" favoriteColor = "favoriteColor1" height = 10 }.build() val out = ByteArrayOutputStream() out.use { val encoder = EncoderFactory.get().directBinaryEncoder(out, null) val writer = SpecificDatumWriter(Employee.getClassSchema()) writer.write(employee, encoder) encoder.flush() } val employeeByteData = out.toByteArray() val input = ByteArrayInputStream(employeeByteData) val decoder = DecoderFactory.get().directBinaryDecoder(input, null) val reader = SpecificDatumReader(Employee.getClassSchema()) val afterDeserializeEmployee = reader.read(null, decoder) Serialize – Deserialize (example)

Slide 12

Slide 12 text

Data Serialization Using Apache Avro 12 it’s DEMO time !!

Slide 13

Slide 13 text

No content