no recomputed MEMORY_AND_DISK no(memory) store on disk MEMORY_ONLY_SER yes recomputed MEMORY_AND_DISK_S ER yes store on disk DISK_ONLY yes store on disk
using Java’s ObjectOutputStream framework • Kryo serialization: Spark can also use the Kryo library (version 2) to serialize objects more quickly. conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
default serializer for generic Records, which will include a lot of unneeded data in each record. • user registers the schemas, then the schema's fingerprint will be sent
user = new GenericData.Record(schema); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema); DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); dataFileWriter.create(schema, file);
.record("testRecord").fields() .requiredString("data") .endRecord() conf.registerAvroSchemas(schema) val record = new Record(schema) record.put("data", "test data")