CS + SWE Computer science: a branch of mathematics. Concerned with algorithms, datastructures, and measuring computation. Software engineering: the work of developing and operating software. Concerned with quality, agility, planning, mentorship, and collaboration.
Brazen • Java already has a pe ectly good I/O library, java.io • Java already has a pe ectly good java.io replacement, java.nio • A blocking library in the era of non-blocking • Switched to Kotlin in 2018!
abstract class InputStream { /** * Consumes bytes from this stream and copy them to [sink]. * Returns the number of bytes that were read, or -1 if this * input stream is exhausted. */ abstract fun read(sink: ByteArray): Int } abstract class OutputStream { /** * Copies all the data in [source] to this. */ abstract fun write(source: ByteArray) }
OkHttp’s Job Was Easy 1. Encode an HTTP request as a ByteArray 2. Write that ByteArray to a socket’s OutputStream 3. Read a ByteArray from a socket’s InputStream 4. Decode that ByteArray as an HTTP response
Adding HTTP/2 • HTTP/2 is multiplexed: 1. Chop each HTTP request into frames 2. Write each frame to the socket’s OuputStream 3. Read frames from the socket’s InputStream 4. Assemble frames into an HTTP response • Frames from different responses are interleaved!
class Buffer { private val buffer = mutableListOf() fun write(source: ByteArray) { for (b in source) buffer += b } fun read(sink: ByteArray): Int { if (buffer.isEmpty()) return -1 val byteCount = minOf(sink.size, buffer.size) for (i in 0 until byteCount) { sink[i] = buffer.removeFirst() } return byteCount } }
A List of Bytes • Easy to get right! • Extremely slow • Autoboxing conve s from JVM byte primitive type to JVM java.lang.Byte object type • Byte-at-a-time requires too many instructions and too many function calls
A Slice of a ByteArray • More difficult to get right • Getting Faster • Need to defend against worst-case access patterns • Copies to shift the data within the buffer
Circular Slice • Even more difficult to get right • Faster still • Every byte is copied once on the way in, once on the way out • Buffers never shrink their memory use
Java I/O Streams Gotta Copy abstract class InputStream { /** * Consumes bytes from this stream and copy them to [sink]. * Returns the number of bytes that were read, or -1 if this * input stream is exhausted. */ abstract fun read(sink: ByteArray): Int }
class Buffer { /** * Transfers all bytes from [source] to this. */ fun write(source: Buffer) /** * Transfers all bytes from this to [sink]. */ fun read(sink: Buffer): Int } I/O Without Copies
class Buffer { private var segments = mutableListOf() var size: Int = 0 /** ... */ fun write(source: Buffer) { size += source.size segments += source.segments source.size = 0 source.segments.clear() } /** ... */ fun read(sink: Buffer): Int { val result = size sink.write(this) return result } }
Transferring Ownership • A depa ure from java.io APIs • Fast? • Writing pa of a Buffer requires copies to split arrays • Worst-case pe ormance is bad! Things behave like the first implementation (List) if the arrays are small
class OkBuffer { private class Segment( val data: ByteArray, val pos: Int, val limit: Int, ) private var segments = mutableListOf() private var size: Int = 0 fun write(source: Buffer, byteCount: Int) { ... } fun read(sink: Buffer, byteCount: Int): Int { ... } }
OkBuffer • Borrows from transfer ownership + array slice strategies • All arrays are the same size – 8 KiB – which we call a segment • Three ways to move data between buffers: • Transfer ownership of a segment • Copy data between segments • Split a segment so both halves share a ByteArray, but maintain independent pos and limit
Fresh New Arrays • What does ByteArray(8192) do? • Asks the memory manager for some memory (8192 + 16 bytes) • Writes an object header (16 bytes) • Writes 0 to each of the remaining 8192 bytes • Calling ByteArray(8192) takes 8x longer than ByteArray(1024) https://publicobject.com/2020/07/26/optimizing-new-byte/
Segment Pooling • When a Buffer is done with a Segment, Okio ‘recycles’ it in a private shared List • This makes writing data faster • It also saves work for the garbage collector
Reading is Destructive • Because buffers transfer data rather than copying it, once you read a byte it’s gone! • Mitigate with Buffer.clone() • But how to make clone fast?
class Buffer { private class Segment( val pos: Int, val limit: Int, val data: ByteArray, /** True if other segments use the same byte array. */ val shared: Boolean, ) ... }
interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
interface Buffer { fun readByteArray() fun readByteArray(Long) fun readByte() fun readShort() fun readShortLe() fun readInt() fun readIntLe() fun readLong() fun readLongLe() fun readDecimalLong() fun readHexadecimalUnsignedLong() fun readString(Charset) fun readString(Long, Charset) fun readUtf8() fun readUtf8(Long) fun readUtf8CodePoint() fun readAll(Sink) } /** * Reads until the next `\r\n`, `\n`, or the * end of the file. Returns null at the end. */ fun readUtf8Line(): String? /** * Reads until the next `\r\n` or `\n`. Use * this for machine-generated text. */ fun readUtf8LineStrict(): String /** * Like readUtf8LineStrict() but throws if * no newline is within [limit] bytes. */ fun readUtf8LineStrict(limit: Long): String
interface Buffer { fun readByteArray() fun readByteArray(Long) fun readByte() fun readShort() fun readShortLe() fun readInt() fun readIntLe() fun readLong() fun readLongLe() fun readDecimalLong() fun readHexadecimalUnsignedLong() fun readString(Charset) fun readString(Long, Charset) fun readUtf8() fun readUtf8(Long) fun readUtf8CodePoint() fun readAll(Sink) } /** * Reads until the next `\r\n`, `\n`, or the * end of the file. Returns null at the end. */ fun readUtf8Line(): String? /** * Reads until the next `\r\n` or `\n`. Use * this for machine-generated text. */ fun readUtf8LineStrict(): String /** * Like readUtf8LineStrict() but throws if * no newline is within [limit] bytes. */ fun readUtf8LineStrict(limit: Long): String
interface Buffer { fun readByteArray() fun readByteArray(Long) fun readByte() fun readShort() fun readShortLe() fun readInt() fun readIntLe() fun readLong() fun readLongLe() fun readDecimalLong() fun readHexadecimalUnsignedLong() fun readString(Charset) fun readString(Long, Charset) fun readUtf8() fun readUtf8(Long) fun readUtf8CodePoint() fun readAll(Sink) } /** * Reads until the next `\r\n`, `\n`, or the * end of the file. Returns null at the end. */ fun readUtf8Line(): String? /** * Reads until the next `\r\n` or `\n`. Use * this for machine-generated text. */ fun readUtf8LineStrict(): String /** * Like readUtf8LineStrict() but throws if * no newline is within [limit] bytes. */ fun readUtf8LineStrict(limit: Long): String
interface Buffer { fun readByteArray() fun readByteArray(Long) fun readByte() fun readShort() fun readShortLe() fun readInt() fun readIntLe() fun readLong() fun readLongLe() fun readDecimalLong() fun readHexadecimalUnsignedLong() fun readString(Charset) fun readString(Long, Charset) fun readUtf8() fun readUtf8(Long) fun readUtf8CodePoint() fun readAll(Sink) } /** * Reads until the next `\r\n`, `\n`, or the * end of the file. Returns null at the end. */ fun readUtf8Line(): String? /** * Reads until the next `\r\n` or `\n`. Use * this for machine-generated text. */ fun readUtf8LineStrict(): String /** * Like readUtf8LineStrict() but throws if * no newline is within [limit] bytes. */ fun readUtf8LineStrict(limit: Long): String
interface BufferedSink : Sink { override fun write(Buffer, Long) fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun writeInt(Int) ... } interface BufferedSource : Source { override fun read(Buffer, Long): Long fun readByteArray(): ByteArray fun readByte(): Byte fun readShort(): Short fun readInt(): Int ... } interface Buffer : BufferedSource, BufferedSink { ... }
Buffering Streams • Better usability • Friendly methods like writeDecimalLong(), readUtf8Line() • Better pe ormance • Moves data 8 KiB at a time • ~ Zero overhead • Buffers don’t add copying!
// True if the stream has at least 100 more bytes. if (source.request(100)) { // ... } // Like request() but throws if there isn't enough data. source.require(100) // True once there's nothing left. Like !request(1). if (source.exhausted()) { // ... } END OF STREAM HANDLING #1
/** * Call [BufferedSource.peek] to do an arbitrarily-long * lookahead. It uses the same segment sharing stuff as * clone to keep things fast! * * Moshi's JSON uses this when polymorphic decoding to * look ahead at the type. */ fun readCelestial(source: BufferedSource): Celestial { val peek = source.peek() val type = findType(peek) peek.close() return decode(source, type) } PEEK IS LIKE A STREAMING CLONE #2
fun connectThreads(): Long { val pipe = Pipe(maxBufferSize = 1024) Thread { pipe.sink.buffer().use { sink -> for (i in 0L until 1000L) { sink.writeLong(i) } } }.start() var total = 0L pipe.source.buffer().use { source -> while (!source.exhausted()) { total += source.readLong() } } return total } PIPE CONNECTS A READER & A WRITER #5
/** * This uses [BufferedSource.readByteString] to read an entire stream * into a single immutable value. ByteString is a great container for * encoded data like protobufs, messages, and snapshots of files. */ private fun handleResponse(response: Response): HandledResponse { if (!response.isSuccessful) { val source = response.body.source() return HandledResponse.UnexpectedStatus( response.code, response.headers, source.readByteString(), ) } ... } BYTESTRING IS A VALUE #9
/** * This uses [ByteString.hmacSha256] to takes a HMAC of a request * body to authenticate a webhook call. Okio includes SHA-1 and * SHA-256 hashes for byte strings, buffers, and streams. */ fun webHookSignatureCheck( headers: Headers, requestBody: ByteString, ) { val hmacSha256 = requestBody.hmacSha256(secret).hex() if (headers["X-Hub-Signature-256"] != "sha256=$hmacSha256") { throw IOException("signature check failed") } } HASHING CAN BE EASY #10
Challenges • Multiplatform is difficult when the platforms are very different! • Deliberately not suppo ing everything! No Volume management, permissions, watches, or locking • Testing real implementations was tough
fun writeSequence(fileSystem: FileSystem, path: Path) { fileSystem.write(path, mustCreate = true) { for (i in 0L until 1000L) { writeDecimalLong(i) writeByte('\n'.code) } } } fun readSequence(fileSystem: FileSystem, path: Path): Long { fileSystem.read(path) { var total = 0L while (!exhausted()) { total += readDecimalLong() readByte() } return total } }
BufferedSource is a Bad Name • We have two inte aces: • Source is the easy-to-implement one • BufferedSource is the easy-to-call one • We should have saved the good name (Source) for the inte ace you use all the time • Similarly for Sink and BufferedSink
Timeout vs. Cancel • Every Source and Sink in Okio comes with a Timeout • A cancel() method would have been better! https://github.com/python-trio/trio
Blocking vs. Non-Blocking • Non-blocking lets you can service N concurrent callers with fewer than N threads • Non-blocking is not otherwise faster • Overhead of abstractions that move work between threads, plus cost of context-switching
Loom is Coming! • Rather than making async better, why not make threads cheaper? • Vi ual threads are coming soon to the JVM • Currently in preview! https://openjdk.java.net/jeps/425
Controversy 2: Kotlin Switch • In 2018 we pressed ⌥⇧⌘K and conve ed Okio from Java to Kotlin, introducing a dependency on the Kotlin standard library • Java programmers are suspicious of alternative JVM languages https://speakerdeck.com/swankjesse/ok-multiplatform-droidcon-nyc-2018
No Regrets on Kotlin • Kotlin’s been really good to us • We’re doing exciting things with multiplatform • Kotlin maintainers’ devotion to compatibility means none of the feared problems have materialized