Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Nerding Out on Okio (Android Worldwide)

Nerding Out on Okio (Android Worldwide)

Video: https://youtu.be/Du7YXPAV1M8

Quirks and features of the I/O library that powers OkHttp.

Jesse Wilson

April 19, 2022
Tweet

More Decks by Jesse Wilson

Other Decks in Technology

Transcript

  1. Okio is Fun • Computer Science • Software Engineering •

    Widely Deployed & Consequential • Brazen!
  2. CS + SWE Computer science: a branch of mathematics. Concerned

    with algorithms, datastructures, and measuring computation. Software engineering: the work of developing and operating software. Concerned with quality, agility, planning, mentorship, and collaboration.
  3. Widely Deployed In Android OS since 2014 Used by Retrofit,

    OkHttp, Coil, Apollo GraphQL, Moshi, Wire
  4. Brazen • Java already has a pe ectly good I/O

    library, java.io • Java already has a pe ectly good java.io replacement, java.nio • A blocking library in the era of non-blocking • Switched to Kotlin in 2018!
  5. I/O

  6. abstract class InputStream { /** * Consumes bytes from this

    stream and copy them to [sink]. * Returns the number of bytes that were read, or -1 if this * input stream is exhausted. */ abstract fun read(sink: ByteArray): Int } abstract class OutputStream { /** * Copies all the data in [source] to this. */ abstract fun write(source: ByteArray) }
  7. OkHttp’s Job Was Easy 1. Encode an HTTP request as

    a ByteArray 2. Write that ByteArray to a socket’s OutputStream 3. Read a ByteArray from a socket’s InputStream 4. Decode that ByteArray as an HTTP response
  8. Adding HTTP/2 • HTTP/2 is multiplexed: 1. Chop each HTTP

    request into frames 2. Write each frame to the socket’s OuputStream 3. Read frames from the socket’s InputStream 4. Assemble frames into an HTTP response • Frames from different responses are interleaved!
  9. class Http2Connection { private val streams = mutableMapOf<StreamId, Stream>() private

    fun processNextFrame(in: InputStream) { when (val frame = readFrame(in)) { is Frame.DataFrame -> { streams[frame.streamId]!!.receive(frame.data) } ... } } }
  10. class Stream : InputStream() { internal fun receive(data: ByteArray) {

    ... } override fun read(sink: ByteArray): Int { ... } }
  11. class Buffer { private val buffer = mutableListOf<Byte>() fun write(source:

    ByteArray) { for (b in source) buffer += b } fun read(sink: ByteArray): Int { if (buffer.isEmpty()) return -1 val byteCount = minOf(sink.size, buffer.size) for (i in 0 until byteCount) { sink[i] = buffer.removeFirst() } return byteCount } }
  12. A List of Bytes • Easy to get right! •

    Extremely slow • Autoboxing conve s from JVM byte primitive type to JVM java.lang.Byte object type • Byte-at-a-time requires too many instructions and too many function calls
  13. class Buffer { private var buffer = ByteArray(0) fun write(source:

    ByteArray) { val newBuffer = ByteArray(buffer.size + source.size) buffer.copyInto(newBuffer, destinationOffset = 0) source.copyInto(newBuffer, destinationOffset = buffer.size) buffer = newBuffer } fun read(sink: ByteArray): Int { if (buffer.isEmpty()) return -1 val byteCount = minOf(sink.size, buffer.size) val newBuffer = ByteArray(buffer.size - byteCount) buffer.copyInto(sink, endIndex = byteCount) buffer.copyInto(newBuffer, startIndex = byteCount, endIndex = buffer.size) buffer = newBuffer return byteCount } }
  14. A Simple ByteArray • Easy to get right • Slow

    • Lots of allocations • Every byte gets copied around a lot
  15. class Buffer { private var buffer = ByteArray(0) private var

    pos = 0 private var limit = 0 fun write(source: ByteArray) { val requiredSize = limit - pos + source.size if (requiredSize > buffer.size) { val newBuffer = ByteArray(size = maxOf(requiredSize, buffer.size * 2)) buffer.copyInto(newBuffer, startIndex = pos, endIndex = limit) limit -= pos pos = 0 } else if (limit + source.size > buffer.size) { buffer.copyInto(buffer, startIndex = pos, endIndex = limit) limit -= pos pos = 0 } source.copyInto(buffer, destinationOffset = limit) limit += source.size } ... }
  16. A Slice of a ByteArray • More difficult to get

    right • Getting Faster • Need to defend against worst-case access patterns • Copies to shift the data within the buffer
  17. class Buffer { private var buffer = ByteArray(0) private var

    pos = 0 private var byteCount = 0 fun write(source: ByteArray) { val requiredSize = byteCount + source.size if (requiredSize > buffer.size) { val newBuffer = ByteArray(size = maxOf(requiredSize, buffer.size * 2)) if (pos + byteCount > buffer.size) { buffer.copyInto( newBuffer, startIndex = pos, ) buffer.copyInto( newBuffer, destinationOffset = buffer.size - pos, endIndex = byteCount - (buffer.size - pos), ) } else { buffer.copyInto( newBuffer,
  18. source.copyInto( buffer, destinationOffset = offset ) } else { source.copyInto(

    buffer, destinationOffset = offset, endIndex = buffer.size - offset, ) source.copyInto( buffer, destinationOffset = 0, startIndex = buffer.size - offset, ) byteCount += buffer.size } } ... }
  19. Circular Slice • Even more difficult to get right •

    Faster still • Every byte is copied once on the way in, once on the way out • Buffers never shrink their memory use
  20. Java I/O Streams Gotta Copy abstract class InputStream { /**

    * Consumes bytes from this stream and copy them to [sink]. * Returns the number of bytes that were read, or -1 if this * input stream is exhausted. */ abstract fun read(sink: ByteArray): Int }
  21. class Buffer { /** * Transfers all bytes from [source]

    to this. */ fun write(source: Buffer) /** * Transfers all bytes from this to [sink]. */ fun read(sink: Buffer): Int } I/O Without Copies
  22. [ ]

  23. [ ]

  24. class Buffer { private var segments = mutableListOf<ByteArray>() var size:

    Int = 0 /** ... */ fun write(source: Buffer) { size += source.size segments += source.segments source.size = 0 source.segments.clear() } /** ... */ fun read(sink: Buffer): Int { val result = size sink.write(this) return result } }
  25. Transferring Ownership • A depa ure from java.io APIs •

    Fast? • Writing pa of a Buffer requires copies to split arrays • Worst-case pe ormance is bad! Things behave like the first implementation (List<Byte>) if the arrays are small
  26. class OkBuffer { private class Segment( val data: ByteArray, val

    pos: Int, val limit: Int, ) private var segments = mutableListOf<Segment>() private var size: Int = 0 fun write(source: Buffer, byteCount: Int) { ... } fun read(sink: Buffer, byteCount: Int): Int { ... } }
  27. OkBuffer • Borrows from transfer ownership + array slice strategies

    • All arrays are the same size – 8 KiB – which we call a segment • Three ways to move data between buffers: • Transfer ownership of a segment • Copy data between segments • Split a segment so both halves share a ByteArray, but maintain independent pos and limit
  28. class Stream { val buffer = OkBuffer() fun receive(source: OkBuffer,

    byteCount: Long) { synchronized(this) { buffer.write(source, byteCount) } } fun read(sink: OkBuffer, byteCount: Long): Long { synchronized(this) { if (buffer.size == 0L) return -1L val result = minOf(byteCount, buffer.size) sink.write(buffer, result) return result } } }
  29. interface Source : Closeable { fun read(sink: Buffer, byteCount: Long):

    Long fun timeout(): Timeout } interface Sink : Closeable { fun write(source: Buffer, byteCount: Long) fun flush() fun timeout(): Timeout }
  30. Fresh New Arrays • What does ByteArray(8192) do? • Asks

    the memory manager for some memory (8192 + 16 bytes) • Writes an object header (16 bytes) • Writes 0 to each of the remaining 8192 bytes • Calling ByteArray(8192) takes 8x longer than ByteArray(1024) https://publicobject.com/2020/07/26/optimizing-new-byte/
  31. Segment Pooling • When a Buffer is done with a

    Segment, Okio ‘recycles’ it in a private shared List<Segment> • This makes writing data faster • It also saves work for the garbage collector
  32. Reading is Destructive • Because buffers transfer data rather than

    copying it, once you read a byte it’s gone! • Mitigate with Buffer.clone() • But how to make clone fast?
  33. class Buffer { private class Segment( val pos: Int, val

    limit: Int, val data: ByteArray, /** True if other segments use the same byte array. */ val shared: Boolean, ) ... }
  34. Copy Metadata, Not Data • Buffer.clone() creates new Segment metadata

    objects • No bytes are copied! • There are implications for pooling
  35. interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun

    writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
  36. interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun

    writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
  37. interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun

    writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
  38. interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun

    writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
  39. interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun

    writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
  40. interface Buffer { fun write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun

    writeInt(Int) fun writeLong(Long) fun writeDecimalLong(Long) fun writeHexadecimalUnsignedLong(Long) fun writeString(String, Charset) fun writeUtf8(String) fun writeUtf8CodePoint(Int) fun writeAll(Source): Long } fun write(ByteArray, Int, Int) fun writeShortLe(Int) fun writeIntLe(Int) fun writeLongLe(Long) fun writeString(String, Int, Int, Charset) fun writeUtf8(String, Int, Int) fun write(Source, Long)
  41. interface Buffer { fun readByteArray() fun readByteArray(Long) fun readByte() fun

    readShort() fun readShortLe() fun readInt() fun readIntLe() fun readLong() fun readLongLe() fun readDecimalLong() fun readHexadecimalUnsignedLong() fun readString(Charset) fun readString(Long, Charset) fun readUtf8() fun readUtf8(Long) fun readUtf8CodePoint() fun readAll(Sink) } /** * Reads until the next `\r\n`, `\n`, or the * end of the file. Returns null at the end. */ fun readUtf8Line(): String? /** * Reads until the next `\r\n` or `\n`. Use * this for machine-generated text. */ fun readUtf8LineStrict(): String /** * Like readUtf8LineStrict() but throws if * no newline is within [limit] bytes. */ fun readUtf8LineStrict(limit: Long): String
  42. interface Buffer { fun readByteArray() fun readByteArray(Long) fun readByte() fun

    readShort() fun readShortLe() fun readInt() fun readIntLe() fun readLong() fun readLongLe() fun readDecimalLong() fun readHexadecimalUnsignedLong() fun readString(Charset) fun readString(Long, Charset) fun readUtf8() fun readUtf8(Long) fun readUtf8CodePoint() fun readAll(Sink) } /** * Reads until the next `\r\n`, `\n`, or the * end of the file. Returns null at the end. */ fun readUtf8Line(): String? /** * Reads until the next `\r\n` or `\n`. Use * this for machine-generated text. */ fun readUtf8LineStrict(): String /** * Like readUtf8LineStrict() but throws if * no newline is within [limit] bytes. */ fun readUtf8LineStrict(limit: Long): String
  43. interface Buffer { fun readByteArray() fun readByteArray(Long) fun readByte() fun

    readShort() fun readShortLe() fun readInt() fun readIntLe() fun readLong() fun readLongLe() fun readDecimalLong() fun readHexadecimalUnsignedLong() fun readString(Charset) fun readString(Long, Charset) fun readUtf8() fun readUtf8(Long) fun readUtf8CodePoint() fun readAll(Sink) } /** * Reads until the next `\r\n`, `\n`, or the * end of the file. Returns null at the end. */ fun readUtf8Line(): String? /** * Reads until the next `\r\n` or `\n`. Use * this for machine-generated text. */ fun readUtf8LineStrict(): String /** * Like readUtf8LineStrict() but throws if * no newline is within [limit] bytes. */ fun readUtf8LineStrict(limit: Long): String
  44. interface Buffer { fun readByteArray() fun readByteArray(Long) fun readByte() fun

    readShort() fun readShortLe() fun readInt() fun readIntLe() fun readLong() fun readLongLe() fun readDecimalLong() fun readHexadecimalUnsignedLong() fun readString(Charset) fun readString(Long, Charset) fun readUtf8() fun readUtf8(Long) fun readUtf8CodePoint() fun readAll(Sink) } /** * Reads until the next `\r\n`, `\n`, or the * end of the file. Returns null at the end. */ fun readUtf8Line(): String? /** * Reads until the next `\r\n` or `\n`. Use * this for machine-generated text. */ fun readUtf8LineStrict(): String /** * Like readUtf8LineStrict() but throws if * no newline is within [limit] bytes. */ fun readUtf8LineStrict(limit: Long): String
  45. interface BufferedSink : Sink { override fun write(Buffer, Long) fun

    write(ByteArray) fun writeByte(Int) fun writeShort(Int) fun writeInt(Int) ... } interface BufferedSource : Source { override fun read(Buffer, Long): Long fun readByteArray(): ByteArray fun readByte(): Byte fun readShort(): Short fun readInt(): Int ... } interface Buffer : BufferedSource, BufferedSink { ... }
  46. Buffering Streams • Better usability • Friendly methods like writeDecimalLong(),

    readUtf8Line() • Better pe ormance • Moves data 8 KiB at a time • ~ Zero overhead • Buffers don’t add copying!
  47. // True if the stream has at least 100 more

    bytes. if (source.request(100)) { // ... } // Like request() but throws if there isn't enough data. source.require(100) // True once there's nothing left. Like !request(1). if (source.exhausted()) { // ... } END OF STREAM HANDLING #1
  48. /** * Call [BufferedSource.peek] to do an arbitrarily-long * lookahead.

    It uses the same segment sharing stuff as * clone to keep things fast! * * Moshi's JSON uses this when polymorphic decoding to * look ahead at the type. */ fun readCelestial(source: BufferedSource): Celestial { val peek = source.peek() val type = findType(peek) peek.close() return decode(source, type) } PEEK IS LIKE A STREAMING CLONE #2
  49. private val celestialTypes = Options.of( "star".encodeUtf8(), "planet".encodeUtf8(), "moon".encodeUtf8(), ) fun

    readCelestialType(source: BufferedSource): KClass<out Celestial>? { return when (source.select(celestialTypes)) { 0 -> Celestial.Star::class 1 -> Celestial.Planet::class 2 -> Celestial.Moon::class else -> null } } SELECT USES a TRIE FOR FAST READING #3 https://speakerdeck.com/swankjesse/json-explained-chicago-roboto-2019
  50. /** * Create input and output streams from Okio. Buffer

    can replace * both [ByteArrayOutputStream] and [ByteArrayInputStream] ! */ fun interopWithJavaIo(file: File) { val source = file.source().buffer() val bitmap = source.use { BitmapFactory.decodeStream(source.inputStream()) } addFunnyMoustaches(bitmap) val sink = file.sink().buffer() sink.use { bitmap.compress(JPEG, 100, sink.outputStream()) } } READ & WRITE AS JAVA.IO STREAMS #4
  51. fun connectThreads(): Long { val pipe = Pipe(maxBufferSize = 1024)

    Thread { pipe.sink.buffer().use { sink -> for (i in 0L until 1000L) { sink.writeLong(i) } } }.start() var total = 0L pipe.source.buffer().use { source -> while (!source.exhausted()) { total += source.readLong() } } return total } PIPE CONNECTS A READER & A WRITER #5
  52. CURSORS OFFER BYTEARRAY ACCESS #7 /** * Connect Okio's cursor

    to Guava's Murmur3F hash function. This uses * Buffer.UnsafeCursor to access the buffer's byte arrays. */ fun Buffer.murmur3(): HashCode { val hasher = Hashing.murmur3_128().newHasher() readUnsafe().use { cursor -> while (cursor.next() != -1) { hasher.putBytes( cursor.data!!, cursor.start, cursor.end - cursor.start ) } } return hasher.hash() }
  53. fun runProcess() { val process = ProcessBuilder() .command("find", "/", "-name",

    "README.md") .start() val timeout = object : AsyncTimeout() { override fun timedOut() { process.destroyForcibly() } } timeout.deadline(5, TimeUnit.SECONDS) timeout.withTimeout { val source = process.inputStream.source().buffer() while (true) { println(source.readUtf8Line() ?: break) } } } TIMEOUTS WORK EVERYWHERE #8
  54. /** * This uses [BufferedSource.readByteString] to read an entire stream

    * into a single immutable value. ByteString is a great container for * encoded data like protobufs, messages, and snapshots of files. */ private fun handleResponse(response: Response): HandledResponse<*> { if (!response.isSuccessful) { val source = response.body.source() return HandledResponse.UnexpectedStatus( response.code, response.headers, source.readByteString(), ) } ... } BYTESTRING IS A VALUE #9
  55. /** * This uses [ByteString.hmacSha256] to takes a HMAC of

    a request * body to authenticate a webhook call. Okio includes SHA-1 and * SHA-256 hashes for byte strings, buffers, and streams. */ fun webHookSignatureCheck( headers: Headers, requestBody: ByteString, ) { val hmacSha256 = requestBody.hmacSha256(secret).hex() if (headers["X-Hub-Signature-256"] != "sha256=$hmacSha256") { throw IOException("signature check failed") } } HASHING CAN BE EASY #10
  56. COOL THINGS 10 REQUIRE PEEK SELECT JAVA.IO 1. 2. 3.

    4. 5. 6. 7. PIPE THROTTLER CURSORS 8. 9. TIMEOUTS BYTESTRING 10. HASHING
  57. Why? • Kotlin Multiplatform needs a file system! • JVM

    file APIs fight you if you try to write tests • We thought we could do better
  58. Challenges • Multiplatform is difficult when the platforms are very

    different! • Deliberately not suppo ing everything! No Volume management, permissions, watches, or locking • Testing real implementations was tough
  59. fun writeSequence(fileSystem: FileSystem, path: Path) { fileSystem.write(path, mustCreate = true)

    { for (i in 0L until 1000L) { writeDecimalLong(i) writeByte('\n'.code) } } } fun readSequence(fileSystem: FileSystem, path: Path): Long { fileSystem.read(path) { var total = 0L while (!exhausted()) { total += readDecimalLong() readByte() } return total } }
  60. BufferedSource is a Bad Name • We have two inte

    aces: • Source is the easy-to-implement one • BufferedSource is the easy-to-call one • We should have saved the good name (Source) for the inte ace you use all the time • Similarly for Sink and BufferedSink
  61. Timeout vs. Cancel • Every Source and Sink in Okio

    comes with a Timeout • A cancel() method would have been better! https://github.com/python-trio/trio
  62. Controversy 1: It’s Blocking • Java server I/O trend: everything

    asynchronous with Futures, callbacks, and event loops • Okio: everything is blocking
  63. Blocking vs. Non-Blocking • Non-blocking lets you can service N

    concurrent callers with fewer than N threads • Non-blocking is not otherwise faster • Overhead of abstractions that move work between threads, plus cost of context-switching
  64. Loom is Coming! • Rather than making async better, why

    not make threads cheaper? • Vi ual threads are coming soon to the JVM • Currently in preview! https://openjdk.java.net/jeps/425
  65. Controversy 2: Kotlin Switch • In 2018 we pressed ⌥⇧⌘K

    and conve ed Okio from Java to Kotlin, introducing a dependency on the Kotlin standard library • Java programmers are suspicious of alternative JVM languages https://speakerdeck.com/swankjesse/ok-multiplatform-droidcon-nyc-2018
  66. No Regrets on Kotlin • Kotlin’s been really good to

    us • We’re doing exciting things with multiplatform • Kotlin maintainers’ devotion to compatibility means none of the feared problems have materialized