I/O? • Akka has had an I/O library since 2.0 • The old one has been deprecated as of 2.2 • With 2.2 (Released in July ’13), Akka introduced a new I/O layer created in collaboration with the Spray.IO team • Based on the internal networking code written to achieve Spray’s impressive performance metrics˒ • Highly Scalable, Pipeline Based IO – integrated into the Actor System ˒ http://spray.io/blog/2013-05-24-benchmarking-spray/
break down? • Why Non-Blocking I/O? • How it works, and why it is advantageous over blocking I/O • Network data representation in Akka I/O • Composing a TCP Client in Akka I/O • Basics of Akka I/O with TCP • The Pipeline system: Plugging in extras (SSL, Backpressure, etc) • Closing Thoughts (Composable APIs)
makes the kernel polling saner = selector thread = network task network event notices Only one thread blocks polling the kernel for network events. Now tasks can be decoupled from threads... And threads can be reused while network events complete.
• NIO (aka ‘New’ IO) introduced in Java 1.4 • The good: • First availability of non-blocking I/O on the JVM • Fast & powerful • Act as the underpinnings of all other Nonblocking frameworks on the JVM • The Bad: • Very low level, requires a ton of manual coding of selector loops – easy to make mistakes
• Netty (theres also Apache MINA with similar goals + limitations) • Incredibly powerful, incredibly rich APIs on top of NIO making it much easier to work with • The good: • Pipelines & other tools for easy composition of complex network protocols • The bad: • Java APIs are tiresome and unwieldy from Scala • Lots of mutability • Lots of unmanageable thread pools - especially cumbersome when you want fine grained control of your concurrency
Representation • NIO offers a powerful datastructure: ByteBuffer˒ • Working with raw Array[Byte] sucks... especially for “larger-than- byte” operations • Who here knows how to decode an int64 (aka long) from a raw byte array? How about IEEE-754 doubles? What about doing it with Big Endian vs Little Endian? • The fact is - doing these correctly is non-trivial – and it shouldn’t be. • ByteBuffer provides simple operations for doing these things • getLong(), getDouble(), etc. • Big Endian vs. Little Endian? order(bo: java.nio.ByteOrder) • A great step for simplifying the work of the network programmer on the JVM ˒ For a more complete tour of ByteBuffer, see http://www.kdgregory.com/?page=java.byteBuffer
Mutable data structures are always the curse of working with Java APIs • ByteString provides a much better immutable structure over ByteBuffer & Array[Byte] for manipulating network data • ByteString is really just an IndexedSeq[Byte] - no Byte-aware read or write operations exist on it • Want to read? ByteIterator (available via ByteString.iterator) gives you ops like getDouble and getLong. • A moving view over your data, no indices needed. • Bonus: java.nio.ByteOrder is passed implicity • Want to write? Follow Scala collection guidelines! • ByteString.newBuilder provides a ByteStringBuilder with write operations (the converse of ByteIterator’s)
server • Our time protocol is relatively simple (and, incidentally, Little Endian) • Client • Server { short messageType = 0 /* type 0 = GET_TIME */ } { short messageType = 1 /* type 1 = SET_TIME */ long unixTimestamp /* Unix Epoch */ string timezone /* i.e. EST, PDT, etc */ }
byteOrder = java.nio.ByteOrder.LITTLE_ENDIAN /* passed to put methods implicitly */ val bsB: ByteStringBuilder = ByteString.newBuilder bsB.putShort(0) val bs: ByteString = bsB.result() /* now we can write our message out to the network */ // companion object for internal messaging case object GetTime extends TimeMessage
= java.nio.ByteOrder.LITTLE_ENDIAN /* passed to get methods implicitly */ val b: ByteString = getADataFrameFromNetwork() /* ByteString itself isn't very useful, we need our iterator */ val frame: ByteIterator = b.iterator val msgType = frame.getShort assert(msgType == 1, s"Invalid server reply. Expected: message type 0, got $msgType") val timestamp = frame.getLong val timezone = frame.getString /* now we could instantiate a Date, etc */ // Companion class for internal messaging case class SetTime(timestamp: Long, timezone: String) extends TimeMessage
• ByteString is a rope - it can present a view over multiple underlying Array[Byte] or ByteBuffers • One downside of ropes is memory locality - our view over different underlying structures may be fragmented in memory • Inefficiencies in reading due to RAM location hopping • ByteString can be compacted! • compact() returns a new ByteString containing a single, memory-contiguous byte array • isCompact() tells you if a ByteString is compacted or not
Akka IO • Akka IO provides a manager for IO work: the IO actors • This means rather than invoking methods, we communicate with the IO manager via message passing • It also means that our core code has to be an Actor itself, to communicate • Centrally, you access the actor for individual protocols (TCP, UDP) via an Akka extension singleton (IO) • It is actually surprisingly easy to wire your own custom protocols: See Spray 1.2’s Http implementation
basic, barebones client with the core Actor API import akka.io.{IO => IOFaçade, _} import akka.actor._ class TimeClient(val serverAddres: InetSocketAddress) extends Actor with ActorLogging { // Get the TCP actor for IO val tcpManager = IOFaçade(Tcp) /** Send a connection request to the actor Replies are either [[Tcp.Connected]] or [[Tcp.CommandFailed]] */ tcpManager ! Connect(serverAddress, options = List(SO.KeepAlive(true), SO.TcpNoDelay(true))) /* Our Initial Behavior */ def receive: Actor.Receive = { // Connection failed case CommandFailed(_: Connect) 㱺 log.error("Network Connection Failed") context stop self case c @ Connected(remote, local) => // now we can swap akka behaviors to a 'connected' state // sender is our connection actor context.become(connectedBehavior(sender)) tcpManager ! Connect(serverAddress, options = List(SO.KeepAlive(true), SO.TcpNoDelay(true))) // Connection failed case CommandFailed(_: Connect) 㱺 log.error("Network Connection Failed") context stop self // now we can swap akka behaviors to a 'connected' state // sender is our connection actor context.become(connectedBehavior(sender))
receiving replies def connectedBehavior(connection: ActorRef): Actor.Receive = { /* a message from outside our actor asking to write data */ case g: GetTime => /* we have to translate to the ByteString ourselves */ connection ! Tcp.Command(g.toByteString()) // a method representing our encoding case Tcp.Event(data) => /* Tcp Event from the socket, containing a ByteString we must decode */ val time = SetTime.fromByteString(data) // decode ByteString to SetTime // do some work here to send back to a registered callback listener, etc } /* a message from outside our actor asking to write data */ case g: GetTime => /* we have to translate to the ByteString ourselves */ connection ! Tcp.Command(g.toByteString()) // a method representing our encoding def connectedBehavior(connection: ActorRef): Actor.Receive = { case Tcp.Event(data) => /* Tcp Event from the socket, containing a ByteString we must decode */ val time = SetTime.fromByteString(data) // decode ByteString to SetTime // do some work here to send back to a registered callback listener, etc What if we want to separate things like parsing logic from our network handler?
better, saner, awesomer way to do it • With our existing implementation, much of our code is colocated and complex – wire protocol decoding is mixed with core logic • What if we want to decouple our Wire protocol? Or add SSL Support? Or handle backpressure? • It is surprisingly easy to do this with Akka IO • Pipelines provide a way to register stages of encoding/ decoding and complex network handling downstream from our “core” actor • Most of the changes happen once our connection is established • Commands go from our code => remote, Events from remote => our code
we setup a protocol-aware pipeline downstream case c @ Connected(remote, local) => val pipeline = TcpPipelineHandler.withLogger(log, new TimeProtocolHandler >> new TcpReadWriteAdapter ) // this is the actual connection context now val handler = context.actorOf(TcpPipelineHandler.props(pipeline, sender, self).withDeploy(Deploy.local) /* ensure it can't be remoted */) // Setup deathwatch so we get notified if the pipeline actor dies context watch handler /* Change the TCP Connection so that all data from either direction (sent by us or received on network) goes through our pipeline */ sender ! Tcp.Register(handler) // now we can swap akka behaviors to a 'connected' state // our connection actor is now 'handler', which represents the pipeline // We also must send the pipeline, which has special containers for send/recv context.become(connectedBehavior(pipeline, handler)) val pipeline = TcpPipelineHandler.withLogger(log, new TimeProtocolHandler >> new TcpReadWriteAdapter ) // this is the actual connection context now val handler = context.actorOf(TcpPipelineHandler.props(pipeline, sender, self).withDeploy(Deploy.local) /* ensure it can't be remoted */) // Setup deathwatch so we get notified if the pipeline actor dies context watch handler /* Change the TCP Connection so that all data from either direction (sent by us or received on network) goes through our pipeline */ sender ! Tcp.Register(handler) // now we can swap akka behaviors to a 'connected' state // our connection actor is now 'handler', which represents the pipeline // We also must send the pipeline, which has special containers for send/recv context.become(connectedBehavior(pipeline, handler))
decoding of ByteString <-> Case Class is now in the Pipeline def connectedBehavior(pipeline: Init[WithinActorContext, TimeMessage, TimeMessage], connection: ActorRef): Actor.Receive = { /* a message from outside our actor asking to write data */ case g: GetTime => /* no longer need to encode, just wrap in a Command */ connection ! pipeline.Command(g) // no longer need to encode case pipeline.Event(time) => /* Event from the pipeline, containing a SetTime object already decoded */ } What does our new Protocol Handler look like?
a ‘Command’ type and ‘Event’ type class TimeProtocolHandler extends SymmetricPipelineStage[PipelineContext, TimeMessage, ByteString] { override def apply(ctx: PipelineContext) = new SymmetricPipePair[TimeMessage, ByteString] { implicit val byteOrder = java.nio.ByteOrder.LITTLE_ENDIAN /* Message from us, being converted out to ByteString */ override val commandPipeline = { msg: TimeMessage 㱺 ctx.singleCommand(msg.toByteString()) } /* Message from network being converted to a GetTime */ override val eventPipeline = { bs: ByteString 㱺 ctx.singleEvent(GetTime.fromByteString(bs)) } } }
we wanted to use SSL? val pipeline = TcpPipelineHandler.withLogger(log, new TimeProtocolHandler >> new TcpReadWriteAdapter >> new SslTlsSupport(sslEngine(remote, client = true)) ) The rest is handled in the Pipeline, transparently...
code Handling Network Backpressure • Downstream of all client network code are buffers in the kernel that place queued up packets onto the network • These buffers are not bottomless - they can get backed up during congested network events, etc... which means they may be unable to accept more writes • In these circumstances, code that is unaware of congestion may have failures • The answer? Backpressure. Be aware of congestion and back off for awhile • There is a prerolled Pipeline handler for Backpressure, that makes it very easy...
/* ... */ with Stash { /* stash allows us to hold messages back for later processing */ // ... val pipeline = TcpPipelineHandler.withLogger(log, new TimeProtocolHandler >> new TcpReadWriteAdapter >> new BackpressureBuffer(lowBytes = 100, highBytes = 1000, maxBytes = 1000000)) // ... def connectedBehavior /* .. */ = { case BackpressureBuffer.HighWatermarkReached => context.become(backpressureBehavior(pipeline, connection)) // ... def backpressureBehavior /* ... */= { case BackpressureBuffer.LowWatermarkReached => unstashAll() // dequeue all waiting messages, resume normal behavior context.become(connectedBehavior(pipeline, connection)) case _ => /* put the message away until we are back at low water stash() } class TimeClient /* ... */ with Stash { /* stash allows us to hold messages back for later processing */ val pipeline = TcpPipelineHandler.withLogger(log, new TimeProtocolHandler >> new TcpReadWriteAdapter >> new BackpressureBuffer(lowBytes = 100, highBytes = 1000, maxBytes = 1000000)) def connectedBehavior /* .. */ = { case BackpressureBuffer.HighWatermarkReached => context.become(backpressureBehavior(pipeline, connection)) def backpressureBehavior /* ... */= { case BackpressureBuffer.LowWatermarkReached => unstashAll() // dequeue all waiting messages, resume normal behavior context.become(connectedBehavior(pipeline, connection)) case _ => /* put the message away until we are back at low water stash() }
on Composable APIs • I’ve been playing a lot with ways to build more functional, composable APIs that abstract away the actors • Some conclusions • The SIP-14 Futures API provides powerful tools, especially using Promises, where you can return the Future and complete the Promise later • A dispatching model is very effective with the Promise approach • Two good ways to represent buffers of larger data like DB Cursors • Iteratees - powerful & flexible but potentially hard for users to comprehend • Reactive Streams ˒ - much of the power of Iteratees, with a more friendly interface ˒ See Netflix’s RxJava + its Scala adapter https://github.com/Netflix/RxJava/tree/master/language-adaptors/rxjava-scala