Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LegoSDN HotNets 2014

Bala
October 28, 2014

LegoSDN HotNets 2014

Tolerating SDN Application Failures with LegoSDN, HotNets 2014.

A redesign of the SDN controller architecture centering around a set of abstractions to eliminate the fate-sharing relationships between SDN applications & the controller, and between the SDN applications themselves.

Bala

October 28, 2014
Tweet

More Decks by Bala

Other Decks in Research

Transcript

  1. Quality of Code “In C, I never learned to use

    the debugger, so I used to never make mistakes …” “I went millions and millions of hours with no problems—probably tens of millions of hours with no problems.” — Arthur Whitney, creator of A, K and Q. ACM Queue, Feb 2009. October 28, 2014 HotNets 2014 | LegoSDN 2
  2. Bugs are endemic in software! § Bugs can be deterministic

    or non- deterministic § [STS] Pox Premature PacketIn – l2_multi routing module failed unexpectedly with a KeyError. October 28, 2014 HotNets 2014 | LegoSDN 3
  3. Cascading Crashes October 28, 2014 HotNets 2014 | LegoSDN 4

    Controller A App1 A App2 A … in out
  4. Cascading Crashes October 28, 2014 HotNets 2014 | LegoSDN 5

    Controller A App1 A App2 A … in out
  5. Cascading Crashes October 28, 2014 HotNets 2014 | LegoSDN 6

    Controller A App1 A App2 A … in out
  6. LegoSDN § Availability is of utmost importance – Second only

    to security October 28, 2014 7 HotNets 2014 | LegoSDN
  7. Fate-sharing § Fate-sharing relationships between – the SDN controller and

    the SDN application(s) (also between SDN applications) – the SDN application and the network § Failure in any one SDN application brings down the other applications, and the SDN controller. October 28, 2014 8 HotNets 2014 | LegoSDN
  8. Three-pronged approach Controller A App1 A App2 A … in

    out 1 October 28, 2014 HotNets 2014 | LegoSDN 9 Contain crash
  9. Three-pronged approach Controller A App1 A App2 A … in

    out 2 October 28, 2014 HotNets 2014 | LegoSDN 10 Undo changes
  10. Three-pronged approach Controller A App1 A App2 A … in

    out 3 October 28, 2014 HotNets 2014 | LegoSDN 11 Handle message
  11. Isolate SDN-Apps from the controller Sandbox A App1 Sandbox A

    App2 Controller October 28, 2014 HotNets 2014 | LegoSDN 14
  12. Isolate SDN-Apps from the controller Sandbox A App1 Sandbox A

    App2 Controller October 28, 2014 HotNets 2014 | LegoSDN 15
  13. Isolate SDN-Apps from the controller Sandbox A App1 Sandbox A

    App2 Controller October 28, 2014 HotNets 2014 | LegoSDN 16
  14. Isolate SDN-Apps from the network Sandbox A App1 Controller a

    October 28, 2014 HotNets 2014 | LegoSDN 17
  15. Isolate SDN-Apps from the network Sandbox A App1 Controller a

    October 28, 2014 HotNets 2014 | LegoSDN 18
  16. LegoSDN AppVisor Stub Lightweight wrapper AppVisor Proxy Message dispatcher SDN-App

    is treated as a black-box. Stub and proxy allow SDN-Apps to talk to controller. NetLog Transactional support Sandbox A App1 Controller a AppVisor Stub AppVisor Proxy NetLog October 28, 2014 HotNets 2014 | LegoSDN 19
  17. LegoSDN Built on top of FloodLight Ported three applications bundled

    with FloodLight to LegoSDN Sandbox A App1 Controller a AppVisor Stub AppVisor Proxy NetLog October 28, 2014 HotNets 2014 | LegoSDN 20
  18. Three-pronged approach Controller A App1 A App2 A … in

    out 3 October 28, 2014 HotNets 2014 | LegoSDN 21 Handle message
  19. 1. Crash and burn § Halt the application – SDN-App

    cannot continue processing – Other SDN-Apps can continue unaffected § No Compromise – Think of security related SDN-Apps Correctness: SDN-App’s ability to implement its functionality without change, according to the specification. October 28, 2014 HotNets 2014 | LegoSDN 23
  20. 2. Induce amnesia § Ignore or drop the crash inducing

    message – SDN-App will not see the message again § Complete Compromise October 28, 2014 HotNets 2014 | LegoSDN 24
  21. 3. Apply transformations § Transform the offending message into another

    one that the application can handle – application will continue with a modified input § Equivalence Compromise October 28, 2014 HotNets 2014 | LegoSDN 25
  22. Related work § Fault tolerance – via reboots – applying

    Paxos for leader selection § Debugging SDN-Apps or the controller October 28, 2014 HotNets 2014 | LegoSDN 27
  23. Message equivalence § How do you determine two messages are

    equivalent? October 28, 2014 HotNets 2014 | LegoSDN 28
  24. Rollbacks are non-trivial § Rollback of one or more rules

    installed changes controller’s view of the state of network – Might induce crashes of other SDN applications that rely on a consistent view of network state October 28, 2014 HotNets 2014 | LegoSDN 29
  25. Error propagation § Last message received by the SDN-App prior

    to the crash need not be the culprit! – How far along should we go back in history to find the root cause of the crash? – Recovery from an earlier checkpoint; How many checkpoints should we maintain? October 28, 2014 HotNets 2014 | LegoSDN 30
  26. Road ahead § Rethink controller architecture – LegoSDN is only

    the tip of the iceberg. § Resilient controllers can catalyze adoption § Failures need to be a first-class citizen October 28, 2014 HotNets 2014 | LegoSDN 31