Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LegoSDN HotNets 2014

Avatar for Bala Bala
October 28, 2014

LegoSDN HotNets 2014

Tolerating SDN Application Failures with LegoSDN, HotNets 2014.

A redesign of the SDN controller architecture centering around a set of abstractions to eliminate the fate-sharing relationships between SDN applications & the controller, and between the SDN applications themselves.

Avatar for Bala

Bala

October 28, 2014
Tweet

More Decks by Bala

Other Decks in Research

Transcript

  1. Quality of Code “In C, I never learned to use

    the debugger, so I used to never make mistakes …” “I went millions and millions of hours with no problems—probably tens of millions of hours with no problems.” — Arthur Whitney, creator of A, K and Q. ACM Queue, Feb 2009. October 28, 2014 HotNets 2014 | LegoSDN 2
  2. Bugs are endemic in software! § Bugs can be deterministic

    or non- deterministic § [STS] Pox Premature PacketIn – l2_multi routing module failed unexpectedly with a KeyError. October 28, 2014 HotNets 2014 | LegoSDN 3
  3. Cascading Crashes October 28, 2014 HotNets 2014 | LegoSDN 4

    Controller A App1 A App2 A … in out
  4. Cascading Crashes October 28, 2014 HotNets 2014 | LegoSDN 5

    Controller A App1 A App2 A … in out
  5. Cascading Crashes October 28, 2014 HotNets 2014 | LegoSDN 6

    Controller A App1 A App2 A … in out
  6. LegoSDN § Availability is of utmost importance – Second only

    to security October 28, 2014 7 HotNets 2014 | LegoSDN
  7. Fate-sharing § Fate-sharing relationships between – the SDN controller and

    the SDN application(s) (also between SDN applications) – the SDN application and the network § Failure in any one SDN application brings down the other applications, and the SDN controller. October 28, 2014 8 HotNets 2014 | LegoSDN
  8. Three-pronged approach Controller A App1 A App2 A … in

    out 1 October 28, 2014 HotNets 2014 | LegoSDN 9 Contain crash
  9. Three-pronged approach Controller A App1 A App2 A … in

    out 2 October 28, 2014 HotNets 2014 | LegoSDN 10 Undo changes
  10. Three-pronged approach Controller A App1 A App2 A … in

    out 3 October 28, 2014 HotNets 2014 | LegoSDN 11 Handle message
  11. Isolate SDN-Apps from the controller Sandbox A App1 Sandbox A

    App2 Controller October 28, 2014 HotNets 2014 | LegoSDN 14
  12. Isolate SDN-Apps from the controller Sandbox A App1 Sandbox A

    App2 Controller October 28, 2014 HotNets 2014 | LegoSDN 15
  13. Isolate SDN-Apps from the controller Sandbox A App1 Sandbox A

    App2 Controller October 28, 2014 HotNets 2014 | LegoSDN 16
  14. Isolate SDN-Apps from the network Sandbox A App1 Controller a

    October 28, 2014 HotNets 2014 | LegoSDN 17
  15. Isolate SDN-Apps from the network Sandbox A App1 Controller a

    October 28, 2014 HotNets 2014 | LegoSDN 18
  16. LegoSDN AppVisor Stub Lightweight wrapper AppVisor Proxy Message dispatcher SDN-App

    is treated as a black-box. Stub and proxy allow SDN-Apps to talk to controller. NetLog Transactional support Sandbox A App1 Controller a AppVisor Stub AppVisor Proxy NetLog October 28, 2014 HotNets 2014 | LegoSDN 19
  17. LegoSDN Built on top of FloodLight Ported three applications bundled

    with FloodLight to LegoSDN Sandbox A App1 Controller a AppVisor Stub AppVisor Proxy NetLog October 28, 2014 HotNets 2014 | LegoSDN 20
  18. Three-pronged approach Controller A App1 A App2 A … in

    out 3 October 28, 2014 HotNets 2014 | LegoSDN 21 Handle message
  19. 1. Crash and burn § Halt the application – SDN-App

    cannot continue processing – Other SDN-Apps can continue unaffected § No Compromise – Think of security related SDN-Apps Correctness: SDN-App’s ability to implement its functionality without change, according to the specification. October 28, 2014 HotNets 2014 | LegoSDN 23
  20. 2. Induce amnesia § Ignore or drop the crash inducing

    message – SDN-App will not see the message again § Complete Compromise October 28, 2014 HotNets 2014 | LegoSDN 24
  21. 3. Apply transformations § Transform the offending message into another

    one that the application can handle – application will continue with a modified input § Equivalence Compromise October 28, 2014 HotNets 2014 | LegoSDN 25
  22. Related work § Fault tolerance – via reboots – applying

    Paxos for leader selection § Debugging SDN-Apps or the controller October 28, 2014 HotNets 2014 | LegoSDN 27
  23. Message equivalence § How do you determine two messages are

    equivalent? October 28, 2014 HotNets 2014 | LegoSDN 28
  24. Rollbacks are non-trivial § Rollback of one or more rules

    installed changes controller’s view of the state of network – Might induce crashes of other SDN applications that rely on a consistent view of network state October 28, 2014 HotNets 2014 | LegoSDN 29
  25. Error propagation § Last message received by the SDN-App prior

    to the crash need not be the culprit! – How far along should we go back in history to find the root cause of the crash? – Recovery from an earlier checkpoint; How many checkpoints should we maintain? October 28, 2014 HotNets 2014 | LegoSDN 30
  26. Road ahead § Rethink controller architecture – LegoSDN is only

    the tip of the iceberg. § Resilient controllers can catalyze adoption § Failures need to be a first-class citizen October 28, 2014 HotNets 2014 | LegoSDN 31