Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CALM and Disorderly Programming in Bloom

CALM and Disorderly Programming in Bloom

Distinguished Lecture, UC Santa Cruz Computer Science

Joe Hellerstein

June 05, 2017
Tweet

More Decks by Joe Hellerstein

Other Decks in Technology

Transcript

  1. CALM and Disorderly Programming in
    Joe Hellerstein
    UC Berkeley
    joint work with Peter Alvaro, Neil Conway, David Maier, and William Marczak
    bloom

    View full-size slide

  2. Distributed software is
    ➔ UBIQUITOUS
    ➔ HARD
    Programming Distributed Systems: It’s Time to Talk

    View full-size slide

  3. Distributed programming is
    ➔ UBIQUITOUS
    ➔ HARD
    An academic imperative!
    Minimal activity in industry
    Programming Distributed Systems: It’s Time to Talk

    View full-size slide

  4. Distributed programming is
    ➔ UBIQUITOUS
    ➔ HARD
    An academic imperative!
    Minimal activity in industry
    Today: one academic group’s take
    Lessons in theory and practice

    Initial impact in industry.
    Programming Distributed Systems: It’s Time to Talk

    View full-size slide

  5. Outline
    Software
    Mismatch
    Order and State 

    in the Cloud
    An Ideal
    Disorderly
    Programming
    for Distributed
    Systems
    A Realization
    <~ bloom
    Implications
    CALM Theorem

    View full-size slide

  6. Outline
    An Ideal
    Disorderly
    Programming
    for Distributed
    Systems
    Realization
    <~ bloom
    Implications
    Software
    Mismatch
    Order and State 

    in the Cloud CALM Theorem

    View full-size slide

  7. The State of Programming Is in Disorder

    View full-size slide

  8. ORDER
    ➔ a list of instructions
    ➔ an array of memory
    THE STATE
    ➔ mutation in time
    Von Neumann “Physics”

    View full-size slide

  9. ORDER
    ➔ a list of instructions
    ➔ an array of memory
    THE STATE
    ➔ mutation in time
    Von Neumann “Physics”

    View full-size slide

  10. DISORDERED TIME
    ➔ multiple clocks
    ➔ parallel computation
    ➔ unordered and 

    interleaved
    SHATTERED STATE
    ➔ local variables
    ➔ sharded tables
    ➔ message passing
    Cloud “Physics”
    v
    x
    z
    q r
    w
    y
    n

    View full-size slide

  11. our programming model 

    fit our physical reality?
    perhaps we could…
    ➔ scale up easily
    ➔ ignore race conditions
    ➔ tolerate faults reliably
    ➔ debug naturally
    ➔ test intelligently
    …and better understand 

    our fundamentals.
    What if…
    Data

    View full-size slide

  12. Outline
    An Ideal
    Disorderly
    Programming
    for Distributed
    Systems
    Realization
    <~ bloom
    Implications
    Software
    Mismatch
    Order and State 

    in the Cloud CALM Theorem

    View full-size slide

  13. Let’s write code that commutes!
    ➔ atomic updates can be disaggregated
    ➔ replicas can update in different orders
    ➔ add idempotence: get retry fault tolerance
    OK. How?
    ➔ Appealing, but never actionable.
    Background: Kitchen Wisdom

    View full-size slide

  14. STATE
    ✔ mergeable types: e.g. sets, relations
    ❌ mutable variables, ordered structures (lists, dense
    arrays)
    TIME
    ✔ logical clocks with application semantics
    ❌ instruction ordering
    DISTRIBUTION (STATE + TIME)
    ✔ unification of storage and communication
    ❌ messaging libraries
    Disorderly-by-Default Programming

    View full-size slide

  15. ➔ Textbook example: 2-party
    communication
    Let Me Show You What I Mean
    ➔ The Bloom language: model
    and syntax
    ➔More sophisticated examples

    View full-size slide

  16. High Noon in

    the Land of Two Mountains

    View full-size slide

  17. High Noon in

    the Land of Two Mountains

    View full-size slide

  18. Let’s implement that in Bloom
    19

    View full-size slide

  19. Let’s implement that in Bloom
    20
    Interfaces akin to data tables
    Speakers write messages into the speak interface
    Listeners insert themselves into the listen interface

    View full-size slide

  20. Let’s implement that in Bloom
    21
    Interfaces akin to data tables
    Speakers write messages into the speak interface
    Listeners insert themselves into the listen interface
    The hear interface is dumped to stdio for debugging

    View full-size slide

  21. Let’s implement that in Bloom
    22

    View full-size slide

  22. Let’s implement that in Bloom
    23
    We include the RendezvousAPI verbatim.

    View full-size slide

  23. Let’s implement that in Bloom
    24
    We include the RendezvousAPI verbatim.
    Rendezvous is a join of speak and listen

    View full-size slide

  24. Let’s implement that in Bloom
    25
    We include the RendezvousAPI verbatim.
    Rendezvous is a join of speak and listen
    We have turned communication into query processing.

    View full-size slide

  25. Outline
    An Ideal
    Disorderly
    Programming
    for Distributed
    Systems
    Realization
    <~ bloom
    Implications
    CALM Theorem
    Software
    Mismatch
    Order and State 

    in the Cloud

    View full-size slide

  26. STATE
    ✔ mergeable types: tables and (other) lattices
    TIME
    ✔ fixpoint-per-tick logical clocks
    DISTRIBUTION (STATE + TIME)
    ✔ async “shuffled” tables
    A Disorderly Language
    <~ bloom
    Encourages asynchronous monotonic programming,

    merging in new information opportunistically over time.

    View full-size slide

  27. Operational Model
    <~ bloom
    bloom rules
    {
    • Independent agents (“nodes”)
    • Local state & logic (can be SPMD or MIMD)
    • Event-driven loop: one clock “tick” per iteration
    One Bloom “Tick”
    a b c
    local updates
    NW/OS events
    deferred local updates
    (next <+)
    async NW/OS msgs
    (async <~)
    instantaneous
    merge
    (now <=)
    atomic local fixpoint
    }

    View full-size slide

  28. Statements
    <~ bloom

    View full-size slide

  29. Statements
    <~ bloom

    <= now
    <+ next
    <~ async
    <- del_next

    View full-size slide

  30. Statements
    <~ bloom

    <= now
    <+ next
    <~ async
    <- del_next
    persistent
    table

    lmax,lbool,lmap…
    transient
    scratch
    interface
    networked
    transient
    channel

    View full-size slide

  31. Statements
    <~ bloom

    <= now
    <+ next
    <~ async
    <- del_next
    persistent
    table

    lmax,lbool,lmap…
    transient scratch
    transient interface
    networked
    transient
    channel

    map, flat_map
    reduce, group,
    argmin/max
    (r * s).pairs
    empty? include?
    count,max,min,…
    >, <, >=, <=
    relational

    operations
    lattice

    functions

    View full-size slide

  32. The Land of Two Mountains

    View full-size slide

  33. Recall Synchronous Rendezvous
    34
    Rendezvous is a join of speak and listen
    We have turned communication into query processing.
    But what of time? This depends on perfect synchrony (luck).

    View full-size slide

  34. Recall Synchronous Rendezvous
    35
    Rendezvous is a join of speak and listen
    We have turned communication into query processing.
    But what of time? This depends on perfect synchrony (luck).
    Asynchronous communication requires persistence.

    View full-size slide

  35. Persistence
    Transience + gap-free sequential refresh.

    View full-size slide

  36. Persistence
    Transience + gap-free sequential refresh.

    View full-size slide

  37. Sender Persists (Signal Fire)

    View full-size slide

  38. Let’s implement that in Bloom
    39

    View full-size slide

  39. Let’s implement that in Bloom
    40

    View full-size slide

  40. Let’s implement that in Bloom
    41

    View full-size slide

  41. Let’s implement that in Bloom
    42
    The spoken table stores all messages. 


    View full-size slide

  42. Let’s implement that in Bloom
    43
    The spoken table stores all messages.

    View full-size slide

  43. Let’s implement that in Bloom
    44
    The spoken table stores all messages.
    When a listen message arrives, it can rendezvous with all prior spoken messages.


    View full-size slide

  44. Receiver Persists
    (Watchtower)

    View full-size slide

  45. Let’s implement that in Bloom
    46

    View full-size slide

  46. Let’s implement that in Bloom
    47

    View full-size slide

  47. Let’s implement that in Bloom
    48

    View full-size slide

  48. Let’s implement that in Bloom
    49
    The listening table records all the agents who want notifications. 

    When a speak message arrives, it can rendezvous with all prior listeners.

    View full-size slide

  49. Both Persist
    (Signal Fire & Watchtower)

    View full-size slide

  50. Let’s implement that in Bloom
    51

    View full-size slide

  51. Let’s implement that in Bloom
    52

    View full-size slide

  52. Let’s implement that in Bloom
    53
    Each rule for hear joins a channel (events) with a table (state).

    Computation is driven by channel arrival.
    Either channel can “arrive first” and hear will be populated.

    View full-size slide

  53. ➔ Up to now, rendezvous in time
    ➔ What about space?
    ➔ listener, sender on the same node?!
    Distributing this

    View full-size slide

  54. ➔ Up to now, rendezvous in time
    ➔ What about space?
    ➔ listener, sender on the same node?!
    ➔ What if listener and sender on their own nodes?
    ➔ need a “Join Server”
    ➔ and proxy logic to reroute interfaces
    ➔ Good news: once you solve time, space is easy!
    Distributing this

    View full-size slide

  55. Asynchronous merge (<~) into channels.


    View full-size slide

  56. Asynchronous merge (<~) into channels.

    Like “shuffle” or “exchange: routed based on values in a demarcated field.

    View full-size slide

  57. JoinServer simply wires up an 

    imported Rendezvous

    View full-size slide

  58. ➔ Can hash-partition JoinServer state on subject
    for scaling
    ➔do this in proxy code; clients unchanged
    ➔ Can replicate JoinServer state for fault-tolerance
    ➔Many possible lattice-based consistency
    models
    ➔Indy KVS project
    Distributed JoinServer
    MapLattice
    key:

    any_type
    val:

    Version

    Lattice
    val:
    any_type
    vc:

    MapLattice
    spoken_map
    node:

    any_type
    val:

    MaxLattice

    View full-size slide

  59. ➔ We now have a distributed
    rendezvous protocol
    ➔ What is all this?
    Reflecting Back: Disorderly, Data-Centric Programming
    Speaker Persists
    Log of messages
    Key-Value Store

    (Database)
    Listener Persists
    Registry of listeners


    Publish/Subscribe

    View full-size slide

  60. Reflecting Back: Disorderly, Data-Centric Programming
    Speaker Persists
    Log of messages
    Key-Value Store

    (Database)
    Listener Persists
    Registry of listeners


    Publish/Subscribe
    Duality of storage and communication!
    Rendezvous over time.

    Choice of “system type” becomes minor code change.
    Hybrids naturally emerge.
    Reduced a hard programming problem to a well-understood
    database problem!
    Data

    View full-size slide

  61. And it gets better!
    Speaker Persists
    Log of messages
    Key-Value Store

    (Database)
    Listener Persists
    Registry of listeners


    Publish/Subscribe
    Post-Hoc distribution — even of server logic
    Any table/channel/interface can be treated like a DB table:
    scale-out: shard
    fault tolerance: replicate
    It’s easy to distribute centralized code, post-hoc!
    Data

    View full-size slide

  62. Reflecting Back: Disorderly, Data-Centric Programming
    Speaker Persists
    Log of messages
    Key-Value Store

    (Database)
    Listener Persists
    Registry of listeners


    Publish/Subscribe
    Post-Hoc distribution
    Any table/channel/interface can be treated like a DB table:
    scale-out: shard
    fault tolerance: replicate
    It’s easy to distribute centralized code, post-hoc!
    What about the hard parts of distributed databases/systems? Consistency.
    Data

    View full-size slide

  63. ➔ Easy! We can statically check Bloom code.
    ➔ budplot
    What of Consistency?

    View full-size slide

  64. ➔ Easy! We can statically check Bloom code.
    ➔ budplot looks for order-sensitive dataflows
    ➔ async communication causes disorder
    (yellow)
    What of Consistency?

    View full-size slide

  65. ➔ Easy! We can statically check Bloom code.
    ➔ budplot looks for order-sensitive dataflows
    ➔ async communication causes disorder
    (yellow)
    ➔ order-sensitive op downstream of
    disorder? non-deterministic! (red)
    What of Consistency?

    View full-size slide

  66. ➔ Easy! We can statically check Bloom code.
    ➔ budplot looks for order-sensitive dataflows
    ➔ async communication causes disorder
    (yellow)
    ➔ order-sensitive op downstream of
    disorder? non-deterministic! (red)
    ➔ Q: What operations are order-sensitive?

    What of Consistency?

    View full-size slide

  67. ➔ Easy! We can statically check Bloom code.
    ➔ budplot looks for order-sensitive dataflows
    ➔ async communication causes disorder
    (yellow)
    ➔ order-sensitive op downstream of
    disorder? non-deterministic! (red)
    ➔ Q: What operations are order-sensitive?

    A: The non-monotone ones
    What of Consistency?

    View full-size slide

  68. ➔ Easy! We can statically check Bloom code.
    ➔ budplot looks for order-sensitive dataflows
    ➔ async communication causes disorder
    (yellow)
    ➔ order-sensitive op downstream of
    disorder? non-deterministic! (red)
    ➔ Q: What operations are order-sensitive?

    A: The non-monotone ones
    ➔ monotone: output grows with input
    ➔ non-monotone: must base (partial)
    results on their full input (prefix)
    What of Consistency?

    View full-size slide

  69. ➔ Easy! We can statically check Bloom code.
    ➔ budplot looks for order-sensitive dataflows
    ➔ async communication causes disorder
    (yellow)
    ➔ order-sensitive op downstream of
    disorder? non-deterministic! (red)
    ➔ Q: What operations are order-sensitive?

    A: The non-monotone ones
    ➔ monotone: output grows with input
    ➔ non-monotone: must base (partial)
    results on their full input (prefix)
    What of Consistency?

    View full-size slide

  70. Statements
    <~ bloom

    <= now
    <+ next
    <~ async
    <- del_next
    persistent
    table

    lmax,lbool,lmap…
    transient scratch
    transient interface
    networked
    transient
    channel
    scheduled
    transient
    periodic

    map, flat_map
    reduce, group,
    argmin/max
    (r * s).pairs
    empty? include?
    count,max,min,…
    >, <, >=, <=
    relational

    operations
    lattice

    functions

    View full-size slide

  71. ➔ Easy! We can statically check Bloom code.
    ➔ Yes, but our SpeakerPersist code is odd:
    ➔ All messages in an unordered set
    ➔ Never deletes or overwrites
    What of Consistency?

    View full-size slide

  72. A More Typical Speaker Persist

    View full-size slide

  73. A More Typical Speaker Persist
    The spoken table is now mutable: one value per subject.

    View full-size slide

  74. A More Typical Speaker Persist
    The spoken table is now mutable: one value per subject.

    Arrival order of network messages should require us to think about consistency.
    What does static analysis say?

    View full-size slide

  75. A More Typical Speaker Persist
    The spoken table is now mutable: one value per subject.

    Arrival order of network messages should require us to think about consistency.
    What does static analysis say?

    View full-size slide

  76. Consistency?

    View full-size slide

  77. Now what? Two options:
    1.avoid non-monotonicity

    2.impose global ordering
    Consistency?

    View full-size slide

  78. Now what? Two options:
    1.avoid non-monotonicity

    2.impose global ordering
    Both natural in Bloom.
    Consistency?

    View full-size slide

  79. Now what? Two options:
    1.avoid non-monotonicity

    2.impose global ordering
    Both natural in Bloom.
    (But one is better, as we’ll discuss :-)
    Consistency?

    View full-size slide

  80. Now what? Two options:
    1.avoid non-monotonicity

    using vector clocks
    2.impose global ordering
    Both natural in Bloom.
    (But one is better, as we’ll discuss :-)
    Consistency?

    View full-size slide

  81. Monotonic Structures: Lattices
    84
    (Join Semi-) Lattice: An object class with
    - a merge operator (<=) that is

    Associative, Commutative and Idempotent.
    - a largest value
    See “ACID 2.0” [Campbell/Helland CIDR ’09], CRDTs [Shapiro, et al. INRIA TR 2011] 


    View full-size slide

  82. Vector Clocks
    85
    my_vc:

    MapLattice
    key:

    any_type
    val:

    MaxLattic
    Joe Phokion
    { }<= Peter Phokion
    { }
    Peter Phokion
    { }

    View full-size slide

  83. Vector Clocks in Bloom Lattices
    86
    my_vc:

    MapLattice
    key:

    any_type
    val:

    MaxLattic
    Joe Phokion
    { }<= Peter Phokion
    { }
    Bloom lets us compose these lattices just as we compose relational tables/expressions
    using merge rules, morphisms, and monotone functions

    View full-size slide

  84. Vector Clocks in Bloom Lattices
    87
    my_vc:

    MapLattice
    key:

    any_type
    val:

    MaxLattice
    state do
    lmap :my_vc
    end
    bootstrap do
    my_vc <= {ip_port =>
    Bud::MaxLattice.new(0)}
    end
    Bloom lets us compose these lattices just as we compose relational tables/expressions
    using merge rules, morphisms, and monotone functions

    View full-size slide

  85. • Initially all clocks are zero.
    Vector Clocks: bloom v. wikipedia
    bootstrap do

    my_vc <= 

    {ip_port => Bud::MaxLattice.new(0)} 

    end


    bloom do

    next_vc <= out_msg 

    { {ip_port => my_vc.at(ip_port) + 1} }


    out_msg_vc <= out_msg 

    {|m| [m.addr, m.payload, next_vc]} 

    next_vc <= in_msg 

    { {ip_port => my_vc.at(ip_port) + 1} } 

    next_vc <= my_vc

    next_vc <= in_msg {|m| m.clock}

    my_vc <+ next_vc

    end
    • Each time a process receives a message, it
    increments its own logical clock in the vector
    by one
    • Each time a process prepares to send a
    message, it increments its own logical clock
    in the vector by one
    • Each time a process experiences an internal
    event, it increments its own logical clock in
    the vector by one.
    and then sends its entire vector along with
    the message being sent.
    and updates each element in its vector by
    taking the maximum of the value in its own
    vector clock and the value in the vector in the
    received message (for every element).
    [“Logic and Lattices”, Conway, et al. SOCC 2012]

    View full-size slide

  86. Now what? Two options:
    1.avoid non-monotonicity

    2.impose global ordering

    using Paxos
    Both natural in Bloom.
    (But one is better, as we’ll discuss :-)
    Consistency?

    View full-size slide

  87. Paxos: pseudocode v. bloom
    1. Priest p chooses a new ballot number b greater than lastTried [p], sets
    lastTried [p] to b, and sends a NextBallot (b) message to some set of priests.

    2. Upon receipt of a NextBallot (b) message from p with b > nextBal [q], priest q
    sets nextBal [q] to b and sends a LastVote (b, v) message to p, where v equals
    prevVote [q]. (A NextBallot (b) message is ignored if b < nextBal [q].)

    3. After receiving a LastVote (b, v) message from every priest in some majority
    set Q, where b = lastTried [p], priest p initiates a new ballot with number b,
    quorum Q, and decree d, where d is chosen to satisfy B3. He then sends a
    BeginBallot (b, d) message to every priest in Q.

    4. Upon receipt of a BeginBallot (b,d) message with b = nextBal [q], priest q casts
    his vote in ballot number b, sets prevVote [q] to this vote, and sends a Voted
    (b, q) message to p. (A BeginBallot (b, d) message is ignored if b = nextBal [q].)

    5. If p has received a Voted (b, q) message from every priest q in Q (the quorum
    for ballot number b), where b = lastTried [p], then he writes d (the decree of
    that ballot) in his ledger and sends a Success (d) message to every priest.
    lastTried <= (lastTried*nextBallot).pairs(priest=>priest) { |l,n| [l.priest, n.bnum] if n.bnum >= l.old }
    nextBallot <= (decreeRequest*lastTried*priestCnt).combos
    (decreeRequest.priest => lastTried.priest, lastTried.priest => priestCnt.priest) { |d,l,p|
    [d.priest, l.old+p.cnt, d.decree] }
    sendNextBallot <~ (nextBallot*parliament).pairs(n.priest=>p.priest) { |n,p| 

    [p.peer, n.ballot, n.decree, n.priest] }
    nextBal <= (nextBal*lastVote).pairs(priest=>priest) { |n,l| [n.priest, l.ballot] if l.ballot >= n.old }
    lastVote <= (sendNextBallot, prevVote).pairs(priest=>priest) { |s,p|
    [s.priest, s.ballot, p.oldBallod, p.oldDecree, s.peer] if s.ballot >= p.oldBallot }
    sendLastVote <~ lastVote { |l| [priest, ballot, oldBallot, decree, lord] }
    priestCnt <= parliament.group([lord], count)
    lastVoteCnt <= sendLastVote.group([lord, ballot], count(Priest))
    maxPrevBallot <= sendLastVote.group([lord], max(oldBallot))
    quorum(Lord,Ballot) <= (priestCnt*lastVoteCnt).pairs(lord=>lord) { |p,l|
    [p.lord, p.pcnt] if vnct > (pcnt / 2) }
    beginBallot <= (quorum*maxPrevBallot*nextBallot*sendLastVote).combos
    (quorum.ballot => nextBallot.ballot, nextBallot.ballot => sendLastVote.ballot
    quorum.lord => maxPrevBallot.lord, maxPrevBallot.lord => nextBallot.lord,
    nextBallot.lord => sendLastVote.lord,
    maxPrevBallot.maxB => sendLastVote.maxB) {|q,m,n,s|
    m.maxB == -1 ? [q.lord, q.ballot, n.decree] : [q.lord, q.ballot, l.oldDecree] }
    sendBeginBallot <~ (beginBallot*parliament).pairs(lord=>lord) {|b,p| [l.priest, b.ballot, b.decree, b.lord] }
    vote <= (sendBeginBallot*nextBal).pairs(priest=>priest, ballot=>oldB) {|s.n| [s.priest, s.ballot, s.decree] }
    prevVote <= (prevVote*lastVote*vote).combos
    (prevVote.priest => lastVote.priest, lastVote.priest => vote.priest,
    lastVote.ballot => vote.ballot, l.decree => v.decree) { |p,l,v|
    [p.priest, l.ballot, l.decree] if l.ballot >= p.old }
    sendVote <~ (vote*sendBeginBallot).pairs(priest=>priest,ballot=>ballot, decree=>decree) {|v,s|
    [s.lord, v.ballot, v.decree, v.priest] }
    voteCnt <= sendVote.group([lord, ballot], count(priest))
    decree <= (lastTried*voteCnt*lastVoteCnt*beginBallot).combos
    (lastTried.lord=>voteCnt.lord, voteCnt.lord=>lastVoteCnt.lord,
    lastVoteCnt.lord=>beginBallot.lord, lastTried.ballot=>voteCnt.ballot,
    voteCnt.ballot=>lastVoteCnt.ballot, voteCnt.ballot=>beginBallot.ballot,
    voteCnt.votes => lastVoteCnt.votes) {|lt, v, lv, b|
    [lt.lord, lt.ballot, b.decree]}
    90
    [“BOOM Analytics”, Alvaro, et al. Eurosys 2010]

    View full-size slide

  88. ➔ scale up easily
    ➔ ignore race conditions
    ➔ tolerate faults reliably
    ➔ debug naturally
    ➔ test intelligently
    How did we do?

    View full-size slide

  89. ➔ How to test end-to-end Fault Tolerance?
    ➔ Lineage-Driven Fault Injection (LDFI)
    ➔ Molly: an LDFI system 

    [Alvaro, et al. SIGMOD 2015]
    ➔ Deployment at Netflix

    [Alvaro, et al. SOCC 2016]
    Testing Distributed Systems for Fault Tolerance

    View full-size slide

  90. ➔ scale up easily
    ➔ ignore race conditions
    ➔ tolerate faults reliably
    ➔ debug naturally
    ➔ test intelligently
    How did we do?
    Data

    View full-size slide

  91. ➔ Proof Point at Scale: BOOM Analytics [Alvaro, et al. Eurosys 2009]
    ➔ HDFS and Hadoop scale-out
    ➔ Industry Adoption of LDFI @ Netflix [Alvaro et al. 2016]
    ➔ Full-featured Ruby Interpreter for Bloom: 

    (https://github.com/bloom-lang/bud)
    ➔ Current work:
    ➔ Fluent: C++ compilation-based Bloom 

    (https://github.com/ucbrise/fluent)
    ➔ Indy: A dense, elastic Key-Value Store in Fluent
    How Real is All This?

    View full-size slide

  92. ➔ Multicore-performant
    Indy: A Key-Value Store [C. Wu, et al. 2017]
    ➔Smooth elastic scaling
    across nodes

    View full-size slide

  93. ➔ Smooth scaling across
    datacenters
    Indy: A Key-Value Store [C. Wu, et al. 2017]
    ➔Implements all known
    coordination-avoiding
    consistency models
    from [Bailis, et al. VLDB ’14]

    View full-size slide

  94. Realization
    <~ bloom
    Outline
    An Ideal
    Disorderly
    Programming
    for Distributed
    Systems
    Implications
    CALM Theorem
    Software
    Mismatch
    Order and State 

    in the Cloud

    View full-size slide

  95. Consistency is Good! But at what cost?
    Two options:
    1. avoid non-monotonicity

    2. impose global ordering via coordination

    View full-size slide

  96. Two options:
    1. avoid non-monotonicity

    2. impose global ordering via coordination
    But coordination is expensive!
    Two-Phase Commit, Paxos, Virtual Synchrony, Raft, etc.
    Require nodes to send messages and wait for responses
    Consistency is Good! But at what cost?

    View full-size slide

  97. Distributed Systems Poetry
    “The first principle of successful scalability
    is to batter the consistency mechanisms down to a minimum
    move them off the critical path
    hide them in a rarely visited corner of the system
    and then make it as hard as possible
    for application developers
    to get permission to use them”
    —James Hamilton (IBM, MS, Amazon) 

    in Birman, Chockler: “Toward a Cloud Computing Research Agenda”, LADIS 2009

    View full-size slide

  98. Two options:
    1. avoid non-monotonicity

    2. impose global ordering via coordination
    Consistency is Good! But at what cost?

    View full-size slide

  99. ➔ What computations require coordination for consistency?
    Questions Deserving Answers

    View full-size slide

  100. ➔ What computations require coordination for consistency?
    Questions Deserving Answers

    View full-size slide

  101. ➔ What computations require coordination for consistency?
    ➔ What computations can avoid coordination consistently?
    Questions Deserving Answers

    View full-size slide

  102. ➔ What computations require coordination for consistency?
    ➔ What computations can avoid coordination consistently?
    Questions Deserving Answers

    View full-size slide

  103. ➔ What computations require coordination for consistency?
    ➔ What computations can avoid coordination consistently?
    Questions Deserving Answers

    View full-size slide

  104. Consistency As Logical Monotonicity
    {coordination-free consistent} <=> {monotonically expressable}

    ➔ Avoid coordination for monotonic programs!
    ➔ no waiting!
    ➔ Monotonic programs are CAP-busters
    ➔ Consistent and Available during Partitions
    The CALM Theorem: A Bright Line

    View full-size slide

  105. Intuition from Rendezvous
    ➔ Happy case: monotonicity. It “streams”!
    ➔At any time, output ⊆ final result
    ➔After all messages, output is maximal
    ➔Implication: deterministic outcome!
    Problem: non-monotonicity. Can’t “stream”.
    Intermediate result ⊄ final result
    New input refutes previous output
    No output until you get entire input.
    Ensuring entire input? Coordination.
    Only works for BothPersist!!!

    View full-size slide

  106. The Declarative Imperative
    Experiences and Conjectures in Distributed Logic
    Joseph M. Hellerstein
    University of California, Berkeley
    [email protected]
    ABSTRACT
    The rise of multicore processors and cloud computing is putting
    enormous pressure on the software community to find solu-
    tions to the difficulty of parallel and distributed programming.
    At the same time, there is more—and more varied—interest in
    data-centric programming languages than at any time in com-
    puting history, in part because these languages parallelize nat-
    urally. This juxtaposition raises the possibility that the theory
    of declarative database query languages can provide a foun-
    dation for the next generation of parallel and distributed pro-
    gramming languages.
    In this paper I reflect on my group’s experience over seven
    years using Datalog extensions to build networking protocols
    and distributed systems. Based on that experience, I present
    a number of theoretical conjectures that may both interest the
    database community, and clarify important practical issues in
    distributed computing. Most importantly, I make a case for
    database researchers to take a leadership role in addressing the
    impending programming crisis.
    This is an extended version of an invited lecture at the ACM
    PODS 2010 conference [32].
    1. INTRODUCTION
    This year marks the forty-fifth anniversary of Gordon Moore’s
    paper laying down the Law: exponential growth in the density
    of transistors on a chip. Of course Moore’s Law has served
    more loosely to predict the doubling of computing efficiency
    every eighteen months. This year is a watershed: by the loose
    accounting, computers should be 1 Billion times faster than
    they were when Moore’s paper appeared in 1965.
    Technology forecasters appear cautiously optimistic that Moore’s
    Law will hold steady over the coming decade, in its strict in-
    terpretation. But they also predict a future in which continued
    exponentiation in hardware performance will only be avail-
    able via parallelism. Given the difficulty of parallel program-
    ming, this prediction has led to an unusually gloomy outlook
    for computing in the coming years.
    At the same time that these storm clouds have been brew-
    ing, there has been a budding resurgence of interest across
    the software disciplines in data-centric computation, includ-
    ing declarative programming and Datalog. There is more—
    and more varied—applied activity in these areas than at any
    point in memory.
    The juxtaposition of these trends presents stark alternatives.
    Will the forecasts of doom and gloom materialize in a storm
    that drowns out progress in computing? Or is this the long-
    delayed catharsis that will wash away today’s thicket of im-
    perative languages, preparing the ground for a more fertile
    declarative future? And what role might the database com-
    munity play in shaping this future, having sowed the seeds of
    Datalog over the last quarter century?
    Before addressing these issues directly, a few more words
    about both crisis and opportunity are in order.
    1.1 Urgency: Parallelism
    I would be panicked if I were in industry.
    — John Hennessy, President, Stanford University [35]
    The need for parallelism is visible at micro and macro scales.
    In microprocessor development, the connection between the
    “strict” and “loose” definitions of Moore’s Law has been sev-
    ered: while transistor density is continuing to grow exponen-
    tially, it is no longer improving processor speeds. Instead, chip
    manufacturers are packing increasing numbers of processor
    cores onto each chip, in reaction to challenges of power con-
    sumption and heat dissipation. Hence Moore’s Law no longer
    predicts the clock speed of a chip, but rather its offered degree
    of parallelism. And as a result, traditional sequential programs
    will get no faster over time. For the first time since Moore’s
    paper was published, the hardware community is at the mercy
    of software: only programmers can deliver the benefits of the
    Law to the people.
    At the same time, Cloud Computing promises to commodi-
    tize access to large compute clusters: it is now within the bud-
    get of individual developers to rent massive resources in the
    worlds’ largest computing centers. But again, this computing
    potential will go untapped unless those developers can write
    programs that harness parallelism, while managing the hetero-
    geneity and component failures endemic to very large clusters
    of distributed computers.
    Unfortunately, parallel and distributed programming today
    is challenging even for the best programmers, and unwork-
    able for the majority. In his Turing lecture, Jim Gray pointed
    to discouraging trends in the cost of software development,
    and presented Automatic Programming as the twelfth of his
    dozen grand challenges for computing [26]: develop methods
    to build software with orders of magnitude less code and ef-
    fort. As presented in the Turing lecture, Gray’s challenge con-
    cerned sequential programming. The urgency and difficulty of
    his twelfth challenge has grown markedly with the technology
    SIGMOD Record, March 2010 (Vol. 39, No. 1) 5
    ➔ CALM Conjecture

    [Hellerstein, PODS ’10, SIGMOD Record 2010]
    CALM History

    View full-size slide

  107. The Declarative Imperative
    Experiences and Conjectures in Distributed Logic
    Joseph M. Hellerstein
    University of California, Berkeley
    [email protected]
    ABSTRACT
    The rise of multicore processors and cloud computing is putting
    enormous pressure on the software community to find solu-
    tions to the difficulty of parallel and distributed programming.
    At the same time, there is more—and more varied—interest in
    data-centric programming languages than at any time in com-
    puting history, in part because these languages parallelize nat-
    urally. This juxtaposition raises the possibility that the theory
    of declarative database query languages can provide a foun-
    dation for the next generation of parallel and distributed pro-
    gramming languages.
    In this paper I reflect on my group’s experience over seven
    years using Datalog extensions to build networking protocols
    and distributed systems. Based on that experience, I present
    a number of theoretical conjectures that may both interest the
    database community, and clarify important practical issues in
    distributed computing. Most importantly, I make a case for
    database researchers to take a leadership role in addressing the
    impending programming crisis.
    This is an extended version of an invited lecture at the ACM
    PODS 2010 conference [32].
    1. INTRODUCTION
    This year marks the forty-fifth anniversary of Gordon Moore’s
    paper laying down the Law: exponential growth in the density
    of transistors on a chip. Of course Moore’s Law has served
    more loosely to predict the doubling of computing efficiency
    every eighteen months. This year is a watershed: by the loose
    accounting, computers should be 1 Billion times faster than
    they were when Moore’s paper appeared in 1965.
    Technology forecasters appear cautiously optimistic that Moore’s
    Law will hold steady over the coming decade, in its strict in-
    terpretation. But they also predict a future in which continued
    exponentiation in hardware performance will only be avail-
    able via parallelism. Given the difficulty of parallel program-
    ming, this prediction has led to an unusually gloomy outlook
    for computing in the coming years.
    At the same time that these storm clouds have been brew-
    ing, there has been a budding resurgence of interest across
    the software disciplines in data-centric computation, includ-
    ing declarative programming and Datalog. There is more—
    and more varied—applied activity in these areas than at any
    point in memory.
    The juxtaposition of these trends presents stark alternatives.
    Will the forecasts of doom and gloom materialize in a storm
    that drowns out progress in computing? Or is this the long-
    delayed catharsis that will wash away today’s thicket of im-
    perative languages, preparing the ground for a more fertile
    declarative future? And what role might the database com-
    munity play in shaping this future, having sowed the seeds of
    Datalog over the last quarter century?
    Before addressing these issues directly, a few more words
    about both crisis and opportunity are in order.
    1.1 Urgency: Parallelism
    I would be panicked if I were in industry.
    — John Hennessy, President, Stanford University [35]
    The need for parallelism is visible at micro and macro scales.
    In microprocessor development, the connection between the
    “strict” and “loose” definitions of Moore’s Law has been sev-
    ered: while transistor density is continuing to grow exponen-
    tially, it is no longer improving processor speeds. Instead, chip
    manufacturers are packing increasing numbers of processor
    cores onto each chip, in reaction to challenges of power con-
    sumption and heat dissipation. Hence Moore’s Law no longer
    predicts the clock speed of a chip, but rather its offered degree
    of parallelism. And as a result, traditional sequential programs
    will get no faster over time. For the first time since Moore’s
    paper was published, the hardware community is at the mercy
    of software: only programmers can deliver the benefits of the
    Law to the people.
    At the same time, Cloud Computing promises to commodi-
    tize access to large compute clusters: it is now within the bud-
    get of individual developers to rent massive resources in the
    worlds’ largest computing centers. But again, this computing
    potential will go untapped unless those developers can write
    programs that harness parallelism, while managing the hetero-
    geneity and component failures endemic to very large clusters
    of distributed computers.
    Unfortunately, parallel and distributed programming today
    is challenging even for the best programmers, and unwork-
    able for the majority. In his Turing lecture, Jim Gray pointed
    to discouraging trends in the cost of software development,
    and presented Automatic Programming as the twelfth of his
    dozen grand challenges for computing [26]: develop methods
    to build software with orders of magnitude less code and ef-
    fort. As presented in the Turing lecture, Gray’s challenge con-
    cerned sequential programming. The urgency and difficulty of
    his twelfth challenge has grown markedly with the technology
    SIGMOD Record, March 2010 (Vol. 39, No. 1) 5
    Declarative Networking:
    Language, Execution and Optimization
    Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗
    Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗
    ∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison
    ABSTRACT
    The networking and distributed systems communities have recently
    explored a variety of new network architectures, both for application-
    level overlay networks, and as prototypes for a next-generation In-
    ternet architecture. In this context, we have investigated declara-
    tive networking: the use of a distributed recursive query engine as
    a powerful vehicle for accelerating innovation in network architec-
    tures [23, 24, 33]. Declarative networking represents a significant
    new application area for database research on recursive query pro-
    cessing. In this paper, we address fundamental database issues in
    this domain. First, we motivate and formally define the Network
    Datalog (NDlog) language for declarative network specifications.
    Second, we introduce and prove correct relaxed versions of the tra-
    ditional semi-na¨
    ıve query evaluation technique, to overcome fun-
    damental problems of the traditional technique in an asynchronous
    distributed setting. Third, we consider the dynamics of network
    state, and formalize the “eventual consistency” of our programs even
    when bursts of updates can arrive in the midst of query execution.
    Fourth, we present a number of query optimization opportunities
    that arise in the declarative networking context, including applica-
    tions of traditional techniques as well as new optimizations. Last,
    we present evaluation results of the above ideas implemented in our
    P2 declarative networking system, running on 100 machines over
    the Emulab network testbed.
    1. INTRODUCTION
    The database literature has a rich tradition of research on recursive
    query languages and processing. This work has influenced com-
    mercial database systems to a certain extent. However, recursion
    is still considered an esoteric feature by most practitioners, and re-
    search in the area has had limited practical impact. Even within
    the database research community, there is longstanding controversy
    over the practical relevance of recursive queries, going back at least
    to the Laguna Beach Report [7], and continuing into relatively re-
    cent textbooks [35].
    In more recent work, we have made the case that recursive query
    technology has a natural application in the design of Internet infras-
    tructure. We presented an approach called declarative networking
    ∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a
    gift from Microsoft.
    Permission to make digital or hard copies of all or part of this work for
    personal or classroom use is granted without fee provided that copies are
    not made or distributed for profit or commercial advantage and that copies
    bear this notice and the full citation on the first page. To copy otherwise, to
    republish, to post on servers or to redistribute to lists, requires prior specific
    permission and/or a fee.
    SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA.
    Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00.
    that enables declarative specification and deployment of distributed
    protocols and algorithms via distributed recursive queries over net-
    work graphs [23, 24, 33]. We recently described how we imple-
    mented and deployed this concept in a system called P2 [23, 33].
    Our high-level goal is to provide a software environment that can
    accelerate the process of specifying, implementing, experimenting
    with and evolving designs for network architectures.
    Declarative networking is part of a larger effort to revisit the cur-
    rent Internet Architecture, which is considered by many researchers
    to be fundamentally ill-suited to handle today’s network uses and
    abuses [13]. While radical new architectures are being proposed
    for a “clean slate” design, there are also many efforts to develop
    application-level “overlay” networks on top of the current Internet,
    to prototype and roll out new network services in an evolutionary
    fashion [26]. Whether one is a proponent of revolution or evolution
    in this context, there is agreement that we are entering a period of
    significant flux in network services, protocols and architectures.
    In such an environment, innovation can be better focused and ac-
    celerated by having the right software tools at hand. Declarative
    query approaches appear to be one of the most promising avenues
    for dealing with the complexity of prototyping, deploying and evolv-
    ing new network architectures. The forwarding tables in network
    routing nodes can be regarded as a view over changing ground state
    (network links, nodes, load, operator policies, etc.), and this view is
    kept correct by the maintenance of distributed queries over this state.
    These queries are necessarily recursive, maintaining facts about ar-
    bitrarily long multi-hop paths over a network of single-hop links.
    Our initial forays into declarative networking have been promis-
    ing. First, in declarative routing [24], we demonstrated that recur-
    sive queries can be used to express a variety of well-known wired
    and wireless routing protocols in a compact and clean fashion, typ-
    ically in a handful of lines of program code. We also showed that
    the declarative approach can expose fundamental connections: for
    example, the query specifications for two well-known protocols –
    one for wired networks and one for wireless – differ only in the or-
    der of two predicates in a single rule body. Moreover, higher-level
    routing concepts (e.g., QoS constraints) can be achieved via simple
    modifications to the queries. Second, in declarative overlays [23],
    we extended our framework to support more complex application-
    level overlay networks such as multicast overlays and distributed
    hash tables (DHTs). We demonstrated a working implementation of
    the Chord [34] overlay lookup network specified in 47 Datalog-like
    rules, versus thousands of lines of C++ for the original version.
    Our declarative approach to networking promises not only flexibil-
    ity and compactness of specification, but also the potential to stat-
    ically check network protocols for security and correctness proper-
    ties [11]. In addition, dynamic runtime checks to test distributed
    properties of the network can easily be expressed as declarative
    queries, providing a uniform framework for network specification,
    monitoring and debugging [33].
    ➔ CALM Conjecture

    [Hellerstein, PODS ’10, SIGMOD Record 2010]
    ➔ Monotonicity => Consistency 

    [Abiteboul PODS 2011, Loo et al., SIGMOD 2006]
    CALM History

    View full-size slide

  108. The Declarative Imperative
    Experiences and Conjectures in Distributed Logic
    Joseph M. Hellerstein
    University of California, Berkeley
    [email protected]
    ABSTRACT
    The rise of multicore processors and cloud computing is putting
    enormous pressure on the software community to find solu-
    tions to the difficulty of parallel and distributed programming.
    At the same time, there is more—and more varied—interest in
    data-centric programming languages than at any time in com-
    puting history, in part because these languages parallelize nat-
    urally. This juxtaposition raises the possibility that the theory
    of declarative database query languages can provide a foun-
    dation for the next generation of parallel and distributed pro-
    gramming languages.
    In this paper I reflect on my group’s experience over seven
    years using Datalog extensions to build networking protocols
    and distributed systems. Based on that experience, I present
    a number of theoretical conjectures that may both interest the
    database community, and clarify important practical issues in
    distributed computing. Most importantly, I make a case for
    database researchers to take a leadership role in addressing the
    impending programming crisis.
    This is an extended version of an invited lecture at the ACM
    PODS 2010 conference [32].
    1. INTRODUCTION
    This year marks the forty-fifth anniversary of Gordon Moore’s
    paper laying down the Law: exponential growth in the density
    of transistors on a chip. Of course Moore’s Law has served
    more loosely to predict the doubling of computing efficiency
    every eighteen months. This year is a watershed: by the loose
    accounting, computers should be 1 Billion times faster than
    they were when Moore’s paper appeared in 1965.
    Technology forecasters appear cautiously optimistic that Moore’s
    Law will hold steady over the coming decade, in its strict in-
    terpretation. But they also predict a future in which continued
    exponentiation in hardware performance will only be avail-
    able via parallelism. Given the difficulty of parallel program-
    ming, this prediction has led to an unusually gloomy outlook
    for computing in the coming years.
    At the same time that these storm clouds have been brew-
    ing, there has been a budding resurgence of interest across
    the software disciplines in data-centric computation, includ-
    ing declarative programming and Datalog. There is more—
    and more varied—applied activity in these areas than at any
    point in memory.
    The juxtaposition of these trends presents stark alternatives.
    Will the forecasts of doom and gloom materialize in a storm
    that drowns out progress in computing? Or is this the long-
    delayed catharsis that will wash away today’s thicket of im-
    perative languages, preparing the ground for a more fertile
    declarative future? And what role might the database com-
    munity play in shaping this future, having sowed the seeds of
    Datalog over the last quarter century?
    Before addressing these issues directly, a few more words
    about both crisis and opportunity are in order.
    1.1 Urgency: Parallelism
    I would be panicked if I were in industry.
    — John Hennessy, President, Stanford University [35]
    The need for parallelism is visible at micro and macro scales.
    In microprocessor development, the connection between the
    “strict” and “loose” definitions of Moore’s Law has been sev-
    ered: while transistor density is continuing to grow exponen-
    tially, it is no longer improving processor speeds. Instead, chip
    manufacturers are packing increasing numbers of processor
    cores onto each chip, in reaction to challenges of power con-
    sumption and heat dissipation. Hence Moore’s Law no longer
    predicts the clock speed of a chip, but rather its offered degree
    of parallelism. And as a result, traditional sequential programs
    will get no faster over time. For the first time since Moore’s
    paper was published, the hardware community is at the mercy
    of software: only programmers can deliver the benefits of the
    Law to the people.
    At the same time, Cloud Computing promises to commodi-
    tize access to large compute clusters: it is now within the bud-
    get of individual developers to rent massive resources in the
    worlds’ largest computing centers. But again, this computing
    potential will go untapped unless those developers can write
    programs that harness parallelism, while managing the hetero-
    geneity and component failures endemic to very large clusters
    of distributed computers.
    Unfortunately, parallel and distributed programming today
    is challenging even for the best programmers, and unwork-
    able for the majority. In his Turing lecture, Jim Gray pointed
    to discouraging trends in the cost of software development,
    and presented Automatic Programming as the twelfth of his
    dozen grand challenges for computing [26]: develop methods
    to build software with orders of magnitude less code and ef-
    fort. As presented in the Turing lecture, Gray’s challenge con-
    cerned sequential programming. The urgency and difficulty of
    his twelfth challenge has grown markedly with the technology
    SIGMOD Record, March 2010 (Vol. 39, No. 1) 5
    Declarative Networking:
    Language, Execution and Optimization
    Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗
    Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗
    ∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison
    ABSTRACT
    The networking and distributed systems communities have recently
    explored a variety of new network architectures, both for application-
    level overlay networks, and as prototypes for a next-generation In-
    ternet architecture. In this context, we have investigated declara-
    tive networking: the use of a distributed recursive query engine as
    a powerful vehicle for accelerating innovation in network architec-
    tures [23, 24, 33]. Declarative networking represents a significant
    new application area for database research on recursive query pro-
    cessing. In this paper, we address fundamental database issues in
    this domain. First, we motivate and formally define the Network
    Datalog (NDlog) language for declarative network specifications.
    Second, we introduce and prove correct relaxed versions of the tra-
    ditional semi-na¨
    ıve query evaluation technique, to overcome fun-
    damental problems of the traditional technique in an asynchronous
    distributed setting. Third, we consider the dynamics of network
    state, and formalize the “eventual consistency” of our programs even
    when bursts of updates can arrive in the midst of query execution.
    Fourth, we present a number of query optimization opportunities
    that arise in the declarative networking context, including applica-
    tions of traditional techniques as well as new optimizations. Last,
    we present evaluation results of the above ideas implemented in our
    P2 declarative networking system, running on 100 machines over
    the Emulab network testbed.
    1. INTRODUCTION
    The database literature has a rich tradition of research on recursive
    query languages and processing. This work has influenced com-
    mercial database systems to a certain extent. However, recursion
    is still considered an esoteric feature by most practitioners, and re-
    search in the area has had limited practical impact. Even within
    the database research community, there is longstanding controversy
    over the practical relevance of recursive queries, going back at least
    to the Laguna Beach Report [7], and continuing into relatively re-
    cent textbooks [35].
    In more recent work, we have made the case that recursive query
    technology has a natural application in the design of Internet infras-
    tructure. We presented an approach called declarative networking
    ∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a
    gift from Microsoft.
    Permission to make digital or hard copies of all or part of this work for
    personal or classroom use is granted without fee provided that copies are
    not made or distributed for profit or commercial advantage and that copies
    bear this notice and the full citation on the first page. To copy otherwise, to
    republish, to post on servers or to redistribute to lists, requires prior specific
    permission and/or a fee.
    SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA.
    Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00.
    that enables declarative specification and deployment of distributed
    protocols and algorithms via distributed recursive queries over net-
    work graphs [23, 24, 33]. We recently described how we imple-
    mented and deployed this concept in a system called P2 [23, 33].
    Our high-level goal is to provide a software environment that can
    accelerate the process of specifying, implementing, experimenting
    with and evolving designs for network architectures.
    Declarative networking is part of a larger effort to revisit the cur-
    rent Internet Architecture, which is considered by many researchers
    to be fundamentally ill-suited to handle today’s network uses and
    abuses [13]. While radical new architectures are being proposed
    for a “clean slate” design, there are also many efforts to develop
    application-level “overlay” networks on top of the current Internet,
    to prototype and roll out new network services in an evolutionary
    fashion [26]. Whether one is a proponent of revolution or evolution
    in this context, there is agreement that we are entering a period of
    significant flux in network services, protocols and architectures.
    In such an environment, innovation can be better focused and ac-
    celerated by having the right software tools at hand. Declarative
    query approaches appear to be one of the most promising avenues
    for dealing with the complexity of prototyping, deploying and evolv-
    ing new network architectures. The forwarding tables in network
    routing nodes can be regarded as a view over changing ground state
    (network links, nodes, load, operator policies, etc.), and this view is
    kept correct by the maintenance of distributed queries over this state.
    These queries are necessarily recursive, maintaining facts about ar-
    bitrarily long multi-hop paths over a network of single-hop links.
    Our initial forays into declarative networking have been promis-
    ing. First, in declarative routing [24], we demonstrated that recur-
    sive queries can be used to express a variety of well-known wired
    and wireless routing protocols in a compact and clean fashion, typ-
    ically in a handful of lines of program code. We also showed that
    the declarative approach can expose fundamental connections: for
    example, the query specifications for two well-known protocols –
    one for wired networks and one for wireless – differ only in the or-
    der of two predicates in a single rule body. Moreover, higher-level
    routing concepts (e.g., QoS constraints) can be achieved via simple
    modifications to the queries. Second, in declarative overlays [23],
    we extended our framework to support more complex application-
    level overlay networks such as multicast overlays and distributed
    hash tables (DHTs). We demonstrated a working implementation of
    the Chord [34] overlay lookup network specified in 47 Datalog-like
    rules, versus thousands of lines of C++ for the original version.
    Our declarative approach to networking promises not only flexibil-
    ity and compactness of specification, but also the potential to stat-
    ically check network protocols for security and correctness proper-
    ties [11]. In addition, dynamic runtime checks to test distributed
    properties of the network can easily be expressed as declarative
    queries, providing a uniform framework for network specification,
    monitoring and debugging [33].
    A
    Relational Transducers for Declarative Networking
    TOM J. AMELOOT, Hasselt University & Transnational University of Limburg
    FRANK NEVEN, Hasselt University & Transnational University of Limburg
    JAN VAN DEN BUSSCHE, Hasselt University & Transnational University of Limburg
    Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a
    formal computation model for “eventually consistent” distributed querying, based on relational transduc-
    ers. A tight link has been conjectured between coordination-freeness of computations, and monotonicity
    of the queries expressed by such computations. Indeed, we propose a formal definition of coordination-
    freeness and confirm that the class of monotone queries is captured by coordination-free transducer net-
    works. Coordination-freeness is a semantic property, but the syntactic class of “oblivious” transducers we
    define also captures the same class of monotone queries. Transducer networks that are not coordination-free
    are much more powerful.
    Categories and Subject Descriptors: H.2 [
    Database Management
    ]: Languages; H.2 [
    Database Manage-
    ment
    ]: Systems—Distributed databases; F.1 [
    Computation by Abstract Devices
    ]: Models of Compu-
    tation
    General Terms: languages, theory
    Additional Key Words and Phrases: distributed database, relational transducer, monotonicity, expressive
    power, cloud programming
    ACM Reference Format:
    AMELOOT, T. J., NEVEN, F. and VAN DEN BUSSCHE, J. 2011. Relational Transducers for Declarative
    Networking. J. ACM V, N, Article A (January YYYY), 37 pages.
    DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
    1. INTRODUCTION
    Declarative networking [Loo et al. 2009] is a recent approach by which distributed compu-
    tations and networking protocols are modeled and programmed using formalisms based on
    Datalog. In his keynote speech at PODS 2010 [Hellerstein 2010a; Hellerstein 2010b], Heller-
    stein made a number of intriguing conjectures concerning the expressiveness of declarative
    networking. In the present paper, we are focusing on the CALM conjecture (Consistency
    And Logical Monotonicity). This conjecture suggests a strong link between, on the one hand,
    “eventually consistent” and “coordination-free” distributed computations, and on the other
    hand, expressibility in monotonic Datalog (without negation or aggregate functions). The
    conjecture was not fully formalized, however; indeed, as Hellerstein notes himself, a proper
    treatment of this conjecture requires crisp definitions of eventual consistency and coordina-
    tion, which have been lacking so far. Moreover, it also requires a formal model of distributed
    computation.
    Tom J. Ameloot is a PhD Fellow of the Fund for Scientific Research, Flanders (FWO).
    Author’s email addresses: [email protected], [email protected],
    [email protected].
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is
    granted without fee provided that copies are not made or distributed for profit or commercial advantage
    and that copies show this notice on the first page or initial screen of a display along with the full citation.
    Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
    credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
    component of this work in other works requires prior specific permission and/or a fee. Permissions may be
    requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA,
    fax +1 (212) 869-0481, or [email protected].
    c
    • YYYY ACM 0004-5411/YYYY/01-ARTA $15.00
    DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
    Journal of the ACM, Vol. V, No. N, Article A, Publication date: January YYYY.
    ➔ CALM Conjecture

    [Hellerstein, PODS ’10, SIGMOD Record 2010]
    ➔ Monotonicity => Consistency 

    [Abiteboul PODS 2011, Loo et al., SIGMOD 2006]
    ➔ Relational Transducer Proofs

    [Ameloot, et al. PODS 2012, JACM 2013]

    CALM History

    View full-size slide

  109. The Declarative Imperative
    Experiences and Conjectures in Distributed Logic
    Joseph M. Hellerstein
    University of California, Berkeley
    [email protected]
    ABSTRACT
    The rise of multicore processors and cloud computing is putting
    enormous pressure on the software community to find solu-
    tions to the difficulty of parallel and distributed programming.
    At the same time, there is more—and more varied—interest in
    data-centric programming languages than at any time in com-
    puting history, in part because these languages parallelize nat-
    urally. This juxtaposition raises the possibility that the theory
    of declarative database query languages can provide a foun-
    dation for the next generation of parallel and distributed pro-
    gramming languages.
    In this paper I reflect on my group’s experience over seven
    years using Datalog extensions to build networking protocols
    and distributed systems. Based on that experience, I present
    a number of theoretical conjectures that may both interest the
    database community, and clarify important practical issues in
    distributed computing. Most importantly, I make a case for
    database researchers to take a leadership role in addressing the
    impending programming crisis.
    This is an extended version of an invited lecture at the ACM
    PODS 2010 conference [32].
    1. INTRODUCTION
    This year marks the forty-fifth anniversary of Gordon Moore’s
    paper laying down the Law: exponential growth in the density
    of transistors on a chip. Of course Moore’s Law has served
    more loosely to predict the doubling of computing efficiency
    every eighteen months. This year is a watershed: by the loose
    accounting, computers should be 1 Billion times faster than
    they were when Moore’s paper appeared in 1965.
    Technology forecasters appear cautiously optimistic that Moore’s
    Law will hold steady over the coming decade, in its strict in-
    terpretation. But they also predict a future in which continued
    exponentiation in hardware performance will only be avail-
    able via parallelism. Given the difficulty of parallel program-
    ming, this prediction has led to an unusually gloomy outlook
    for computing in the coming years.
    At the same time that these storm clouds have been brew-
    ing, there has been a budding resurgence of interest across
    the software disciplines in data-centric computation, includ-
    ing declarative programming and Datalog. There is more—
    and more varied—applied activity in these areas than at any
    point in memory.
    The juxtaposition of these trends presents stark alternatives.
    Will the forecasts of doom and gloom materialize in a storm
    that drowns out progress in computing? Or is this the long-
    delayed catharsis that will wash away today’s thicket of im-
    perative languages, preparing the ground for a more fertile
    declarative future? And what role might the database com-
    munity play in shaping this future, having sowed the seeds of
    Datalog over the last quarter century?
    Before addressing these issues directly, a few more words
    about both crisis and opportunity are in order.
    1.1 Urgency: Parallelism
    I would be panicked if I were in industry.
    — John Hennessy, President, Stanford University [35]
    The need for parallelism is visible at micro and macro scales.
    In microprocessor development, the connection between the
    “strict” and “loose” definitions of Moore’s Law has been sev-
    ered: while transistor density is continuing to grow exponen-
    tially, it is no longer improving processor speeds. Instead, chip
    manufacturers are packing increasing numbers of processor
    cores onto each chip, in reaction to challenges of power con-
    sumption and heat dissipation. Hence Moore’s Law no longer
    predicts the clock speed of a chip, but rather its offered degree
    of parallelism. And as a result, traditional sequential programs
    will get no faster over time. For the first time since Moore’s
    paper was published, the hardware community is at the mercy
    of software: only programmers can deliver the benefits of the
    Law to the people.
    At the same time, Cloud Computing promises to commodi-
    tize access to large compute clusters: it is now within the bud-
    get of individual developers to rent massive resources in the
    worlds’ largest computing centers. But again, this computing
    potential will go untapped unless those developers can write
    programs that harness parallelism, while managing the hetero-
    geneity and component failures endemic to very large clusters
    of distributed computers.
    Unfortunately, parallel and distributed programming today
    is challenging even for the best programmers, and unwork-
    able for the majority. In his Turing lecture, Jim Gray pointed
    to discouraging trends in the cost of software development,
    and presented Automatic Programming as the twelfth of his
    dozen grand challenges for computing [26]: develop methods
    to build software with orders of magnitude less code and ef-
    fort. As presented in the Turing lecture, Gray’s challenge con-
    cerned sequential programming. The urgency and difficulty of
    his twelfth challenge has grown markedly with the technology
    SIGMOD Record, March 2010 (Vol. 39, No. 1) 5
    Declarative Networking:
    Language, Execution and Optimization
    Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗
    Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗
    ∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison
    ABSTRACT
    The networking and distributed systems communities have recently
    explored a variety of new network architectures, both for application-
    level overlay networks, and as prototypes for a next-generation In-
    ternet architecture. In this context, we have investigated declara-
    tive networking: the use of a distributed recursive query engine as
    a powerful vehicle for accelerating innovation in network architec-
    tures [23, 24, 33]. Declarative networking represents a significant
    new application area for database research on recursive query pro-
    cessing. In this paper, we address fundamental database issues in
    this domain. First, we motivate and formally define the Network
    Datalog (NDlog) language for declarative network specifications.
    Second, we introduce and prove correct relaxed versions of the tra-
    ditional semi-na¨
    ıve query evaluation technique, to overcome fun-
    damental problems of the traditional technique in an asynchronous
    distributed setting. Third, we consider the dynamics of network
    state, and formalize the “eventual consistency” of our programs even
    when bursts of updates can arrive in the midst of query execution.
    Fourth, we present a number of query optimization opportunities
    that arise in the declarative networking context, including applica-
    tions of traditional techniques as well as new optimizations. Last,
    we present evaluation results of the above ideas implemented in our
    P2 declarative networking system, running on 100 machines over
    the Emulab network testbed.
    1. INTRODUCTION
    The database literature has a rich tradition of research on recursive
    query languages and processing. This work has influenced com-
    mercial database systems to a certain extent. However, recursion
    is still considered an esoteric feature by most practitioners, and re-
    search in the area has had limited practical impact. Even within
    the database research community, there is longstanding controversy
    over the practical relevance of recursive queries, going back at least
    to the Laguna Beach Report [7], and continuing into relatively re-
    cent textbooks [35].
    In more recent work, we have made the case that recursive query
    technology has a natural application in the design of Internet infras-
    tructure. We presented an approach called declarative networking
    ∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a
    gift from Microsoft.
    Permission to make digital or hard copies of all or part of this work for
    personal or classroom use is granted without fee provided that copies are
    not made or distributed for profit or commercial advantage and that copies
    bear this notice and the full citation on the first page. To copy otherwise, to
    republish, to post on servers or to redistribute to lists, requires prior specific
    permission and/or a fee.
    SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA.
    Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00.
    that enables declarative specification and deployment of distributed
    protocols and algorithms via distributed recursive queries over net-
    work graphs [23, 24, 33]. We recently described how we imple-
    mented and deployed this concept in a system called P2 [23, 33].
    Our high-level goal is to provide a software environment that can
    accelerate the process of specifying, implementing, experimenting
    with and evolving designs for network architectures.
    Declarative networking is part of a larger effort to revisit the cur-
    rent Internet Architecture, which is considered by many researchers
    to be fundamentally ill-suited to handle today’s network uses and
    abuses [13]. While radical new architectures are being proposed
    for a “clean slate” design, there are also many efforts to develop
    application-level “overlay” networks on top of the current Internet,
    to prototype and roll out new network services in an evolutionary
    fashion [26]. Whether one is a proponent of revolution or evolution
    in this context, there is agreement that we are entering a period of
    significant flux in network services, protocols and architectures.
    In such an environment, innovation can be better focused and ac-
    celerated by having the right software tools at hand. Declarative
    query approaches appear to be one of the most promising avenues
    for dealing with the complexity of prototyping, deploying and evolv-
    ing new network architectures. The forwarding tables in network
    routing nodes can be regarded as a view over changing ground state
    (network links, nodes, load, operator policies, etc.), and this view is
    kept correct by the maintenance of distributed queries over this state.
    These queries are necessarily recursive, maintaining facts about ar-
    bitrarily long multi-hop paths over a network of single-hop links.
    Our initial forays into declarative networking have been promis-
    ing. First, in declarative routing [24], we demonstrated that recur-
    sive queries can be used to express a variety of well-known wired
    and wireless routing protocols in a compact and clean fashion, typ-
    ically in a handful of lines of program code. We also showed that
    the declarative approach can expose fundamental connections: for
    example, the query specifications for two well-known protocols –
    one for wired networks and one for wireless – differ only in the or-
    der of two predicates in a single rule body. Moreover, higher-level
    routing concepts (e.g., QoS constraints) can be achieved via simple
    modifications to the queries. Second, in declarative overlays [23],
    we extended our framework to support more complex application-
    level overlay networks such as multicast overlays and distributed
    hash tables (DHTs). We demonstrated a working implementation of
    the Chord [34] overlay lookup network specified in 47 Datalog-like
    rules, versus thousands of lines of C++ for the original version.
    Our declarative approach to networking promises not only flexibil-
    ity and compactness of specification, but also the potential to stat-
    ically check network protocols for security and correctness proper-
    ties [11]. In addition, dynamic runtime checks to test distributed
    properties of the network can easily be expressed as declarative
    queries, providing a uniform framework for network specification,
    monitoring and debugging [33].
    A
    Relational Transducers for Declarative Networking
    TOM J. AMELOOT, Hasselt University & Transnational University of Limburg
    FRANK NEVEN, Hasselt University & Transnational University of Limburg
    JAN VAN DEN BUSSCHE, Hasselt University & Transnational University of Limburg
    Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a
    formal computation model for “eventually consistent” distributed querying, based on relational transduc-
    ers. A tight link has been conjectured between coordination-freeness of computations, and monotonicity
    of the queries expressed by such computations. Indeed, we propose a formal definition of coordination-
    freeness and confirm that the class of monotone queries is captured by coordination-free transducer net-
    works. Coordination-freeness is a semantic property, but the syntactic class of “oblivious” transducers we
    define also captures the same class of monotone queries. Transducer networks that are not coordination-free
    are much more powerful.
    Categories and Subject Descriptors: H.2 [
    Database Management
    ]: Languages; H.2 [
    Database Manage-
    ment
    ]: Systems—Distributed databases; F.1 [
    Computation by Abstract Devices
    ]: Models of Compu-
    tation
    General Terms: languages, theory
    Additional Key Words and Phrases: distributed database, relational transducer, monotonicity, expressive
    power, cloud programming
    ACM Reference Format:
    AMELOOT, T. J., NEVEN, F. and VAN DEN BUSSCHE, J. 2011. Relational Transducers for Declarative
    Networking. J. ACM V, N, Article A (January YYYY), 37 pages.
    DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
    1. INTRODUCTION
    Declarative networking [Loo et al. 2009] is a recent approach by which distributed compu-
    tations and networking protocols are modeled and programmed using formalisms based on
    Datalog. In his keynote speech at PODS 2010 [Hellerstein 2010a; Hellerstein 2010b], Heller-
    stein made a number of intriguing conjectures concerning the expressiveness of declarative
    networking. In the present paper, we are focusing on the CALM conjecture (Consistency
    And Logical Monotonicity). This conjecture suggests a strong link between, on the one hand,
    “eventually consistent” and “coordination-free” distributed computations, and on the other
    hand, expressibility in monotonic Datalog (without negation or aggregate functions). The
    conjecture was not fully formalized, however; indeed, as Hellerstein notes himself, a proper
    treatment of this conjecture requires crisp definitions of eventual consistency and coordina-
    tion, which have been lacking so far. Moreover, it also requires a formal model of distributed
    computation.
    Tom J. Ameloot is a PhD Fellow of the Fund for Scientific Research, Flanders (FWO).
    Author’s email addresses: [email protected], [email protected],
    [email protected].
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is
    granted without fee provided that copies are not made or distributed for profit or commercial advantage
    and that copies show this notice on the first page or initial screen of a display along with the full citation.
    Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
    credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
    component of this work in other works requires prior specific permission and/or a fee. Permissions may be
    requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA,
    fax +1 (212) 869-0481, or [email protected].
    c
    • YYYY ACM 0004-5411/YYYY/01-ARTA $15.00
    DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
    Journal of the ACM, Vol. V, No. N, Article A, Publication date: January YYYY.
    ➔ CALM Conjecture

    [Hellerstein, PODS ’10, SIGMOD Record 2010]
    ➔ Monotonicity => Consistency 

    [Abiteboul PODS 2011, Loo et al., SIGMOD 2006]
    ➔ Relational Transducer Proofs

    [Ameloot, et al. PODS 2012, JACM 2013]

    [Ameloot et al. PODS 2014]
    CALM History
    Weaker Forms of Monotonicity for Declarative Networking:
    a More Fine-grained Answer to the CALM-conjecture
    Tom J. Ameloot

    Hasselt University &
    transnational University of Limburg
    [email protected]
    Bas Ketsman
    Hasselt University &
    transnational University of Limburg
    [email protected]
    Frank Neven
    Hasselt University &
    transnational University of Limburg
    [email protected]
    Daniel Zinn
    LogicBlox, Inc
    [email protected]
    ABSTRACT
    The CALM-conjecture, first stated by Hellerstein [23] and
    proved in its revised form by Ameloot et al. [13] within the
    framework of relational transducer networks, asserts that
    a query has a coordination-free execution strategy if and
    only if the query is monotone. Zinn et al. [32] extended the
    framework of relational transducer networks to allow for spe-
    cific data distribution strategies and showed that the non-
    monotone win-move query is coordination-free for domain-
    guided data distributions. In this paper, we complete the
    story by equating increasingly larger classes of coordination-
    free computations with increasingly weaker forms of mono-
    tonicity and make Datalog variants explicit that capture
    each of these classes. One such fragment is based on strati-
    fied Datalog where rules are required to be connected with
    the exception of the last stratum. In addition, we charac-
    terize coordination-freeness as those computations that do
    not require knowledge about all other nodes in the network,
    and therefore, can not globally coordinate. The results in
    this paper can be interpreted as a more fine-grained answer
    to the CALM-conjecture.
    Categories and Subject Descriptors
    H.2 [Database Management]: Languages; H.2 [Database
    Management]: Systems—Distributed databases; F.1 [Com-
    putation by Abstract Devices]: Models of Computation
    Keywords
    Distributed database, relational transducer, consistency, co-
    ordination, expressive power, cloud programming

    PhD Fellow of the Fund for Scientific Research, Flanders
    (FWO).
    Permission to make digital or hard copies of all or part of this work for personal or
    classroom use is granted without fee provided that copies are not made or distributed
    for profit or commercial advantage and that copies bear this notice and the full cita-
    tion on the first page. Copyrights for components of this work owned by others than
    ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
    publish, to post on servers or to redistribute to lists, requires prior specific permission
    and/or a fee. Request permissions from [email protected].
    PODS’14,
    June 22–27, 2014, Snowbird, UT, USA.
    Copyright 2014 ACM 978-1-4503-2375-8/14/06 ...$15.00.
    http://dx.doi.org/10.1145/2594538.2594541.
    1. INTRODUCTION
    Declarative networking is an approach where distributed
    computations are modeled and programmed using declara-
    tive formalisms based on extensions of Datalog. On a logical
    level, programs (queries) are specified over a global schema
    and are computed by multiple computing nodes over which
    the input database is distributed. These nodes can perform
    local computations and communicate asynchronously with
    each other via messages. The model operates under the
    assumption that messages can never be lost but can be ar-
    bitrarily delayed. An inherent source of ine ciency in such
    systems are the global barriers raised by the need for syn-
    chronization in computing the result of queries.
    This source of ine ciency inspired Hellerstein [11] to for-
    mulate the CALM-principle which suggests a link between
    logical monotonicity on the one hand and distributed consis-
    tency without the need for coordination on the other hand.1
    A crucial property of monotone programs is that derived
    facts must never be retracted when new data arrives. The
    latter implies a simple coordination-free execution strategy:
    every node sends all relevant data to every other node in the
    network and outputs new facts from the moment they can
    be derived. No coordination is needed and the output of all
    computing nodes is consistent. This observation motivated
    Hellerstein [23] to formulate the CALM-conjecture which, in
    its revised form2, states
    “A query has a coordination-free execution strat-
    egy i↵ the query is monotone.”
    Ameloot, Neven, and Van den Bussche [13] formalized the
    conjecture in terms of relational transducer networks and
    provided a proof. Zinn, Green, and Lud¨
    ascher [32] subse-
    quently showed that there is more to this story. In particu-
    lar, they obtained that when computing nodes are increas-
    ingly more knowledgeable on how facts are distributed, in-
    creasingly more queries can be computed in a coordination-
    free manner. Zinn et al. [32] considered two extensions of
    the original transducer model introduced in [13]. In the first
    extension, here referred to as the policy-aware model, ev-
    ery computing node is aware of the facts that should be
    assigned to it and can consequently evaluate negation over
    schema relations. In the second extension, referred to as the
    1CALM stands for Consistency And Logical Monotonicity.
    2The original conjecture replaced monotone by Datalog [13].

    View full-size slide

  110. The Declarative Imperative
    Experiences and Conjectures in Distributed Logic
    Joseph M. Hellerstein
    University of California, Berkeley
    [email protected]
    ABSTRACT
    The rise of multicore processors and cloud computing is putting
    enormous pressure on the software community to find solu-
    tions to the difficulty of parallel and distributed programming.
    At the same time, there is more—and more varied—interest in
    data-centric programming languages than at any time in com-
    puting history, in part because these languages parallelize nat-
    urally. This juxtaposition raises the possibility that the theory
    of declarative database query languages can provide a foun-
    dation for the next generation of parallel and distributed pro-
    gramming languages.
    In this paper I reflect on my group’s experience over seven
    years using Datalog extensions to build networking protocols
    and distributed systems. Based on that experience, I present
    a number of theoretical conjectures that may both interest the
    database community, and clarify important practical issues in
    distributed computing. Most importantly, I make a case for
    database researchers to take a leadership role in addressing the
    impending programming crisis.
    This is an extended version of an invited lecture at the ACM
    PODS 2010 conference [32].
    1. INTRODUCTION
    This year marks the forty-fifth anniversary of Gordon Moore’s
    paper laying down the Law: exponential growth in the density
    of transistors on a chip. Of course Moore’s Law has served
    more loosely to predict the doubling of computing efficiency
    every eighteen months. This year is a watershed: by the loose
    accounting, computers should be 1 Billion times faster than
    they were when Moore’s paper appeared in 1965.
    Technology forecasters appear cautiously optimistic that Moore’s
    Law will hold steady over the coming decade, in its strict in-
    terpretation. But they also predict a future in which continued
    exponentiation in hardware performance will only be avail-
    able via parallelism. Given the difficulty of parallel program-
    ming, this prediction has led to an unusually gloomy outlook
    for computing in the coming years.
    At the same time that these storm clouds have been brew-
    ing, there has been a budding resurgence of interest across
    the software disciplines in data-centric computation, includ-
    ing declarative programming and Datalog. There is more—
    and more varied—applied activity in these areas than at any
    point in memory.
    The juxtaposition of these trends presents stark alternatives.
    Will the forecasts of doom and gloom materialize in a storm
    that drowns out progress in computing? Or is this the long-
    delayed catharsis that will wash away today’s thicket of im-
    perative languages, preparing the ground for a more fertile
    declarative future? And what role might the database com-
    munity play in shaping this future, having sowed the seeds of
    Datalog over the last quarter century?
    Before addressing these issues directly, a few more words
    about both crisis and opportunity are in order.
    1.1 Urgency: Parallelism
    I would be panicked if I were in industry.
    — John Hennessy, President, Stanford University [35]
    The need for parallelism is visible at micro and macro scales.
    In microprocessor development, the connection between the
    “strict” and “loose” definitions of Moore’s Law has been sev-
    ered: while transistor density is continuing to grow exponen-
    tially, it is no longer improving processor speeds. Instead, chip
    manufacturers are packing increasing numbers of processor
    cores onto each chip, in reaction to challenges of power con-
    sumption and heat dissipation. Hence Moore’s Law no longer
    predicts the clock speed of a chip, but rather its offered degree
    of parallelism. And as a result, traditional sequential programs
    will get no faster over time. For the first time since Moore’s
    paper was published, the hardware community is at the mercy
    of software: only programmers can deliver the benefits of the
    Law to the people.
    At the same time, Cloud Computing promises to commodi-
    tize access to large compute clusters: it is now within the bud-
    get of individual developers to rent massive resources in the
    worlds’ largest computing centers. But again, this computing
    potential will go untapped unless those developers can write
    programs that harness parallelism, while managing the hetero-
    geneity and component failures endemic to very large clusters
    of distributed computers.
    Unfortunately, parallel and distributed programming today
    is challenging even for the best programmers, and unwork-
    able for the majority. In his Turing lecture, Jim Gray pointed
    to discouraging trends in the cost of software development,
    and presented Automatic Programming as the twelfth of his
    dozen grand challenges for computing [26]: develop methods
    to build software with orders of magnitude less code and ef-
    fort. As presented in the Turing lecture, Gray’s challenge con-
    cerned sequential programming. The urgency and difficulty of
    his twelfth challenge has grown markedly with the technology
    SIGMOD Record, March 2010 (Vol. 39, No. 1) 5
    Declarative Networking:
    Language, Execution and Optimization
    Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗
    Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗
    ∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison
    ABSTRACT
    The networking and distributed systems communities have recently
    explored a variety of new network architectures, both for application-
    level overlay networks, and as prototypes for a next-generation In-
    ternet architecture. In this context, we have investigated declara-
    tive networking: the use of a distributed recursive query engine as
    a powerful vehicle for accelerating innovation in network architec-
    tures [23, 24, 33]. Declarative networking represents a significant
    new application area for database research on recursive query pro-
    cessing. In this paper, we address fundamental database issues in
    this domain. First, we motivate and formally define the Network
    Datalog (NDlog) language for declarative network specifications.
    Second, we introduce and prove correct relaxed versions of the tra-
    ditional semi-na¨
    ıve query evaluation technique, to overcome fun-
    damental problems of the traditional technique in an asynchronous
    distributed setting. Third, we consider the dynamics of network
    state, and formalize the “eventual consistency” of our programs even
    when bursts of updates can arrive in the midst of query execution.
    Fourth, we present a number of query optimization opportunities
    that arise in the declarative networking context, including applica-
    tions of traditional techniques as well as new optimizations. Last,
    we present evaluation results of the above ideas implemented in our
    P2 declarative networking system, running on 100 machines over
    the Emulab network testbed.
    1. INTRODUCTION
    The database literature has a rich tradition of research on recursive
    query languages and processing. This work has influenced com-
    mercial database systems to a certain extent. However, recursion
    is still considered an esoteric feature by most practitioners, and re-
    search in the area has had limited practical impact. Even within
    the database research community, there is longstanding controversy
    over the practical relevance of recursive queries, going back at least
    to the Laguna Beach Report [7], and continuing into relatively re-
    cent textbooks [35].
    In more recent work, we have made the case that recursive query
    technology has a natural application in the design of Internet infras-
    tructure. We presented an approach called declarative networking
    ∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a
    gift from Microsoft.
    Permission to make digital or hard copies of all or part of this work for
    personal or classroom use is granted without fee provided that copies are
    not made or distributed for profit or commercial advantage and that copies
    bear this notice and the full citation on the first page. To copy otherwise, to
    republish, to post on servers or to redistribute to lists, requires prior specific
    permission and/or a fee.
    SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA.
    Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00.
    that enables declarative specification and deployment of distributed
    protocols and algorithms via distributed recursive queries over net-
    work graphs [23, 24, 33]. We recently described how we imple-
    mented and deployed this concept in a system called P2 [23, 33].
    Our high-level goal is to provide a software environment that can
    accelerate the process of specifying, implementing, experimenting
    with and evolving designs for network architectures.
    Declarative networking is part of a larger effort to revisit the cur-
    rent Internet Architecture, which is considered by many researchers
    to be fundamentally ill-suited to handle today’s network uses and
    abuses [13]. While radical new architectures are being proposed
    for a “clean slate” design, there are also many efforts to develop
    application-level “overlay” networks on top of the current Internet,
    to prototype and roll out new network services in an evolutionary
    fashion [26]. Whether one is a proponent of revolution or evolution
    in this context, there is agreement that we are entering a period of
    significant flux in network services, protocols and architectures.
    In such an environment, innovation can be better focused and ac-
    celerated by having the right software tools at hand. Declarative
    query approaches appear to be one of the most promising avenues
    for dealing with the complexity of prototyping, deploying and evolv-
    ing new network architectures. The forwarding tables in network
    routing nodes can be regarded as a view over changing ground state
    (network links, nodes, load, operator policies, etc.), and this view is
    kept correct by the maintenance of distributed queries over this state.
    These queries are necessarily recursive, maintaining facts about ar-
    bitrarily long multi-hop paths over a network of single-hop links.
    Our initial forays into declarative networking have been promis-
    ing. First, in declarative routing [24], we demonstrated that recur-
    sive queries can be used to express a variety of well-known wired
    and wireless routing protocols in a compact and clean fashion, typ-
    ically in a handful of lines of program code. We also showed that
    the declarative approach can expose fundamental connections: for
    example, the query specifications for two well-known protocols –
    one for wired networks and one for wireless – differ only in the or-
    der of two predicates in a single rule body. Moreover, higher-level
    routing concepts (e.g., QoS constraints) can be achieved via simple
    modifications to the queries. Second, in declarative overlays [23],
    we extended our framework to support more complex application-
    level overlay networks such as multicast overlays and distributed
    hash tables (DHTs). We demonstrated a working implementation of
    the Chord [34] overlay lookup network specified in 47 Datalog-like
    rules, versus thousands of lines of C++ for the original version.
    Our declarative approach to networking promises not only flexibil-
    ity and compactness of specification, but also the potential to stat-
    ically check network protocols for security and correctness proper-
    ties [11]. In addition, dynamic runtime checks to test distributed
    properties of the network can easily be expressed as declarative
    queries, providing a uniform framework for network specification,
    monitoring and debugging [33].
    A
    Relational Transducers for Declarative Networking
    TOM J. AMELOOT, Hasselt University & Transnational University of Limburg
    FRANK NEVEN, Hasselt University & Transnational University of Limburg
    JAN VAN DEN BUSSCHE, Hasselt University & Transnational University of Limburg
    Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a
    formal computation model for “eventually consistent” distributed querying, based on relational transduc-
    ers. A tight link has been conjectured between coordination-freeness of computations, and monotonicity
    of the queries expressed by such computations. Indeed, we propose a formal definition of coordination-
    freeness and confirm that the class of monotone queries is captured by coordination-free transducer net-
    works. Coordination-freeness is a semantic property, but the syntactic class of “oblivious” transducers we
    define also captures the same class of monotone queries. Transducer networks that are not coordination-free
    are much more powerful.
    Categories and Subject Descriptors: H.2 [
    Database Management
    ]: Languages; H.2 [
    Database Manage-
    ment
    ]: Systems—Distributed databases; F.1 [
    Computation by Abstract Devices
    ]: Models of Compu-
    tation
    General Terms: languages, theory
    Additional Key Words and Phrases: distributed database, relational transducer, monotonicity, expressive
    power, cloud programming
    ACM Reference Format:
    AMELOOT, T. J., NEVEN, F. and VAN DEN BUSSCHE, J. 2011. Relational Transducers for Declarative
    Networking. J. ACM V, N, Article A (January YYYY), 37 pages.
    DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
    1. INTRODUCTION
    Declarative networking [Loo et al. 2009] is a recent approach by which distributed compu-
    tations and networking protocols are modeled and programmed using formalisms based on
    Datalog. In his keynote speech at PODS 2010 [Hellerstein 2010a; Hellerstein 2010b], Heller-
    stein made a number of intriguing conjectures concerning the expressiveness of declarative
    networking. In the present paper, we are focusing on the CALM conjecture (Consistency
    And Logical Monotonicity). This conjecture suggests a strong link between, on the one hand,
    “eventually consistent” and “coordination-free” distributed computations, and on the other
    hand, expressibility in monotonic Datalog (without negation or aggregate functions). The
    conjecture was not fully formalized, however; indeed, as Hellerstein notes himself, a proper
    treatment of this conjecture requires crisp definitions of eventual consistency and coordina-
    tion, which have been lacking so far. Moreover, it also requires a formal model of distributed
    computation.
    Tom J. Ameloot is a PhD Fellow of the Fund for Scientific Research, Flanders (FWO).
    Author’s email addresses: [email protected], [email protected],
    [email protected].
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is
    granted without fee provided that copies are not made or distributed for profit or commercial advantage
    and that copies show this notice on the first page or initial screen of a display along with the full citation.
    Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
    credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
    component of this work in other works requires prior specific permission and/or a fee. Permissions may be
    requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA,
    fax +1 (212) 869-0481, or [email protected].
    c
    • YYYY ACM 0004-5411/YYYY/01-ARTA $15.00
    DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
    Journal of the ACM, Vol. V, No. N, Article A, Publication date: January YYYY.
    ➔ CALM Conjecture

    [Hellerstein, PODS ’10, SIGMOD Record 2010]
    ➔ Monotonicity => Consistency 

    [Abiteboul PODS 2011, Loo et al., SIGMOD 2006]
    ➔ Relational Transducer Proofs

    [Ameloot, et al. PODS 2012, JACM 2013]

    [Ameloot et al. PODS 2014]
    ➔ Napkin-sized proof

    [Hellerstein & Alvaro 2017?]
    CALM History
    Weaker Forms of Monotonicity for Declarative Networking:
    a More Fine-grained Answer to the CALM-conjecture
    Tom J. Ameloot

    Hasselt University &
    transnational University of Limburg
    [email protected]
    Bas Ketsman
    Hasselt University &
    transnational University of Limburg
    [email protected]
    Frank Neven
    Hasselt University &
    transnational University of Limburg
    [email protected]
    Daniel Zinn
    LogicBlox, Inc
    [email protected]
    ABSTRACT
    The CALM-conjecture, first stated by Hellerstein [23] and
    proved in its revised form by Ameloot et al. [13] within the
    framework of relational transducer networks, asserts that
    a query has a coordination-free execution strategy if and
    only if the query is monotone. Zinn et al. [32] extended the
    framework of relational transducer networks to allow for spe-
    cific data distribution strategies and showed that the non-
    monotone win-move query is coordination-free for domain-
    guided data distributions. In this paper, we complete the
    story by equating increasingly larger classes of coordination-
    free computations with increasingly weaker forms of mono-
    tonicity and make Datalog variants explicit that capture
    each of these classes. One such fragment is based on strati-
    fied Datalog where rules are required to be connected with
    the exception of the last stratum. In addition, we charac-
    terize coordination-freeness as those computations that do
    not require knowledge about all other nodes in the network,
    and therefore, can not globally coordinate. The results in
    this paper can be interpreted as a more fine-grained answer
    to the CALM-conjecture.
    Categories and Subject Descriptors
    H.2 [Database Management]: Languages; H.2 [Database
    Management]: Systems—Distributed databases; F.1 [Com-
    putation by Abstract Devices]: Models of Computation
    Keywords
    Distributed database, relational transducer, consistency, co-
    ordination, expressive power, cloud programming

    PhD Fellow of the Fund for Scientific Research, Flanders
    (FWO).
    Permission to make digital or hard copies of all or part of this work for personal or
    classroom use is granted without fee provided that copies are not made or distributed
    for profit or commercial advantage and that copies bear this notice and the full cita-
    tion on the first page. Copyrights for components of this work owned by others than
    ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
    publish, to post on servers or to redistribute to lists, requires prior specific permission
    and/or a fee. Request permissions from [email protected].
    PODS’14,
    June 22–27, 2014, Snowbird, UT, USA.
    Copyright 2014 ACM 978-1-4503-2375-8/14/06 ...$15.00.
    http://dx.doi.org/10.1145/2594538.2594541.
    1. INTRODUCTION
    Declarative networking is an approach where distributed
    computations are modeled and programmed using declara-
    tive formalisms based on extensions of Datalog. On a logical
    level, programs (queries) are specified over a global schema
    and are computed by multiple computing nodes over which
    the input database is distributed. These nodes can perform
    local computations and communicate asynchronously with
    each other via messages. The model operates under the
    assumption that messages can never be lost but can be ar-
    bitrarily delayed. An inherent source of ine ciency in such
    systems are the global barriers raised by the need for syn-
    chronization in computing the result of queries.
    This source of ine ciency inspired Hellerstein [11] to for-
    mulate the CALM-principle which suggests a link between
    logical monotonicity on the one hand and distributed consis-
    tency without the need for coordination on the other hand.1
    A crucial property of monotone programs is that derived
    facts must never be retracted when new data arrives. The
    latter implies a simple coordination-free execution strategy:
    every node sends all relevant data to every other node in the
    network and outputs new facts from the moment they can
    be derived. No coordination is needed and the output of all
    computing nodes is consistent. This observation motivated
    Hellerstein [23] to formulate the CALM-conjecture which, in
    its revised form2, states
    “A query has a coordination-free execution strat-
    egy i↵ the query is monotone.”
    Ameloot, Neven, and Van den Bussche [13] formalized the
    conjecture in terms of relational transducer networks and
    provided a proof. Zinn, Green, and Lud¨
    ascher [32] subse-
    quently showed that there is more to this story. In particu-
    lar, they obtained that when computing nodes are increas-
    ingly more knowledgeable on how facts are distributed, in-
    creasingly more queries can be computed in a coordination-
    free manner. Zinn et al. [32] considered two extensions of
    the original transducer model introduced in [13]. In the first
    extension, here referred to as the policy-aware model, ev-
    ery computing node is aware of the facts that should be
    assigned to it and can consequently evaluate negation over
    schema relations. In the second extension, referred to as the
    1CALM stands for Consistency And Logical Monotonicity.
    2The original conjecture replaced monotone by Datalog [13].

    View full-size slide

  111. ➔ Immerman-Vardi Theorem
    ➔ Same monotonicity as CALM?!
    ➔ Consistency <=> Monotonicity <=> PTIME!
    ➔ Can avoid coordination for all polynomial-time
    computations?!
    An Intriguing Connection

    View full-size slide

  112. An Intriguing Connection
    ➔ Immerman-Vardi Theorem
    ➔ Same monotonicity as CALM?!
    ➔ Consistency <=> Monotonicity <=> PTIME!
    ➔ Can avoid coordination for all polynomial-
    time computations?!

    View full-size slide

  113. An Intriguing Connection
    ➔ Immerman-Vardi Theorem
    ➔ Same monotonicity as CALM?!
    ➔ Consistency <=> Monotonicity <=> PTIME!
    ➔ Can avoid coordination for all polynomial-
    time computations?!

    View full-size slide

  114. 1. Fluent: disorderly programming toolkit
    ➔C++ Libraries
    ➔Lattices, Relational Algebra
    ➔Rule Registry/Execution
    ➔Static C++ typechecking via
    template metaprogramming
    ➔ Fluent Debugger
    ➔Distributed data lineage
    ➔Distributed tracing
    2. Familiar programming models
    ➔Can we skin Fluent with: RPC, State
    Machines, Actors, Futures, etc?
    Current and Future Work
    3. Dense Clouds
    ➔ High-performance, coordination-
    free code at multiple scales
    ➔Cores to servers to the globe
    4. Fundamentals
    ➔ Constructions for coordination-
    free polynomial-time programs?
    ➔General?
    ➔Code synthesis?
    ➔ Abiding theoretical questions
    ➔Stochastic CALM?
    5. Applications
    ➔ RL and Robotics?

    View full-size slide

  115. CALM
    ➔ Seek monotonicity, avoid coordination.
    ➔ Move up the stack!
    ➔ Historic focus on Read/Write API distracts from what’s possible in application logic
    Bloom
    ➔ Disorderly programming can radically simplify distributed programming
    ➔ Data-centric: be it “declarative”, “reactive”, “dataflow”, etc.
    ➔ Revolution (Bloom vs. Java?) or Evolution (Bloom vs. LLVM IR?)
    Ambitious systems work in an era of maturity?
    ➔ If it’s doable, somebody is already doing it
    ➔ Green field problems are the ones with high switching costs
    ➔ “That will never work!”
    ➔ Be patient, seek lessons along the way
    Takeaways

    View full-size slide