Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CALM and Disorderly Programming in Bloom

CALM and Disorderly Programming in Bloom

Distinguished Lecture, UC Santa Cruz Computer Science

Joe Hellerstein

June 05, 2017
Tweet

More Decks by Joe Hellerstein

Other Decks in Technology

Transcript

  1. CALM and Disorderly Programming in Joe Hellerstein UC Berkeley joint

    work with Peter Alvaro, Neil Conway, David Maier, and William Marczak bloom
  2. Distributed programming is ➔ UBIQUITOUS ➔ HARD An academic imperative!

    Minimal activity in industry Programming Distributed Systems: It’s Time to Talk
  3. Distributed programming is ➔ UBIQUITOUS ➔ HARD An academic imperative!

    Minimal activity in industry Today: one academic group’s take Lessons in theory and practice
 Initial impact in industry. Programming Distributed Systems: It’s Time to Talk
  4. Outline Software Mismatch Order and State 
 in the Cloud

    An Ideal Disorderly Programming for Distributed Systems A Realization <~ bloom Implications CALM Theorem
  5. Outline An Ideal Disorderly Programming for Distributed Systems Realization <~

    bloom Implications Software Mismatch Order and State 
 in the Cloud CALM Theorem
  6. ORDER ➔ a list of instructions ➔ an array of

    memory THE STATE ➔ mutation in time Von Neumann “Physics”
  7. ORDER ➔ a list of instructions ➔ an array of

    memory THE STATE ➔ mutation in time Von Neumann “Physics”
  8. DISORDERED TIME ➔ multiple clocks ➔ parallel computation ➔ unordered

    and 
 interleaved SHATTERED STATE ➔ local variables ➔ sharded tables ➔ message passing Cloud “Physics” v x z q r w y n
  9. our programming model 
 fit our physical reality? perhaps we

    could… ➔ scale up easily ➔ ignore race conditions ➔ tolerate faults reliably ➔ debug naturally ➔ test intelligently …and better understand 
 our fundamentals. What if… Data
  10. Outline An Ideal Disorderly Programming for Distributed Systems Realization <~

    bloom Implications Software Mismatch Order and State 
 in the Cloud CALM Theorem
  11. Let’s write code that commutes! ➔ atomic updates can be

    disaggregated ➔ replicas can update in different orders ➔ add idempotence: get retry fault tolerance OK. How? ➔ Appealing, but never actionable. Background: Kitchen Wisdom
  12. STATE ✔ mergeable types: e.g. sets, relations ❌ mutable variables,

    ordered structures (lists, dense arrays) TIME ✔ logical clocks with application semantics ❌ instruction ordering DISTRIBUTION (STATE + TIME) ✔ unification of storage and communication ❌ messaging libraries Disorderly-by-Default Programming
  13. ➔ Textbook example: 2-party communication Let Me Show You What

    I Mean ➔ The Bloom language: model and syntax ➔More sophisticated examples
  14. Let’s implement that in Bloom 20 Interfaces akin to data

    tables Speakers write messages into the speak interface Listeners insert themselves into the listen interface
  15. Let’s implement that in Bloom 21 Interfaces akin to data

    tables Speakers write messages into the speak interface Listeners insert themselves into the listen interface The hear interface is dumped to stdio for debugging
  16. Let’s implement that in Bloom 24 We include the RendezvousAPI

    verbatim. Rendezvous is a join of speak and listen
  17. Let’s implement that in Bloom 25 We include the RendezvousAPI

    verbatim. Rendezvous is a join of speak and listen We have turned communication into query processing.
  18. Outline An Ideal Disorderly Programming for Distributed Systems Realization <~

    bloom Implications CALM Theorem Software Mismatch Order and State 
 in the Cloud
  19. STATE ✔ mergeable types: tables and (other) lattices TIME ✔

    fixpoint-per-tick logical clocks DISTRIBUTION (STATE + TIME) ✔ async “shuffled” tables A Disorderly Language <~ bloom Encourages asynchronous monotonic programming,
 merging in new information opportunistically over time.
  20. Operational Model <~ bloom bloom rules { • Independent agents

    (“nodes”) • Local state & logic (can be SPMD or MIMD) • Event-driven loop: one clock “tick” per iteration One Bloom “Tick” a b c local updates NW/OS events deferred local updates (next <+) async NW/OS msgs (async <~) instantaneous merge (now <=) atomic local fixpoint }
  21. Statements <~ bloom <mergeable> <merge op> <mergeable expression> <= now

    <+ next <~ async <- del_next persistent table
 lmax,lbool,lmap… transient scratch interface networked transient channel
  22. Statements <~ bloom <mergeable> <merge op> <mergeable expression> <= now

    <+ next <~ async <- del_next persistent table
 lmax,lbool,lmap… transient scratch transient interface networked transient channel <mergeable> map, flat_map reduce, group, argmin/max (r * s).pairs empty? include? count,max,min,… >, <, >=, <= relational
 operations lattice
 functions
  23. Recall Synchronous Rendezvous 34 Rendezvous is a join of speak

    and listen We have turned communication into query processing. But what of time? This depends on perfect synchrony (luck).
  24. Recall Synchronous Rendezvous 35 Rendezvous is a join of speak

    and listen We have turned communication into query processing. But what of time? This depends on perfect synchrony (luck). Asynchronous communication requires persistence.
  25. Let’s implement that in Bloom 44 The spoken table stores

    all messages. When a listen message arrives, it can rendezvous with all prior spoken messages.

  26. Let’s implement that in Bloom 49 The listening table records

    all the agents who want notifications. 
 When a speak message arrives, it can rendezvous with all prior listeners.
  27. Let’s implement that in Bloom 53 Each rule for hear

    joins a channel (events) with a table (state).
 Computation is driven by channel arrival. Either channel can “arrive first” and hear will be populated.
  28. ➔ Up to now, rendezvous in time ➔ What about

    space? ➔ listener, sender on the same node?! Distributing this
  29. ➔ Up to now, rendezvous in time ➔ What about

    space? ➔ listener, sender on the same node?! ➔ What if listener and sender on their own nodes? ➔ need a “Join Server” ➔ and proxy logic to reroute interfaces ➔ Good news: once you solve time, space is easy! Distributing this
  30. ➔ Can hash-partition JoinServer state on subject for scaling ➔do

    this in proxy code; clients unchanged ➔ Can replicate JoinServer state for fault-tolerance ➔Many possible lattice-based consistency models ➔Indy KVS project Distributed JoinServer MapLattice key:
 any_type val:
 Version
 Lattice val: any_type vc:
 MapLattice spoken_map node:
 any_type val:
 MaxLattice
  31. ➔ We now have a distributed rendezvous protocol ➔ What

    is all this? Reflecting Back: Disorderly, Data-Centric Programming Speaker Persists Log of messages Key-Value Store
 (Database) Listener Persists Registry of listeners
 
 Publish/Subscribe
  32. Reflecting Back: Disorderly, Data-Centric Programming Speaker Persists Log of messages

    Key-Value Store
 (Database) Listener Persists Registry of listeners
 
 Publish/Subscribe Duality of storage and communication! Rendezvous over time. 
 Choice of “system type” becomes minor code change. Hybrids naturally emerge. Reduced a hard programming problem to a well-understood database problem! Data
  33. And it gets better! Speaker Persists Log of messages Key-Value

    Store
 (Database) Listener Persists Registry of listeners
 
 Publish/Subscribe Post-Hoc distribution — even of server logic Any table/channel/interface can be treated like a DB table: scale-out: shard fault tolerance: replicate It’s easy to distribute centralized code, post-hoc! Data
  34. Reflecting Back: Disorderly, Data-Centric Programming Speaker Persists Log of messages

    Key-Value Store
 (Database) Listener Persists Registry of listeners
 
 Publish/Subscribe Post-Hoc distribution Any table/channel/interface can be treated like a DB table: scale-out: shard fault tolerance: replicate It’s easy to distribute centralized code, post-hoc! What about the hard parts of distributed databases/systems? Consistency. Data
  35. ➔ Easy! We can statically check Bloom code. ➔ budplot

    looks for order-sensitive dataflows ➔ async communication causes disorder (yellow) What of Consistency?
  36. ➔ Easy! We can statically check Bloom code. ➔ budplot

    looks for order-sensitive dataflows ➔ async communication causes disorder (yellow) ➔ order-sensitive op downstream of disorder? non-deterministic! (red) What of Consistency?
  37. ➔ Easy! We can statically check Bloom code. ➔ budplot

    looks for order-sensitive dataflows ➔ async communication causes disorder (yellow) ➔ order-sensitive op downstream of disorder? non-deterministic! (red) ➔ Q: What operations are order-sensitive?
 What of Consistency?
  38. ➔ Easy! We can statically check Bloom code. ➔ budplot

    looks for order-sensitive dataflows ➔ async communication causes disorder (yellow) ➔ order-sensitive op downstream of disorder? non-deterministic! (red) ➔ Q: What operations are order-sensitive?
 A: The non-monotone ones What of Consistency?
  39. ➔ Easy! We can statically check Bloom code. ➔ budplot

    looks for order-sensitive dataflows ➔ async communication causes disorder (yellow) ➔ order-sensitive op downstream of disorder? non-deterministic! (red) ➔ Q: What operations are order-sensitive?
 A: The non-monotone ones ➔ monotone: output grows with input ➔ non-monotone: must base (partial) results on their full input (prefix) What of Consistency?
  40. ➔ Easy! We can statically check Bloom code. ➔ budplot

    looks for order-sensitive dataflows ➔ async communication causes disorder (yellow) ➔ order-sensitive op downstream of disorder? non-deterministic! (red) ➔ Q: What operations are order-sensitive?
 A: The non-monotone ones ➔ monotone: output grows with input ➔ non-monotone: must base (partial) results on their full input (prefix) What of Consistency?
  41. Statements <~ bloom <mergeable> <merge op> <mergeable expression> <= now

    <+ next <~ async <- del_next persistent table
 lmax,lbool,lmap… transient scratch transient interface networked transient channel scheduled transient periodic <mergeable> map, flat_map reduce, group, argmin/max (r * s).pairs empty? include? count,max,min,… >, <, >=, <= relational
 operations lattice
 functions
  42. ➔ Easy! We can statically check Bloom code. ➔ Yes,

    but our SpeakerPersist code is odd: ➔ All messages in an unordered set ➔ Never deletes or overwrites What of Consistency?
  43. A More Typical Speaker Persist The spoken table is now

    mutable: one value per subject.
 Arrival order of network messages should require us to think about consistency. What does static analysis say?
  44. A More Typical Speaker Persist The spoken table is now

    mutable: one value per subject.
 Arrival order of network messages should require us to think about consistency. What does static analysis say?
  45. Now what? Two options: 1.avoid non-monotonicity
 2.impose global ordering Both

    natural in Bloom. (But one is better, as we’ll discuss :-) Consistency?
  46. Now what? Two options: 1.avoid non-monotonicity
 using vector clocks 2.impose

    global ordering Both natural in Bloom. (But one is better, as we’ll discuss :-) Consistency?
  47. Monotonic Structures: Lattices 84 (Join Semi-) Lattice: An object class

    with - a merge operator (<=) that is
 Associative, Commutative and Idempotent. - a largest value See “ACID 2.0” [Campbell/Helland CIDR ’09], CRDTs [Shapiro, et al. INRIA TR 2011] 
 

  48. Vector Clocks 85 my_vc:
 MapLattice key:
 any_type val:
 MaxLattic Joe

    Phokion { }<= Peter Phokion { } Peter Phokion { }
  49. Vector Clocks in Bloom Lattices 86 my_vc:
 MapLattice key:
 any_type

    val:
 MaxLattic Joe Phokion { }<= Peter Phokion { } Bloom lets us compose these lattices just as we compose relational tables/expressions using merge rules, morphisms, and monotone functions
  50. Vector Clocks in Bloom Lattices 87 my_vc:
 MapLattice key:
 any_type

    val:
 MaxLattice state do lmap :my_vc end bootstrap do my_vc <= {ip_port => Bud::MaxLattice.new(0)} end Bloom lets us compose these lattices just as we compose relational tables/expressions using merge rules, morphisms, and monotone functions
  51. • Initially all clocks are zero. Vector Clocks: bloom v.

    wikipedia bootstrap do
 my_vc <= 
 {ip_port => Bud::MaxLattice.new(0)} 
 end bloom do
 next_vc <= out_msg 
 { {ip_port => my_vc.at(ip_port) + 1} } out_msg_vc <= out_msg 
 {|m| [m.addr, m.payload, next_vc]} 
 next_vc <= in_msg 
 { {ip_port => my_vc.at(ip_port) + 1} } 
 next_vc <= my_vc
 next_vc <= in_msg {|m| m.clock}
 my_vc <+ next_vc
 end • Each time a process receives a message, it increments its own logical clock in the vector by one • Each time a process prepares to send a message, it increments its own logical clock in the vector by one • Each time a process experiences an internal event, it increments its own logical clock in the vector by one. and then sends its entire vector along with the message being sent. and updates each element in its vector by taking the maximum of the value in its own vector clock and the value in the vector in the received message (for every element). [“Logic and Lattices”, Conway, et al. SOCC 2012]
  52. Now what? Two options: 1.avoid non-monotonicity
 2.impose global ordering
 using

    Paxos Both natural in Bloom. (But one is better, as we’ll discuss :-) Consistency?
  53. Paxos: pseudocode v. bloom 1. Priest p chooses a new

    ballot number b greater than lastTried [p], sets lastTried [p] to b, and sends a NextBallot (b) message to some set of priests.
 2. Upon receipt of a NextBallot (b) message from p with b > nextBal [q], priest q sets nextBal [q] to b and sends a LastVote (b, v) message to p, where v equals prevVote [q]. (A NextBallot (b) message is ignored if b < nextBal [q].)
 3. After receiving a LastVote (b, v) message from every priest in some majority set Q, where b = lastTried [p], priest p initiates a new ballot with number b, quorum Q, and decree d, where d is chosen to satisfy B3. He then sends a BeginBallot (b, d) message to every priest in Q.
 4. Upon receipt of a BeginBallot (b,d) message with b = nextBal [q], priest q casts his vote in ballot number b, sets prevVote [q] to this vote, and sends a Voted (b, q) message to p. (A BeginBallot (b, d) message is ignored if b = nextBal [q].)
 5. If p has received a Voted (b, q) message from every priest q in Q (the quorum for ballot number b), where b = lastTried [p], then he writes d (the decree of that ballot) in his ledger and sends a Success (d) message to every priest. lastTried <= (lastTried*nextBallot).pairs(priest=>priest) { |l,n| [l.priest, n.bnum] if n.bnum >= l.old } nextBallot <= (decreeRequest*lastTried*priestCnt).combos (decreeRequest.priest => lastTried.priest, lastTried.priest => priestCnt.priest) { |d,l,p| [d.priest, l.old+p.cnt, d.decree] } sendNextBallot <~ (nextBallot*parliament).pairs(n.priest=>p.priest) { |n,p| 
 [p.peer, n.ballot, n.decree, n.priest] } nextBal <= (nextBal*lastVote).pairs(priest=>priest) { |n,l| [n.priest, l.ballot] if l.ballot >= n.old } lastVote <= (sendNextBallot, prevVote).pairs(priest=>priest) { |s,p| [s.priest, s.ballot, p.oldBallod, p.oldDecree, s.peer] if s.ballot >= p.oldBallot } sendLastVote <~ lastVote { |l| [priest, ballot, oldBallot, decree, lord] } priestCnt <= parliament.group([lord], count) lastVoteCnt <= sendLastVote.group([lord, ballot], count(Priest)) maxPrevBallot <= sendLastVote.group([lord], max(oldBallot)) quorum(Lord,Ballot) <= (priestCnt*lastVoteCnt).pairs(lord=>lord) { |p,l| [p.lord, p.pcnt] if vnct > (pcnt / 2) } beginBallot <= (quorum*maxPrevBallot*nextBallot*sendLastVote).combos (quorum.ballot => nextBallot.ballot, nextBallot.ballot => sendLastVote.ballot quorum.lord => maxPrevBallot.lord, maxPrevBallot.lord => nextBallot.lord, nextBallot.lord => sendLastVote.lord, maxPrevBallot.maxB => sendLastVote.maxB) {|q,m,n,s| m.maxB == -1 ? [q.lord, q.ballot, n.decree] : [q.lord, q.ballot, l.oldDecree] } sendBeginBallot <~ (beginBallot*parliament).pairs(lord=>lord) {|b,p| [l.priest, b.ballot, b.decree, b.lord] } vote <= (sendBeginBallot*nextBal).pairs(priest=>priest, ballot=>oldB) {|s.n| [s.priest, s.ballot, s.decree] } prevVote <= (prevVote*lastVote*vote).combos (prevVote.priest => lastVote.priest, lastVote.priest => vote.priest, lastVote.ballot => vote.ballot, l.decree => v.decree) { |p,l,v| [p.priest, l.ballot, l.decree] if l.ballot >= p.old } sendVote <~ (vote*sendBeginBallot).pairs(priest=>priest,ballot=>ballot, decree=>decree) {|v,s| [s.lord, v.ballot, v.decree, v.priest] } voteCnt <= sendVote.group([lord, ballot], count(priest)) decree <= (lastTried*voteCnt*lastVoteCnt*beginBallot).combos (lastTried.lord=>voteCnt.lord, voteCnt.lord=>lastVoteCnt.lord, lastVoteCnt.lord=>beginBallot.lord, lastTried.ballot=>voteCnt.ballot, voteCnt.ballot=>lastVoteCnt.ballot, voteCnt.ballot=>beginBallot.ballot, voteCnt.votes => lastVoteCnt.votes) {|lt, v, lv, b| [lt.lord, lt.ballot, b.decree]} 90 [“BOOM Analytics”, Alvaro, et al. Eurosys 2010]
  54. ➔ scale up easily ➔ ignore race conditions ➔ tolerate

    faults reliably ➔ debug naturally ➔ test intelligently How did we do?
  55. ➔ How to test end-to-end Fault Tolerance? ➔ Lineage-Driven Fault

    Injection (LDFI) ➔ Molly: an LDFI system 
 [Alvaro, et al. SIGMOD 2015] ➔ Deployment at Netflix
 [Alvaro, et al. SOCC 2016] Testing Distributed Systems for Fault Tolerance
  56. ➔ scale up easily ➔ ignore race conditions ➔ tolerate

    faults reliably ➔ debug naturally ➔ test intelligently How did we do? Data
  57. ➔ Proof Point at Scale: BOOM Analytics [Alvaro, et al.

    Eurosys 2009] ➔ HDFS and Hadoop scale-out ➔ Industry Adoption of LDFI @ Netflix [Alvaro et al. 2016] ➔ Full-featured Ruby Interpreter for Bloom: 
 (https://github.com/bloom-lang/bud) ➔ Current work: ➔ Fluent: C++ compilation-based Bloom 
 (https://github.com/ucbrise/fluent) ➔ Indy: A dense, elastic Key-Value Store in Fluent How Real is All This?
  58. ➔ Multicore-performant Indy: A Key-Value Store [C. Wu, et al.

    2017] ➔Smooth elastic scaling across nodes
  59. ➔ Smooth scaling across datacenters Indy: A Key-Value Store [C.

    Wu, et al. 2017] ➔Implements all known coordination-avoiding consistency models from [Bailis, et al. VLDB ’14]
  60. Realization <~ bloom Outline An Ideal Disorderly Programming for Distributed

    Systems Implications CALM Theorem Software Mismatch Order and State 
 in the Cloud
  61. Consistency is Good! But at what cost? Two options: 1.

    avoid non-monotonicity
 2. impose global ordering via coordination
  62. Two options: 1. avoid non-monotonicity
 2. impose global ordering via

    coordination But coordination is expensive! Two-Phase Commit, Paxos, Virtual Synchrony, Raft, etc. Require nodes to send messages and wait for responses Consistency is Good! But at what cost?
  63. Distributed Systems Poetry “The first principle of successful scalability is

    to batter the consistency mechanisms down to a minimum move them off the critical path hide them in a rarely visited corner of the system and then make it as hard as possible for application developers to get permission to use them” —James Hamilton (IBM, MS, Amazon) 
 in Birman, Chockler: “Toward a Cloud Computing Research Agenda”, LADIS 2009
  64. Two options: 1. avoid non-monotonicity
 2. impose global ordering via

    coordination Consistency is Good! But at what cost?
  65. ➔ What computations require coordination for consistency? ➔ What computations

    can avoid coordination consistently? Questions Deserving Answers
  66. ➔ What computations require coordination for consistency? ➔ What computations

    can avoid coordination consistently? Questions Deserving Answers
  67. ➔ What computations require coordination for consistency? ➔ What computations

    can avoid coordination consistently? Questions Deserving Answers
  68. Consistency As Logical Monotonicity {coordination-free consistent} <=> {monotonically expressable} 


    ➔ Avoid coordination for monotonic programs! ➔ no waiting! ➔ Monotonic programs are CAP-busters ➔ Consistent and Available during Partitions The CALM Theorem: A Bright Line
  69. Intuition from Rendezvous ➔ Happy case: monotonicity. It “streams”! ➔At

    any time, output ⊆ final result ➔After all messages, output is maximal ➔Implication: deterministic outcome! Problem: non-monotonicity. Can’t “stream”. Intermediate result ⊄ final result New input refutes previous output No output until you get entire input. Ensuring entire input? Coordination. Only works for BothPersist!!!
  70. The Declarative Imperative Experiences and Conjectures in Distributed Logic Joseph

    M. Hellerstein University of California, Berkeley [email protected] ABSTRACT The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solu- tions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in com- puting history, in part because these languages parallelize nat- urally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foun- dation for the next generation of parallel and distributed pro- gramming languages. In this paper I reflect on my group’s experience over seven years using Datalog extensions to build networking protocols and distributed systems. Based on that experience, I present a number of theoretical conjectures that may both interest the database community, and clarify important practical issues in distributed computing. Most importantly, I make a case for database researchers to take a leadership role in addressing the impending programming crisis. This is an extended version of an invited lecture at the ACM PODS 2010 conference [32]. 1. INTRODUCTION This year marks the forty-fifth anniversary of Gordon Moore’s paper laying down the Law: exponential growth in the density of transistors on a chip. Of course Moore’s Law has served more loosely to predict the doubling of computing efficiency every eighteen months. This year is a watershed: by the loose accounting, computers should be 1 Billion times faster than they were when Moore’s paper appeared in 1965. Technology forecasters appear cautiously optimistic that Moore’s Law will hold steady over the coming decade, in its strict in- terpretation. But they also predict a future in which continued exponentiation in hardware performance will only be avail- able via parallelism. Given the difficulty of parallel program- ming, this prediction has led to an unusually gloomy outlook for computing in the coming years. At the same time that these storm clouds have been brew- ing, there has been a budding resurgence of interest across the software disciplines in data-centric computation, includ- ing declarative programming and Datalog. There is more— and more varied—applied activity in these areas than at any point in memory. The juxtaposition of these trends presents stark alternatives. Will the forecasts of doom and gloom materialize in a storm that drowns out progress in computing? Or is this the long- delayed catharsis that will wash away today’s thicket of im- perative languages, preparing the ground for a more fertile declarative future? And what role might the database com- munity play in shaping this future, having sowed the seeds of Datalog over the last quarter century? Before addressing these issues directly, a few more words about both crisis and opportunity are in order. 1.1 Urgency: Parallelism I would be panicked if I were in industry. — John Hennessy, President, Stanford University [35] The need for parallelism is visible at micro and macro scales. In microprocessor development, the connection between the “strict” and “loose” definitions of Moore’s Law has been sev- ered: while transistor density is continuing to grow exponen- tially, it is no longer improving processor speeds. Instead, chip manufacturers are packing increasing numbers of processor cores onto each chip, in reaction to challenges of power con- sumption and heat dissipation. Hence Moore’s Law no longer predicts the clock speed of a chip, but rather its offered degree of parallelism. And as a result, traditional sequential programs will get no faster over time. For the first time since Moore’s paper was published, the hardware community is at the mercy of software: only programmers can deliver the benefits of the Law to the people. At the same time, Cloud Computing promises to commodi- tize access to large compute clusters: it is now within the bud- get of individual developers to rent massive resources in the worlds’ largest computing centers. But again, this computing potential will go untapped unless those developers can write programs that harness parallelism, while managing the hetero- geneity and component failures endemic to very large clusters of distributed computers. Unfortunately, parallel and distributed programming today is challenging even for the best programmers, and unwork- able for the majority. In his Turing lecture, Jim Gray pointed to discouraging trends in the cost of software development, and presented Automatic Programming as the twelfth of his dozen grand challenges for computing [26]: develop methods to build software with orders of magnitude less code and ef- fort. As presented in the Turing lecture, Gray’s challenge con- cerned sequential programming. The urgency and difficulty of his twelfth challenge has grown markedly with the technology SIGMOD Record, March 2010 (Vol. 39, No. 1) 5 ➔ CALM Conjecture
 [Hellerstein, PODS ’10, SIGMOD Record 2010] CALM History
  71. The Declarative Imperative Experiences and Conjectures in Distributed Logic Joseph

    M. Hellerstein University of California, Berkeley [email protected] ABSTRACT The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solu- tions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in com- puting history, in part because these languages parallelize nat- urally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foun- dation for the next generation of parallel and distributed pro- gramming languages. In this paper I reflect on my group’s experience over seven years using Datalog extensions to build networking protocols and distributed systems. Based on that experience, I present a number of theoretical conjectures that may both interest the database community, and clarify important practical issues in distributed computing. Most importantly, I make a case for database researchers to take a leadership role in addressing the impending programming crisis. This is an extended version of an invited lecture at the ACM PODS 2010 conference [32]. 1. INTRODUCTION This year marks the forty-fifth anniversary of Gordon Moore’s paper laying down the Law: exponential growth in the density of transistors on a chip. Of course Moore’s Law has served more loosely to predict the doubling of computing efficiency every eighteen months. This year is a watershed: by the loose accounting, computers should be 1 Billion times faster than they were when Moore’s paper appeared in 1965. Technology forecasters appear cautiously optimistic that Moore’s Law will hold steady over the coming decade, in its strict in- terpretation. But they also predict a future in which continued exponentiation in hardware performance will only be avail- able via parallelism. Given the difficulty of parallel program- ming, this prediction has led to an unusually gloomy outlook for computing in the coming years. At the same time that these storm clouds have been brew- ing, there has been a budding resurgence of interest across the software disciplines in data-centric computation, includ- ing declarative programming and Datalog. There is more— and more varied—applied activity in these areas than at any point in memory. The juxtaposition of these trends presents stark alternatives. Will the forecasts of doom and gloom materialize in a storm that drowns out progress in computing? Or is this the long- delayed catharsis that will wash away today’s thicket of im- perative languages, preparing the ground for a more fertile declarative future? And what role might the database com- munity play in shaping this future, having sowed the seeds of Datalog over the last quarter century? Before addressing these issues directly, a few more words about both crisis and opportunity are in order. 1.1 Urgency: Parallelism I would be panicked if I were in industry. — John Hennessy, President, Stanford University [35] The need for parallelism is visible at micro and macro scales. In microprocessor development, the connection between the “strict” and “loose” definitions of Moore’s Law has been sev- ered: while transistor density is continuing to grow exponen- tially, it is no longer improving processor speeds. Instead, chip manufacturers are packing increasing numbers of processor cores onto each chip, in reaction to challenges of power con- sumption and heat dissipation. Hence Moore’s Law no longer predicts the clock speed of a chip, but rather its offered degree of parallelism. And as a result, traditional sequential programs will get no faster over time. For the first time since Moore’s paper was published, the hardware community is at the mercy of software: only programmers can deliver the benefits of the Law to the people. At the same time, Cloud Computing promises to commodi- tize access to large compute clusters: it is now within the bud- get of individual developers to rent massive resources in the worlds’ largest computing centers. But again, this computing potential will go untapped unless those developers can write programs that harness parallelism, while managing the hetero- geneity and component failures endemic to very large clusters of distributed computers. Unfortunately, parallel and distributed programming today is challenging even for the best programmers, and unwork- able for the majority. In his Turing lecture, Jim Gray pointed to discouraging trends in the cost of software development, and presented Automatic Programming as the twelfth of his dozen grand challenges for computing [26]: develop methods to build software with orders of magnitude less code and ef- fort. As presented in the Turing lecture, Gray’s challenge con- cerned sequential programming. The urgency and difficulty of his twelfth challenge has grown markedly with the technology SIGMOD Record, March 2010 (Vol. 39, No. 1) 5 Declarative Networking: Language, Execution and Optimization Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗ Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗ ∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison ABSTRACT The networking and distributed systems communities have recently explored a variety of new network architectures, both for application- level overlay networks, and as prototypes for a next-generation In- ternet architecture. In this context, we have investigated declara- tive networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architec- tures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query pro- cessing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifications. Second, we introduce and prove correct relaxed versions of the tra- ditional semi-na¨ ıve query evaluation technique, to overcome fun- damental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the “eventual consistency” of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applica- tions of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed. 1. INTRODUCTION The database literature has a rich tradition of research on recursive query languages and processing. This work has influenced com- mercial database systems to a certain extent. However, recursion is still considered an esoteric feature by most practitioners, and re- search in the area has had limited practical impact. Even within the database research community, there is longstanding controversy over the practical relevance of recursive queries, going back at least to the Laguna Beach Report [7], and continuing into relatively re- cent textbooks [35]. In more recent work, we have made the case that recursive query technology has a natural application in the design of Internet infras- tructure. We presented an approach called declarative networking ∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a gift from Microsoft. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA. Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00. that enables declarative specification and deployment of distributed protocols and algorithms via distributed recursive queries over net- work graphs [23, 24, 33]. We recently described how we imple- mented and deployed this concept in a system called P2 [23, 33]. Our high-level goal is to provide a software environment that can accelerate the process of specifying, implementing, experimenting with and evolving designs for network architectures. Declarative networking is part of a larger effort to revisit the cur- rent Internet Architecture, which is considered by many researchers to be fundamentally ill-suited to handle today’s network uses and abuses [13]. While radical new architectures are being proposed for a “clean slate” design, there are also many efforts to develop application-level “overlay” networks on top of the current Internet, to prototype and roll out new network services in an evolutionary fashion [26]. Whether one is a proponent of revolution or evolution in this context, there is agreement that we are entering a period of significant flux in network services, protocols and architectures. In such an environment, innovation can be better focused and ac- celerated by having the right software tools at hand. Declarative query approaches appear to be one of the most promising avenues for dealing with the complexity of prototyping, deploying and evolv- ing new network architectures. The forwarding tables in network routing nodes can be regarded as a view over changing ground state (network links, nodes, load, operator policies, etc.), and this view is kept correct by the maintenance of distributed queries over this state. These queries are necessarily recursive, maintaining facts about ar- bitrarily long multi-hop paths over a network of single-hop links. Our initial forays into declarative networking have been promis- ing. First, in declarative routing [24], we demonstrated that recur- sive queries can be used to express a variety of well-known wired and wireless routing protocols in a compact and clean fashion, typ- ically in a handful of lines of program code. We also showed that the declarative approach can expose fundamental connections: for example, the query specifications for two well-known protocols – one for wired networks and one for wireless – differ only in the or- der of two predicates in a single rule body. Moreover, higher-level routing concepts (e.g., QoS constraints) can be achieved via simple modifications to the queries. Second, in declarative overlays [23], we extended our framework to support more complex application- level overlay networks such as multicast overlays and distributed hash tables (DHTs). We demonstrated a working implementation of the Chord [34] overlay lookup network specified in 47 Datalog-like rules, versus thousands of lines of C++ for the original version. Our declarative approach to networking promises not only flexibil- ity and compactness of specification, but also the potential to stat- ically check network protocols for security and correctness proper- ties [11]. In addition, dynamic runtime checks to test distributed properties of the network can easily be expressed as declarative queries, providing a uniform framework for network specification, monitoring and debugging [33]. ➔ CALM Conjecture
 [Hellerstein, PODS ’10, SIGMOD Record 2010] ➔ Monotonicity => Consistency 
 [Abiteboul PODS 2011, Loo et al., SIGMOD 2006] CALM History
  72. The Declarative Imperative Experiences and Conjectures in Distributed Logic Joseph

    M. Hellerstein University of California, Berkeley [email protected] ABSTRACT The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solu- tions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in com- puting history, in part because these languages parallelize nat- urally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foun- dation for the next generation of parallel and distributed pro- gramming languages. In this paper I reflect on my group’s experience over seven years using Datalog extensions to build networking protocols and distributed systems. Based on that experience, I present a number of theoretical conjectures that may both interest the database community, and clarify important practical issues in distributed computing. Most importantly, I make a case for database researchers to take a leadership role in addressing the impending programming crisis. This is an extended version of an invited lecture at the ACM PODS 2010 conference [32]. 1. INTRODUCTION This year marks the forty-fifth anniversary of Gordon Moore’s paper laying down the Law: exponential growth in the density of transistors on a chip. Of course Moore’s Law has served more loosely to predict the doubling of computing efficiency every eighteen months. This year is a watershed: by the loose accounting, computers should be 1 Billion times faster than they were when Moore’s paper appeared in 1965. Technology forecasters appear cautiously optimistic that Moore’s Law will hold steady over the coming decade, in its strict in- terpretation. But they also predict a future in which continued exponentiation in hardware performance will only be avail- able via parallelism. Given the difficulty of parallel program- ming, this prediction has led to an unusually gloomy outlook for computing in the coming years. At the same time that these storm clouds have been brew- ing, there has been a budding resurgence of interest across the software disciplines in data-centric computation, includ- ing declarative programming and Datalog. There is more— and more varied—applied activity in these areas than at any point in memory. The juxtaposition of these trends presents stark alternatives. Will the forecasts of doom and gloom materialize in a storm that drowns out progress in computing? Or is this the long- delayed catharsis that will wash away today’s thicket of im- perative languages, preparing the ground for a more fertile declarative future? And what role might the database com- munity play in shaping this future, having sowed the seeds of Datalog over the last quarter century? Before addressing these issues directly, a few more words about both crisis and opportunity are in order. 1.1 Urgency: Parallelism I would be panicked if I were in industry. — John Hennessy, President, Stanford University [35] The need for parallelism is visible at micro and macro scales. In microprocessor development, the connection between the “strict” and “loose” definitions of Moore’s Law has been sev- ered: while transistor density is continuing to grow exponen- tially, it is no longer improving processor speeds. Instead, chip manufacturers are packing increasing numbers of processor cores onto each chip, in reaction to challenges of power con- sumption and heat dissipation. Hence Moore’s Law no longer predicts the clock speed of a chip, but rather its offered degree of parallelism. And as a result, traditional sequential programs will get no faster over time. For the first time since Moore’s paper was published, the hardware community is at the mercy of software: only programmers can deliver the benefits of the Law to the people. At the same time, Cloud Computing promises to commodi- tize access to large compute clusters: it is now within the bud- get of individual developers to rent massive resources in the worlds’ largest computing centers. But again, this computing potential will go untapped unless those developers can write programs that harness parallelism, while managing the hetero- geneity and component failures endemic to very large clusters of distributed computers. Unfortunately, parallel and distributed programming today is challenging even for the best programmers, and unwork- able for the majority. In his Turing lecture, Jim Gray pointed to discouraging trends in the cost of software development, and presented Automatic Programming as the twelfth of his dozen grand challenges for computing [26]: develop methods to build software with orders of magnitude less code and ef- fort. As presented in the Turing lecture, Gray’s challenge con- cerned sequential programming. The urgency and difficulty of his twelfth challenge has grown markedly with the technology SIGMOD Record, March 2010 (Vol. 39, No. 1) 5 Declarative Networking: Language, Execution and Optimization Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗ Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗ ∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison ABSTRACT The networking and distributed systems communities have recently explored a variety of new network architectures, both for application- level overlay networks, and as prototypes for a next-generation In- ternet architecture. In this context, we have investigated declara- tive networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architec- tures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query pro- cessing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifications. Second, we introduce and prove correct relaxed versions of the tra- ditional semi-na¨ ıve query evaluation technique, to overcome fun- damental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the “eventual consistency” of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applica- tions of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed. 1. INTRODUCTION The database literature has a rich tradition of research on recursive query languages and processing. This work has influenced com- mercial database systems to a certain extent. However, recursion is still considered an esoteric feature by most practitioners, and re- search in the area has had limited practical impact. Even within the database research community, there is longstanding controversy over the practical relevance of recursive queries, going back at least to the Laguna Beach Report [7], and continuing into relatively re- cent textbooks [35]. In more recent work, we have made the case that recursive query technology has a natural application in the design of Internet infras- tructure. We presented an approach called declarative networking ∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a gift from Microsoft. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA. Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00. that enables declarative specification and deployment of distributed protocols and algorithms via distributed recursive queries over net- work graphs [23, 24, 33]. We recently described how we imple- mented and deployed this concept in a system called P2 [23, 33]. Our high-level goal is to provide a software environment that can accelerate the process of specifying, implementing, experimenting with and evolving designs for network architectures. Declarative networking is part of a larger effort to revisit the cur- rent Internet Architecture, which is considered by many researchers to be fundamentally ill-suited to handle today’s network uses and abuses [13]. While radical new architectures are being proposed for a “clean slate” design, there are also many efforts to develop application-level “overlay” networks on top of the current Internet, to prototype and roll out new network services in an evolutionary fashion [26]. Whether one is a proponent of revolution or evolution in this context, there is agreement that we are entering a period of significant flux in network services, protocols and architectures. In such an environment, innovation can be better focused and ac- celerated by having the right software tools at hand. Declarative query approaches appear to be one of the most promising avenues for dealing with the complexity of prototyping, deploying and evolv- ing new network architectures. The forwarding tables in network routing nodes can be regarded as a view over changing ground state (network links, nodes, load, operator policies, etc.), and this view is kept correct by the maintenance of distributed queries over this state. These queries are necessarily recursive, maintaining facts about ar- bitrarily long multi-hop paths over a network of single-hop links. Our initial forays into declarative networking have been promis- ing. First, in declarative routing [24], we demonstrated that recur- sive queries can be used to express a variety of well-known wired and wireless routing protocols in a compact and clean fashion, typ- ically in a handful of lines of program code. We also showed that the declarative approach can expose fundamental connections: for example, the query specifications for two well-known protocols – one for wired networks and one for wireless – differ only in the or- der of two predicates in a single rule body. Moreover, higher-level routing concepts (e.g., QoS constraints) can be achieved via simple modifications to the queries. Second, in declarative overlays [23], we extended our framework to support more complex application- level overlay networks such as multicast overlays and distributed hash tables (DHTs). We demonstrated a working implementation of the Chord [34] overlay lookup network specified in 47 Datalog-like rules, versus thousands of lines of C++ for the original version. Our declarative approach to networking promises not only flexibil- ity and compactness of specification, but also the potential to stat- ically check network protocols for security and correctness proper- ties [11]. In addition, dynamic runtime checks to test distributed properties of the network can easily be expressed as declarative queries, providing a uniform framework for network specification, monitoring and debugging [33]. A Relational Transducers for Declarative Networking TOM J. AMELOOT, Hasselt University & Transnational University of Limburg FRANK NEVEN, Hasselt University & Transnational University of Limburg JAN VAN DEN BUSSCHE, Hasselt University & Transnational University of Limburg Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a formal computation model for “eventually consistent” distributed querying, based on relational transduc- ers. A tight link has been conjectured between coordination-freeness of computations, and monotonicity of the queries expressed by such computations. Indeed, we propose a formal definition of coordination- freeness and confirm that the class of monotone queries is captured by coordination-free transducer net- works. Coordination-freeness is a semantic property, but the syntactic class of “oblivious” transducers we define also captures the same class of monotone queries. Transducer networks that are not coordination-free are much more powerful. Categories and Subject Descriptors: H.2 [ Database Management ]: Languages; H.2 [ Database Manage- ment ]: Systems—Distributed databases; F.1 [ Computation by Abstract Devices ]: Models of Compu- tation General Terms: languages, theory Additional Key Words and Phrases: distributed database, relational transducer, monotonicity, expressive power, cloud programming ACM Reference Format: AMELOOT, T. J., NEVEN, F. and VAN DEN BUSSCHE, J. 2011. Relational Transducers for Declarative Networking. J. ACM V, N, Article A (January YYYY), 37 pages. DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 1. INTRODUCTION Declarative networking [Loo et al. 2009] is a recent approach by which distributed compu- tations and networking protocols are modeled and programmed using formalisms based on Datalog. In his keynote speech at PODS 2010 [Hellerstein 2010a; Hellerstein 2010b], Heller- stein made a number of intriguing conjectures concerning the expressiveness of declarative networking. In the present paper, we are focusing on the CALM conjecture (Consistency And Logical Monotonicity). This conjecture suggests a strong link between, on the one hand, “eventually consistent” and “coordination-free” distributed computations, and on the other hand, expressibility in monotonic Datalog (without negation or aggregate functions). The conjecture was not fully formalized, however; indeed, as Hellerstein notes himself, a proper treatment of this conjecture requires crisp definitions of eventual consistency and coordina- tion, which have been lacking so far. Moreover, it also requires a formal model of distributed computation. Tom J. Ameloot is a PhD Fellow of the Fund for Scientific Research, Flanders (FWO). Author’s email addresses: [email protected], [email protected], [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c • YYYY ACM 0004-5411/YYYY/01-ARTA $15.00 DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 Journal of the ACM, Vol. V, No. N, Article A, Publication date: January YYYY. ➔ CALM Conjecture
 [Hellerstein, PODS ’10, SIGMOD Record 2010] ➔ Monotonicity => Consistency 
 [Abiteboul PODS 2011, Loo et al., SIGMOD 2006] ➔ Relational Transducer Proofs
 [Ameloot, et al. PODS 2012, JACM 2013]
 CALM History
  73. The Declarative Imperative Experiences and Conjectures in Distributed Logic Joseph

    M. Hellerstein University of California, Berkeley [email protected] ABSTRACT The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solu- tions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in com- puting history, in part because these languages parallelize nat- urally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foun- dation for the next generation of parallel and distributed pro- gramming languages. In this paper I reflect on my group’s experience over seven years using Datalog extensions to build networking protocols and distributed systems. Based on that experience, I present a number of theoretical conjectures that may both interest the database community, and clarify important practical issues in distributed computing. Most importantly, I make a case for database researchers to take a leadership role in addressing the impending programming crisis. This is an extended version of an invited lecture at the ACM PODS 2010 conference [32]. 1. INTRODUCTION This year marks the forty-fifth anniversary of Gordon Moore’s paper laying down the Law: exponential growth in the density of transistors on a chip. Of course Moore’s Law has served more loosely to predict the doubling of computing efficiency every eighteen months. This year is a watershed: by the loose accounting, computers should be 1 Billion times faster than they were when Moore’s paper appeared in 1965. Technology forecasters appear cautiously optimistic that Moore’s Law will hold steady over the coming decade, in its strict in- terpretation. But they also predict a future in which continued exponentiation in hardware performance will only be avail- able via parallelism. Given the difficulty of parallel program- ming, this prediction has led to an unusually gloomy outlook for computing in the coming years. At the same time that these storm clouds have been brew- ing, there has been a budding resurgence of interest across the software disciplines in data-centric computation, includ- ing declarative programming and Datalog. There is more— and more varied—applied activity in these areas than at any point in memory. The juxtaposition of these trends presents stark alternatives. Will the forecasts of doom and gloom materialize in a storm that drowns out progress in computing? Or is this the long- delayed catharsis that will wash away today’s thicket of im- perative languages, preparing the ground for a more fertile declarative future? And what role might the database com- munity play in shaping this future, having sowed the seeds of Datalog over the last quarter century? Before addressing these issues directly, a few more words about both crisis and opportunity are in order. 1.1 Urgency: Parallelism I would be panicked if I were in industry. — John Hennessy, President, Stanford University [35] The need for parallelism is visible at micro and macro scales. In microprocessor development, the connection between the “strict” and “loose” definitions of Moore’s Law has been sev- ered: while transistor density is continuing to grow exponen- tially, it is no longer improving processor speeds. Instead, chip manufacturers are packing increasing numbers of processor cores onto each chip, in reaction to challenges of power con- sumption and heat dissipation. Hence Moore’s Law no longer predicts the clock speed of a chip, but rather its offered degree of parallelism. And as a result, traditional sequential programs will get no faster over time. For the first time since Moore’s paper was published, the hardware community is at the mercy of software: only programmers can deliver the benefits of the Law to the people. At the same time, Cloud Computing promises to commodi- tize access to large compute clusters: it is now within the bud- get of individual developers to rent massive resources in the worlds’ largest computing centers. But again, this computing potential will go untapped unless those developers can write programs that harness parallelism, while managing the hetero- geneity and component failures endemic to very large clusters of distributed computers. Unfortunately, parallel and distributed programming today is challenging even for the best programmers, and unwork- able for the majority. In his Turing lecture, Jim Gray pointed to discouraging trends in the cost of software development, and presented Automatic Programming as the twelfth of his dozen grand challenges for computing [26]: develop methods to build software with orders of magnitude less code and ef- fort. As presented in the Turing lecture, Gray’s challenge con- cerned sequential programming. The urgency and difficulty of his twelfth challenge has grown markedly with the technology SIGMOD Record, March 2010 (Vol. 39, No. 1) 5 Declarative Networking: Language, Execution and Optimization Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗ Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗ ∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison ABSTRACT The networking and distributed systems communities have recently explored a variety of new network architectures, both for application- level overlay networks, and as prototypes for a next-generation In- ternet architecture. In this context, we have investigated declara- tive networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architec- tures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query pro- cessing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifications. Second, we introduce and prove correct relaxed versions of the tra- ditional semi-na¨ ıve query evaluation technique, to overcome fun- damental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the “eventual consistency” of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applica- tions of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed. 1. INTRODUCTION The database literature has a rich tradition of research on recursive query languages and processing. This work has influenced com- mercial database systems to a certain extent. However, recursion is still considered an esoteric feature by most practitioners, and re- search in the area has had limited practical impact. Even within the database research community, there is longstanding controversy over the practical relevance of recursive queries, going back at least to the Laguna Beach Report [7], and continuing into relatively re- cent textbooks [35]. In more recent work, we have made the case that recursive query technology has a natural application in the design of Internet infras- tructure. We presented an approach called declarative networking ∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a gift from Microsoft. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA. Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00. that enables declarative specification and deployment of distributed protocols and algorithms via distributed recursive queries over net- work graphs [23, 24, 33]. We recently described how we imple- mented and deployed this concept in a system called P2 [23, 33]. Our high-level goal is to provide a software environment that can accelerate the process of specifying, implementing, experimenting with and evolving designs for network architectures. Declarative networking is part of a larger effort to revisit the cur- rent Internet Architecture, which is considered by many researchers to be fundamentally ill-suited to handle today’s network uses and abuses [13]. While radical new architectures are being proposed for a “clean slate” design, there are also many efforts to develop application-level “overlay” networks on top of the current Internet, to prototype and roll out new network services in an evolutionary fashion [26]. Whether one is a proponent of revolution or evolution in this context, there is agreement that we are entering a period of significant flux in network services, protocols and architectures. In such an environment, innovation can be better focused and ac- celerated by having the right software tools at hand. Declarative query approaches appear to be one of the most promising avenues for dealing with the complexity of prototyping, deploying and evolv- ing new network architectures. The forwarding tables in network routing nodes can be regarded as a view over changing ground state (network links, nodes, load, operator policies, etc.), and this view is kept correct by the maintenance of distributed queries over this state. These queries are necessarily recursive, maintaining facts about ar- bitrarily long multi-hop paths over a network of single-hop links. Our initial forays into declarative networking have been promis- ing. First, in declarative routing [24], we demonstrated that recur- sive queries can be used to express a variety of well-known wired and wireless routing protocols in a compact and clean fashion, typ- ically in a handful of lines of program code. We also showed that the declarative approach can expose fundamental connections: for example, the query specifications for two well-known protocols – one for wired networks and one for wireless – differ only in the or- der of two predicates in a single rule body. Moreover, higher-level routing concepts (e.g., QoS constraints) can be achieved via simple modifications to the queries. Second, in declarative overlays [23], we extended our framework to support more complex application- level overlay networks such as multicast overlays and distributed hash tables (DHTs). We demonstrated a working implementation of the Chord [34] overlay lookup network specified in 47 Datalog-like rules, versus thousands of lines of C++ for the original version. Our declarative approach to networking promises not only flexibil- ity and compactness of specification, but also the potential to stat- ically check network protocols for security and correctness proper- ties [11]. In addition, dynamic runtime checks to test distributed properties of the network can easily be expressed as declarative queries, providing a uniform framework for network specification, monitoring and debugging [33]. A Relational Transducers for Declarative Networking TOM J. AMELOOT, Hasselt University & Transnational University of Limburg FRANK NEVEN, Hasselt University & Transnational University of Limburg JAN VAN DEN BUSSCHE, Hasselt University & Transnational University of Limburg Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a formal computation model for “eventually consistent” distributed querying, based on relational transduc- ers. A tight link has been conjectured between coordination-freeness of computations, and monotonicity of the queries expressed by such computations. Indeed, we propose a formal definition of coordination- freeness and confirm that the class of monotone queries is captured by coordination-free transducer net- works. Coordination-freeness is a semantic property, but the syntactic class of “oblivious” transducers we define also captures the same class of monotone queries. Transducer networks that are not coordination-free are much more powerful. Categories and Subject Descriptors: H.2 [ Database Management ]: Languages; H.2 [ Database Manage- ment ]: Systems—Distributed databases; F.1 [ Computation by Abstract Devices ]: Models of Compu- tation General Terms: languages, theory Additional Key Words and Phrases: distributed database, relational transducer, monotonicity, expressive power, cloud programming ACM Reference Format: AMELOOT, T. J., NEVEN, F. and VAN DEN BUSSCHE, J. 2011. Relational Transducers for Declarative Networking. J. ACM V, N, Article A (January YYYY), 37 pages. DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 1. INTRODUCTION Declarative networking [Loo et al. 2009] is a recent approach by which distributed compu- tations and networking protocols are modeled and programmed using formalisms based on Datalog. In his keynote speech at PODS 2010 [Hellerstein 2010a; Hellerstein 2010b], Heller- stein made a number of intriguing conjectures concerning the expressiveness of declarative networking. In the present paper, we are focusing on the CALM conjecture (Consistency And Logical Monotonicity). This conjecture suggests a strong link between, on the one hand, “eventually consistent” and “coordination-free” distributed computations, and on the other hand, expressibility in monotonic Datalog (without negation or aggregate functions). The conjecture was not fully formalized, however; indeed, as Hellerstein notes himself, a proper treatment of this conjecture requires crisp definitions of eventual consistency and coordina- tion, which have been lacking so far. Moreover, it also requires a formal model of distributed computation. Tom J. Ameloot is a PhD Fellow of the Fund for Scientific Research, Flanders (FWO). Author’s email addresses: [email protected], [email protected], [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c • YYYY ACM 0004-5411/YYYY/01-ARTA $15.00 DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 Journal of the ACM, Vol. V, No. N, Article A, Publication date: January YYYY. ➔ CALM Conjecture
 [Hellerstein, PODS ’10, SIGMOD Record 2010] ➔ Monotonicity => Consistency 
 [Abiteboul PODS 2011, Loo et al., SIGMOD 2006] ➔ Relational Transducer Proofs
 [Ameloot, et al. PODS 2012, JACM 2013]
 [Ameloot et al. PODS 2014] CALM History Weaker Forms of Monotonicity for Declarative Networking: a More Fine-grained Answer to the CALM-conjecture Tom J. Ameloot ⇤ Hasselt University & transnational University of Limburg [email protected] Bas Ketsman Hasselt University & transnational University of Limburg [email protected] Frank Neven Hasselt University & transnational University of Limburg [email protected] Daniel Zinn LogicBlox, Inc [email protected] ABSTRACT The CALM-conjecture, first stated by Hellerstein [23] and proved in its revised form by Ameloot et al. [13] within the framework of relational transducer networks, asserts that a query has a coordination-free execution strategy if and only if the query is monotone. Zinn et al. [32] extended the framework of relational transducer networks to allow for spe- cific data distribution strategies and showed that the non- monotone win-move query is coordination-free for domain- guided data distributions. In this paper, we complete the story by equating increasingly larger classes of coordination- free computations with increasingly weaker forms of mono- tonicity and make Datalog variants explicit that capture each of these classes. One such fragment is based on strati- fied Datalog where rules are required to be connected with the exception of the last stratum. In addition, we charac- terize coordination-freeness as those computations that do not require knowledge about all other nodes in the network, and therefore, can not globally coordinate. The results in this paper can be interpreted as a more fine-grained answer to the CALM-conjecture. Categories and Subject Descriptors H.2 [Database Management]: Languages; H.2 [Database Management]: Systems—Distributed databases; F.1 [Com- putation by Abstract Devices]: Models of Computation Keywords Distributed database, relational transducer, consistency, co- ordination, expressive power, cloud programming ⇤ PhD Fellow of the Fund for Scientific Research, Flanders (FWO). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. PODS’14, June 22–27, 2014, Snowbird, UT, USA. Copyright 2014 ACM 978-1-4503-2375-8/14/06 ...$15.00. http://dx.doi.org/10.1145/2594538.2594541. 1. INTRODUCTION Declarative networking is an approach where distributed computations are modeled and programmed using declara- tive formalisms based on extensions of Datalog. On a logical level, programs (queries) are specified over a global schema and are computed by multiple computing nodes over which the input database is distributed. These nodes can perform local computations and communicate asynchronously with each other via messages. The model operates under the assumption that messages can never be lost but can be ar- bitrarily delayed. An inherent source of ine ciency in such systems are the global barriers raised by the need for syn- chronization in computing the result of queries. This source of ine ciency inspired Hellerstein [11] to for- mulate the CALM-principle which suggests a link between logical monotonicity on the one hand and distributed consis- tency without the need for coordination on the other hand.1 A crucial property of monotone programs is that derived facts must never be retracted when new data arrives. The latter implies a simple coordination-free execution strategy: every node sends all relevant data to every other node in the network and outputs new facts from the moment they can be derived. No coordination is needed and the output of all computing nodes is consistent. This observation motivated Hellerstein [23] to formulate the CALM-conjecture which, in its revised form2, states “A query has a coordination-free execution strat- egy i↵ the query is monotone.” Ameloot, Neven, and Van den Bussche [13] formalized the conjecture in terms of relational transducer networks and provided a proof. Zinn, Green, and Lud¨ ascher [32] subse- quently showed that there is more to this story. In particu- lar, they obtained that when computing nodes are increas- ingly more knowledgeable on how facts are distributed, in- creasingly more queries can be computed in a coordination- free manner. Zinn et al. [32] considered two extensions of the original transducer model introduced in [13]. In the first extension, here referred to as the policy-aware model, ev- ery computing node is aware of the facts that should be assigned to it and can consequently evaluate negation over schema relations. In the second extension, referred to as the 1CALM stands for Consistency And Logical Monotonicity. 2The original conjecture replaced monotone by Datalog [13].
  74. The Declarative Imperative Experiences and Conjectures in Distributed Logic Joseph

    M. Hellerstein University of California, Berkeley [email protected] ABSTRACT The rise of multicore processors and cloud computing is putting enormous pressure on the software community to find solu- tions to the difficulty of parallel and distributed programming. At the same time, there is more—and more varied—interest in data-centric programming languages than at any time in com- puting history, in part because these languages parallelize nat- urally. This juxtaposition raises the possibility that the theory of declarative database query languages can provide a foun- dation for the next generation of parallel and distributed pro- gramming languages. In this paper I reflect on my group’s experience over seven years using Datalog extensions to build networking protocols and distributed systems. Based on that experience, I present a number of theoretical conjectures that may both interest the database community, and clarify important practical issues in distributed computing. Most importantly, I make a case for database researchers to take a leadership role in addressing the impending programming crisis. This is an extended version of an invited lecture at the ACM PODS 2010 conference [32]. 1. INTRODUCTION This year marks the forty-fifth anniversary of Gordon Moore’s paper laying down the Law: exponential growth in the density of transistors on a chip. Of course Moore’s Law has served more loosely to predict the doubling of computing efficiency every eighteen months. This year is a watershed: by the loose accounting, computers should be 1 Billion times faster than they were when Moore’s paper appeared in 1965. Technology forecasters appear cautiously optimistic that Moore’s Law will hold steady over the coming decade, in its strict in- terpretation. But they also predict a future in which continued exponentiation in hardware performance will only be avail- able via parallelism. Given the difficulty of parallel program- ming, this prediction has led to an unusually gloomy outlook for computing in the coming years. At the same time that these storm clouds have been brew- ing, there has been a budding resurgence of interest across the software disciplines in data-centric computation, includ- ing declarative programming and Datalog. There is more— and more varied—applied activity in these areas than at any point in memory. The juxtaposition of these trends presents stark alternatives. Will the forecasts of doom and gloom materialize in a storm that drowns out progress in computing? Or is this the long- delayed catharsis that will wash away today’s thicket of im- perative languages, preparing the ground for a more fertile declarative future? And what role might the database com- munity play in shaping this future, having sowed the seeds of Datalog over the last quarter century? Before addressing these issues directly, a few more words about both crisis and opportunity are in order. 1.1 Urgency: Parallelism I would be panicked if I were in industry. — John Hennessy, President, Stanford University [35] The need for parallelism is visible at micro and macro scales. In microprocessor development, the connection between the “strict” and “loose” definitions of Moore’s Law has been sev- ered: while transistor density is continuing to grow exponen- tially, it is no longer improving processor speeds. Instead, chip manufacturers are packing increasing numbers of processor cores onto each chip, in reaction to challenges of power con- sumption and heat dissipation. Hence Moore’s Law no longer predicts the clock speed of a chip, but rather its offered degree of parallelism. And as a result, traditional sequential programs will get no faster over time. For the first time since Moore’s paper was published, the hardware community is at the mercy of software: only programmers can deliver the benefits of the Law to the people. At the same time, Cloud Computing promises to commodi- tize access to large compute clusters: it is now within the bud- get of individual developers to rent massive resources in the worlds’ largest computing centers. But again, this computing potential will go untapped unless those developers can write programs that harness parallelism, while managing the hetero- geneity and component failures endemic to very large clusters of distributed computers. Unfortunately, parallel and distributed programming today is challenging even for the best programmers, and unwork- able for the majority. In his Turing lecture, Jim Gray pointed to discouraging trends in the cost of software development, and presented Automatic Programming as the twelfth of his dozen grand challenges for computing [26]: develop methods to build software with orders of magnitude less code and ef- fort. As presented in the Turing lecture, Gray’s challenge con- cerned sequential programming. The urgency and difficulty of his twelfth challenge has grown markedly with the technology SIGMOD Record, March 2010 (Vol. 39, No. 1) 5 Declarative Networking: Language, Execution and Optimization Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗ Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗ ∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison ABSTRACT The networking and distributed systems communities have recently explored a variety of new network architectures, both for application- level overlay networks, and as prototypes for a next-generation In- ternet architecture. In this context, we have investigated declara- tive networking: the use of a distributed recursive query engine as a powerful vehicle for accelerating innovation in network architec- tures [23, 24, 33]. Declarative networking represents a significant new application area for database research on recursive query pro- cessing. In this paper, we address fundamental database issues in this domain. First, we motivate and formally define the Network Datalog (NDlog) language for declarative network specifications. Second, we introduce and prove correct relaxed versions of the tra- ditional semi-na¨ ıve query evaluation technique, to overcome fun- damental problems of the traditional technique in an asynchronous distributed setting. Third, we consider the dynamics of network state, and formalize the “eventual consistency” of our programs even when bursts of updates can arrive in the midst of query execution. Fourth, we present a number of query optimization opportunities that arise in the declarative networking context, including applica- tions of traditional techniques as well as new optimizations. Last, we present evaluation results of the above ideas implemented in our P2 declarative networking system, running on 100 machines over the Emulab network testbed. 1. INTRODUCTION The database literature has a rich tradition of research on recursive query languages and processing. This work has influenced com- mercial database systems to a certain extent. However, recursion is still considered an esoteric feature by most practitioners, and re- search in the area has had limited practical impact. Even within the database research community, there is longstanding controversy over the practical relevance of recursive queries, going back at least to the Laguna Beach Report [7], and continuing into relatively re- cent textbooks [35]. In more recent work, we have made the case that recursive query technology has a natural application in the design of Internet infras- tructure. We presented an approach called declarative networking ∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a gift from Microsoft. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA. Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00. that enables declarative specification and deployment of distributed protocols and algorithms via distributed recursive queries over net- work graphs [23, 24, 33]. We recently described how we imple- mented and deployed this concept in a system called P2 [23, 33]. Our high-level goal is to provide a software environment that can accelerate the process of specifying, implementing, experimenting with and evolving designs for network architectures. Declarative networking is part of a larger effort to revisit the cur- rent Internet Architecture, which is considered by many researchers to be fundamentally ill-suited to handle today’s network uses and abuses [13]. While radical new architectures are being proposed for a “clean slate” design, there are also many efforts to develop application-level “overlay” networks on top of the current Internet, to prototype and roll out new network services in an evolutionary fashion [26]. Whether one is a proponent of revolution or evolution in this context, there is agreement that we are entering a period of significant flux in network services, protocols and architectures. In such an environment, innovation can be better focused and ac- celerated by having the right software tools at hand. Declarative query approaches appear to be one of the most promising avenues for dealing with the complexity of prototyping, deploying and evolv- ing new network architectures. The forwarding tables in network routing nodes can be regarded as a view over changing ground state (network links, nodes, load, operator policies, etc.), and this view is kept correct by the maintenance of distributed queries over this state. These queries are necessarily recursive, maintaining facts about ar- bitrarily long multi-hop paths over a network of single-hop links. Our initial forays into declarative networking have been promis- ing. First, in declarative routing [24], we demonstrated that recur- sive queries can be used to express a variety of well-known wired and wireless routing protocols in a compact and clean fashion, typ- ically in a handful of lines of program code. We also showed that the declarative approach can expose fundamental connections: for example, the query specifications for two well-known protocols – one for wired networks and one for wireless – differ only in the or- der of two predicates in a single rule body. Moreover, higher-level routing concepts (e.g., QoS constraints) can be achieved via simple modifications to the queries. Second, in declarative overlays [23], we extended our framework to support more complex application- level overlay networks such as multicast overlays and distributed hash tables (DHTs). We demonstrated a working implementation of the Chord [34] overlay lookup network specified in 47 Datalog-like rules, versus thousands of lines of C++ for the original version. Our declarative approach to networking promises not only flexibil- ity and compactness of specification, but also the potential to stat- ically check network protocols for security and correctness proper- ties [11]. In addition, dynamic runtime checks to test distributed properties of the network can easily be expressed as declarative queries, providing a uniform framework for network specification, monitoring and debugging [33]. A Relational Transducers for Declarative Networking TOM J. AMELOOT, Hasselt University & Transnational University of Limburg FRANK NEVEN, Hasselt University & Transnational University of Limburg JAN VAN DEN BUSSCHE, Hasselt University & Transnational University of Limburg Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a formal computation model for “eventually consistent” distributed querying, based on relational transduc- ers. A tight link has been conjectured between coordination-freeness of computations, and monotonicity of the queries expressed by such computations. Indeed, we propose a formal definition of coordination- freeness and confirm that the class of monotone queries is captured by coordination-free transducer net- works. Coordination-freeness is a semantic property, but the syntactic class of “oblivious” transducers we define also captures the same class of monotone queries. Transducer networks that are not coordination-free are much more powerful. Categories and Subject Descriptors: H.2 [ Database Management ]: Languages; H.2 [ Database Manage- ment ]: Systems—Distributed databases; F.1 [ Computation by Abstract Devices ]: Models of Compu- tation General Terms: languages, theory Additional Key Words and Phrases: distributed database, relational transducer, monotonicity, expressive power, cloud programming ACM Reference Format: AMELOOT, T. J., NEVEN, F. and VAN DEN BUSSCHE, J. 2011. Relational Transducers for Declarative Networking. J. ACM V, N, Article A (January YYYY), 37 pages. DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 1. INTRODUCTION Declarative networking [Loo et al. 2009] is a recent approach by which distributed compu- tations and networking protocols are modeled and programmed using formalisms based on Datalog. In his keynote speech at PODS 2010 [Hellerstein 2010a; Hellerstein 2010b], Heller- stein made a number of intriguing conjectures concerning the expressiveness of declarative networking. In the present paper, we are focusing on the CALM conjecture (Consistency And Logical Monotonicity). This conjecture suggests a strong link between, on the one hand, “eventually consistent” and “coordination-free” distributed computations, and on the other hand, expressibility in monotonic Datalog (without negation or aggregate functions). The conjecture was not fully formalized, however; indeed, as Hellerstein notes himself, a proper treatment of this conjecture requires crisp definitions of eventual consistency and coordina- tion, which have been lacking so far. Moreover, it also requires a formal model of distributed computation. Tom J. Ameloot is a PhD Fellow of the Fund for Scientific Research, Flanders (FWO). Author’s email addresses: [email protected], [email protected], [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. c • YYYY ACM 0004-5411/YYYY/01-ARTA $15.00 DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 Journal of the ACM, Vol. V, No. N, Article A, Publication date: January YYYY. ➔ CALM Conjecture
 [Hellerstein, PODS ’10, SIGMOD Record 2010] ➔ Monotonicity => Consistency 
 [Abiteboul PODS 2011, Loo et al., SIGMOD 2006] ➔ Relational Transducer Proofs
 [Ameloot, et al. PODS 2012, JACM 2013]
 [Ameloot et al. PODS 2014] ➔ Napkin-sized proof
 [Hellerstein & Alvaro 2017?] CALM History Weaker Forms of Monotonicity for Declarative Networking: a More Fine-grained Answer to the CALM-conjecture Tom J. Ameloot ⇤ Hasselt University & transnational University of Limburg [email protected] Bas Ketsman Hasselt University & transnational University of Limburg [email protected] Frank Neven Hasselt University & transnational University of Limburg [email protected] Daniel Zinn LogicBlox, Inc [email protected] ABSTRACT The CALM-conjecture, first stated by Hellerstein [23] and proved in its revised form by Ameloot et al. [13] within the framework of relational transducer networks, asserts that a query has a coordination-free execution strategy if and only if the query is monotone. Zinn et al. [32] extended the framework of relational transducer networks to allow for spe- cific data distribution strategies and showed that the non- monotone win-move query is coordination-free for domain- guided data distributions. In this paper, we complete the story by equating increasingly larger classes of coordination- free computations with increasingly weaker forms of mono- tonicity and make Datalog variants explicit that capture each of these classes. One such fragment is based on strati- fied Datalog where rules are required to be connected with the exception of the last stratum. In addition, we charac- terize coordination-freeness as those computations that do not require knowledge about all other nodes in the network, and therefore, can not globally coordinate. The results in this paper can be interpreted as a more fine-grained answer to the CALM-conjecture. Categories and Subject Descriptors H.2 [Database Management]: Languages; H.2 [Database Management]: Systems—Distributed databases; F.1 [Com- putation by Abstract Devices]: Models of Computation Keywords Distributed database, relational transducer, consistency, co- ordination, expressive power, cloud programming ⇤ PhD Fellow of the Fund for Scientific Research, Flanders (FWO). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. PODS’14, June 22–27, 2014, Snowbird, UT, USA. Copyright 2014 ACM 978-1-4503-2375-8/14/06 ...$15.00. http://dx.doi.org/10.1145/2594538.2594541. 1. INTRODUCTION Declarative networking is an approach where distributed computations are modeled and programmed using declara- tive formalisms based on extensions of Datalog. On a logical level, programs (queries) are specified over a global schema and are computed by multiple computing nodes over which the input database is distributed. These nodes can perform local computations and communicate asynchronously with each other via messages. The model operates under the assumption that messages can never be lost but can be ar- bitrarily delayed. An inherent source of ine ciency in such systems are the global barriers raised by the need for syn- chronization in computing the result of queries. This source of ine ciency inspired Hellerstein [11] to for- mulate the CALM-principle which suggests a link between logical monotonicity on the one hand and distributed consis- tency without the need for coordination on the other hand.1 A crucial property of monotone programs is that derived facts must never be retracted when new data arrives. The latter implies a simple coordination-free execution strategy: every node sends all relevant data to every other node in the network and outputs new facts from the moment they can be derived. No coordination is needed and the output of all computing nodes is consistent. This observation motivated Hellerstein [23] to formulate the CALM-conjecture which, in its revised form2, states “A query has a coordination-free execution strat- egy i↵ the query is monotone.” Ameloot, Neven, and Van den Bussche [13] formalized the conjecture in terms of relational transducer networks and provided a proof. Zinn, Green, and Lud¨ ascher [32] subse- quently showed that there is more to this story. In particu- lar, they obtained that when computing nodes are increas- ingly more knowledgeable on how facts are distributed, in- creasingly more queries can be computed in a coordination- free manner. Zinn et al. [32] considered two extensions of the original transducer model introduced in [13]. In the first extension, here referred to as the policy-aware model, ev- ery computing node is aware of the facts that should be assigned to it and can consequently evaluate negation over schema relations. In the second extension, referred to as the 1CALM stands for Consistency And Logical Monotonicity. 2The original conjecture replaced monotone by Datalog [13].
  75. ➔ Immerman-Vardi Theorem ➔ Same monotonicity as CALM?! ➔ Consistency

    <=> Monotonicity <=> PTIME! ➔ Can avoid coordination for all polynomial-time computations?! An Intriguing Connection
  76. An Intriguing Connection ➔ Immerman-Vardi Theorem ➔ Same monotonicity as

    CALM?! ➔ Consistency <=> Monotonicity <=> PTIME! ➔ Can avoid coordination for all polynomial- time computations?!
  77. An Intriguing Connection ➔ Immerman-Vardi Theorem ➔ Same monotonicity as

    CALM?! ➔ Consistency <=> Monotonicity <=> PTIME! ➔ Can avoid coordination for all polynomial- time computations?!
  78. 1. Fluent: disorderly programming toolkit ➔C++ Libraries ➔Lattices, Relational Algebra

    ➔Rule Registry/Execution ➔Static C++ typechecking via template metaprogramming ➔ Fluent Debugger ➔Distributed data lineage ➔Distributed tracing 2. Familiar programming models ➔Can we skin Fluent with: RPC, State Machines, Actors, Futures, etc? Current and Future Work 3. Dense Clouds ➔ High-performance, coordination- free code at multiple scales ➔Cores to servers to the globe 4. Fundamentals ➔ Constructions for coordination- free polynomial-time programs? ➔General? ➔Code synthesis? ➔ Abiding theoretical questions ➔Stochastic CALM? 5. Applications ➔ RL and Robotics?
  79. CALM ➔ Seek monotonicity, avoid coordination. ➔ Move up the

    stack! ➔ Historic focus on Read/Write API distracts from what’s possible in application logic Bloom ➔ Disorderly programming can radically simplify distributed programming ➔ Data-centric: be it “declarative”, “reactive”, “dataflow”, etc. ➔ Revolution (Bloom vs. Java?) or Evolution (Bloom vs. LLVM IR?) Ambitious systems work in an era of maturity? ➔ If it’s doable, somebody is already doing it ➔ Green field problems are the ones with high switching costs ➔ “That will never work!” ➔ Be patient, seek lessons along the way Takeaways