The Declarative Imperative
Experiences and Conjectures in Distributed Logic
Joseph M. Hellerstein
University of California, Berkeley
[email protected]
ABSTRACT
The rise of multicore processors and cloud computing is putting
enormous pressure on the software community to find solu-
tions to the difficulty of parallel and distributed programming.
At the same time, there is more—and more varied—interest in
data-centric programming languages than at any time in com-
puting history, in part because these languages parallelize nat-
urally. This juxtaposition raises the possibility that the theory
of declarative database query languages can provide a foun-
dation for the next generation of parallel and distributed pro-
gramming languages.
In this paper I reflect on my group’s experience over seven
years using Datalog extensions to build networking protocols
and distributed systems. Based on that experience, I present
a number of theoretical conjectures that may both interest the
database community, and clarify important practical issues in
distributed computing. Most importantly, I make a case for
database researchers to take a leadership role in addressing the
impending programming crisis.
This is an extended version of an invited lecture at the ACM
PODS 2010 conference [32].
1. INTRODUCTION
This year marks the forty-fifth anniversary of Gordon Moore’s
paper laying down the Law: exponential growth in the density
of transistors on a chip. Of course Moore’s Law has served
more loosely to predict the doubling of computing efficiency
every eighteen months. This year is a watershed: by the loose
accounting, computers should be 1 Billion times faster than
they were when Moore’s paper appeared in 1965.
Technology forecasters appear cautiously optimistic that Moore’s
Law will hold steady over the coming decade, in its strict in-
terpretation. But they also predict a future in which continued
exponentiation in hardware performance will only be avail-
able via parallelism. Given the difficulty of parallel program-
ming, this prediction has led to an unusually gloomy outlook
for computing in the coming years.
At the same time that these storm clouds have been brew-
ing, there has been a budding resurgence of interest across
the software disciplines in data-centric computation, includ-
ing declarative programming and Datalog. There is more—
and more varied—applied activity in these areas than at any
point in memory.
The juxtaposition of these trends presents stark alternatives.
Will the forecasts of doom and gloom materialize in a storm
that drowns out progress in computing? Or is this the long-
delayed catharsis that will wash away today’s thicket of im-
perative languages, preparing the ground for a more fertile
declarative future? And what role might the database com-
munity play in shaping this future, having sowed the seeds of
Datalog over the last quarter century?
Before addressing these issues directly, a few more words
about both crisis and opportunity are in order.
1.1 Urgency: Parallelism
I would be panicked if I were in industry.
— John Hennessy, President, Stanford University [35]
The need for parallelism is visible at micro and macro scales.
In microprocessor development, the connection between the
“strict” and “loose” definitions of Moore’s Law has been sev-
ered: while transistor density is continuing to grow exponen-
tially, it is no longer improving processor speeds. Instead, chip
manufacturers are packing increasing numbers of processor
cores onto each chip, in reaction to challenges of power con-
sumption and heat dissipation. Hence Moore’s Law no longer
predicts the clock speed of a chip, but rather its offered degree
of parallelism. And as a result, traditional sequential programs
will get no faster over time. For the first time since Moore’s
paper was published, the hardware community is at the mercy
of software: only programmers can deliver the benefits of the
Law to the people.
At the same time, Cloud Computing promises to commodi-
tize access to large compute clusters: it is now within the bud-
get of individual developers to rent massive resources in the
worlds’ largest computing centers. But again, this computing
potential will go untapped unless those developers can write
programs that harness parallelism, while managing the hetero-
geneity and component failures endemic to very large clusters
of distributed computers.
Unfortunately, parallel and distributed programming today
is challenging even for the best programmers, and unwork-
able for the majority. In his Turing lecture, Jim Gray pointed
to discouraging trends in the cost of software development,
and presented Automatic Programming as the twelfth of his
dozen grand challenges for computing [26]: develop methods
to build software with orders of magnitude less code and ef-
fort. As presented in the Turing lecture, Gray’s challenge con-
cerned sequential programming. The urgency and difficulty of
his twelfth challenge has grown markedly with the technology
SIGMOD Record, March 2010 (Vol. 39, No. 1) 5
Declarative Networking:
Language, Execution and Optimization
Boon Thau Loo∗ Tyson Condie∗ Minos Garofalakis† David E. Gay† Joseph M. Hellerstein∗
Petros Maniatis† Raghu Ramakrishnan‡ Timothy Roscoe† Ion Stoica∗
∗UC Berkeley, †Intel Research Berkeley and ‡University of Wisconsin-Madison
ABSTRACT
The networking and distributed systems communities have recently
explored a variety of new network architectures, both for application-
level overlay networks, and as prototypes for a next-generation In-
ternet architecture. In this context, we have investigated declara-
tive networking: the use of a distributed recursive query engine as
a powerful vehicle for accelerating innovation in network architec-
tures [23, 24, 33]. Declarative networking represents a significant
new application area for database research on recursive query pro-
cessing. In this paper, we address fundamental database issues in
this domain. First, we motivate and formally define the Network
Datalog (NDlog) language for declarative network specifications.
Second, we introduce and prove correct relaxed versions of the tra-
ditional semi-na¨
ıve query evaluation technique, to overcome fun-
damental problems of the traditional technique in an asynchronous
distributed setting. Third, we consider the dynamics of network
state, and formalize the “eventual consistency” of our programs even
when bursts of updates can arrive in the midst of query execution.
Fourth, we present a number of query optimization opportunities
that arise in the declarative networking context, including applica-
tions of traditional techniques as well as new optimizations. Last,
we present evaluation results of the above ideas implemented in our
P2 declarative networking system, running on 100 machines over
the Emulab network testbed.
1. INTRODUCTION
The database literature has a rich tradition of research on recursive
query languages and processing. This work has influenced com-
mercial database systems to a certain extent. However, recursion
is still considered an esoteric feature by most practitioners, and re-
search in the area has had limited practical impact. Even within
the database research community, there is longstanding controversy
over the practical relevance of recursive queries, going back at least
to the Laguna Beach Report [7], and continuing into relatively re-
cent textbooks [35].
In more recent work, we have made the case that recursive query
technology has a natural application in the design of Internet infras-
tructure. We presented an approach called declarative networking
∗UC Berkeley authors funded by NSF grants 0205647, 0209108, and 0225660, and a
gift from Microsoft.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGMOD 2006, June 27–29, 2006, Chicago, Illinois, USA.
Copyright 2006 ACM 1-59593-256-9/06/0006 ...$5.00.
that enables declarative specification and deployment of distributed
protocols and algorithms via distributed recursive queries over net-
work graphs [23, 24, 33]. We recently described how we imple-
mented and deployed this concept in a system called P2 [23, 33].
Our high-level goal is to provide a software environment that can
accelerate the process of specifying, implementing, experimenting
with and evolving designs for network architectures.
Declarative networking is part of a larger effort to revisit the cur-
rent Internet Architecture, which is considered by many researchers
to be fundamentally ill-suited to handle today’s network uses and
abuses [13]. While radical new architectures are being proposed
for a “clean slate” design, there are also many efforts to develop
application-level “overlay” networks on top of the current Internet,
to prototype and roll out new network services in an evolutionary
fashion [26]. Whether one is a proponent of revolution or evolution
in this context, there is agreement that we are entering a period of
significant flux in network services, protocols and architectures.
In such an environment, innovation can be better focused and ac-
celerated by having the right software tools at hand. Declarative
query approaches appear to be one of the most promising avenues
for dealing with the complexity of prototyping, deploying and evolv-
ing new network architectures. The forwarding tables in network
routing nodes can be regarded as a view over changing ground state
(network links, nodes, load, operator policies, etc.), and this view is
kept correct by the maintenance of distributed queries over this state.
These queries are necessarily recursive, maintaining facts about ar-
bitrarily long multi-hop paths over a network of single-hop links.
Our initial forays into declarative networking have been promis-
ing. First, in declarative routing [24], we demonstrated that recur-
sive queries can be used to express a variety of well-known wired
and wireless routing protocols in a compact and clean fashion, typ-
ically in a handful of lines of program code. We also showed that
the declarative approach can expose fundamental connections: for
example, the query specifications for two well-known protocols –
one for wired networks and one for wireless – differ only in the or-
der of two predicates in a single rule body. Moreover, higher-level
routing concepts (e.g., QoS constraints) can be achieved via simple
modifications to the queries. Second, in declarative overlays [23],
we extended our framework to support more complex application-
level overlay networks such as multicast overlays and distributed
hash tables (DHTs). We demonstrated a working implementation of
the Chord [34] overlay lookup network specified in 47 Datalog-like
rules, versus thousands of lines of C++ for the original version.
Our declarative approach to networking promises not only flexibil-
ity and compactness of specification, but also the potential to stat-
ically check network protocols for security and correctness proper-
ties [11]. In addition, dynamic runtime checks to test distributed
properties of the network can easily be expressed as declarative
queries, providing a uniform framework for network specification,
monitoring and debugging [33].
A
Relational Transducers for Declarative Networking
TOM J. AMELOOT, Hasselt University & Transnational University of Limburg
FRANK NEVEN, Hasselt University & Transnational University of Limburg
JAN VAN DEN BUSSCHE, Hasselt University & Transnational University of Limburg
Motivated by a recent conjecture concerning the expressiveness of declarative networking, we propose a
formal computation model for “eventually consistent” distributed querying, based on relational transduc-
ers. A tight link has been conjectured between coordination-freeness of computations, and monotonicity
of the queries expressed by such computations. Indeed, we propose a formal definition of coordination-
freeness and confirm that the class of monotone queries is captured by coordination-free transducer net-
works. Coordination-freeness is a semantic property, but the syntactic class of “oblivious” transducers we
define also captures the same class of monotone queries. Transducer networks that are not coordination-free
are much more powerful.
Categories and Subject Descriptors: H.2 [
Database Management
]: Languages; H.2 [
Database Manage-
ment
]: Systems—Distributed databases; F.1 [
Computation by Abstract Devices
]: Models of Compu-
tation
General Terms: languages, theory
Additional Key Words and Phrases: distributed database, relational transducer, monotonicity, expressive
power, cloud programming
ACM Reference Format:
AMELOOT, T. J., NEVEN, F. and VAN DEN BUSSCHE, J. 2011. Relational Transducers for Declarative
Networking. J. ACM V, N, Article A (January YYYY), 37 pages.
DOI = 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
1. INTRODUCTION
Declarative networking [Loo et al. 2009] is a recent approach by which distributed compu-
tations and networking protocols are modeled and programmed using formalisms based on
Datalog. In his keynote speech at PODS 2010 [Hellerstein 2010a; Hellerstein 2010b], Heller-
stein made a number of intriguing conjectures concerning the expressiveness of declarative
networking. In the present paper, we are focusing on the CALM conjecture (Consistency
And Logical Monotonicity). This conjecture suggests a strong link between, on the one hand,
“eventually consistent” and “coordination-free” distributed computations, and on the other
hand, expressibility in monotonic Datalog (without negation or aggregate functions). The
conjecture was not fully formalized, however; indeed, as Hellerstein notes himself, a proper
treatment of this conjecture requires crisp definitions of eventual consistency and coordina-
tion, which have been lacking so far. Moreover, it also requires a formal model of distributed
computation.
Tom J. Ameloot is a PhD Fellow of the Fund for Scientific Research, Flanders (FWO).
Author’s email addresses:
[email protected],
[email protected],
[email protected].
Permission to make digital or hard copies of part or all of this work for personal or classroom use is
granted without fee provided that copies are not made or distributed for profit or commercial advantage
and that copies show this notice on the first page or initial screen of a display along with the full citation.
Copyrights for components of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any
component of this work in other works requires prior specific permission and/or a fee. Permissions may be
requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA,
fax +1 (212) 869-0481, or
[email protected].
c
• YYYY ACM 0004-5411/YYYY/01-ARTA $15.00
DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000
Journal of the ACM, Vol. V, No. N, Article A, Publication date: January YYYY.
➔ CALM Conjecture
[Hellerstein, PODS ’10, SIGMOD Record 2010]
➔ Monotonicity => Consistency
[Abiteboul PODS 2011, Loo et al., SIGMOD 2006]
➔ Relational Transducer Proofs
[Ameloot, et al. PODS 2012, JACM 2013]
[Ameloot et al. PODS 2014]
➔ Napkin-sized proof
[Hellerstein & Alvaro 2017?]
CALM History
Weaker Forms of Monotonicity for Declarative Networking:
a More Fine-grained Answer to the CALM-conjecture
Tom J. Ameloot
⇤
Hasselt University &
transnational University of Limburg
[email protected]
Bas Ketsman
Hasselt University &
transnational University of Limburg
[email protected]
Frank Neven
Hasselt University &
transnational University of Limburg
[email protected]
Daniel Zinn
LogicBlox, Inc
[email protected]
ABSTRACT
The CALM-conjecture, first stated by Hellerstein [23] and
proved in its revised form by Ameloot et al. [13] within the
framework of relational transducer networks, asserts that
a query has a coordination-free execution strategy if and
only if the query is monotone. Zinn et al. [32] extended the
framework of relational transducer networks to allow for spe-
cific data distribution strategies and showed that the non-
monotone win-move query is coordination-free for domain-
guided data distributions. In this paper, we complete the
story by equating increasingly larger classes of coordination-
free computations with increasingly weaker forms of mono-
tonicity and make Datalog variants explicit that capture
each of these classes. One such fragment is based on strati-
fied Datalog where rules are required to be connected with
the exception of the last stratum. In addition, we charac-
terize coordination-freeness as those computations that do
not require knowledge about all other nodes in the network,
and therefore, can not globally coordinate. The results in
this paper can be interpreted as a more fine-grained answer
to the CALM-conjecture.
Categories and Subject Descriptors
H.2 [Database Management]: Languages; H.2 [Database
Management]: Systems—Distributed databases; F.1 [Com-
putation by Abstract Devices]: Models of Computation
Keywords
Distributed database, relational transducer, consistency, co-
ordination, expressive power, cloud programming
⇤
PhD Fellow of the Fund for Scientific Research, Flanders
(FWO).
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from
[email protected].
PODS’14,
June 22–27, 2014, Snowbird, UT, USA.
Copyright 2014 ACM 978-1-4503-2375-8/14/06 ...$15.00.
http://dx.doi.org/10.1145/2594538.2594541.
1. INTRODUCTION
Declarative networking is an approach where distributed
computations are modeled and programmed using declara-
tive formalisms based on extensions of Datalog. On a logical
level, programs (queries) are specified over a global schema
and are computed by multiple computing nodes over which
the input database is distributed. These nodes can perform
local computations and communicate asynchronously with
each other via messages. The model operates under the
assumption that messages can never be lost but can be ar-
bitrarily delayed. An inherent source of ine ciency in such
systems are the global barriers raised by the need for syn-
chronization in computing the result of queries.
This source of ine ciency inspired Hellerstein [11] to for-
mulate the CALM-principle which suggests a link between
logical monotonicity on the one hand and distributed consis-
tency without the need for coordination on the other hand.1
A crucial property of monotone programs is that derived
facts must never be retracted when new data arrives. The
latter implies a simple coordination-free execution strategy:
every node sends all relevant data to every other node in the
network and outputs new facts from the moment they can
be derived. No coordination is needed and the output of all
computing nodes is consistent. This observation motivated
Hellerstein [23] to formulate the CALM-conjecture which, in
its revised form2, states
“A query has a coordination-free execution strat-
egy i↵ the query is monotone.”
Ameloot, Neven, and Van den Bussche [13] formalized the
conjecture in terms of relational transducer networks and
provided a proof. Zinn, Green, and Lud¨
ascher [32] subse-
quently showed that there is more to this story. In particu-
lar, they obtained that when computing nodes are increas-
ingly more knowledgeable on how facts are distributed, in-
creasingly more queries can be computed in a coordination-
free manner. Zinn et al. [32] considered two extensions of
the original transducer model introduced in [13]. In the first
extension, here referred to as the policy-aware model, ev-
ery computing node is aware of the facts that should be
assigned to it and can consequently evaluate negation over
schema relations. In the second extension, referred to as the
1CALM stands for Consistency And Logical Monotonicity.
2The original conjecture replaced monotone by Datalog [13].