Complexity and Functional Relational Programming

Complexity and Functional Relational Programming Based on "Out of the
Tarpit" by Ben Moseley and Peter Marks

–C.A.R. Hoare “I conclude that there are two ways of
constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”

Complexity • The root cause of most software deﬁciencies •
Understanding a system is a prerequisite to avoiding problems • In contrast to simplicity, complexity comes from "complecting" of separable concerns

Understanding Software • Testing • Attempt to understand from the
outside (black box) • Draw conclusions based on observation • Testing with one set of inputs says nothing at all about what to expect with different inputs • Informal Reasoning • Attempt to understand from the inside • Draw conclusions based on reading the code

Causes of Complexity • State • Control • Code Volume
• more (duplication, unnecessary abstraction, etc.)

Complexity from State

Anyone who has ever telephoned a support desk for a
software system and been told to “try it again”, or “reload the document”, or “restart the program”, or “reboot your computer” or “re-install the program” or even “re- install the operating system and then the program” has direct experience of the problems that state causes for writing reliable, understandable software –Out of the Tarpit

State • Enumerating and understanding all possible states of a
program is ...difﬁcult. • Testing a system/component in one particular state tells you nothing about it in a different state • Hidden state, inter-component interactions, system state after sequence of input, etc. • Informal reasoning often revolve around a case-by-case mental simulation of behavior • Contamination: Even indirect use of stateful procedures by stateless ones can only be understood in the context of state

with state: when you let the nose of the camel
into the tent, the rest of him tends to follow

Complexity from Control

Control • Implicit in most languages (e.g., order of statements)
• If always enforced implicitly (e.g,. imperative programming), program must be understood in that context • Often over-specifies the system • Developer must specify ordering/flow (how) instead of declare constraints (what) • Forces the developer (and compiler) to understand whether that ordering affects computation to understand the program • Can significantly complicate informal reasoning about a program • Misinterpretation of ordering's significance can result in subtle bugs

a := b + 3 c := d + 2
e := f * 4 Does the ordering matter?

Control • Concurrency (shared state model) • Tests (with known
initial state and inputs) in the presence of concurrency tell you nothing about the next time they are run with the same initial state and inputs • Informal reasoning is made exponentially more complex with each additional piece of state

Complexity from Code Volume var maxH: number = 0.0; var
sd2 = 0.0; for (var u: number = 0; u < n; ++u) { for (i = 0; i < this.k; ++i) Huu[i] = this.g[i][u] = 0.0; for (var v = 0; v < n; ++v) { if (!(u === v)) { // The following loop randomly displaces nodes that are at identical po var maxDisplaces = n; // avoid infinite loop in the case of numerical i while (maxDisplaces--) { sd2 = 0.0; for (i = 0; i < this.k; ++i) { var dx = d[i] = x[i][u] - x[i][v] + 0.0; sd2 += d2[i] = dx * dx; } if (sd2 > 1e-9) break; var rd = this.offsetDir(); for (i = 0; i < this.k; ++i) x[i][v] += rd[i]; } var l: number = Math.sqrt(sd2); var D: number = this.D[u][v]; var weight = this.G != null ? this.G[u][v] : 1.0; if (weight > 1 && l > D || !isFinite(D)) { for (i = 0; i < this.k; ++i) this.H[i][u][v] = 0.0; } else { if (weight > 1.0) { weight = 1.0; } var D2: number = D * D; var gs: number = 2.0 * weight * (l - D) / (D2 * l); var l3 = l * l * l; var hs: number = 2.0 * -weight / (D2 * l3); //if (!isFinite(gs)) // console.log(gs); for (i = 0; i < this.k; ++i) { this.g[i][u] += d[i] * gs; Huu[i] -= this.H[i][u][v] = hs * (l3 + D * (d2[i] - sd2) + l * }

–Brooks, No Silver Bullet “Many of the classic problems of
developing software products derive from this essential complexity and its nonlinear increase with size”

Code Volume • When much of the code is dealing
with state and control, complexity tends to increase non-linearly with code size • Compounds the problems caused by state and control, non- linearly • Managing code volume indirectly manages complexity

and More • Complexity breeds complexity • Duplication of code/functionality
because state/control/LOC complexity makes comprehension difﬁcult • Simplicity is hard • Requires signiﬁcant effort to arrive at the simplest solution to a problem (time pressures, existing complexity etc) • Power Corrupts • In the absence of language enforced guarantees, mistakes will happen • Restriction of power • e.g., garbage collection, immutability • The more that is possible in a language, the harder it is to understand systems constructed in it

"Managing" Complexity

Object Oriented Programming • Currently the dominate method of general
software development for von-Neumann computers • facilitates von-Neumann style (i.e. state-based) computation • Imperative programming model

State • Stateful objects with impure functions • Objects are
intensional identities • objects with identical attributes/values are not equivalent • domain-speciﬁc equality must be explicitly deﬁned • No guarantee that they adhere to any standard equivalence relation

Control • Implicit sequential control ﬂow • Ordered statements •
Explicit "shared state" concurrency • Sequential execution unless explicitly made concurrent • Shared state is usually mutable to concurrently executing code

Functional Programming • Declarative (in contrast to imperative) style •
Roots in stateless Lambda Calculus • Emphasis on immutability • Function composition over sequential statements • Equivalent in power to a Turing machine (Von Neumann architecture)

State • Avoids state/side-effects • "Pure" functions • Enables Referential
Transparency (given the same set of arguments, a function will always return the same result) • Simpliﬁes testing, in contrast to stateful models • Informal reasoning also simpliﬁed

Control • Evaluation order is still implicit • Functionals over
explicit looping (map, reduce, etc.) • Simpler concurrency models • Concurrent evaluation safe due to lack of mutable state

Logic Programming • Declarative style • Not derived from Von
Neumann architecture • Pure logic programming simply makes statements about a problem and the desired solution • Describe using axioms • Constrain with required attributes • Solutions are formal logical consequences of axioms and constraints • "Running" the system is equivalent to constructing a formal proof

State • Pure logic programming makes use of no mutable
state • Some allow for, but still discourage state • e.g. Prolog introducing new axioms

Control • Prolog has implicit ordering • Can lead to
non-termination if clauses written in certain order • Pure logic programming has no speciﬁed control

Types of Complexity • Essential • Inherent in, and the
essence of, the problem (as seen by the users) • Complexity that must be dealt with, even in an ideal world • i.e., with a language and infrastructure to directly express the users' problem • Accidental • Everything else • Performance issues, language expressiveness, infrastructure, hardware limitations (e.g., computational complexity), etc.

Ideal World • Informal (user) requirements -> Formal requirements •
no relevant ambiguity/omissions • Simply execute formal requirements • No control, just declaration of facts and constraints • Absolute simplicity • The essence of declarative programming

State in the ideal world • Input data: Data input
by the user is the only real essential state • Essential derived immutable data: Can always be derived from the user's input data and can be ignored • Essential derived mutable data: can be derived from input data, but can also be changed by the user. (Should be treated as input data, more later) • Accidental derived data: Since it is accidental, the user doesn't need it and it is not in the user's informal requirements– can be ignored • e.g., data derived for performance

State in the ideal world Data Essentiality Data Type Data
Mutability Classiﬁcation Essential Input - Essential State Essential Derived Immutable Accidental State Essential Derived Mutable Accidental State Accidental Derived - Accidental State

Control in the ideal world • Can be completely omitted
as it is entirely accidental • Not usually present in informal requirements • Control ﬂow is about how to execute • Results should be independent of control • Concurrency • Concurrent and sequential execution are identical if you assume execution takes zero time (synchrony hypothesis)

The Real World • Performance is a real concern •
e.g., caching of derived data • Accidental state is sometimes important for ease of expression • Logic is sometimes easier to express in terms of accumulated values • e.g., current position in a simulation can be computed from all previous inputs over time, but is more easily expressed as accidental state • Note: time is considered an input alongside other inputs

Avoid and Separate • Accidental complexity will sometimes be necessary
or desired • To maintain the simplest, practical model we should • Avoid state and control where not absolutely and truly essential (within reasonable conﬁnes of ease of expression) • Separate accidental state and control from the essential data and logic

–Kowalski, co-inventor of Prolog “Algorithm = Logic + Control”

Avoid and Separate • Performance • Avoid explicit management of
accidental state for performance • Instead, declare what accidental state should be used • Leave implementation to separate infrastructure • e.g., caching infrastructure • Removes possibility of state inconsistency within the system (correctness of external infrastructure notwithstanding) • Complexity is the enemy of performance and optimization • It is far simpler and oftentimes easier to improve the performance of a slow system designed for simplicity than remove complexity from a complex system designed to be fast (and may not be because of missed optimization opportunities due to the complexity)

Avoid and Separate • Ease of Expression • If derived
state is the best way to express parts of the logic, externalize its derivation from the system logic and treat it as input • System logic should be free of complexity (state and control) • Other, non-useful (e.g., for ease of expression) accidental complexity • Avoid, may require discipline and additional effort to arrive at a simpler model

Avoid and Separate Complexity Type Recommendation Essential Logic <no data>
Separate Essential Complexity State Separate Accidental Useful Complexity State / Control Separate Accidental Useless Complexity State / Control Avoid

Separation • Restrict the power of each component independently •
Use the least powerful language necessary for each separately speciﬁed component • e.g. use a language without control primitives when only specifying state • The weaker the language, the simpler it is to reason about

Separation • Essential State • Completely self-contained • Not dependent
on any other components • Essential Logic • The heart of the system • Isolated from all accidental complexity • May require changes if essential state component changes • Accidental State and Control • May require changes if essential components change • Nothing essential depends on this *Arrows show reference

The Relational Model • Not about databases, rather, about •
structuring data • a means to manipulate structured data • a mechanism for maintaining integrity and consistency of state • A clear separation of logical and physical layers of the system • Logical model to minimize complexity addressed separately from designing an efﬁcient physical storage model

Relational Model • Structure: use of relations as means for
representing all data • Manipulation: means to specify derived data • Integrity: means to specify constraints on the data • Data Independence: Clear separation between the logical data and its physical representation

SQL is not an accurate reﬂection of the relational model

Structure • Relations • A relation is homogenous set of
records each consisting of a heterogeneous set of uniquely named attributes • Contains no duplicates and has no ordering • Best thought of as a value • Base relation: stored directly • Derived Relation: A "view" deﬁned in terms of other relations • Relation variables, "relvars" reference relation values • Path independent, i.e. no explicit connections need to be made between relation types • Contrast with network and hierarchical models (and OOP approaches)

Manipulation • Relational Algebra (all operations create relations) • Restrict:
Selects a subset of records according to some criteria • Project: Selects a subset of attributes • Product: Cartesian product of args • Union: All records in both args

Manipulation • Intersection: All records in both the args •
Difference: All records in the ﬁrst arg but not in the second • Join: Constructs all records that result from matching identical attributes of the argument relations' records • Divide: Returns all records of the ﬁrst arg which occur in the second arg associated with each record of the third arg

Integrity • Maintained by declaratively specifying a set of constraints
that must hold at all times • In contrast to imperative DBMS functionality such as "triggers"

Data Independence • Separate the logical model from the physical
storage representation • Analogous to the essential / accidental split • Hints may be declaratively provided to the storage system to optimize the physical representation

Functional Relational Programming • The essential components of the system
are based upon functional programming and the relational model • All essential state takes the form of relations • All essential logic is expressed using relational algebra, extended with pure user-deﬁned functions • Recommendations of the essential / accidental split previously discussed are put into practice

Functional Relational Programming • Essential State: expressed in terms of
relvars (the names/types of base relvars) • Essential Logic: Both functional and relational parts. Main part consists of derived relations and integrity constraints; can make use of an arbitrary set of pure user-deﬁned functions • Accidental State and Control: Isolated (from each other) performance hints, e.g. whether to store derived relvar values, concurrency hints, etc. • Other: Interfacing– Feeders and Observers *Arrows show data-ﬂow

Interfacing • Feeders • Components that convert input into relational
assignments • Causes changes to the essential state • Speciﬁed in some state manipulation language by the infrastructure • Integrity constraints asserted • Observers • Generate output in response to changes of the values of derived relvars • Invoked by infrastructure • Both feeders and observers are tasked with converting between relational (ﬂat) and input/output structured data

Beneﬁts for State • The architecture is explicitly designed to
avoid useless accidental state • The core FRP system cannot be in a "bad state" • Derived state is not normally stored • Fixing bugs should never require exhaustive search through essential state • Accidental state does not need to be considered when developing the logic of your system

Beneﬁts for State • Functional component of the logic has
no access to state at all– referentially transparent • State represented with relations has no subjective bias in how related data is accessed • Integrity constraints are imposed in a purely declarative manner, do not interact with eachother, and therefore add them only increase the overall complexity of the system linearly • In contrast to OOP/imperative programming where interaction between methods cause complexity to grow at a much higher rate • More amenable to performance tuning

Beneﬁts for Control • Control is completely avoided in relational
components (essential logic) • Logic is simply a set of equations equating relvars with relations calculated by their expressions • No implicit ordering • No explicit parallelism • Lack of state in essential logic makes implicit parallelism much simpler to implement at the infrastructure level

Beneﬁts for Code Volume • Focus on essentials and avoiding
useless accidental complexity means less code • Mitigates dangers of large volumes of code through use of separation

Benefits for Data Abstraction • (Minimizes) Subjectivity • Use of
flat relations discourages building larger compound data abstractions making access more flexible • Minimal commitment to subjective groupings (only base relations) • (Discourages) Data Hiding • Benefits for referential transparency • testing • informal reasoning

Example FRP System from "Out of the Tarpit"

Complexity and Functional Relational Programming

Complexity and Functional Relational Programming

More Decks by Joe R. Smith

Other Decks in Programming

Featured

Transcript