Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Complexity and Functional Relational Programming

Complexity and Functional Relational Programming

Joe R. Smith

April 19, 2016
Tweet

More Decks by Joe R. Smith

Other Decks in Programming

Transcript

  1. –C.A.R. Hoare “I conclude that there are two ways of

    constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.”
  2. Complexity • The root cause of most software deficiencies •

    Understanding a system is a prerequisite to avoiding problems • In contrast to simplicity, complexity comes from "complecting" of separable concerns
  3. Understanding Software • Testing • Attempt to understand from the

    outside (black box) • Draw conclusions based on observation • Testing with one set of inputs says nothing at all about what to expect with different inputs • Informal Reasoning • Attempt to understand from the inside • Draw conclusions based on reading the code
  4. Causes of Complexity • State • Control • Code Volume

    • more (duplication, unnecessary abstraction, etc.)
  5. Anyone who has ever telephoned a support desk for a

    software system and been told to “try it again”, or “reload the document”, or “restart the program”, or “reboot your computer” or “re-install the program” or even “re- install the operating system and then the program” has direct experience of the problems that state causes for writing reliable, understandable software –Out of the Tarpit
  6. State • Enumerating and understanding all possible states of a

    program is ...difficult. • Testing a system/component in one particular state tells you nothing about it in a different state • Hidden state, inter-component interactions, system state after sequence of input, etc. • Informal reasoning often revolve around a case-by-case mental simulation of behavior • Contamination: Even indirect use of stateful procedures by stateless ones can only be understood in the context of state
  7. with state: when you let the nose of the camel

    into the tent, the rest of him tends to follow
  8. Control • Implicit in most languages (e.g., order of statements)

    • If always enforced implicitly (e.g,. imperative programming), program must be understood in that context • Often over-specifies the system • Developer must specify ordering/flow (how) instead of declare constraints (what) • Forces the developer (and compiler) to understand whether that ordering affects computation to understand the program • Can significantly complicate informal reasoning about a program • Misinterpretation of ordering's significance can result in subtle bugs
  9. a := b + 3 c := d + 2

    e := f * 4 Does the ordering matter?
  10. Control • Concurrency (shared state model) • Tests (with known

    initial state and inputs) in the presence of concurrency tell you nothing about the next time they are run with the same initial state and inputs • Informal reasoning is made exponentially more complex with each additional piece of state
  11. Complexity from Code Volume var maxH: number = 0.0; var

    sd2 = 0.0; for (var u: number = 0; u < n; ++u) { for (i = 0; i < this.k; ++i) Huu[i] = this.g[i][u] = 0.0; for (var v = 0; v < n; ++v) { if (!(u === v)) { // The following loop randomly displaces nodes that are at identical po var maxDisplaces = n; // avoid infinite loop in the case of numerical i while (maxDisplaces--) { sd2 = 0.0; for (i = 0; i < this.k; ++i) { var dx = d[i] = x[i][u] - x[i][v] + 0.0; sd2 += d2[i] = dx * dx; } if (sd2 > 1e-9) break; var rd = this.offsetDir(); for (i = 0; i < this.k; ++i) x[i][v] += rd[i]; } var l: number = Math.sqrt(sd2); var D: number = this.D[u][v]; var weight = this.G != null ? this.G[u][v] : 1.0; if (weight > 1 && l > D || !isFinite(D)) { for (i = 0; i < this.k; ++i) this.H[i][u][v] = 0.0; } else { if (weight > 1.0) { weight = 1.0; } var D2: number = D * D; var gs: number = 2.0 * weight * (l - D) / (D2 * l); var l3 = l * l * l; var hs: number = 2.0 * -weight / (D2 * l3); //if (!isFinite(gs)) // console.log(gs); for (i = 0; i < this.k; ++i) { this.g[i][u] += d[i] * gs; Huu[i] -= this.H[i][u][v] = hs * (l3 + D * (d2[i] - sd2) + l * }
  12. –Brooks, No Silver Bullet “Many of the classic problems of

    developing software products derive from this essential complexity and its nonlinear increase with size”
  13. Code Volume • When much of the code is dealing

    with state and control, complexity tends to increase non-linearly with code size • Compounds the problems caused by state and control, non- linearly • Managing code volume indirectly manages complexity
  14. and More • Complexity breeds complexity • Duplication of code/functionality

    because state/control/LOC complexity makes comprehension difficult • Simplicity is hard • Requires significant effort to arrive at the simplest solution to a problem (time pressures, existing complexity etc) • Power Corrupts • In the absence of language enforced guarantees, mistakes will happen • Restriction of power • e.g., garbage collection, immutability • The more that is possible in a language, the harder it is to understand systems constructed in it
  15. Object Oriented Programming • Currently the dominate method of general

    software development for von-Neumann computers • facilitates von-Neumann style (i.e. state-based) computation • Imperative programming model
  16. State • Stateful objects with impure functions • Objects are

    intensional identities • objects with identical attributes/values are not equivalent • domain-specific equality must be explicitly defined • No guarantee that they adhere to any standard equivalence relation
  17. Control • Implicit sequential control flow • Ordered statements •

    Explicit "shared state" concurrency • Sequential execution unless explicitly made concurrent • Shared state is usually mutable to concurrently executing code
  18. Functional Programming • Declarative (in contrast to imperative) style •

    Roots in stateless Lambda Calculus • Emphasis on immutability • Function composition over sequential statements • Equivalent in power to a Turing machine (Von Neumann architecture)
  19. State • Avoids state/side-effects • "Pure" functions • Enables Referential

    Transparency (given the same set of arguments, a function will always return the same result) • Simplifies testing, in contrast to stateful models • Informal reasoning also simplified
  20. Control • Evaluation order is still implicit • Functionals over

    explicit looping (map, reduce, etc.) • Simpler concurrency models • Concurrent evaluation safe due to lack of mutable state
  21. Logic Programming • Declarative style • Not derived from Von

    Neumann architecture • Pure logic programming simply makes statements about a problem and the desired solution • Describe using axioms • Constrain with required attributes • Solutions are formal logical consequences of axioms and constraints • "Running" the system is equivalent to constructing a formal proof
  22. State • Pure logic programming makes use of no mutable

    state • Some allow for, but still discourage state • e.g. Prolog introducing new axioms
  23. Control • Prolog has implicit ordering • Can lead to

    non-termination if clauses written in certain order • Pure logic programming has no specified control
  24. Types of Complexity • Essential • Inherent in, and the

    essence of, the problem (as seen by the users) • Complexity that must be dealt with, even in an ideal world • i.e., with a language and infrastructure to directly express the users' problem • Accidental • Everything else • Performance issues, language expressiveness, infrastructure, hardware limitations (e.g., computational complexity), etc.
  25. Ideal World • Informal (user) requirements -> Formal requirements •

    no relevant ambiguity/omissions • Simply execute formal requirements • No control, just declaration of facts and constraints • Absolute simplicity • The essence of declarative programming
  26. State in the ideal world • Input data: Data input

    by the user is the only real essential state • Essential derived immutable data: Can always be derived from the user's input data and can be ignored • Essential derived mutable data: can be derived from input data, but can also be changed by the user. (Should be treated as input data, more later) • Accidental derived data: Since it is accidental, the user doesn't need it and it is not in the user's informal requirements– can be ignored • e.g., data derived for performance
  27. State in the ideal world Data Essentiality Data Type Data

    Mutability Classification Essential Input - Essential State Essential Derived Immutable Accidental State Essential Derived Mutable Accidental State Accidental Derived - Accidental State
  28. Control in the ideal world • Can be completely omitted

    as it is entirely accidental • Not usually present in informal requirements • Control flow is about how to execute • Results should be independent of control • Concurrency • Concurrent and sequential execution are identical if you assume execution takes zero time (synchrony hypothesis)
  29. The Real World • Performance is a real concern •

    e.g., caching of derived data • Accidental state is sometimes important for ease of expression • Logic is sometimes easier to express in terms of accumulated values • e.g., current position in a simulation can be computed from all previous inputs over time, but is more easily expressed as accidental state • Note: time is considered an input alongside other inputs
  30. Avoid and Separate • Accidental complexity will sometimes be necessary

    or desired • To maintain the simplest, practical model we should • Avoid state and control where not absolutely and truly essential (within reasonable confines of ease of expression) • Separate accidental state and control from the essential data and logic
  31. Avoid and Separate • Performance • Avoid explicit management of

    accidental state for performance • Instead, declare what accidental state should be used • Leave implementation to separate infrastructure • e.g., caching infrastructure • Removes possibility of state inconsistency within the system (correctness of external infrastructure notwithstanding) • Complexity is the enemy of performance and optimization • It is far simpler and oftentimes easier to improve the performance of a slow system designed for simplicity than remove complexity from a complex system designed to be fast (and may not be because of missed optimization opportunities due to the complexity)
  32. Avoid and Separate • Ease of Expression • If derived

    state is the best way to express parts of the logic, externalize its derivation from the system logic and treat it as input • System logic should be free of complexity (state and control) • Other, non-useful (e.g., for ease of expression) accidental complexity • Avoid, may require discipline and additional effort to arrive at a simpler model
  33. Avoid and Separate Complexity Type Recommendation Essential Logic <no data>

    Separate Essential Complexity State Separate Accidental Useful Complexity State / Control Separate Accidental Useless Complexity State / Control Avoid
  34. Separation • Restrict the power of each component independently •

    Use the least powerful language necessary for each separately specified component • e.g. use a language without control primitives when only specifying state • The weaker the language, the simpler it is to reason about
  35. Separation • Essential State • Completely self-contained • Not dependent

    on any other components • Essential Logic • The heart of the system • Isolated from all accidental complexity • May require changes if essential state component changes • Accidental State and Control • May require changes if essential components change • Nothing essential depends on this *Arrows show reference
  36. The Relational Model • Not about databases, rather, about •

    structuring data • a means to manipulate structured data • a mechanism for maintaining integrity and consistency of state • A clear separation of logical and physical layers of the system • Logical model to minimize complexity addressed separately from designing an efficient physical storage model
  37. Relational Model • Structure: use of relations as means for

    representing all data • Manipulation: means to specify derived data • Integrity: means to specify constraints on the data • Data Independence: Clear separation between the logical data and its physical representation
  38. Structure • Relations • A relation is homogenous set of

    records each consisting of a heterogeneous set of uniquely named attributes • Contains no duplicates and has no ordering • Best thought of as a value • Base relation: stored directly • Derived Relation: A "view" defined in terms of other relations • Relation variables, "relvars" reference relation values • Path independent, i.e. no explicit connections need to be made between relation types • Contrast with network and hierarchical models (and OOP approaches)
  39. Manipulation • Relational Algebra (all operations create relations) • Restrict:

    Selects a subset of records according to some criteria • Project: Selects a subset of attributes • Product: Cartesian product of args • Union: All records in both args
  40. Manipulation • Intersection: All records in both the args •

    Difference: All records in the first arg but not in the second • Join: Constructs all records that result from matching identical attributes of the argument relations' records • Divide: Returns all records of the first arg which occur in the second arg associated with each record of the third arg
  41. Integrity • Maintained by declaratively specifying a set of constraints

    that must hold at all times • In contrast to imperative DBMS functionality such as "triggers"
  42. Data Independence • Separate the logical model from the physical

    storage representation • Analogous to the essential / accidental split • Hints may be declaratively provided to the storage system to optimize the physical representation
  43. Functional Relational Programming • The essential components of the system

    are based upon functional programming and the relational model • All essential state takes the form of relations • All essential logic is expressed using relational algebra, extended with pure user-defined functions • Recommendations of the essential / accidental split previously discussed are put into practice
  44. Functional Relational Programming • Essential State: expressed in terms of

    relvars (the names/types of base relvars) • Essential Logic: Both functional and relational parts. Main part consists of derived relations and integrity constraints; can make use of an arbitrary set of pure user-defined functions • Accidental State and Control: Isolated (from each other) performance hints, e.g. whether to store derived relvar values, concurrency hints, etc. • Other: Interfacing– Feeders and Observers *Arrows show data-flow
  45. Interfacing • Feeders • Components that convert input into relational

    assignments • Causes changes to the essential state • Specified in some state manipulation language by the infrastructure • Integrity constraints asserted • Observers • Generate output in response to changes of the values of derived relvars • Invoked by infrastructure • Both feeders and observers are tasked with converting between relational (flat) and input/output structured data
  46. Benefits for State • The architecture is explicitly designed to

    avoid useless accidental state • The core FRP system cannot be in a "bad state" • Derived state is not normally stored • Fixing bugs should never require exhaustive search through essential state • Accidental state does not need to be considered when developing the logic of your system
  47. Benefits for State • Functional component of the logic has

    no access to state at all– referentially transparent • State represented with relations has no subjective bias in how related data is accessed • Integrity constraints are imposed in a purely declarative manner, do not interact with eachother, and therefore add them only increase the overall complexity of the system linearly • In contrast to OOP/imperative programming where interaction between methods cause complexity to grow at a much higher rate • More amenable to performance tuning
  48. Benefits for Control • Control is completely avoided in relational

    components (essential logic) • Logic is simply a set of equations equating relvars with relations calculated by their expressions • No implicit ordering • No explicit parallelism • Lack of state in essential logic makes implicit parallelism much simpler to implement at the infrastructure level
  49. Benefits for Code Volume • Focus on essentials and avoiding

    useless accidental complexity means less code • Mitigates dangers of large volumes of code through use of separation
  50. Benefits for Data Abstraction • (Minimizes) Subjectivity • Use of

    flat relations discourages building larger compound data abstractions making access more flexible • Minimal commitment to subjective groupings (only base relations) • (Discourages) Data Hiding • Benefits for referential transparency • testing • informal reasoning