Slide 1

Slide 1 text

Partial Specifications of Libraries: Applications in Software Engineering Vladimir Itsykson Peter the Great St.Petersburg Polytechnic University JetBrains Research Tbilisi, 08-Nov-2019

Slide 2

Slide 2 text

Partial Specifications of Libraries Who am I? Vladimir Itsykson • Director of Higher School of Intellectual Systems & Supercomputer Technologies Institute of Computer Science & Technologies Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia • Head of JetBrains Research laboratory “Verification & Program Analysis” (joint research laboratory with Polytechnic university), St. Petersburg, Russia

Slide 3

Slide 3 text

Partial Specifications of Libraries Overview • The main subject of this presentation is partial specifications of program libraries • We will discuss – problems which raise when we deal with external libraries – our approach to cope with external program libraries – formalism for describing structure & behavior of program libraries – DSL for expressing specifications in user friendly manner – applications of library specifications in various software engineering tasks

Slide 4

Slide 4 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with external libraries • Formalism for describing partial specifications of libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 5

Slide 5 text

Partial Specifications of Libraries Problems with External Libraries • Current practices of libraries distribution are the following: – The authors distribute libraries “as is” without any documentation – The authors distribute libraries only with short informal documentation – The authors distribute libraries with short description of API calls and types used – The authors distribute libraries with some examples of usage • There are no any descriptions of library semantics • The formal rules for using libraries are not provided by the authors • The result of lack of formal semantics is software integration errors

Slide 6

Slide 6 text

Partial Specifications of Libraries Problems with External Libraries fd = fopen(…) … n = fread(fd, …) … fclose(fd) fclose(fd) … n = fread(fd, …) … fd = fopen(…) Syntactically correct Semantically correct Syntactically correct Semantically incorrect! Trivial example (stdio.h)

Slide 7

Slide 7 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with external libraries • Formalism for describing partial specifications of libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 8

Slide 8 text

Partial Specifications of Libraries Formal Specifications of Libraries Instead of making this slide, I participated in a wine tasting yesterday

Slide 9

Slide 9 text

Partial Specifications of Libraries Formal Specifications of Libraries • Main goals are to make the formalism – As simple as possible – Enough powerful to express significant share of all libraries • The main idea is to: – specify only visible (external) behavior of libraries – hide insignificant details • As the consequence, the target specifications will be partial

Slide 10

Slide 10 text

Partial Specifications of Libraries Formal Specifications of Libraries • Components of program libraries to be described (static semantics): – Data types – Variables & objects – Functions of public API • Behavior of program libraries to be described (dynamic semantics): – Semantics of API functions • Behavior of functions • Side effects of functions • Contracts of functions – Rules of library use

Slide 11

Slide 11 text

Partial Specifications of Libraries Formal Specifications of Libraries • Library is a composition of EFSMs • Each EFSM (automaton) describes behavior of one library object – FSM states correspond to object states – Transitions correspond to API function calls – New FSM creation is a side action of API function call • Each API function: – Function signature – Function contract – High-level behavior – Actions to be performed (semantic descriptions) * V. Itsykson. Formalism and Language Tools for Specification of the Semantics of Software Libraries / Automatic Control and Computer Sciences, December 2017, Volume 51, Issue 7, pp. 531-538

Slide 12

Slide 12 text

Partial Specifications of Libraries Formal Specification of Library Function Fi = < Name, Args, Res, Pre, Post, A, CondA, D, CondD > • Name –name of the function • Args – set of the formal arguments of the function • Res –result of the function • Pre – preconditions of function • Post – postconditions of function • A – set of semantic actions performed by the function • CondA – set of conditions for semantic actions • D – set of launched child automata • CondD – set of launch conditions for child automata F = { Fi } • F – specification of API functions

Slide 13

Slide 13 text

Partial Specifications of Libraries Formal Specification of Automaton

Slide 14

Slide 14 text

Partial Specifications of Libraries Formal Specification of Library B = { L, S1 (q, P), …, Sn (q, P) } • L – main automaton describing the behavior of the entire library • Si – ith child automaton launched if certain conditions are fulfilled • q – initial state of the child automaton • P – optional parameter of the child automaton Lib = • F = {Fi } – set of library functions • B – behavioral description of library

Slide 15

Slide 15 text

Partial Specifications of Libraries Formal Model of TCP socket (server side) Automata of library States of automata Calls of API functions Creating new automaton Finish states

Slide 16

Slide 16 text

Partial Specifications of Libraries Formal Model of TCP socket (client side)

Slide 17

Slide 17 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 18

Slide 18 text

Partial Specifications of Libraries LibSL: Library Specification Language • User friendly DSL for specifying libraries • Allow defining: – Library name – Import sections – Semantic data types (data types with annotated values) – Automata and their parameters (states & shifts) – Global objects variables (definitions & initializations) – API functions descriptions (signatures, contracts, behaviors, semantic actions, etc.) * V. Itsykson. LibSL: A language for software components specification // Software Engineering. 2018. N 5. pp. 209-220. (in Russian) doi:10.17587/prin.9.209-220

Slide 19

Slide 19 text

Partial Specifications of Libraries Global objects section API functions section LibSL: Structure of Specification library ; import ; ... types { // Semantic types ... } automaton : { // Automaton description ... } ... fun (): {// API function description ... } ... var : = ; // Global variables declaration ... var main: = new Main() // Main automaton creation Import section Semantic types section Automata section

Slide 20

Slide 20 text

Partial Specifications of Libraries Definitions of semantic types with annotated values Simple semantic types definitions Alias types definitions LibSL: Semantic Types types { int = int32; unsigned = unsigned32; byte = unsigned8; SOCKET (int); // Socket type BUFFER (*void); // Socket buffer LENGTH (int); // Socket length PROTOCOL_TYPE (int); // Socket protocol ID SOCKET_TYPE (int) { // Socket type STREAM: 1; // Stream socket DGRAM: 2; // Datagram socket RAW: 3; // Raw Socket SEQPACKET: 5; // Stream packet socket }; SIZE (int) { ERROR: -1; } }

Slide 21

Slide 21 text

Partial Specifications of Libraries Internal variables Automaton states LibSL: Automaton Description automaton BSD_SOCKET: int { var blocked: boolean; state Created; state Bound; state Established; state Listening; finishstate Closed; shift Create->Bound (bind); shift Bound->Create (close); shift Bound->Listening (listen); shift Listening->Bound (close); shift Listening->self (accept); shift Established->Created (close); shift Established->self (recv); shift Established->self (send); shift any->Closed (shutdown); } Automaton shifts Automaton name & type

Slide 22

Slide 22 text

Partial Specifications of Libraries LibSL: API Functions Description fun socket(domain: DOMAIN, type: SOCKET_TYPE, proto: PROTOCOL_TYPE): SOCKET { result = new BSD_SOCKET(Created); } fun accept(@s: SOCKET, addr: SOCK_ADDR, addrlen: SOCK_LEN): SOCKET { result = new TCP_SOCKET(Established); } fun send(@s: SOCKET, msg: BUFFER, len: LENGTH, FLAGS: int): SIZE { if (len > 0) action SEND(s, msg, len); else action ERROR(Send01, “Parameter error”); } Semantic types New automaton creation Automaton variable annotation API functions Semantic actions

Slide 23

Slide 23 text

Partial Specifications of Libraries LibSL: Global Objects Section var errno: int = 0; var status: int = 1; var stdin: int = new File(Created, mRead); var stderr: int = new File(Created, mError); var stdout: int; stdout = new File(Created, mWrite); Global variables creation Global objects creation & initialization

Slide 24

Slide 24 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 25

Slide 25 text

Partial Specifications of Libraries Applications: Porting of Software • Porting the source program using the old library to the target program using the new library • Both libraries have similar functionality Source Program Old Library Porting Target Program New Library

Slide 26

Slide 26 text

Partial Specifications of Libraries When we need to port the program? • Cases of porting: – Migration to a new operation system – Migration to a new hardware platform (e.g. mobile) – Moving from the old version of library to the new one – Moving from one library to another with similar functionality – Translation of the source code to another programming language – …

Slide 27

Slide 27 text

Partial Specifications of Libraries How is the porting task being solved now? • Use of cross-platform libraries • Use of parameterized macros (in C) • Use of intermediate layers • Manual rewriting of function calls

Slide 28

Slide 28 text

Partial Specifications of Libraries Problems of existing approaches • A lot of manual actions • In fact, a modified program is a new program → – All QA efforts of the source program are made in vain – The full retesting of a modified program is necessary • The solution is the automation of the program transformation

Slide 29

Slide 29 text

Partial Specifications of Libraries The General Porting Algorithm • Creating the specification of the old library • Creating the specification of the new library • Checking compatibility of the old and new libraries by means of analysis of both specifications • In case of the compatibility – Creating the model of a source program – Converting the model in accordance of both specifications – Generating a target program from the converted model

Slide 30

Slide 30 text

Partial Specifications of Libraries Program Migration Tool Source program Source Code Source Library Specification Target program Source Code Target Library Specification Reengineering tool Library Model Builder Source Library Model Target Library Model Program Model Builder Source Program Model Model Converter Modified Program Model Program Generator

Slide 31

Slide 31 text

Partial Specifications of Libraries Porting of Software: Evaluation • We created 2 porting tools – Experimental tool for porting C-program1 – More functional experimental tool for porting Java-program2 • Now we can migrate Java-programs with certain limitations 1 Itsykson V. M., Zozulya A.V. Automated Program Transformation for Migration to New Libraries // Software Engineering 2012. V 6. pp. 8-14. 2 Aleksyuk, A.O., Itsykson, V.M. Semantics-Driven Migration of Java Programs: A Practical Application / Automatic Control and Computer Sciences, December 2018, Volume 52, Issue 7, pp 581–588

Slide 32

Slide 32 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 33

Slide 33 text

Partial Specifications of Libraries Applications: Enhancements of Static Analysis • Essential problems of static analysis: – Inter-procedural analysis – Analysis of multicomponent applications (when source code of external components are unavailable) – Analysis of applications with external libraries usage • These problems lead to: – increasing the analysis complexity (and the analysis time, respectively) – decreasing the analysis soundness – decreasing the analysis precision

Slide 34

Slide 34 text

Partial Specifications of Libraries Applications: Enhancements of Static Analysis • Main idea is to use partial specifications of libraries to enhance of the static analysis: – Source code of external library is replaced with LibSL specification – Respectively, models of library function source code are replaced with simplified models of function behavior

Slide 35

Slide 35 text

Partial Specifications of Libraries Borealis BMC Static Analyzer • Borealis BMC Static Analyzer*: – Based on bounded model checking – Converts a source code to logical predicates – Converts correctness checks to logical predicates – Converts function summaries (contacts, Craig interpolants, etc.) to logical predicates – Builds complex predicate from simple predicates mentioned above – Solves a complex predicate by means of SMT solvers (Z3, Boolector, etc.) – Interprets solution found by SMT solver and maps it onto a source code * Akhin M., Belyaev M., Itsykson V. (2017) Borealis Bounded Model Checker: The Coming of Age Story. In: Mazzara M., Meyer B. (eds) Present and Ulterior Software Engineering. Springer, Cham, pp. 119-137

Slide 36

Slide 36 text

Partial Specifications of Libraries Borealis BMC Static Analyzer with Library Specs LLVM Predicate Extraction PS Converter SMT formula SAT Counterexample UNSAT ОК Source Code SMT Solver LLVM IR Predicate State Assertions Pointer checks Memory checks … Function Summaries Contracts Inference Craig Interpolations Libraries Specifications LibSL

Slide 37

Slide 37 text

Partial Specifications of Libraries Example of Library Function Description fun recv (s: int, buf: *void, len: unsigned, flags: int): long { if ( buf == 0 ) { action ERROR(Recv01, "Second argument is NULL"); } if ( flags < 0 ) { action ERROR(Recv02, "Incorrect flags"); } if ( state(s) != Established) { action ERROR(Recv03, "Receiving from unattached socket"); } set ( buf, [-inf:+inf], len ); return [0:len]; } Bug detection Bug type & error message

Slide 38

Slide 38 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 39

Slide 39 text

Partial Specifications of Libraries Applications: Cross-Language Integration • The goal of the project is to work out the approach for integrating program components, developed in different languages, without a manually written linking code • The project is focused on solution of the following tasks: – to allow reusing existing, well-tested and optimized program components and libraries for older languages from newer languages – to allow adding new features implemented in modern languages to established projects written in older languages – to allow performing above-mentioned tasks without extensive knowledge in both languages – to provide easy adaptation for new languages and libraries

Slide 40

Slide 40 text

Partial Specifications of Libraries Applications: Cross-Language Integration • Remote Procedure Call (RPC): gRPC, Thrift, RabbitMQ RPC, Java RMI, Cap’n’Proto, XML-RPC – Require writing a linking code manually – May require adapting existing program components (for example, to solve serialization problems) – Have performance limitations • Foreign Function Interface (FFI): libffi, JNI, SWIG. – C libraries are the primary aim, other languages are supported, but require additional work – Requires an additional code and efforts to coordinate different memory management models (manual, garbage collection) – May require adapting language runtimes and VMs for operating in a single process

Slide 41

Slide 41 text

Partial Specifications of Libraries Applications: Cross-Language Integration • The main idea is to use partial specifications of libraries to automate cross-language integration: – Automatically generate wrappers for remote API functions based on library specification – Automatically generate receiving part for remote library based on library specification

Slide 42

Slide 42 text

Partial Specifications of Libraries LibraryLink: Scheme of the Approach Main program Library Wrapper Receiver Wrapper Core Receiver Core RPC Code generator First Language Second Language LibSL library description Wrapper & Receiver templates for each language

Slide 43

Slide 43 text

Partial Specifications of Libraries LibraryLink: Details • Automatically generate linking modules (wrappers and receivers) based on the LibSL specifications • Automata are mapped to classes, transitions are mapped to methods and constructors • Callbacks and inheritance are supported • Minimize the need for serialization using a handle mechanism • Semantic information is used to improve performance (caching and prefetching) • Aware of multithreading and memory management models

Slide 44

Slide 44 text

Partial Specifications of Libraries LibraryLink: Comparison Approach Wrapper/ Receiver Wrapper/ Receiver core (language support) Serialization required Memory management coordination Caching and prefetching RPC Manually written For each language (N) Yes Manual Manual FFI Manually written For each pair of languages (~N²) No Manual Manual LibraryLink Generated from an interface description For each language (N) No Built-in Inferred from a semantic model

Slide 45

Slide 45 text

Partial Specifications of Libraries LibraryLink: Current State • Cross-integration implemented in the pilot tool • Libraries can be written in C, Python, and Go • Wrappers can be made for Java, Kotlin and Go • The approach tested on popular libraries: Requests (Python), Z3 (C), Jennifer (Go) • Performance is up to 90,000 calls per second (270,000 with prefetch)

Slide 46

Slide 46 text

Partial Specifications of Libraries LibraryLink: LibSL Example automaton Z3_config { state Created, Constructed, Closed; shift Created -> Constructed (Z3_mk_config); shift Constructed -> self (Z3_mk_context, Z3_set_param_value); shift Constructed -> Closed (Z3_del_config); } … fun Z3_config.Z3_mk_config(): Z3_config; fun Z3_config.Z3_mk_context(cfg: self): Z3_context; … fun Z3_context.Z3_mk_and(cfg: self, num_args: Int, args: Z3_ast[]): Z3_ast; … Z3 automaton Z3 API

Slide 47

Slide 47 text

Partial Specifications of Libraries LibraryLink: Kotlin Code Example cfg = Z3Kotlin.Z3_config() ctx = cfg.Z3_mk_context() symbol_x = ctx.Z3_mk_int_symbol(0) symbol_y = ctx.Z3_mk_int_symbol(1) /* De Morgan - with a negation around */ /* !(!(x && y) <-> (!x || !y)) */ not_x = ctx.Z3_mk_not(x) not_y = ctx.Z3_mk_not(y) args[0] = x args[1] = y x_and_y = ctx.Z3_mk_and(2, args)

Slide 48

Slide 48 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 49

Slide 49 text

Partial Specifications of Libraries Applications: Integration Errors Detection • A library specification expresses reference behavior of library and set reference protocol of library usage • Main idea is use the dynamic analysis of program to check correctness of library usage protocol • We will record program traces and check their correspondence to reference behavior. V. Itsykson, M, Gusev. Automation of library usage correctness detection // Proceedings of SEIM conference, St.Petersburg, 2018.

Slide 50

Slide 50 text

Partial Specifications of Libraries Instrumentation of Program and Trace Collecting • Specification-driven instrumentation – Only library-related elements of program are instrumented (function calls, library objects changes, etc.) • Program execution – Instrumented program are executed on tests • Trace collecting – logging any library API functions calls – logging full context: object ID, handles, function parameters, etc.

Slide 51

Slide 51 text

Partial Specifications of Libraries Library Usage Protocol Violations Detection • Mapping the collected program traces to library automata: – Filtering trace for extracting events related to concrete library objects (library object trace) – Playing library object trace (events) on the library model – Checking events correctness • Traces correctness checking: – Checking preconditions for each library function – Checking postconditions for each library function – Checking validity of automaton states change

Slide 52

Slide 52 text

Partial Specifications of Libraries Trace Library Usage Protocol Violations Detection Instrumentation LibSL library description Instrumented Program Source Program Correctness checking Error Report Execution

Slide 53

Slide 53 text

Partial Specifications of Libraries Library Usage Protocol Violations Detection socket(…) bind(…) recv(…)

Slide 54

Slide 54 text

Partial Specifications of Libraries Library Usage Violations Detection: Evaluation • Above-mentioned approach was implemented as a pilot tool for Java programs. • We created LibSL specifications for several libraries used in the apache/incubator-netbeans repository. • Our tool was tested on several artificial projects and 400+ files from the apache/incubator-netbeans repository. • The tool detects 300+ violations of library protocol usage. Most of them are – resource leaks – violations of library functions preconditions

Slide 55

Slide 55 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 56

Slide 56 text

Partial Specifications of Libraries Applications: Specification Mining • Main question of any specification-based approach: where we can get specifications for libraries? • The authors do will not provide specifications for theirs libraries! (at least in the near future) • The programmer cannot create specifications because of lack of knowledge about libraries • The reasonable way is to use the knowledge of the programming community!

Slide 57

Slide 57 text

Partial Specifications of Libraries Applications: Specification Mining • There are billions of open source projects available in software repositories (Github, Bitbucket, etc.) • Many of them use target library • Our approach is to extract library specification by means of learning millions of projects which use target library:

Slide 58

Slide 58 text

Partial Specifications of Libraries Applications: Specification Mining • Finding all projects from GitHub, which written in Java and use target library • Loading appropriate projects • If the authors of project don’t provide any tests then generating tests for it • Running native or generated tests • Collecting project traces • Analyzing traces and generating predicates • Converting collected predicates into specification skeleton • Manual enriching of specification

Slide 59

Slide 59 text

Partial Specifications of Libraries Presentation Outline • Introduction • Problems with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion

Slide 60

Slide 60 text

Partial Specifications of Libraries Applications: What Else? In which areas can we also use formal library specifications? • Static analysis for the library usage violations detection • Use library semantics to form behavioral predicate • Extend BMC-based static analysis tools to detect library usage violations based on reference library behavior • Static analysis for specifications mining • Automated creation of the libraries documentation • …

Slide 61

Slide 61 text

Partial Specifications of Libraries Conclusion • We presented new formalism for specifications of structure & behavior of external libraries • We presented LibSL, new DSL for creating specification in user friendly manner • We applied formalism & language to solve several software engineering tasks – Automated specification driven application porting from one library to another – Static Analysis enhancements for multicomponent projects – Cross-language integration: using library written in one language from application written in another language – Detection of violation of library protocol usage – Mining of library specifications from open repositories • Future research directions: – Extend static analysis tool for the library usage violations detection – Use methods of static analysis for specifications mining – Automated creation of the libraries documentation

Slide 62

Slide 62 text

Partial Specifications of Libraries Contacts Vladimir Itsykson [email protected] Director of High School of Intellectual Systems & Supercomputer Technologies Peter the Great St.Petersburg Polytechnic University, Russia JetBrains Research laboratory “Verification & Program Analysis”