Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Partial Specifications of Libraries: Applications in Software Engineering

Exactpro
November 08, 2019

Partial Specifications of Libraries: Applications in Software Engineering

Vladimir Itsykson
Director of the Higher School of Intelligent Systems & Supercomputer Technologies, Peter the Great St. Petersburg Polytechnic University

International Conference on Software Testing, Machine Learning and Complex Process Analysis (TMPA-2019)
7-9 November 2019, Tbilisi

Video: https://youtu.be/R2dFbcdQ6y4

TMPA Conference website https://tmpaconf.org/
TMPA Conference on Facebook https://www.facebook.com/groups/tmpaconf/

Exactpro

November 08, 2019
Tweet

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. Partial Specifications of Libraries: Applications in Software Engineering Vladimir Itsykson

    Peter the Great St.Petersburg Polytechnic University JetBrains Research Tbilisi, 08-Nov-2019
  2. Partial Specifications of Libraries Who am I? Vladimir Itsykson •

    Director of Higher School of Intellectual Systems & Supercomputer Technologies Institute of Computer Science & Technologies Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia • Head of JetBrains Research laboratory “Verification & Program Analysis” (joint research laboratory with Polytechnic university), St. Petersburg, Russia
  3. Partial Specifications of Libraries Overview • The main subject of

    this presentation is partial specifications of program libraries • We will discuss – problems which raise when we deal with external libraries – our approach to cope with external program libraries – formalism for describing structure & behavior of program libraries – DSL for expressing specifications in user friendly manner – applications of library specifications in various software engineering tasks
  4. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with external libraries • Formalism for describing partial specifications of libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  5. Partial Specifications of Libraries Problems with External Libraries • Current

    practices of libraries distribution are the following: – The authors distribute libraries “as is” without any documentation – The authors distribute libraries only with short informal documentation – The authors distribute libraries with short description of API calls and types used – The authors distribute libraries with some examples of usage • There are no any descriptions of library semantics • The formal rules for using libraries are not provided by the authors • The result of lack of formal semantics is software integration errors
  6. Partial Specifications of Libraries Problems with External Libraries fd =

    fopen(…) … n = fread(fd, …) … fclose(fd) fclose(fd) … n = fread(fd, …) … fd = fopen(…) Syntactically correct Semantically correct Syntactically correct Semantically incorrect! Trivial example (stdio.h)
  7. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with external libraries • Formalism for describing partial specifications of libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  8. Partial Specifications of Libraries Formal Specifications of Libraries Instead of

    making this slide, I participated in a wine tasting yesterday
  9. Partial Specifications of Libraries Formal Specifications of Libraries • Main

    goals are to make the formalism – As simple as possible – Enough powerful to express significant share of all libraries • The main idea is to: – specify only visible (external) behavior of libraries – hide insignificant details • As the consequence, the target specifications will be partial
  10. Partial Specifications of Libraries Formal Specifications of Libraries • Components

    of program libraries to be described (static semantics): – Data types – Variables & objects – Functions of public API • Behavior of program libraries to be described (dynamic semantics): – Semantics of API functions • Behavior of functions • Side effects of functions • Contracts of functions – Rules of library use
  11. Partial Specifications of Libraries Formal Specifications of Libraries • Library

    is a composition of EFSMs • Each EFSM (automaton) describes behavior of one library object – FSM states correspond to object states – Transitions correspond to API function calls – New FSM creation is a side action of API function call • Each API function: – Function signature – Function contract – High-level behavior – Actions to be performed (semantic descriptions) * V. Itsykson. Formalism and Language Tools for Specification of the Semantics of Software Libraries / Automatic Control and Computer Sciences, December 2017, Volume 51, Issue 7, pp. 531-538
  12. Partial Specifications of Libraries Formal Specification of Library Function Fi

    = < Name, Args, Res, Pre, Post, A, CondA, D, CondD > • Name –name of the function • Args – set of the formal arguments of the function • Res –result of the function • Pre – preconditions of function • Post – postconditions of function • A – set of semantic actions performed by the function • CondA – set of conditions for semantic actions • D – set of launched child automata • CondD – set of launch conditions for child automata F = { Fi } • F – specification of API functions
  13. Partial Specifications of Libraries Formal Specification of Library B =

    { L, S1 (q, P), …, Sn (q, P) } • L – main automaton describing the behavior of the entire library • Si – ith child automaton launched if certain conditions are fulfilled • q – initial state of the child automaton • P – optional parameter of the child automaton Lib = <F, B> • F = {Fi } – set of library functions • B – behavioral description of library
  14. Partial Specifications of Libraries Formal Model of TCP socket (server

    side) Automata of library States of automata Calls of API functions Creating new automaton Finish states
  15. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  16. Partial Specifications of Libraries LibSL: Library Specification Language • User

    friendly DSL for specifying libraries • Allow defining: – Library name – Import sections – Semantic data types (data types with annotated values) – Automata and their parameters (states & shifts) – Global objects variables (definitions & initializations) – API functions descriptions (signatures, contracts, behaviors, semantic actions, etc.) * V. Itsykson. LibSL: A language for software components specification // Software Engineering. 2018. N 5. pp. 209-220. (in Russian) doi:10.17587/prin.9.209-220
  17. Partial Specifications of Libraries Global objects section API functions section

    LibSL: Structure of Specification library <LibraryName>; import <FileName>; ... types { // Semantic types ... } automaton <AutomatonClass>: <Type> { // Automaton description ... } ... fun <FunName>(<Params>): <Type> {// API function description ... } ... var <VarName>: <Type> = <Value>; // Global variables declaration ... var main: <Type> = new Main(<InitState>) // Main automaton creation Import section Semantic types section Automata section
  18. Partial Specifications of Libraries Definitions of semantic types with annotated

    values Simple semantic types definitions Alias types definitions LibSL: Semantic Types types { int = int32; unsigned = unsigned32; byte = unsigned8; SOCKET (int); // Socket type BUFFER (*void); // Socket buffer LENGTH (int); // Socket length PROTOCOL_TYPE (int); // Socket protocol ID SOCKET_TYPE (int) { // Socket type STREAM: 1; // Stream socket DGRAM: 2; // Datagram socket RAW: 3; // Raw Socket SEQPACKET: 5; // Stream packet socket }; SIZE (int) { ERROR: -1; } }
  19. Partial Specifications of Libraries Internal variables Automaton states LibSL: Automaton

    Description automaton BSD_SOCKET: int { var blocked: boolean; state Created; state Bound; state Established; state Listening; finishstate Closed; shift Create->Bound (bind); shift Bound->Create (close); shift Bound->Listening (listen); shift Listening->Bound (close); shift Listening->self (accept); shift Established->Created (close); shift Established->self (recv); shift Established->self (send); shift any->Closed (shutdown); } Automaton shifts Automaton name & type
  20. Partial Specifications of Libraries LibSL: API Functions Description fun socket(domain:

    DOMAIN, type: SOCKET_TYPE, proto: PROTOCOL_TYPE): SOCKET { result = new BSD_SOCKET(Created); } fun accept(@s: SOCKET, addr: SOCK_ADDR, addrlen: SOCK_LEN): SOCKET { result = new TCP_SOCKET(Established); } fun send(@s: SOCKET, msg: BUFFER, len: LENGTH, FLAGS: int): SIZE { if (len > 0) action SEND(s, msg, len); else action ERROR(Send01, “Parameter error”); } Semantic types New automaton creation Automaton variable annotation API functions Semantic actions
  21. Partial Specifications of Libraries LibSL: Global Objects Section var errno:

    int = 0; var status: int = 1; var stdin: int = new File(Created, mRead); var stderr: int = new File(Created, mError); var stdout: int; stdout = new File(Created, mWrite); Global variables creation Global objects creation & initialization
  22. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  23. Partial Specifications of Libraries Applications: Porting of Software • Porting

    the source program using the old library to the target program using the new library • Both libraries have similar functionality Source Program Old Library Porting Target Program New Library
  24. Partial Specifications of Libraries When we need to port the

    program? • Cases of porting: – Migration to a new operation system – Migration to a new hardware platform (e.g. mobile) – Moving from the old version of library to the new one – Moving from one library to another with similar functionality – Translation of the source code to another programming language – …
  25. Partial Specifications of Libraries How is the porting task being

    solved now? • Use of cross-platform libraries • Use of parameterized macros (in C) • Use of intermediate layers • Manual rewriting of function calls
  26. Partial Specifications of Libraries Problems of existing approaches • A

    lot of manual actions • In fact, a modified program is a new program → – All QA efforts of the source program are made in vain – The full retesting of a modified program is necessary • The solution is the automation of the program transformation
  27. Partial Specifications of Libraries The General Porting Algorithm • Creating

    the specification of the old library • Creating the specification of the new library • Checking compatibility of the old and new libraries by means of analysis of both specifications • In case of the compatibility – Creating the model of a source program – Converting the model in accordance of both specifications – Generating a target program from the converted model
  28. Partial Specifications of Libraries Program Migration Tool Source program Source

    Code Source Library Specification Target program Source Code Target Library Specification Reengineering tool Library Model Builder Source Library Model Target Library Model Program Model Builder Source Program Model Model Converter Modified Program Model Program Generator
  29. Partial Specifications of Libraries Porting of Software: Evaluation • We

    created 2 porting tools – Experimental tool for porting C-program1 – More functional experimental tool for porting Java-program2 • Now we can migrate Java-programs with certain limitations 1 Itsykson V. M., Zozulya A.V. Automated Program Transformation for Migration to New Libraries // Software Engineering 2012. V 6. pp. 8-14. 2 Aleksyuk, A.O., Itsykson, V.M. Semantics-Driven Migration of Java Programs: A Practical Application / Automatic Control and Computer Sciences, December 2018, Volume 52, Issue 7, pp 581–588
  30. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  31. Partial Specifications of Libraries Applications: Enhancements of Static Analysis •

    Essential problems of static analysis: – Inter-procedural analysis – Analysis of multicomponent applications (when source code of external components are unavailable) – Analysis of applications with external libraries usage • These problems lead to: – increasing the analysis complexity (and the analysis time, respectively) – decreasing the analysis soundness – decreasing the analysis precision
  32. Partial Specifications of Libraries Applications: Enhancements of Static Analysis •

    Main idea is to use partial specifications of libraries to enhance of the static analysis: – Source code of external library is replaced with LibSL specification – Respectively, models of library function source code are replaced with simplified models of function behavior
  33. Partial Specifications of Libraries Borealis BMC Static Analyzer • Borealis

    BMC Static Analyzer*: – Based on bounded model checking – Converts a source code to logical predicates – Converts correctness checks to logical predicates – Converts function summaries (contacts, Craig interpolants, etc.) to logical predicates – Builds complex predicate from simple predicates mentioned above – Solves a complex predicate by means of SMT solvers (Z3, Boolector, etc.) – Interprets solution found by SMT solver and maps it onto a source code * Akhin M., Belyaev M., Itsykson V. (2017) Borealis Bounded Model Checker: The Coming of Age Story. In: Mazzara M., Meyer B. (eds) Present and Ulterior Software Engineering. Springer, Cham, pp. 119-137
  34. Partial Specifications of Libraries Borealis BMC Static Analyzer with Library

    Specs LLVM Predicate Extraction PS Converter SMT formula SAT Counterexample UNSAT ОК Source Code SMT Solver LLVM IR Predicate State Assertions Pointer checks Memory checks … Function Summaries Contracts Inference Craig Interpolations Libraries Specifications LibSL
  35. Partial Specifications of Libraries Example of Library Function Description fun

    recv (s: int, buf: *void, len: unsigned, flags: int): long { if ( buf == 0 ) { action ERROR(Recv01, "Second argument is NULL"); } if ( flags < 0 ) { action ERROR(Recv02, "Incorrect flags"); } if ( state(s) != Established) { action ERROR(Recv03, "Receiving from unattached socket"); } set ( buf, [-inf:+inf], len ); return [0:len]; } Bug detection Bug type & error message
  36. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  37. Partial Specifications of Libraries Applications: Cross-Language Integration • The goal

    of the project is to work out the approach for integrating program components, developed in different languages, without a manually written linking code • The project is focused on solution of the following tasks: – to allow reusing existing, well-tested and optimized program components and libraries for older languages from newer languages – to allow adding new features implemented in modern languages to established projects written in older languages – to allow performing above-mentioned tasks without extensive knowledge in both languages – to provide easy adaptation for new languages and libraries
  38. Partial Specifications of Libraries Applications: Cross-Language Integration • Remote Procedure

    Call (RPC): gRPC, Thrift, RabbitMQ RPC, Java RMI, Cap’n’Proto, XML-RPC – Require writing a linking code manually – May require adapting existing program components (for example, to solve serialization problems) – Have performance limitations • Foreign Function Interface (FFI): libffi, JNI, SWIG. – C libraries are the primary aim, other languages are supported, but require additional work – Requires an additional code and efforts to coordinate different memory management models (manual, garbage collection) – May require adapting language runtimes and VMs for operating in a single process
  39. Partial Specifications of Libraries Applications: Cross-Language Integration • The main

    idea is to use partial specifications of libraries to automate cross-language integration: – Automatically generate wrappers for remote API functions based on library specification – Automatically generate receiving part for remote library based on library specification
  40. Partial Specifications of Libraries LibraryLink: Scheme of the Approach Main

    program Library Wrapper Receiver Wrapper Core Receiver Core RPC Code generator First Language Second Language LibSL library description Wrapper & Receiver templates for each language
  41. Partial Specifications of Libraries LibraryLink: Details • Automatically generate linking

    modules (wrappers and receivers) based on the LibSL specifications • Automata are mapped to classes, transitions are mapped to methods and constructors • Callbacks and inheritance are supported • Minimize the need for serialization using a handle mechanism • Semantic information is used to improve performance (caching and prefetching) • Aware of multithreading and memory management models
  42. Partial Specifications of Libraries LibraryLink: Comparison Approach Wrapper/ Receiver Wrapper/

    Receiver core (language support) Serialization required Memory management coordination Caching and prefetching RPC Manually written For each language (N) Yes Manual Manual FFI Manually written For each pair of languages (~N²) No Manual Manual LibraryLink Generated from an interface description For each language (N) No Built-in Inferred from a semantic model
  43. Partial Specifications of Libraries LibraryLink: Current State • Cross-integration implemented

    in the pilot tool • Libraries can be written in C, Python, and Go • Wrappers can be made for Java, Kotlin and Go • The approach tested on popular libraries: Requests (Python), Z3 (C), Jennifer (Go) • Performance is up to 90,000 calls per second (270,000 with prefetch)
  44. Partial Specifications of Libraries LibraryLink: LibSL Example automaton Z3_config {

    state Created, Constructed, Closed; shift Created -> Constructed (Z3_mk_config); shift Constructed -> self (Z3_mk_context, Z3_set_param_value); shift Constructed -> Closed (Z3_del_config); } … fun Z3_config.Z3_mk_config(): Z3_config; fun Z3_config.Z3_mk_context(cfg: self): Z3_context; … fun Z3_context.Z3_mk_and(cfg: self, num_args: Int, args: Z3_ast[]): Z3_ast; … Z3 automaton Z3 API
  45. Partial Specifications of Libraries LibraryLink: Kotlin Code Example cfg =

    Z3Kotlin.Z3_config() ctx = cfg.Z3_mk_context() symbol_x = ctx.Z3_mk_int_symbol(0) symbol_y = ctx.Z3_mk_int_symbol(1) /* De Morgan - with a negation around */ /* !(!(x && y) <-> (!x || !y)) */ not_x = ctx.Z3_mk_not(x) not_y = ctx.Z3_mk_not(y) args[0] = x args[1] = y x_and_y = ctx.Z3_mk_and(2, args)
  46. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  47. Partial Specifications of Libraries Applications: Integration Errors Detection • A

    library specification expresses reference behavior of library and set reference protocol of library usage • Main idea is use the dynamic analysis of program to check correctness of library usage protocol • We will record program traces and check their correspondence to reference behavior. V. Itsykson, M, Gusev. Automation of library usage correctness detection // Proceedings of SEIM conference, St.Petersburg, 2018.
  48. Partial Specifications of Libraries Instrumentation of Program and Trace Collecting

    • Specification-driven instrumentation – Only library-related elements of program are instrumented (function calls, library objects changes, etc.) • Program execution – Instrumented program are executed on tests • Trace collecting – logging any library API functions calls – logging full context: object ID, handles, function parameters, etc.
  49. Partial Specifications of Libraries Library Usage Protocol Violations Detection •

    Mapping the collected program traces to library automata: – Filtering trace for extracting events related to concrete library objects (library object trace) – Playing library object trace (events) on the library model – Checking events correctness • Traces correctness checking: – Checking preconditions for each library function – Checking postconditions for each library function – Checking validity of automaton states change
  50. Partial Specifications of Libraries Trace Library Usage Protocol Violations Detection

    Instrumentation LibSL library description Instrumented Program Source Program Correctness checking Error Report Execution
  51. Partial Specifications of Libraries Library Usage Violations Detection: Evaluation •

    Above-mentioned approach was implemented as a pilot tool for Java programs. • We created LibSL specifications for several libraries used in the apache/incubator-netbeans repository. • Our tool was tested on several artificial projects and 400+ files from the apache/incubator-netbeans repository. • The tool detects 300+ violations of library protocol usage. Most of them are – resource leaks – violations of library functions preconditions
  52. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  53. Partial Specifications of Libraries Applications: Specification Mining • Main question

    of any specification-based approach: where we can get specifications for libraries? • The authors do will not provide specifications for theirs libraries! (at least in the near future) • The programmer cannot create specifications because of lack of knowledge about libraries • The reasonable way is to use the knowledge of the programming community!
  54. Partial Specifications of Libraries Applications: Specification Mining • There are

    billions of open source projects available in software repositories (Github, Bitbucket, etc.) • Many of them use target library • Our approach is to extract library specification by means of learning millions of projects which use target library:
  55. Partial Specifications of Libraries Applications: Specification Mining • Finding all

    projects from GitHub, which written in Java and use target library • Loading appropriate projects • If the authors of project don’t provide any tests then generating tests for it • Running native or generated tests • Collecting project traces • Analyzing traces and generating predicates • Converting collected predicates into specification skeleton • Manual enriching of specification
  56. Partial Specifications of Libraries Presentation Outline • Introduction • Problems

    with External Libraries • Formalism for Describing Partial Specifications of Libraries • Library Specification Language (LibSL) • Applications – Porting of Software – Enhancements of Static Analysis – Cross-language Integration – Integration Errors Detection – Specification Mining • Conclusion
  57. Partial Specifications of Libraries Applications: What Else? In which areas

    can we also use formal library specifications? • Static analysis for the library usage violations detection • Use library semantics to form behavioral predicate • Extend BMC-based static analysis tools to detect library usage violations based on reference library behavior • Static analysis for specifications mining • Automated creation of the libraries documentation • …
  58. Partial Specifications of Libraries Conclusion • We presented new formalism

    for specifications of structure & behavior of external libraries • We presented LibSL, new DSL for creating specification in user friendly manner • We applied formalism & language to solve several software engineering tasks – Automated specification driven application porting from one library to another – Static Analysis enhancements for multicomponent projects – Cross-language integration: using library written in one language from application written in another language – Detection of violation of library protocol usage – Mining of library specifications from open repositories • Future research directions: – Extend static analysis tool for the library usage violations detection – Use methods of static analysis for specifications mining – Automated creation of the libraries documentation
  59. Partial Specifications of Libraries Contacts Vladimir Itsykson [email protected] Director of

    High School of Intellectual Systems & Supercomputer Technologies Peter the Great St.Petersburg Polytechnic University, Russia JetBrains Research laboratory “Verification & Program Analysis”