Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The troubles of modern dependency management and what to do about them

The troubles of modern dependency management and what to do about them

Georgios Gousios

March 21, 2019
Tweet

More Decks by Georgios Gousios

Other Decks in Programming

Transcript

  1. The troubles of modern
    dependency management
    and what to do about them
    Georgios Gousios
    TU Delft

    View Slide

  2. Software reuse
    • System reuse

    • Application reuse

    • Component reuse

    • Libraries that come with the operating system

    • COTS components

    • OSS packages

    • Object/Function reuse

    View Slide

  3. Dependency management
    Library
    Dependencies

    View Slide

  4. Transitive dependencies
    <>
    csv-parser lists dependencies
    among others ndjson
    ndjson lists its own dependencies

    View Slide

  5. Package dependency networks
    • Dependencies on version ranges with
    semantic versioning

    • Online package repositories host all (?)
    released package versions

    • Package managers read dependency
    descriptors and download libraries

    • Transitive dependencies are
    downloaded automatically
    Strongly connected component
    of the Rust/Cargo packages (Kikas 2016)

    View Slide

  6. Recent failures: leftpad
    A developer removed a library, consisting of just
    11 lines of code, from NPM, over a naming
    dispute.

    The internet broke in response.
    https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/

    View Slide

  7. Recent failures: Equifax
    • Security breach caused by a security update to
    an Apache Struts dependency that was not
    considered critical

    • 143 M user account details stolen

    • > $4B in damages
    https://www.wired.com/story/equifax-breach-no-excuse/

    View Slide

  8. Recent failures: eventstream
    • Maintainer decides to transfer ownership of
    popular project to major contributor

    • The new maintainer installs Bitcoin stealing
    code in the library

    • The library is being downloaded 2M times a
    week

    • Vulnerability discovered 2 months later
    https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident

    View Slide

  9. Ecosystems grow at breakneck speeds...
    • Avg JavaScript project has 54 (Kikas et al. 2017), or 80 (Zimermann et al. 2019) transitive
    dependencies

    • 50% of transitive dependency closures different in a period of 6 months on
    Cargo/Rust (Hejderup et al. 2019)

    ...and they deteriorate
    • Packages exist in RubyGems whose removal can bring down 500k (40%) other
    package versions (Kikas et al. 2017)

    • 391 highly influential maintainers affect more than 10k packages (Zimermann et al.
    2019).
    What research tells us

    View Slide

  10. Ecosystems grow at breakneck speeds...
    • Avg JavaScript project has 54 (Kikas et al. 2017), or 80 (Zimermann et al. 2019) transitive
    dependencies

    • 50% of transitive dependency closures different in a period of 6 months on
    Cargo/Rust (Hejderup et al. 2019)

    ...and they deteriorate
    • Packages exist in RubyGems whose removal can bring down 500k (40%) other
    package versions (Kikas et al. 2017)

    • 391 highly influential maintainers affect more than 10k packages (Zimermann et al.
    2019).
    What research tells us

    View Slide

  11. Developers don't update (Kula et al. 2017)
    • 85% of the dependencies are outdated in 50% of important Maven packages

    • No updates even in the case of security disclosures (70% were unaware)

    • "Too difficult!", "No tools!"

    Vulnerabilities proliferate
    • 1/4 of library downloads have a vulnerability (Comcast TR)

    • 1/3 of top 133k sites have a vulnerable dependency (Lauinger et al. 2017)
    What research tells us

    View Slide

  12. Developers don't update (Kula et al. 2017)
    • 85% of the dependencies are outdated in 50% of important Maven packages

    • No updates even in the case of security disclosures (70% were unaware)

    • "Too difficult!", "No tools!"

    Vulnerabilities proliferate
    • 1/4 of library downloads have a vulnerability (Comcast TR)

    • 1/3 of top 133k sites have a vulnerable dependency (Lauinger et al. 2017)
    What research tells us

    View Slide

  13. Problems: The Developers’
    perspective
    • The observability problem: How can I know that one of
    my dependencies is outdated?

    • The update problem: How can I check if an updated
    dependency breaks my code?

    • The compliance problem: How do I know that I am not
    violating anyone’s copyrights?

    • The trust problem: How can I trust code I download from
    the Internet with my valuable data?

    View Slide

  14. Problems: The Maintainers’
    perspective
    • The update problem: How can I update my library without
    breaking clients? How can I notify important clients that I am
    about to break them?
    • The deprecation problem: How can I remove features from
    my library?

    • The unlawful use problem: How can I spot instances of my
    code being distributed without permission?

    • The lack of incentive problem: Why should I use my (free!)
    time to maintain a library that large corporations depend
    upon?
    + the problems that developers have!

    View Slide

  15. State of the art practices
    • Resolve dependencies and
    store resolution in repo

    • Protects against breakage due
    to updates on dependencies

    • Also “protects” against fast
    distribution of security
    updates
    https://www.publicdomainpictures.net/en/view-image.php?image=80963
    Dependency version pinning

    View Slide

  16. State of the art practices
    • Lots of services (GitHub,
    snyk.io, …) notify projects
    when new dependency
    versions are available

    • Ripe with false positives

    • No help with updating
    Monitoring services

    View Slide

  17. The sorry state of the state
    of the art
    • Not much beyond simple package version matches (and
    a bit of compliance)

    • No support for assessing updates

    • No support for making decisions on which libraries to use

    • No support for maintainers
    We can do better than that!

    View Slide

  18. Are monorepos enough?
    Issue Dependency manager Monorepo
    Observability Update notification services N/A (always on latest version)
    Updates
    Move responsibility to consumer.
    Version pinning. Client is expected
    to have tests/CI
    Move responsibility to consumer
    (faster). Depend on builds + tests to
    catch semantic updates.
    Compliance No generic solution No generic solution
    Impact Semantic versioning
    Land and forget. Move responsibility
    to client.
    Unlawful /
    Improper use
    Special tools (FOSSology) or
    services (e.g. BlackDuck) but with
    tons of FPs.
    N/A, usually deployed within a
    company
    Monorepos have the same problems, faster!

    View Slide

  19. Getting to the root cause
    State of the art tools analyze package relationships…
    …while reuse happens in the code
    App
    v1.0
    Lib 2
    v0.2
    Lib 1
    v3.2
    foo()
    main()
    bar()
    used()
    unused()
    «depends on»
    «calls»
    intern()
    intern()
    Lib 1
    v3.2
    App
    v1.0
    App
    v1.0
    Lib 2
    v0.2
    Lib 1
    v3.2
    foo()
    main()
    bar()
    used()
    unused()
    «depends on»
    «calls»
    intern()
    intern()
    OPR
    Lib 2
    v0.2
    Lib 1
    v3.2
    App
    v1.0
    Package Dependency
    Network (PDN)
    Call Dependency
    Network (CDN)

    View Slide

  20. Promises of Call-based
    Dependency Networks
    • Fully precise usage analysis

    • Does this vulnerability affect my code?

    • Am I linking to GPL code?

    • Fully precise impact analysis

    • How many clients will I break if I change this?

    • Can I safely update?

    • Effectively, augmenting soundness with more precision

    View Slide

  21. Präzi: A generic technique
    for building CDNs

    View Slide

  22. Präzi in a nutshell
    1.Retrieve all package versions for an ecosystem

    2.Generate call graphs for each package

    3.Build unique ids for nodes (functions)

    4.Link the call graphs
    https://cdn.pixabay.com/photo/2014/12/21/23/28/recipe-575434_960_720.png

    View Slide

  23. Resolving package dependencies
    • Repositories are not very strict in what they accept

    • Need to account for missing packages

    • Need to identify and fix dependency descriptors

    • Need to deal with compilation errors

    • Dependency version ranges make dependency
    graphs time-dependent

    • Resolution at t must only consider versions
    released before t
    • Global call-graphs need to be dynamic
    B
    v1.1
    A
    v3.0
    1.*
    A
    v3.0
    B
    [email protected]
    B
    [email protected]
    1.*
    1.*
    t1
    t2
    time

    View Slide

  24. Building call graphs
    • (Maybe) Need to compile packages: tool chains etc
    are not compatible/available

    • 2 types of nodes:

    • Normal function calls (statically resolved)

    • Linkage points, when function calls cross
    dependencies (dynamically resolved)

    • Nodes can have arbitrary metadata: containing file,
    vulnerabilities, license, etc
    S
    S
    S
    S
    S
    S
    T
    S
    S
    T S
    1
    S
    2

    View Slide

  25. Unique IDs and Unification
    After all functions have been assigned a unique ID, creating a CDN is a matter of

    cat *.callgraph | sort | uniq > ecosystem_callgraph

    View Slide

  26. Resolving call graphs for
    clients applications
    1.Given a timestamp t, resolve the latest version for each
    dependency in the transitive closure released in t1 < t.

    2.Retrieve the call graph for each resolved package

    3.Identify linkage points and link them

    4.Analyze the client application and link to dependency call
    graph

    View Slide

  27. RustPräzi

    View Slide

  28. download packages
    repair Cargo.toml
    build
    LLVM
    validate
    generate call graph
    output bitcode
    demangle symbols
    retrieve & build packages
    _ZN9Lib23bar
    _ZN9Lib23used
    _ZN9Lib23intern
    Lib1::bar
    Lib2::used
    Lib1::intern
    +
    io::crates::Lib1::bar
    io::crates::Lib2::used
    Lib1::intern
    io::crates::Lib2::used
    io::crates::Lib2::intern
    io::crates::Lib1::bar
    io::crates::Lib2::used
    Lib1::intern
    io::crates::Lib2::intern
    Lib1::bar
    Syn
    io::crates::Lib1::bar
    Lib1::bar::{{closure}}
    io::crates::Lib1::bar::{{closure}}
    Lib1::bar::{{closure}}
    L::f::g
    io::crates::L::fio::crates::L2::Type2>::g
    generate call graphs build unique ids link call graphs
    In a nutshell

    View Slide

  29. Building call graphs
    13,991 packages
    79,724 releases
    12,307 packages
    72,947 releases
    10,831 packages
    49,844 releases
    ⏱ 69 hours
    70%
    Top failure reasons
    • Released code has a syntax error (!)

    • Type checking got stricter

    • Language syntax changed

    • Conditional compilation

    Feb 16, 2018

    cleaning invalid manifests
    ⎌ call graphs

    View Slide

  30. Call graph statistics
    7M nodes
    19M edges
    10,831 packages
    49,844 releases
    60M nodes
    176M edges
    8.6x
    reduction
    io::crates:url::2_0::Url::validate()
    ⎌ call graphs not merged CDN final CDN

    View Slide

  31. Is Präzi sound?
    • The Rust PDN is de facto sound, but not precise

    • RustPräzi is precise by construction, but may be not sound

    • Rust PDN vs RustPräzi-extracted PDN’: 18k different edges

    • Sampled and manually analyzed 381 edges (95% conf
    interval)

    View Slide

  32. PDN vs Präzi
    381
    analysed edges
    35%
    improvements
    65%
    open problems
    • Unused dependencies

    • Dependencies only used in
    test code
    • Dynamic dispatch

    • Generic functions

    • Conditional compilation

    • Macros
    Präzi can be as sound as the
    call graph generator used

    View Slide

  33. Vulnerability propagation
    6
    advisories
    13
    functions
    PDN
    8k
    Präzi
    649
    482
    manually investigated

    cases as ground truth

    1rst level dependencies
    Precision
    Recall (soundness)
    Accuracy
    PDN Präzi
    0.3
    1
    0.3
    1
    0.53
    0.87
    Effect estimation
    (packages)

    View Slide

  34. http://fasten-project.eu

    View Slide

  35. FASTEN in a nutshell
    • Präzi for Java, C and Python, incl integration to pkg managers

    • Analyses on top of it:

    • Can I safely update?

    • Security vulnerability propagation

    • Dependency risk profiling

    • Compliance monitoring

    • A centralised service to host the graphs and serve the analyses

    • Getting the tools to the hands of developers

    View Slide

  36. CodeFeedr
    FASTEN server
    A B
    a()
    b() c()
    y()
    x()
    z()
    PyPi
    Package
    Repositories
    Project information R
    E
    S
    T
    A
    P
    I
    Vulnerability
    Information
    Storage layer
    W
    e
    b
    U
    I
    Continuous
    Integration
    Server
    Developer
    Call-graph construction
    Security Compliance
    Change
    impact
    Quality and
    Risk
    Analysis layer
    software
    analytics as
    streams

    View Slide

  37. Example FASTEN workflow
    # Check outdated dependencies
    $ pip list --outdated
    Package Version Latest Type
    ---------- ------- ------ -----
    Pygments 2.2.0 2.3.1 wheel
    Updating Pygments will affect:
    foo.py: function colorize
    bar.py: function parse
    # Check outdated dependencies
    $ pip list --outdated
    Package Version Latest Type
    ---------- ------- ------ -----
    Pygments 2.2.0 2.3.1 wheel
    # Update a package
    $ pip install --upgrade Pygments
    Collecting Pygments
    Downloading ...
    Successfully installed Pygments-2.3.1
    # Done, fingers crossed!
    Updating a dependency
    Before FASTEN After FASTEN
    # Estimate update impact
    $ pip install --dry-run Pygments
    Function Pygments.Formatter.format[formatter.py]
    changed ->
    check application at colorize[foo.py]
    # Developer or CI runs tests
    # Update can continue
    $ pip install --upgrade Pygments
    Collecting Pygments
    Downloading ...
    Successfully installed Pygments-2.3.1
    # Done

    View Slide

  38. http://dep.management

    View Slide

  39. The FASTEN project has received funding from the European Union’s Horizon 2020
    research and innovation programme under grant agreement No 825328.
    The opinions expressed in this document reflects only the author`s view and in no way reflect the European Commission’s opinions. The European
    Commission is not responsible for any use that may be made of the information it contains.

    View Slide