Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The troubles of modern dependency management an...

The troubles of modern dependency management and what to do about them

Georgios Gousios

March 21, 2019
Tweet

More Decks by Georgios Gousios

Other Decks in Programming

Transcript

  1. Software reuse • System reuse • Application reuse • Component

    reuse • Libraries that come with the operating system • COTS components • OSS packages • Object/Function reuse
  2. Package dependency networks • Dependencies on version ranges with semantic

    versioning • Online package repositories host all (?) released package versions • Package managers read dependency descriptors and download libraries • Transitive dependencies are downloaded automatically Strongly connected component of the Rust/Cargo packages (Kikas 2016)
  3. Recent failures: leftpad A developer removed a library, consisting of

    just 11 lines of code, from NPM, over a naming dispute. The internet broke in response. https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/
  4. Recent failures: Equifax • Security breach caused by a security

    update to an Apache Struts dependency that was not considered critical • 143 M user account details stolen • > $4B in damages https://www.wired.com/story/equifax-breach-no-excuse/
  5. Recent failures: eventstream • Maintainer decides to transfer ownership of

    popular project to major contributor • The new maintainer installs Bitcoin stealing code in the library • The library is being downloaded 2M times a week • Vulnerability discovered 2 months later https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident
  6. Ecosystems grow at breakneck speeds... • Avg JavaScript project has

    54 (Kikas et al. 2017), or 80 (Zimermann et al. 2019) transitive dependencies • 50% of transitive dependency closures different in a period of 6 months on Cargo/Rust (Hejderup et al. 2019) ...and they deteriorate • Packages exist in RubyGems whose removal can bring down 500k (40%) other package versions (Kikas et al. 2017) • 391 highly influential maintainers affect more than 10k packages (Zimermann et al. 2019). What research tells us
  7. Ecosystems grow at breakneck speeds... • Avg JavaScript project has

    54 (Kikas et al. 2017), or 80 (Zimermann et al. 2019) transitive dependencies • 50% of transitive dependency closures different in a period of 6 months on Cargo/Rust (Hejderup et al. 2019) ...and they deteriorate • Packages exist in RubyGems whose removal can bring down 500k (40%) other package versions (Kikas et al. 2017) • 391 highly influential maintainers affect more than 10k packages (Zimermann et al. 2019). What research tells us
  8. Developers don't update (Kula et al. 2017) • 85% of

    the dependencies are outdated in 50% of important Maven packages • No updates even in the case of security disclosures (70% were unaware) • "Too difficult!", "No tools!" Vulnerabilities proliferate • 1/4 of library downloads have a vulnerability (Comcast TR) • 1/3 of top 133k sites have a vulnerable dependency (Lauinger et al. 2017) What research tells us
  9. Developers don't update (Kula et al. 2017) • 85% of

    the dependencies are outdated in 50% of important Maven packages • No updates even in the case of security disclosures (70% were unaware) • "Too difficult!", "No tools!" Vulnerabilities proliferate • 1/4 of library downloads have a vulnerability (Comcast TR) • 1/3 of top 133k sites have a vulnerable dependency (Lauinger et al. 2017) What research tells us
  10. Problems: The Developers’ perspective • The observability problem: How can

    I know that one of my dependencies is outdated? • The update problem: How can I check if an updated dependency breaks my code? • The compliance problem: How do I know that I am not violating anyone’s copyrights? • The trust problem: How can I trust code I download from the Internet with my valuable data?
  11. Problems: The Maintainers’ perspective • The update problem: How can

    I update my library without breaking clients? How can I notify important clients that I am about to break them? • The deprecation problem: How can I remove features from my library? • The unlawful use problem: How can I spot instances of my code being distributed without permission? • The lack of incentive problem: Why should I use my (free!) time to maintain a library that large corporations depend upon? + the problems that developers have!
  12. State of the art practices • Resolve dependencies and store

    resolution in repo • Protects against breakage due to updates on dependencies • Also “protects” against fast distribution of security updates https://www.publicdomainpictures.net/en/view-image.php?image=80963 Dependency version pinning
  13. State of the art practices • Lots of services (GitHub,

    snyk.io, …) notify projects when new dependency versions are available • Ripe with false positives • No help with updating Monitoring services
  14. The sorry state of the state of the art •

    Not much beyond simple package version matches (and a bit of compliance) • No support for assessing updates • No support for making decisions on which libraries to use • No support for maintainers We can do better than that!
  15. Are monorepos enough? Issue Dependency manager Monorepo Observability Update notification

    services N/A (always on latest version) Updates Move responsibility to consumer. Version pinning. Client is expected to have tests/CI Move responsibility to consumer (faster). Depend on builds + tests to catch semantic updates. Compliance No generic solution No generic solution Impact Semantic versioning Land and forget. Move responsibility to client. Unlawful / Improper use Special tools (FOSSology) or services (e.g. BlackDuck) but with tons of FPs. N/A, usually deployed within a company Monorepos have the same problems, faster!
  16. Getting to the root cause State of the art tools

    analyze package relationships… …while reuse happens in the code App v1.0 Lib 2 v0.2 Lib 1 v3.2 foo() main() bar() used() unused() «depends on» «calls» intern() intern() Lib 1 v3.2 App v1.0 App v1.0 Lib 2 v0.2 Lib 1 v3.2 foo() main() bar() used() unused() «depends on» «calls» intern() intern() OPR Lib 2 v0.2 Lib 1 v3.2 App v1.0 Package Dependency Network (PDN) Call Dependency Network (CDN)
  17. Promises of Call-based Dependency Networks • Fully precise usage analysis

    • Does this vulnerability affect my code? • Am I linking to GPL code? • Fully precise impact analysis • How many clients will I break if I change this? • Can I safely update? • Effectively, augmenting soundness with more precision
  18. Präzi in a nutshell 1.Retrieve all package versions for an

    ecosystem 2.Generate call graphs for each package 3.Build unique ids for nodes (functions) 4.Link the call graphs https://cdn.pixabay.com/photo/2014/12/21/23/28/recipe-575434_960_720.png
  19. Resolving package dependencies • Repositories are not very strict in

    what they accept • Need to account for missing packages • Need to identify and fix dependency descriptors • Need to deal with compilation errors • Dependency version ranges make dependency graphs time-dependent • Resolution at t must only consider versions released before t • Global call-graphs need to be dynamic B v1.1 A v3.0 1.* A v3.0 B v1.1@t1 B v1.2@t2 1.* 1.* t1 t2 time
  20. Building call graphs • (Maybe) Need to compile packages: tool

    chains etc are not compatible/available • 2 types of nodes: • Normal function calls (statically resolved) • Linkage points, when function calls cross dependencies (dynamically resolved) • Nodes can have arbitrary metadata: containing file, vulnerabilities, license, etc S S S S S S T S S T S 1 S 2
  21. Unique IDs and Unification After all functions have been assigned

    a unique ID, creating a CDN is a matter of cat *.callgraph | sort | uniq > ecosystem_callgraph
  22. Resolving call graphs for clients applications 1.Given a timestamp t,

    resolve the latest version for each dependency in the transitive closure released in t1 < t. 2.Retrieve the call graph for each resolved package 3.Identify linkage points and link them 4.Analyze the client application and link to dependency call graph
  23. download packages repair Cargo.toml build LLVM validate generate call graph

    output bitcode demangle symbols retrieve & build packages _ZN9Lib23bar _ZN9Lib23used _ZN9Lib23intern Lib1::bar Lib2::used Lib1::intern + io::crates::Lib1::bar io::crates::Lib2::used Lib1::intern io::crates::Lib2::used io::crates::Lib2::intern io::crates::Lib1::bar io::crates::Lib2::used Lib1::intern io::crates::Lib2::intern Lib1::bar Syn io::crates::Lib1::bar Lib1::bar::{{closure}} io::crates::Lib1::bar::{{closure}} Lib1::bar::{{closure}} L::f<L1::T1, L2::Type2>::g io::crates::L::f<io::crates::L1::T1, io::crates::L2::Type2>::g generate call graphs build unique ids link call graphs In a nutshell
  24. Building call graphs 13,991 packages 79,724 releases 12,307 packages 72,947

    releases 10,831 packages 49,844 releases ⏱ 69 hours 70% Top failure reasons • Released code has a syntax error (!) • Type checking got stricter • Language syntax changed • Conditional compilation Feb 16, 2018 cleaning invalid manifests ⎌ call graphs
  25. Call graph statistics 7M nodes 19M edges 10,831 packages 49,844

    releases 60M nodes 176M edges 8.6x reduction io::crates:url::2_0::Url::validate() ⎌ call graphs not merged CDN final CDN
  26. Is Präzi sound? • The Rust PDN is de facto

    sound, but not precise • RustPräzi is precise by construction, but may be not sound • Rust PDN vs RustPräzi-extracted PDN’: 18k different edges • Sampled and manually analyzed 381 edges (95% conf interval)
  27. PDN vs Präzi 381 analysed edges 35% improvements 65% open

    problems • Unused dependencies • Dependencies only used in test code • Dynamic dispatch • Generic functions • Conditional compilation • Macros Präzi can be as sound as the call graph generator used
  28. Vulnerability propagation 6 advisories 13 functions PDN 8k Präzi 649

    482 manually investigated cases as ground truth 1rst level dependencies Precision Recall (soundness) Accuracy PDN Präzi 0.3 1 0.3 1 0.53 0.87 Effect estimation (packages)
  29. FASTEN in a nutshell • Präzi for Java, C and

    Python, incl integration to pkg managers • Analyses on top of it: • Can I safely update? • Security vulnerability propagation • Dependency risk profiling • Compliance monitoring • A centralised service to host the graphs and serve the analyses • Getting the tools to the hands of developers
  30. CodeFeedr FASTEN server A B a() b() c() y() x()

    z() PyPi Package Repositories Project information R E S T A P I Vulnerability Information Storage layer W e b U I Continuous Integration Server Developer Call-graph construction Security Compliance Change impact Quality and Risk Analysis layer software analytics as streams
  31. Example FASTEN workflow # Check outdated dependencies $ pip list

    --outdated Package Version Latest Type ---------- ------- ------ ----- Pygments 2.2.0 2.3.1 wheel Updating Pygments will affect: foo.py: function colorize bar.py: function parse # Check outdated dependencies $ pip list --outdated Package Version Latest Type ---------- ------- ------ ----- Pygments 2.2.0 2.3.1 wheel # Update a package $ pip install --upgrade Pygments Collecting Pygments Downloading ... Successfully installed Pygments-2.3.1 # Done, fingers crossed! Updating a dependency Before FASTEN After FASTEN # Estimate update impact $ pip install --dry-run Pygments Function Pygments.Formatter.format[formatter.py] changed -> check application at colorize[foo.py] # Developer or CI runs tests # Update can continue $ pip install --upgrade Pygments Collecting Pygments Downloading ... Successfully installed Pygments-2.3.1 # Done
  32. The FASTEN project has received funding from the European Union’s

    Horizon 2020 research and innovation programme under grant agreement No 825328. The opinions expressed in this document reflects only the author`s view and in no way reflect the European Commission’s opinions. The European Commission is not responsible for any use that may be made of the information it contains.