Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From SBOM to Call Graphs: Harnessing OSS Tools ...

From SBOM to Call Graphs: Harnessing OSS Tools to Streamline Update Impact Analysis in Cloud Services

This slide was presented at Critical Software Summit of Open Source Summit Japan 2023 held on December 6, 2023.
https://sched.co/1Tyrg

Modern cloud services rely heavily on open-source software, making continuous maintenance a challenge. For instance, web applications developed in Node.js can contain over 100 packages, which may account for up to 90% of the entire source code. Frequent vulnerability reports and package EOLs necessitate constant updates. But understanding these updates and their potential cascading effects on an application is complex. Based on academic knowledge and OSS tools like callgraph tools, AST difftool, git, NetworkX, this talk will present a method to pinpoint how updates impact an application. Integrating this into the PSIRT system, we aim to expedite update decisions and define a clearer verification scope. We'll share insights from implementing it in our in-house system.

Noboru Iwamatsu

December 07, 2023
Tweet

More Decks by Noboru Iwamatsu

Other Decks in Technology

Transcript

  1. #OSSummit From SBOM to Call Graphs: Harnessing OSS Tools to

    Streamline Change Impact Analysis in Cloud Services Noboru Iwamatsu @FUJITSU LIMITED
  2. #ossummit Introduction • Noboru Iwamatsu – Senior Director, Head of

    Update as a Service Planning Office @Fujitsu – Formerly, OpenStack-based IaaS architect, Cloud, Hypervisor, OS researcher, … – Has contributed to OpenStack, Ansible, Xen, Linux, … https://www.fujitsu.com/global/about/global-fde/noboru-iwamatsu/ https://www.linkedin.com/in/noboru-iwamatsu/
  3. #ossummit Agenda • Software Supply Chain Attack • Understanding SBOM

    • Update Change Impact Analysis • Experimental Results • Conclusion
  4. #ossummit Software Supply Chain Attack • What is a Software

    Supply Chain Attack? • An increasingly common type of cyberattack. • Injects malicious code into third-party components, compromising the integrity and security of final software products. • Why the Risk is Increasing • Modern software heavily relies on numerous third-party components. • The extensive use of Open Source Software (OSS) means that vulnerabilities or malicious code in OSS can lead to widespread and severe impacts.
  5. #ossummit Understanding SBOM (Software Bill of Materials) • What is

    SBOM (Software Bill of Materials)? • A comprehensive list of all components in a software product. • Key defense against supply chain attacks: Referenced in Executive Order on Improving the Nation's Cybersecurity, The White House, May 12, 2021 • Enhances Transparency: • Clarifies all software parts, origins, and licenses. • Detects Vulnerabilities: • Scans each component for security risks. • Prompt Response: • Fast identification and fix for security problems in components.
  6. #ossummit Open Source SBOM Tools and Solutions • SBOM Generator

    • SBOM tools and solutions are increasingly becoming available, both in Open Source and commercial products. • SBOM Scanner https://github.com/openclarity/kubeclarity https://github.com/CycloneDX/cyclonedx-gomod https://github.com/anchore/syft/ https://github.com/anchore/grype https://github.com/aquasecurity/trivy https://github.com/microsoft/sbom-tool https://github.com/openclarity/vmclarity
  7. #ossummit SBOM Management Example • Integrating SBOM tools into DevOps

    – Essential for clear software component tracking and risk management. • Escalating Remediation Effort – Developers must now address an increasing volume of complex security risks. Plan Develop Test Operate SBOM DB SBOM Generation scan Remediation Alert Vulnerability Information Developer Incident Management Deliver
  8. #ossummit Evolving Our Update Approach • Ad-Hoc and Problematic Update

    Approach Briefly skim release notes. Update automatically by package manager. Run existing test suite. May miss key changes! Blindly trusting package manager is risky! The coverage is less comprehensive than assumed. Evaluate all impacting changes and dependencies in advance. Manage dependencies and security. Ensure sufficient test coverage for changes. Plan Develop Test Alert Unchecked Release! Develop Test Plan Update Change Impact Analysis Alert Release • Our Proposed Approach
  9. #ossummit Update Change Impact Analysis: Basic Idea • Inspired by

    this study and its implementation “Uppdatera”. • Uppdatera (https://github.com/jhejderup/uppdatera) – Change Impact Analysis for Automated Dependency Updates in Maven Projects https://www.sciencedirect.com/science/article/pii/S0164121221001941
  10. #ossummit Update Change Impact Analysis: Objectives • The study calculated

    the test coverage for direct/indirect dependencies for 521 Java Projects, and claims: – Test coverage and its ability to detect defects is insufficient. – Static analysis proves to be approximately twice as effective. • Uppdatera implements: – Analyze semantic code changes and call graphs to identify the impacting changes. Inspired by prior research, we have developed our own “Update Change Impact Analysis” system, leveraging OSS tools to evaluate our in-house Node.js applications.
  11. #ossummit UCIA (Update Change Impact Analysis) Architecture Update Simulation Semantic

    Change Detection Call Graph Construction Reachability Analysis Change History Mapping app source code file diff Pre-update Call Graph Reachability Analysis Post-update Call Graph update Change History Mapping git clone git log current env. updated env. package manager build package source code specify source repo Change Impact List github github AST diff
  12. #ossummit UCIA: Update Simulation Update Simulation Semantic Change Detection Call

    Graph Construction Reachability Analysis Change History Mapping app source code file diff Pre-update Call Graph Reachability Analysis Post-update Call Graph update Change History Mapping git clone git log current env. updated env. package manager build package source code specify source repo Change Impact List github github AST diff • Setup pre- and post-update environments.
  13. #ossummit Build/Install Pre-/Post-Update Environment • Build/Install the pre-update environment from

    application source code. – Launch Node.js in a Docker container. – Obtain the application source code. – Install the exact dependencies described in the package-lock.json. • Perform update % npm ci –omit=dev https://docs.npmjs.com/cli/v10/commands/npm-ci # modify the package.json, then % npm update <pkg> –omit=dev
  14. #ossummit UCIA: Semantic Change Detection Update Simulation Semantic Change Detection

    Call Graph Construction Reachability Analysis Change History Mapping app source code file diff Pre-update Call Graph Reachability Analysis Post-update Call Graph update Change History Mapping git clone git log current env. updated env. package manager build package source code specify source repo Change Impact List github github AST diff • Find only differences in program code between pre- and post-update. • Extract only semantic changes between the program codes.
  15. #ossummit <post-update directory> app ├── node_modules │ ├── aaa │

    │ ├── LICENSE │ │ ├── README.md │ │ ├── index.js │ │ └── package.json │ ├── bbb │ ├── ccc │ ├── ... Find Only Differences in Program Code • Identify the package version differences – Analyze package dependencies and compare the versions of same package. • Find the modified program files – Exclude docs, include only .js/.ts files (unavoidably retain test/sample) – Use diff to check for differences in the source code files. <pre-update directory> app ├── node_modules │ ├── aaa │ │ ├── LICENSE │ │ ├── README.md │ │ ├── index.js │ │ └── package.json │ ├── bbb │ ├── ccc │ ├── ... Version differences File differences “version”: “1.2.0” “version”: “1.2.1”
  16. #ossummit Extract Only Semantic Changes between Codes • Use GumTree

    to extract functions that have semantically changed. • GumTree (https://github.com/GumTreeDiff/gumtree) – A syntax-aware diff tool, based on Abstract Syntax Tree(AST). – Academic origin: Fine-grained and accurate source code differencing * – Supported languages: C, C++, C#, Go, Java, JavaScript, Python, R, Ruby, .. pre-update post-update source code AST expression deleted added updated + - GumTree Locations to Functions AST differences Changed Function List diff parse parse *ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software
  17. #ossummit Example: Diff Output vs. GumTree Output % diff –u

    <express 4.17.1>/lib/response.js <express 4.17.2>/lib/response.js … if (arguments.length === 2) { - // res.json(body, status) backwards compat + // res.jsonp(body, status) backwards compat if (typeof arguments[1] === 'number') { - deprecate('res.jsonp(obj, status): Use res.status(status).json(obj) instead'); + deprecate('res.jsonp(obj, status): Use res.status(status).jsonp(obj) instead'); this.statusCode = arguments[1]; } else { … - * or when an error occurs. Be sure to check `res.sentHeader` + * or when an error occurs. Be sure to check `res.headersSent` … - * 'appliation/json': function(){ + * 'application/json': function () { … % gumtree cluster <express 4.17.1>/lib/response.js <express 4.17.2>/lib/response.js … New cluster: UPDATE from 'res.jsonp(obj, status): Use res.status(status).json(obj) instead' to 'res.jsonp(obj, status): Use res.status(status).jsonp(obj) instead' ------------ === update-node --- Literal: 'res.jsonp(obj, status): Use res.status(status).json(obj) instead' [6600,6666] replace 'res.jsonp(obj, status): Use res.status(status).json(obj) instead' by 'res.jsonp(obj, status): Use res.status(status).jsonp(obj) instead' ignored ignored ignored update detected https://github.com/expressjs/express/compare/4.17.1...4.17.2
  18. #ossummit UCIA: Call Graph Construction Update Simulation Semantic Change Detection

    Call Graph Construction Reachability Analysis Change History Mapping app source code file diff Pre-update Call Graph Post-update Call Graph update Change History Mapping git clone git log current env. updated env. package manager build package source code specify source repo Change Impact List github github AST diff Reachability Analysis • Construct call graphs for both pre- and post-update software
  19. #ossummit Jelly, A Call Graph Constructer for Node.js Platform •

    Jelly (https://github.com/cs-au-dk/jelly) – Static analyzer for constructing call graphs. – For JavaScript/TypeScript programs running on Node.js platform. – Analyzer Design is based on ides from JAM [1], TAPIR [2] and ACG [3]. [1] Benjamin Barslev Nielsen, Martin Toldam Torp, Anders Møller: Modular call graph construction for security scanning of Node.js applications. ISSTA 2021: 29-41 [2] Anders Møller, Benjamin Barslev Nielsen, Martin Toldam Torp: Detecting locations in JavaScript programs affected by breaking library changes. Proc. ACM Program. Lang. 4(OOPSLA): 187:1-187:25 (2020) [3] Asger Feldthaus, Max Schäfer, Manu Sridharan, Julian Dolby, Frank Tip: Efficient construction of approximate call graphs for JavaScript IDE services. ICSE 2013: 752-761
  20. #ossummit Construct Call Graphs by Jelly • To prevent out-of-memory

    errors, create multiple call graphs by splitting the dependent packages to be loaded. – First time: Ignore dependencies to create baseline call graph. – After the second: Load each dependent package. % NODE_OPTIONS=--max-old-space-size=250000 jelly --ignore-dependencies -j <ouput.json> -m <output.html> -i <timeout> --loglevel verbose <application dir> % NODE_OPTIONS=--max-old-space-size=250000 jelly --include-packages <direct dependent package> -j <ouput.json> -m <output.html> -i <timeout> --loglevel verbose <application dir>
  21. #ossummit Call Graph Data: Embedded JSON in HTML Output {

    "graphs": [ { "kind": "callgraph", "elements": [ { "data": { "id": 1, "kind": "package", "name": "aaa", "fullName": "[email protected]", "callWeight": 100, "callCount": 3971, "isEntry": "true", "isReachable": "true" }, }, ... ... { "data": { "kind": "call", "source": 152, "target": 150 }, }, ... } “kind”: “package” “kind”: “module” “kind”: “function” “kind”: “call” Nodes (in graph theory) Edges of directed graph
  22. #ossummit UCIA: Reachability Analysis Update Simulation Semantic Change Detection Call

    Graph Construction Reachability Analysis Change History Mapping app source code file diff Pre-update Call Graph Post-update Call Graph update Change History Mapping git clone git log current env. updated env. package manager build package source code specify source repo Change Impact List github github AST diff Reachability Analysis • Link semantically changed functions to functions on the call graphs. • Analyze the reachability of the changed functions.
  23. #ossummit Reachability Analysis by NetworkX B App A C D

    Pre-update Call Graph Changed Function List B App A C D • Link changed functions to call graphs. • Analyze reachability of the changed functions. Post-update Call Graph App A C D B - + Impacting call-path
  24. #ossummit UCIA: Change History Mapping Update Simulation Semantic Change Detection

    Call Graph Construction Reachability Analysis Change History Mapping app source code file diff Pre-update Call Graph Post-update Call Graph update Change History Mapping git clone git log current env. updated env. package manager build package source code specify source repo Change Impact List github github AST diff Reachability Analysis • Map function changes to git commit ids.
  25. #ossummit Package Version to Source Code Tag • Retrieve the

    source code – Use npm to get repository information, then clone it. • Select the correct tag – “Version number” and “tag name” are not always match! • “1.2.3” vs. “v1.2.3”, “1_2_3”, … • Monorepo case: “4.0.0” -> “@azure/identity_4.0.0” • Use multiple similarity algorisms (N-gram, Gestalt, and Levenshtein distance) to identify the most similar tag. – Some repositories don’t have tags… % npm view <package name> repository.url git+https://github.com/<path>/<package name>.git
  26. #ossummit Package Module to Source File Mapping • Simple cases

    – Just replace the top directory path <app build directory> app ├── node_modules │ ├── aaa │ │ ├── LICENSE │ │ ├── README.md │ │ ├── index.js │ │ └── package.json <git cloned directory> aaa ├── LICENSE ├── README.md ├── index.js └── package.json Replace directory path
  27. #ossummit Package Module to Source File Mapping • Bundled/Transpiled/Minified cases

    – Use source map decoder, if source map files are provided. <app build directory> app ├── node_modules │ ├── aaa │ │ ├── LICENSE │ │ ├── README.md │ │ ├── package.json │ │ ├── dist │ │ │ ├── lib │ │ │ │ ├── utils.js │ │ │ │ ├── utils.js.map <git cloned directory> app ├── node_modules │ ├── aaa │ │ ├── LICENSE │ │ ├── README.md │ │ ├── package.json │ │ ├── src │ │ │ ├── lib │ │ │ │ ├── utils.ts Source Map Decode https://github.com/mozilla/source-map#consuming-a-source-map
  28. #ossummit Source File to Git Log Mapping • Use git

    log with revision range and -L option to identify the commit from function name, or line number range for anonymous function. • Finally, collect the changes information into a Excel file by OpenPyXL(https://pypi.org/project/openpyxl/). % git log <tag>..<tag> -L:<funcname>:<file> % git log <tag>..<tag> -L:<start>,<end>:<file> https://git-scm.com/docs/git-log
  29. #ossummit Use Cases • Use Case 1 – Update 1

    dependent package that has no dependents. • Use Case 2 – Update 1 dependent package that has a lot of dependents. • Use Case 3 – Update most of the outdated packages.
  30. #ossummit Use Case 1: axios, an HTTP client for Node.js

    platform • Axios – Version 1.6.2 (latest) – Dependencies • follow-redirects – 1.15.0 → 1.15.3 (min. req to latest) • form-data – 4.0.0 • proxy-from-env – 1.1.0 https://axios-http.com/ https://github.com/axios/axios
  31. #ossummit [email protected]: [email protected]>1.15.3 • Reachability Analysis result: – 5 semantic

    function changes were detected in follow-redirects. – 5 changes affect 646 call-path (contains duplicated) in axios. # of packages # of modules # of functions (nodes) # of edges app others total Semantic Changes 1 1 0 5 5 Call Graph Pre-Update 8 138 1,987 125 2,112 10,396 Post-Update 8 138 1,987 129 2,116 10,415 Reachable Changes 1 1 646 5 0 Post-Update env. includes 8 packages/30 modules.
  32. #ossummit [email protected]: [email protected]>1.15.3 % git log v1.15.0..v1.15.3 --oneline 192dbe7 (tag:

    v1.15.3) Release version 1.15.3 of the npm package. bd8c81e Fix resource leak on destroy. 9c728c3 Split linting and testing. d388fe2 build: harden ci.yml permissions 9655237 (tag: v1.15.2) Release version 1.15.2 of the npm package. 6e2b86d Default to localhost if no host given. 449e895 Throw invalid URL error on relative URLs. e30137c Use type functions. 76ea31f ternary operator syntax fix 84c00b0 HTTP header lines are separated by CRLF. d28bcbf Create SECURITY.md (#202) 62a551c (tag: v1.15.1) Release version 1.15.1 of the npm package. 7fe0779 Use for ... of. 948c30c Fix redirecting to relative URL when using proxy module Changed Functions state Impacting call path index.js destroyRequest added 25 index.js abortRequest deleted 11 index.js isBuffer added 625 index.js isFunction added 7 index.js isString added 7 • The causes of the impact were consolidated into just 2 commits! Total 646 • Change History Mapping Result:
  33. #ossummit Use Case 2: Node.js Hello World, An Azure Sample

    • nodejs-docs-hello-world – Version: 0.0.1 – Dependencies • body-parser – 1.20.2 • cors – 2.8.5 • express – 4.17.1 -> 4.17.3 (to resolve vulnerable dependent packages) https://learn.microsoft.com/en-us/samples/azure-samples/nodejs- docs-hello-world/nodejs-hello-world/ https://github.com/azure-samples/nodejs-docs-hello-world/
  34. #ossummit About Express.js • One of the most widely used

    Node.js web application framework. – over 20 dependent packages. https://expressjs.com/
  35. #ossummit Node.js Hello World: [email protected]>4.17.3 • Reachability Analysis Result: –

    543 semantic function changes were detected after updating express. – However, no impacting changes were found! # of packages # of modules # of functions (nodes) # of edges app others total Semantic Changes 24 82 0 543 543 Call Graph Pre-Update 51 85 11 751 762 1,529 Post-Update 63 107 11 943 954 1,774 Reachable Changes 0 0 0 0 0 Post-Update env. includes 76 packages/195 modules.
  36. #ossummit Use Case 3: Azure REST SDKs Sample • azure-sdk-rest-blog

    – Version: 1.0.0 – Dependencies • @azure-rest/purview-catalog – 1.0.0-beta.3 -> 1.0.0.beta.5 • @azure/identity – 2.0.4 -> 2.1.0 https://github.com/Azure-Samples/azure-sdk-rest-blog
  37. #ossummit Azure REST SDKs Sample: Update Outdated Packages • Reachability

    Analysis Result: – 4,357 semantic function changes were detected. – 69 changes affect 1 call-path! # of packages # of modules # of functions (nodes) # of edges app others total Semantic Changes 25 378 0 4,357 4,357 Call Graph Pre-Update 23 83 1 745 745 989 Post-Update 21 36 1 579 578 895 Reachable Changes 6 23 1 69 Post-Update env. includes 55 packages/708 modules.
  38. #ossummit Azure REST SDKs Sample: Where did these changes come

    from? • The app simply calls 1 function. • azure-rest/purview-catalog and the directly dependent packages were updated at once. azure-sdk-rest-blog/src/index.ts @azure-rest/purview-catalog @1.0.0-beta.3 -> 1.0.0-beta.5 ├─@azure-rest/core-client @1.0.0-beta.7 -> 1.0.0-beta.9 ├─@azure/core-auth @1.3.2 -> 1.5.0 ├─@azure/core-lro @2.2.4 -> 2.5.4 ├─@azure/[email protected] -> 1.12.2 ├─@azure/logger @1.0.3 -> 1.0.4 └─tslib @2.3.1 -> 2.6.21
  39. #ossummit Azure REST SDKs Sample: Update Outdated Packages • Change

    History Mapping Result – More than half of changes were due to the removal of 1 package. – Other causes could be narrowed down to a few commits! Package Changed module # of functions state Impacting call path @azure-rest/purview- catalog dist/index.js 1 deleted 1 1 added 1 @azure-rest/core-client dist/index.js 12 added 12 @azure/core-rest- pipeline dist/index.js 1 deleted 1 2 added 2 @azure/core-tracing dist/index.js 5 deleted 5 6 added 6 @azure/core-util dist/index.js 6 added 6 @opentelemetry/api (13 files) 35 deleted 35 4 commits 3 commits 1 commit 2 commits 9d78d16527 [core] - Extract OpenTelemetry to a separate package (#19734) The entire package was deleted! 1 commit
  40. #ossummit Key Takeaways • SBOM (Software Bill of Materials) management

    is a crucial defensive measure against software supply chain attacks. – But remember, SBOM alone aren't enough to make software update easy. • Our “UCIA” revealed the following insights: – The package dependencies in Node.js applications are complex, and many of the packages and source codes included are actually not used. – By analyzing the semantic changes in the source code and the call graphs, only the changes that truly have an impact are identified. – Furthermore, by associating these changes with git logs, only the key changes that should be focused on are pinpointed. – UCIA would expedite vulnerability response planning, accelerate development, and clarify the scope of testing and verification. – All these analyses can be achieved using Open Source tools!
  41. #ossummit Future Work • Improvements: – Optimization of Change History

    Mapping • Address challenges with hard-to-identify source codes and commits. – OSS tools enhancement • Bugfixes and community contributions. • In Development: – Language support expansion • Adding Go and Java. – Dynamic analysis integration • Enhance runtime behavior analysis – Migrating UCIA to an app running on Kubernetes • Open Sourcing UCIA: – Releasing UCIA as an OSS to engage and expand the community.
  42. #ossummit Software Update Problems: Academic Consideration • Key Findings –

    Tests cover only 58% of direct dependency calls and 20% of indirect calls. – Test suites detect only 47% of artificial defects in direct dependencies and 35% in indirect ones. – Using static analysis can detect 74% of defects in direct dependencies and 64% in indirect ones, approximately twice as effective as test suites alone. https://www.sciencedirect.com/science/article/pii/S0164121221001941
  43. #ossummit Big Picture Plan Develop Test Deliver Operate CMDB/SBOM SBOM

    Generation scan Change Impact Analysis Incident Management Remediation Alert Vulnerability Information Developer