Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ICSE22: An Exploratory Study of Deep Learning Supply Chain

Xin Tan
April 10, 2022

ICSE22: An Exploratory Study of Deep Learning Supply Chain

This is the slide of the paper: An Exploratory Study of Deep Learning Supply Chain. This paper is accepted by ICSE 2022.

Xin Tan

April 10, 2022
Tweet

Other Decks in Research

Transcript

  1. An Exploratory Study of Deep Learning Supply Chain Xin Tan1,

    Kai Gao2, Minghui Zhou2, and Li Zhang1 1Beihang University 2Peking University
  2. Deep Learning (DL) is so Popular and In Demand Nowadays.

    An Exploratory Study of Deep Learning Supply Chain 2 Why DL? Great power of DL #Searches for DL from 2014 to 2021 Source: Google Trend, Search term “deep learning”
  3. What is DL Supply Chains (SCs)? How important it is?

    An Exploratory Study of Deep Learning Supply Chain 3 software dependencies: import relationships … … … … downstream projects Characteristics Ø multi-layer Ø a deep learning framework as the core Ø substantial downstream projects as the periphery Ø gradually evolving Importance Ø dynamically reflect the overall picture of the ecosystem formed from the DL framework Ø compare the differences between different DL frameworks Ø provide reference for emerging frameworks
  4. DL Supply Chains Remain “Black Box” An Exploratory Study of

    Deep Learning Supply Chain 4 the large number of projects involved in the supply chains the large number of intricate dependencies involved in the supply chains a lack of effective modeling methods
  5. Research Questions An Exploratory Study of Deep Learning Supply Chain

    5 RQ1: (Structure) What is the structure of the DL SCs, how does it evolve, and how to identify the vulnerabilities in the structure? RQ2: (Domain Distribution) What domains do the DL SCs cover, and are the two SCs different? RQ3: (Evolutionary Factors) What factors are related to the number of downstream projects in the DL SCs? Identify key and risk projects in DL SCs Reflecting software status in DL SCs, guiding developer selection Push software to supply chain efficiently Questions Aims
  6. Primary Issue: How to Construct DL SCs? An Exploratory Study

    of Deep Learning Supply Chain 6 Input: the name of the project Output: all direct and indirectly dependent projects (i.e., downstream projects) of this project and their relationships To identify which projects in a layer of the SC have released packages and its information dependent software packages defined in Synthesizer.py World of Code (WoC) collects almost all the Git projects on the Internet and provides the API that can efficiently retrieval technical dependencies. Libraries.io:Libraries.io dataset tracks over 2.7m unique open source packages. Project name
  7. SC Construction An Exploratory Study of Deep Learning Supply Chain

    7 Study Cases Approach to Construct DL SCs
  8. RQ1: Structure: Basic Information An Exploratory Study of Deep Learning

    Supply Chain 8 1. Substantial projects in the DL SCs. 2. Each SC has five or six layers. 3. Low proportion of packages.(TensorFlow: 0.29%, PyTorch: 0.51%) 4. More than 30% packages have no downstream projects. Basic Information of the DL SCs
  9. RQ1: Structure: Basic Information An Exploratory Study of Deep Learning

    Supply Chain 9 Project Dependencies in the DL SC More than 90% of projects are in the 2nd layer, i.e., direct dependences of TensorFlow or PyTorch.
  10. RQ1: Structure: Basic Information An Exploratory Study of Deep Learning

    Supply Chain 10 Project Dependencies in the DL SC Except for the first layer, the number of projects in each layer decreases with the increase of the number of layers, which indicates that with the increase of DL SC layer, it seems more difficult to attract downstream projects.
  11. RQ1: Structure: Basic Information An Exploratory Study of Deep Learning

    Supply Chain 11 Project Dependencies in the DL SC For most of layers, projects seem to be more likely to only import packages published by the project immediately above them. This is generally because the package has integrated the functions of the package it directly depends on.
  12. RQ1: Structure: Evolution An Exploratory Study of Deep Learning Supply

    Chain 12 1. Both TensorFlow and PyTorch SCs show growth trend overall. 2. The growth rates of both two SCs reach peak in 2019.05. 3. The growth trend of packages is similar to the growth trend of projects.
  13. RQ1: Structure: Evolution An Exploratory Study of Deep Learning Supply

    Chain 13 1. The cumulative changes of downstream projects in TensorFlow SC and PyTorch SC are similar. 2. Many of the initial downstream projects come from early adopters. After this initial rising of popularity, the growth of half of the projects’ downstream projects tends to stabilize.
  14. RQ1: Structure: Vulnerabilities An Exploratory Study of Deep Learning Supply

    Chain 14 Most Impactive Projects The most critical projects have cascading impact on their downstream projects. We consider both direct dependences and transitive dependences when calculating in-degree.
  15. RQ1: Structure: Vulnerabilities An Exploratory Study of Deep Learning Supply

    Chain 15 Most Vulnerable Projects The more packages a project depends on, the more vulnerable it may be. Therefore, we calculate the out-degree distribution of the projects considering their transitive dependency.
  16. RQ1: Structure: Vulnerabilities An Exploratory Study of Deep Learning Supply

    Chain 16 Projects have the Most Chance of being a Single Point of Failure By identifying the nodes that have high betweenness centrality but low degree centrality, we can find the projects that are more likely to be a single point of failure once they are removed.
  17. RQ2: Domain Distribution: Packages An Exploratory Study of Deep Learning

    Supply Chain 17 Method We analyze the packages with no less than ten downstream projects. We apply thematic analysis on their descriptions. Packages Distribution in TensorFlow SC Packages Distribution in PyTorch SC The non-domain related packages (NDR) account for more than half, including framework, model, wrapper, and tutorial, and provide a rich variety of support tools
  18. RQ2: Domain Distribution: Packages An Exploratory Study of Deep Learning

    Supply Chain 18 Method We analyze the packages with no less than ten downstream. We apply thematic analysis on their descriptions. Packages Distribution in TensorFlow SC Packages Distribution in PyTorch SC Domain-related packages (DR) cover a wide range of fields, such as hot areas including CV, NLP, and RL, as well as interdisciplinary fields, e.g., biology.
  19. RQ2: Domain Distribution: Packages An Exploratory Study of Deep Learning

    Supply Chain 19 Some Differences Packages Distribution in TensorFlow SC Packages Distribution in PyTorch SC PyTorch SC seems to contain more framework packages in the specific domains, while TensorFlow SC seems to contain more general supporting tools.
  20. RQ2: Domain Distribution: Packages An Exploratory Study of Deep Learning

    Supply Chain 20 Some Differences Packages Distribution in TensorFlow SC Packages Distribution in PyTorch SC PyTorch SC seems to have a higher proportion in general Framework, while TensorFlow SC seems to pay more attention to general wrappers.
  21. RQ2: Domain Distribution: Projects An Exploratory Study of Deep Learning

    Supply Chain 21 We apply LDA to analyze their readme files. 1. For two SCs, their application categories contain rich types, both including CV, NLP, and RL. 2. TensorFlow SC involves the areas such as Self-driving and Robot Control, which are closely related to the industry.
  22. RQ2: Domain Distribution: Projects An Exploratory Study of Deep Learning

    Supply Chain 22 Evolution of the Project Types Distribution in TensorFlow SC Evolution of the Project Types Distribution in PyTorch SC Some Observations • The distribution of project types fluctuates over time, among which CV contributes a large proportion – about 30% ~ 40%. • The proportion of Research in the PyTorch SC is much higher than that of the TensorFlow SC.
  23. RQ3: Evolutionary Factors An Exploratory Study of Deep Learning Supply

    Chain 23 • whether the number of downstream projects is related to some factors • Projects: Choose the projects with more than 10 downstream projects • Response variable : #downstream projects • Predictor variables: • Project characteristics: #commits; #authors; age; #stars; • Package Domain: package_domain (DR/NDR); sub_domain • Supply chain characteristics: TensorFlow/PyTorch SC, No.layer, #dependencies • Generalized Additive Models (GAM)
  24. RQ3: Evolutionary Factors An Exploratory Study of Deep Learning Supply

    Chain 24 1. The influences of #authors on #downstream_projects is nonlinear. 2. #dependencies is negatively related with #downstream_projects. 3. Domain-related packages tends to attract more downstream projects.
  25. Implications Static Analysis for Restoring Python Script Dependencies: How Far

    Are We? 25 DL community Maintainers DL Practitioners Researchers • Make the structure visible can help to formulate targeted development strategies and ensure the safety of the SCs. • The differences we reveal can help maintainers better understand what they need to optimize to remain competitive. • Serve as examples for new frameworks to lead their own ecosystems. • choose the suitable DL framework more easily. • can provide guidance for package developers in the DL SCs. • Build on our findings and ask new questions in order to have a deeper understanding of software SCs and perform relevant research.
  26. Summary An Exploratory Study of Deep Learning Supply Chain 26

    DL SCs S tru ctu re ü Basic Structure ü Evolution Characteristics ü Vulnerabilities We try to open the "black box" of DL SCs. DL SCs D o main D is trib u tio n E vo lu tio n ary F a c to rs ü Packages Domain ü Projects Domain ü Consider following factors: 1. Project characteristics 2. Package domain 3. SC characteristics