Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Economics of Business Networks: Estimation and Applications at Scale

The Economics of Business Networks: Estimation and Applications at Scale

■イベント 
:Econ Fiesta 2
https://sansan.connpass.com/event/203771/

■登壇概要
タイトル:The Economics of Business Networks: Estimation and Applications at Scale
発表者: 
DSOC R&D研究員  Juan Martínez / Shota Komatsu

▼Twitter
https://twitter.com/SansanRandD

Sansan DSOC

March 05, 2021
Tweet

More Decks by Sansan DSOC

Other Decks in Science

Transcript

  1. Data Strategy and Operation Center Research Team – Sansan DSOC

    Takanori Nishida ⻄⽥ 貴紀 Sansan Inc. DSOC R&D SocSci Group Manager / Researcher Juan Martínez マルティネスフアン Sansan Inc. DSOC R&D SocSci Group Researcher/ PhD in economics Online Cards Online Cards Online Cards Shota Komatsu ⼩松 尚太 Sansan Inc. DSOC R&D SocSci Group Researcher
  2. Data Strategy and Operation Center Research Team – Collaborator Angelo

    Mele Angelo Mele is an Associate Professor of Economics at Johns Hopkins University - Carey Business School. His research analyses how social and strategic interactions affect individual and aggregate socioeconomic outcomes. His work has been published in Econometrica, American Economic Journal: Economic Policy and Review of Economics and Statistics, Regional Science and Urban Economics. He has a PhD in Economics from University of Illinois at Urbana-Champaign.
  3. Data Strategy and Operation Center The big questions WHY DO

    PEOPLE FORM BUSINESS NETWORKS? HOW TO PERFORM NETWORK ECONOMETRICS AT SCALE? HOW CAN WE APPLY MODELS TO SOLVE REAL PROBLEMS?
  4. Data Strategy and Operation Center Answering those questions Estimate a

    structural model of business network formation... - Based on Mele (2017) - Adapted to local dependence models (Schweinberger & Handcock, 2015) ...using Eight business network for 2019... ...employing scalable algorithms. - Model based clustering (Vu et. al., 2013) and pseudo-likelihood estimation Babkin et. al. (2020).
  5. Data Strategy and Operation Center We want to achieve -

    Understanding about: - The cost of connections. - The role of externalities and homophily. - A way to simulate networks: - For measuring causal effects - Evaluating the impact of new services before launching them - To contribute to society
  6. Data Strategy and Operation Center Modeling Social Networks with ERGM

    - ERGM (Exponential Random Graph Model): Model the probability distribution of networks as a function of network statistics (edges, triangles, shared partners, stars, etc.) - Can incorporate third-party effects on link formation like “common friends” - A popular method in network science
  7. Data Strategy and Operation Center Microeconomic foundation for ERGMs -

    Mele (2017): Conditions under which a network formation game can be represented by a potential game and converges to an ERGM, and its use for structural estimation.
  8. Data Strategy and Operation Center Difficulties in estimating ERGMs -

    ERGMs are prone to degeneracy, i.e. tend to put very large probability mass on few networks, which are not representative of what we see in reality. - Impossible to compute the denominator of the likelihood function due to combinatorial explosion of possible network configurations. - Even when N = 10, it would take 40 million years to compute just for one iteration. - Due to the problem above, MCMC-based methods are often applied, but computationally burdensome. - Theoretically, assumes that all nodes know the state of all others when considering forming a new connection.
  9. Data Strategy and Operation Center Modeling Local Dependence with HERGM

    - Hierarchical ERGM (HERGM): Schweinberger & Handcock (2015) - Agents belong to communities: - Connections across communities happen by luck, influenced by homophily. - Connections within communities also consider externalities (friends of friends, stars, etc.)
  10. Data Strategy and Operation Center Estimation by approximate maximum likelihood

    - Step 1: Estimate a block structure using a stochastic block model - Applies minorization-maximization (Vu et al., 2013) - Step 2: Given the estimated block structure, estimate between- and within - block parameters by maximum pseudo-likelihood estimation - Parameters like “homophily” and the effect of common friends We tried the R package hergm (Schweinberger and Luna, 2018), but the most recent implementation cannot handle large networks even on a high-performance machine.
  11. Data Strategy and Operation Center Estimation at scale - Usual

    estimation techniques stop being feasible at the scale of a few hundred nodes. - Our approach: - Scalable model-based variational clustering algorithm in Vu et. al. (2013) - We augmented that model: - Work with hundreds of thousands of nodes and thousands of communities - Fully parallelized - Low memory usage - Allow to include discrete covariates - Pseudo-likelihood estimation based on Babkin et. al. (2020) - We appreciate Michael Schweinberger for his comments on our fixes to the original hergm package.
  12. Data Strategy and Operation Center The data - We employ

    anonymized data from Eight. - Eight is a social networking service for individuals where users “connect” with each other by exchanging business cards. - We perform analysis on Eightʼs data within the scope of the serviceʼs Terms of Use for the purposes of improving it. Analysis is merely statistical. - We formed a network from a sample of Eight user connections during 2019. - Node covariates include: - City JIS Code - Industrial category - Occupational category
  13. Data Strategy and Operation Center Characteristics of the Network 675,074

    Edges 26,595 Triangles 9.5M 2-Stars 240,961 Nodes
  14. Data Strategy and Operation Center Step 1: Clustering Results -

    1,000 clusters are obtained (median size: 212 nodes) - A few large clusters are obtained - The largest cluster is not so large - Otherwise, sizes are very stable - Industries sharing many clusters tend to be “similar”.
  15. Data Strategy and Operation Center Step 2: Parameter Estimation The

    network is VERY sparse. Communities are tighter subnetworks Significant homophily based on observables Externalities (popularity, friends-of-friends) are important determinants of connections within communities.
  16. Data Strategy and Operation Center Simulation (POC) : Work from

    home in South Kanto - Randomly move a 15% of nodes based in Tokyo to other prefectures in South Kanto proportionally. - Simulate many networks of individuals using HERGM estimates. - Aggregate them into cities to create networks of cities. - Calculate the degree centrality of each city, and take the mean across all simulations. - Calculate its change with respect to simulations in absence of the treatment. - Degree increases more in smaller cities. - The increase more than compensates for the loss in access to cities in Tokyo. Counterfactual change in city degree Red for negative, Blue for positive, Darker for larger values.
  17. Data Strategy and Operation Center Simulation (POC) : Moving to

    Kamiyama-cho! 20 Sansan 神⼭ラボ Technical school project at Kamiyama https://kamiyama-marugoto.com/
  18. Data Strategy and Operation Center Simulation (POC): Moving to Kamiyama-cho!

    - Move 5% of the nodes in Shikoku to Kamiyama-cho. - Same simulation method as the last slide. - Thereʼs a drop in degree on cities that “donated” nodes. - However, many cities, including Tokushima-shi see an increase. - We can learn about how agglomeration leads to industrial clustering. Red for negative. Blue for positive. Darker for larger absolute values Percentile of counterfactual change in mean city degree Mean city degree without treatment Kamiyama- cho Kamiyama- cho
  19. Data Strategy and Operation Center What we learned - Business

    networks are: - Large - Sparse - Clustered - HERGM allows us to estimate a structural network formation model in a large network. - Results can be used to simulate counterfactual networks to assess the impact of events, new services, etc.
  20. Data Strategy and Operation Center References - Babkin, S., Stewart,

    J. R., Long, X., & Schweinberger, M. (2020). Large-scale estimation of random graph models with local dependence. Computational statistics & data analysis, 152, 107029. - Schweinberger, M., & Luna, P. (2018). HERGM: Hierarchical exponential-family random graph models. Journal of Statistical Software, 85(1). - Schweinberger, M., and Handcock, M. S. (2015), “Local dependence in random graph models: characterization, properties and statistical inference,” Journal of the Royal Statistical Society, Series B, 77, 647–676. - Vu, D. Q., Hunter, D. R., & Schweinberger, M. (2013). Model-based clustering of large networks. The annals of applied statistics, 7(2), 1010. - Stivala A, Robins G, Lomi A (2020) Exponential random graph model parameter estimation for very large directed networks. PLoS ONE 15(1): e0227804. https://doi.org/10.1371/journal.pone.0227804 - Strauss, D. and Ikeda, M. (1990). Pseudolikelihood estimation for social networks. J. Amer. Statist. Assoc. 85 204– 212.