Metric Recovery from Unweighted k-NN Graphs

Slide 1

Slide 1 text

1 KYOTO UNIVERSITY KYOTO UNIVERSITY Metric Recovery from Unweighted k-NN Graphs Ryoma Sato

Slide 2

Slide 2 text

2 / 45 KYOTO UNIVERSITY I introduce my favorite topic and its applications  Metric recovery from unweighted k-NN graphs is my recent favorite technique. I like this technique because The scope of applications is broad, and The results are simple but non-trivial.  I first introduce this problem.  I then introduce my recent projects that used this technique. - Towards Principled User-side Recommender Systems (CIKM 2022) - Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure (ICML 2023)

Slide 3

Slide 3 text

3 / 45 KYOTO UNIVERSITY Metric Recovery from Unweighted k-NN Graphs Morteza Alamgir, Ulrike von Luxburg. Shortest path distance in random k-nearest neighbor graphs. ICML 2012. Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola. Metric recovery from directed unweighted graphs. AISTATS 2015.

Slide 4

Slide 4 text

4 / 45 KYOTO UNIVERSITY k-NN graph is generated from a point cloud  We generate a k-NN graph from a point cloud.  Then, we discard the coordinates of nodes. generate edges discard coordinates nodes have coordinates for visualization but they are random

Slide 5

Slide 5 text

5 / 45 KYOTO UNIVERSITY Metric recovery asks to estimate the coodinates  The original coordinates are hidden now.  Metric recovery from unweighted k-NN graphs is a problem of estimating the coordinates from the k-NN graph. estimate

Slide 6

Slide 6 text

6 / 45 KYOTO UNIVERSITY Only the existences of edges are observable  Unweighted means the edge lengths are neither available.  This is equivalent to the setting where only the 01-adjacency matrix of the k-NN graph is available. estimate

Slide 7

Slide 7 text

7 / 45 KYOTO UNIVERSITY Given 01-adjacency, estimate the coordinates  Problem (Metric Recovery from Unweighted k-NN Graphs) In: The 01-adjacency matrix of a k-NN graph Out: The latent coordinates of the nodes  Very simple. estimate

Slide 8

Slide 8 text

8 / 45 KYOTO UNIVERSITY Why Is This Problem Challenging?

Slide 9

Slide 9 text

9 / 45 KYOTO UNIVERSITY Standard node embedding methods fail  The type of this problem is node embedding. I.e., In: graph, Out: node embeddings.  However, the following example tells standard embeddings techniques fail.

Slide 10

Slide 10 text

10 / 45 KYOTO UNIVERSITY Distance is opposite in the graph and latent space  The shortest-path distance between nodes A and B is 21. The shortest-path distance between nodes A and C is 18.  Standard node embedding methods would embed node C closer to A than node B to A, which is not consistent with the ground truth latent coordinates. 10-NN graph The coordinates are supposed to be hidden, but I show them for illustration.

Slide 11

Slide 11 text

11 / 45 KYOTO UNIVERSITY Critical assumption does not hold  Embedding nodes that are close in the input graph close is the critical assumption in various embedding methods.  This assumption does NOT hold in our situation. 10-NN graph The coordinates are supposed to be hidden, but I show them for illustration.

Slide 12

Slide 12 text

12 / 45 KYOTO UNIVERSITY Solution

Slide 13

Slide 13 text

13 / 45 KYOTO UNIVERSITY Edge lengths are important  Why the previous example fails?  If the edge lengths were took into consideration, the shortest path distance would be a consistent estimator of the latent distance.  Step 1: Estimate the latent edge lengths. 10-NN graph The coordinates are supposed to be hidden, but I show them for illustration.

Slide 14

Slide 14 text

14 / 45 KYOTO UNIVERSITY Densities are important  Observation: Edges are longer in sparse regions and shorter in dense regions.  Step 2: Estimate the densities.  But how? We do not know the coordinates of the points... 10-NN graph The coordinates are supposed to be hidden, but I show them for illustration.

Slide 15

Slide 15 text

15 / 45 KYOTO UNIVERSITY Density can be estimated from PageRank  Solution: A PageRank-like estimator solves it. The stationary distribution of random walks (plus a simple transformation) is a consistent estimator of the density.  The higher the rank is, the denser there is.  This can be computed solely from the unweighted graph. 10-NN graph Stationary distribution of simple random walks ≈ PageRank

Slide 16

Slide 16 text

16 / 45 KYOTO UNIVERSITY Given 01-adjacency, estimate the coordinates  Problem definition (again) In: The 01-adjacency matrix of a k-NN graph Out: The latent coordinates of the nodes  Very simple. estimate

Slide 17

Slide 17 text

17 / 45 KYOTO UNIVERSITY Procedure to estimate the coordinates 1. Compute the stationary distribution of random walks. 2. Estimate the density around each node. 3. Estimate the edge lengths using the estimated densities. 4. Compute the shortest path distances using the estimated edge lengths and compute the distance matrix. 5. Estimate the coordinates from the distance matrix by, e.g., multidimentional scaling.  This is a consistent estimator [Hashimoto+ AISTATS 2015]. Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola. Metric recovery from directed unweighted graphs. AISTATS 2015. (up to rigid transform)

Slide 18

Slide 18 text

18 / 45 KYOTO UNIVERSITY We can recover the coordinates consistently The latent coordinates can be consistently estimated solely from the unweighted k-NN graph. Take Home Message

Slide 19

Slide 19 text

19 / 45 KYOTO UNIVERSITY Towards Principled User-side Recommender Systems (CIKM 2022) Ryoma Sato. Towards Principled User-side Recommender Systems. CIKM 2022.

Slide 20

Slide 20 text

20 / 45 KYOTO UNIVERSITY Let’s consider item-to-item recommendations  We consider item-to-item recommendations.  Ex: “Products related to this item” panel in Amazon.com.

Slide 21

Slide 21 text

21 / 45 KYOTO UNIVERSITY User-side recsys realizes user’s desiderata  Problem: We are unsatisfactory with the official recommender system.  It provides monotone recommendations. We need serendipity.  It provides recommendations biased towards specific companies or countries.  User-side recommender systems [Sato 2022] enable users to build their own recommender systems that satisfy their desiderata even when the official one does not support them. Ryoma Sato. Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? SDM 2022.

Slide 22

Slide 22 text

22 / 45 KYOTO UNIVERSITY We need powerful and principled user-side Recsys  [Sato 2022]’s user-side recommender system is realized in an ad-hoc manner, and the performance is not so high.  We need a way to build user-side recommender systems in a systematic manner and a more powerful one. Hopefully one that is as strong as the official one. Ryoma Sato. Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? SDM 2022.

Slide 23

Slide 23 text

23 / 45 KYOTO UNIVERSITY Official (traditional) recommender systems Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item Step 1. training Step 2. inference recommendations Official (traditional) recsys

Slide 24

Slide 24 text

24 / 45 KYOTO UNIVERSITY Users cannot see the data, algorithm, and model Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations These parts are not observable for users (industrial secrets)

Slide 25

Slide 25 text

25 / 45 KYOTO UNIVERSITY How can we build our Recsys without them? Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations But they are crucial information to build new Recsys...

Slide 26

Slide 26 text

26 / 45 KYOTO UNIVERSITY We assume the model is embedding-based Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations (Slight) Assumption: The model embeds items and recommends near items. This is a common strategy in Recsys. We do not assume the way it embeds. It can be matrix factorization, neural networks, etc.

Slide 27

Slide 27 text

27 / 45 KYOTO UNIVERSITY We can observe k-NN graph of the embeddings Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations Observation: These outputs have sufficient information to construct the unweighted k-NN graph. I.e., users can build the k-NN graph by accessing each item page, and observing what the neighboring items are.

Slide 28

Slide 28 text

28 / 45 KYOTO UNIVERSITY We can estimate the embeddings! Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations Solution: Estimate the item embeddings of the official Recsys. They are considered to be secret, but we can estimate them from the weighted k-NN graph! They contain much information!

Slide 29

Slide 29 text

29 / 45 KYOTO UNIVERSITY We realize our desiderata with the embeddings  We can do many things with the estimated embeddings.  We can compute recommendations by ourselves and with our own postprocessings.  If you want more serendipity, recommend 1st, 2nd, 4th, 8th, ... and 32nd nearest items or add noise to the embeddings.  If you want to decrease the bias to specific companies, add negative biases to the score of these items so as to suppress these companies.

Slide 30

Slide 30 text

30 / 45 KYOTO UNIVERSITY Experiments validated the theory  In the experiments I conducted simulations and showed that the hidden item embeddings can be estimated accurately. I built a fair Recsys for Twitter, which runs in the real-world, on the user’s side. Even though the official Recsys is not fair w.r.t. gender, mine is, and it is more efficient than the existing one.

Slide 31

Slide 31 text

31 / 45 KYOTO UNIVERSITY Users can recover the item embeddings Users can “reverse engineer” the official item embeddings solely from the observable information. Take Home Message

Slide 32

Slide 32 text

32 / 45 KYOTO UNIVERSITY Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure (ICML 2023) Ryoma Sato. Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure. ICML 2023.

Slide 33

Slide 33 text

33 / 45 KYOTO UNIVERSITY We call for the theory for GNNs  Graph Neural Networks (GNNs) take a graph with node features as input and output node embeddings.  GNNs is a popular choice in various graph-related tasks.  GNNs are so popular that understanding GNNs by theory is an important topic in its own right. e.g., What is the hypothesis space of GNNs? (GNNs do not have a universal approximation power.) Why GNNs work well in so many tasks?

Slide 34

Slide 34 text

34 / 45 KYOTO UNIVERSITY GNNs apply filters to node features  GNNs apply filters to the input node features and extract useful features.  The input node features have long been considered to be the key to success. If the features have no useful signals, GNNs will not work.

Slide 35

Slide 35 text

35 / 45 KYOTO UNIVERSITY Good node features are not always available  However, informative node features are not always available.  E.g., social network user information may be hidden for privacy reasons.

Slide 36

Slide 36 text

36 / 45 KYOTO UNIVERSITY Uninformative features degrade the performance  If we have no features at hand, we usually input uninformative node features such as the degree features.  No matter how such features are filtered, only uninformative embeddings are obtained. “garbage in, garbage out.” This is common sense.

Slide 37

Slide 37 text

37 / 45 KYOTO UNIVERSITY Can GNNs work with uninformative node features?  Research question I want to answer in this project: Do GNNs really not work when the input node features are uninformative?  In practice, GNNs sometimes work just with degree features. The reason is a mystery, which I want to elucidate.

Slide 38

Slide 38 text

38 / 45 KYOTO UNIVERSITY We assume latent node features behind the graph  (Slight) Assumption: The graph structure is formed by connecting nodes whose latent node features z* v are close to each other.  The latent node features z* v are not an observable e.g., "true user preference vector" Latent features that contain users’ preferences, workplace, residence, etc. Those who have similar preferences and residence have connections. We can only observe the way they are connected, not the coordinates.

Slide 39

Slide 39 text

39 / 45 KYOTO UNIVERSITY GNNs can recover the lantent feature  Main results: GNNs can recover the latent node features z* v even when the input node features are uninformative.  z* v contains the preferences of users, which is useful for tasks.

Slide 40

Slide 40 text

40 / 45 KYOTO UNIVERSITY GNNs create useful node features themselves  GNNs can create completely new and useful node features by absorbing information from the graph structure, even when the input node features are uninformative.  A new perspective that overturns the existing view of filtering input node features.

Slide 41

Slide 41 text

41 / 45 KYOTO UNIVERSITY GNNs can recover the coordinates with some tricks  How to prove it? → Metric recovery from k-NN graphs as you may expect.  But be careful when you apply it. What GNNs can do (the hypothesis space of GNNs) is limited.  The metric recovery algorithm is compatible with GNNs. Stationary distribution → GNNs can do random walks. Shortest path → GNNs can simulate Bellman-Ford. MDS → This is a bit tricky part. We send the matrix to some nodes and solve it locally.  GNNs can recover the metric with slight additional errors.

Slide 42

Slide 42 text

42 / 45 KYOTO UNIVERSITY Recovered features are empicirally useful  In the experiments, We empirically confirmed this phenomenon. The recovered features are useful for various downstream tasks, even when the input features x syn are uninformative.

Slide 43

Slide 43 text

43 / 45 KYOTO UNIVERSITY GNNs can create useful features by themselves GNNs can create useful node features by absorbing information from the underlying graph. Take Home Message

Slide 44

Slide 44 text

44 / 45 KYOTO UNIVERSITY Conclusion

Slide 45

Slide 45 text

45 / 45 KYOTO UNIVERSITY I introduced my favorite topic and its applications  Metric recovery from unweighted k-NN graphs is my recent favorite technique. I like this technique because The scope of applications is broad, and The results are simple but non-trivial. The latent coordinates can be consistently estimated solely from the unweighted k-NN graph. Take Home Message