Slide 1

Slide 1 text

Follow on project Tensor Networks–a brief description Emir Mu˜ noz Fujitsu Ireland Ltd. [email protected] October 2015 Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 1/40

Slide 2

Slide 2 text

Tensor Product Networks Main reference: Smolesnky, Paul: Tensor product variable binding and the representation of symbolic structures in connectionist systems, Artificial Intelligence 46 (1990) pp 159-216. All this material came from http://www.cse.unsw.edu.au/ ~billw/cs9444/tensor-stuff/tensor-intro-04.html Keywords: tensor product network, variable binding problem, rank, one-shot learning, orthonormality, relational memory, teaching and retrieval modes, proportional analogies, lesioning a network, random representations, sparse random representations, fact recognition scores, representable non-facts. Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 2/40

Slide 3

Slide 3 text

Network Topology and Activation Figure: Connection of Units Common model: assume weighted connections wij from input units with activation xj to unit i Output (activation function) for unit i to be: σ( j wijxj) where σ is a ’squashing’ function such as tanh Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 3/40

Slide 4

Slide 4 text

Network Topology Feedforward Nets Figure: Feedforward Nets If the graph consisting of the neurons as nodes and connections as directed edges is a directed acyclic graph Input nodes (no incoming edge); output nodes (no outgoing edge); anything else is called hidden node or unit Edges labelled with ω signify that there are connections with ‘trainable’ weights between neurons in one “layer” and those in the next Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 4/40

Slide 5

Slide 5 text

Network Topology Fully Recurrent Nets Figure: Fully Recurrent Nets Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 5/40

Slide 6

Slide 6 text

Tensor Product Nets Other possibilities for the activation function include linear networks (where σ is the identity function) One particular one is the rank 2 Tensor Product Network (TPN) TPNs come with different number of dimensions (rank) In the case of TPN of rank 2, the topology is that of a matrix Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 6/40

Slide 7

Slide 7 text

Rank 2 TPN Figure: Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 7/40

Slide 8

Slide 8 text

Rank 2 TPN (cont.) The previous network is shown in teaching mode There is also a retrieval mode, where you feed the net with (the representation of) a variable, and it outputs the value of the symbol (the ‘filler’) Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 8/40

Slide 9

Slide 9 text

Teaching Mode In rank 2 TPNs Vectors representing a variable and a filler are presented to the two sides of the network The fact that the variable has that filler is learned by the network The teaching is one-shot, different from other classes of neural network Teaching is accomplished by adjusting the value of the binding unit memory Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 9/40

Slide 10

Slide 10 text

Teaching Mode (cont.) In rank 2 TPNs Specifically, if the i-th component of the filler vector is fi and the j-th component of the variable vector is vj, then fivj is added to bij, the (i, j)-th binding unit memory, ∀ i and j Another way to look at this is considering: binding units as a matrix B and the filler and variable as column vectors f and v Then what we are doing is forming the outer product fv and adding it to B B = B + fv Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 10/40

Slide 11

Slide 11 text

Retrieval Mode In rank 2 TPNs For exact retrieval we must ensure that: the vectors used to represent variables must be orthogonal to each other (i.e. any two of them should have the dot product equal to zero) the same must be true for the vectors used to represent the fillers Each representation vector should also be of length 1 (i.e. the dot product of each vector with itself should be 1) It is common to refer to a set of vectors with these properties (orthogonality and length 1) as an orthonormal set Orthonormality entails that the representation vectors are linearly independent, and in particular, if the matrix/tensor has m rows and n columns, then it can represent at most m fillers and n variables Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 11/40

Slide 12

Slide 12 text

Retrieval from a TP Net In rank 2 TPNs Retrieval is accomplished by computing dot products To retrieve the value/filler for a variable v = (vj) from a rank 2 tensor with binding unit values bij, compute fi = j bijvj, for each i. The resulting vector (fi) represents the filler To decide whether variable v has filler f, compute D = i j bijvjfi. D will be either 1 or 0. If it is 1, then variable v has filler f, otherwise not Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 12/40

Slide 13

Slide 13 text

Learning with Hadamard Representations In rank 2 TPNs Example Suppose we are using representations as follows: (0.5, 0.5, 0.5, 0.5) to represent rabbit (0.5, −0.5, 0.5, −0.5) to represent mouse (0.5, 0.5, −0.5, −0.5) to represent carrot (0.5, −0.5, −0.5, 0.5) to represent cat and we want to build a tensor to represent the pairs (rabbit, carrot) and (cat, mouse) Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 13/40

Slide 14

Slide 14 text

Learning with Hadamard Representations (cont.) In rank 2 TPNs Example carrot ×rabbit= 1 2          1 1 −1 −1          × 1 2 1 1 1 1 = 1 4          1 1 1 1 1 1 1 1 −1 −1 −1 −1 −1 −1 −1 −1          (Applying Kronecker product) Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 14/40

Slide 15

Slide 15 text

Learning with Hadamard Representations (cont.) In rank 2 TPNs We check that we can recover carrot from this by unbinding with rabbit We must compute fi = j bijvj where bij is the matrix, and (vj) is the rabbit vector Example f1 = b11v1 + b12v2 + b13v3 + b14v4 = 1 2 × 1 4 × (1 × 1 + 1 × 1 + 1 × 1 + 1 × 1) = 0.5 and similarly, f2 = 0.5, f3 = −0.5, and f4 = −0.5, so that f represents carrot Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 15/40

Slide 16

Slide 16 text

Learning with Hadamard Representations (cont.) In rank 2 TPNs For (cat, mouse), we compute Example mouse ×cat= 1 2          1 −1 1 −1          × 1 2 1 −1 −1 1 = 1 4          1 −1 −1 1 −1 1 1 −1 1 −1 −1 1 −1 1 1 −1          (Applying Kronecker product) Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 16/40

Slide 17

Slide 17 text

Learning with Hadamard Representations (cont.) In rank 2 TPNs The tensor representing both of these is the sum of the two matrices: Example 1 4              1 1 1 1 1 1 1 1 −1 −1 −1 −1 −1 −1 −1 −1              + 1 4              1 −1 −1 1 −1 1 1 −1 1 −1 −1 1 −1 1 1 −1              = 1 4              2 0 0 2 0 2 2 0 0 −2 −2 0 −2 0 0 −2              Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 17/40

Slide 18

Slide 18 text

Learning with Hadamard Representations (cont.) In rank 2 TPNs We check that we can still recover carrot from this by unbinding with rabbit We must compute fi = j bijvj where bij is the (new) matrix, and (vj) is the rabbit vector Example f1 = b11v1 + b12v2 + b13v3 + b14v4 = 1 2 × 1 4 × (2 × 1 + 0 × 1 + 0 × 1 + 2 × 1) = 0.5 and similarly, f2 = 0.5, f3 = −0.5, and f4 = −0.5, so that f represents carrot as before Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 18/40

Slide 19

Slide 19 text

TP Nets as Relational Memories So far we have been using TPN to store a particular kind of relational information: variable binding In variable binding, each variable has a unique filler (at any given time) This restriction on the kind of information stored in the tensor is unnecessary A rank 2 tensor will store an arbitrary binary relation Animal Food rabbit carrot mouse cheese crocodile student rabbit lettuce guinea pig lettuce crocodile lecturer Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 19/40

Slide 20

Slide 20 text

TP Nets as Relational Memories (cont.) This information can be represented, and stored in the tensor in the usual way putting the ‘animal’ in the side we have been calling ‘variable’, and the ‘food’ in the side we have been calling ‘filler’ And the retrieval is the same Example We can present the vector representing rabbit to the variable/animal side of the tensor. What we get out of the filler/food side of the tensor will be the sum of the vectors representing the foods that the tensor has been taught that rabbit eats: in this case carrot + lettuce Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 20/40

Slide 21

Slide 21 text

TP Nets as Relational Memories (cont.) Checking a particular fact, like that (mouse, cheese) is in the relation, is done just as before, we compute D = i j bijvjfi where v is for varmint and f for food, and if D = 1 then the varmint eats the food Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 21/40

Slide 22

Slide 22 text

Rank 3 Tensors High Ranks TPN We could better call these nets ‘matrix nets’ The tensor aspect of things comes in when we generalise to enable us to store ternary (or higher rank) relations Suppose ternary relations like kiss(frank, betty) hit(max, frank). Now we need a tensor net with three sides: say a REL side, an ARG1 side and an ARG2 side, or more generally a u side, a v side and a w side Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 22/40

Slide 23

Slide 23 text

Gross Structure of a Rank 3 TP Net High Ranks TPN This shows binding units, some activated (shaded), and neurons in the u, v, and w vectors, but not interconnections Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 23/40

Slide 24

Slide 24 text

Gross Structure of a Rank 3 TP Net (cont.) High Ranks TPN The net is ready for retrieval from the u side, given v and w There are 27 binding units, 3 × 3 × 3 In general, if the u, v, and w sides use vectors with q components, there are q3 binding units Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 24/40

Slide 25

Slide 25 text

Connections of a Rank 3 TP Net The binding units are labelled tijk – t for tensor. Each component of a side is connected to a hyperplane of binding units. e.g. v1 is connected to ti1k for all i and k Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 25/40

Slide 26

Slide 26 text

Retrieval in a Rank 3 Tensor If we have concepts (or their representations) for any two sides of the tensor, then we can retrieve something from the third side Example If we have u = (ui) and v = (vj), then we can compute wk = ij tijkuivj, for each value of k, and the result will be the sum of the vectors representing concepts w such that u(v,w) is stored in the tensor This time the activation function for wk is not linear but multi-linear As usual, we can check facts, too D = ijk tijkuivjwk is 1 exactly when u(v,w) is stored in the tensor, and zero otherwise Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 26/40

Slide 27

Slide 27 text

Teaching in a Rank 3 Tensor To teach the network the fact u(v,w), present u, v and w to the net In teaching mode, this causes the content of each binding unit memory tijk to be altered by adding uivjwk to it Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 27/40

Slide 28

Slide 28 text

Higher Rank Tensor Product Networks For a rank r tensor product network: the binding units would have r subscripts: ti1i2...ir ; there would be r sides; there would be r input/output vectors, say u1 ,u2 ,. . . ,ur ; to teach the tensor the fact u1 (u2 , . . . , ur ), add u1i1 × u2i2 × . . . × urir to each binding unit ti1i2...ir ; to retrieve, say, the r-th component given the first r − 1, you would compute urir = i1,i2,...,ir−1 ti1i2...ir u1i1 u2i2 . . . urir−1 Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 28/40

Slide 29

Slide 29 text

Higher Rank Tensor Product Networks (cont.) This rapidly becomes impractical, as the size of the network (number of binding units) grows as nr it is desirable to have n fairly large in practice, since n is the largest number of concepts that can be represented (per side of the tensor) For example, with a rank 6 tensor, with 64 concepts per side, we would need 646 = 236 ∼ 64 billion binding units Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 29/40

Slide 30

Slide 30 text

Gross Topology of a Rank 4 TPN High Ranks TPN This one has 3 components for each of the 4 ‘directions’, so has a total of 34 = 81 binding units Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 30/40

Slide 31

Slide 31 text

Applications of TPN Theory building for connectionist models Construction of theories of cognition Detailed diagrams of tensor product networks are complicated. Here a rank 3 tensor: Here v and w are inputs and u is output, but we could make any side the output Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 31/40

Slide 32

Slide 32 text

Solving Proportional Analogy Problems using TPN The aim is to simulate simple human analogical reasoning The TPN is used to store facts relevant to the analogical reasoning problem Proportional analogy problems are sometimes used in psychological testing They are fairly easy for a human over a certain age, but it is not particularly clear how to solve them on a machine A typical example is: dog : kennel :: rabbit : what? The aim is to find the best replacement for the what?. Here the answer is burrow Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 32/40

Slide 33

Slide 33 text

Solving Proportional Analogy Problems using TPN (cont.) The human mind exercise is: “The dog lives-in the kennel – what does the rabbit live in? – a burrow” The human names a relationship between dog and kennel, and then proceeds from there However, the human does not pick just any relation between dog and kennel (like smaller-than(dog, kennel)): they pick the most salient relation How? And how could we do this with a machine? The TPN approach actually finesses this question Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 33/40

Slide 34

Slide 34 text

A set of “facts” Example woman loves baby woman mother-of baby woman bigger-than baby woman feeds baby mare feeds foal mare mother-of foal mare bigger-than foal mare bigger-than rabbit woman bigger-than rabbit woman bigger-than foal woman lives-in house baby lives-in house mare lives-in barn foal lives-in barn rabbit lives-in burrow barn bigger-than woman barn bigger-than baby barn bigger-than mare barn bigger-than foal barn bigger-than rabbit (Did someone think about RDF? ;-)) Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 34/40

Slide 35

Slide 35 text

A set of “facts” (cont.) Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 35/40

Slide 36

Slide 36 text

Steps in the Simple Analogical Reasoning Algorithm Present WOMAN and BABY to the arg1 and arg2 sides of the net; From the rel(ation) side of the network we get a “predicate bundle” The sum of the vectors representing predicates or relation symbols P such that the net has been taught that P(WOMAN, BABY) holds; Present this predicate bundle to the rel side of the same network and present MARE to the arg1 side of the net; Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 36/40

Slide 37

Slide 37 text

Steps in the Simple Analogical Reasoning Algorithm (cont.) From the arg2 side of the net we get a “weighted argument bundle” The sum of the vectors representing second arguments y such that the net has been taught that P(MARE, y) holds for some P in the predicate bundle The weight associated with each y is the number of predicates P in the predicate bundle for which P(MARE, y) holds For the given set of facts, the arg2 bundle is 3×FOAL + 1×RABBIT Pick the concept (arg2 item) which has the largest weight - FOAL Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 37/40

Slide 38

Slide 38 text

Omnidirectional Access In solving the analogy problem, the TPN was accessed in two different ways: 1 ARG1 and ARG2 in, REL out 2 ARG1 and REL in, ARG2 out This would not have been possible with a backprop net - the input/output structure is “hard-wired” in backprop nets In the TPN, the same information in the tensor supports both these modes of operation Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 38/40

Slide 39

Slide 39 text

Omnidirectional Access (cont.) This is like when kids learn addition/subtraction - you learn that 9 + 7 = 16, and from this you also know that 16 − 7 = 9. We learn addition tables, but not subtraction tables An obvious third access mode: ARG2 and REL in, ARG1 out, is possible And of course, you can have and ARG1, ARG2, and REL in, YES/NO out access mode Less obviously, you can have access modes like: REL in, ARG1 ⊗ ARG2 out In fact there are a total of 7 access modes to a rank 3 tensor There are 2k − 1 access modes for a rank k tensor. This property is referred to as omnidirectional access Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 39/40

Slide 40

Slide 40 text

thanks! Emir Mu˜ noz [email protected] Network Topology Tensor Product Nets Relational Memories High Ranks TPN Applications 40/40