Slide 1

Slide 1 text

Gabriel Peyré É C O L E N O R M A L E S U P É R I E U R E Transformers are Universal in Context Learners Takashi Furuya Maarten de Hoop Valérie Castin Pierre Ablin

Slide 2

Slide 2 text

˜ xi := ∑ j e⟨Qxi ,Kxj ⟩ ∑ ℓ e⟨Qxi ,Kxℓ ⟩ Vxj Transformers and attention mechanism … + AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69O10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/Ajzs2dE= x1 AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69P10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/AnoD2dI= x2 AAA9h3ictVv9chu3EYfTr9j9cto/O9O5VnYnybiqpHrSzGQ0E1uSJcWKJZuU7CS0Nfw40WcfeTSPlGQzfI7+2z5Jn6Nv0P7VV+juAjjgSNwtoLrCSMKB+O0uFsB+AMfOKE3yydraP6998IMf/ujHP/nw+o2f/uznv/jlzY9+dZJn03E3Pu5maTZ+1mnncZoM4+NJMknjZ6Nx3B500vhp5/UWfv70PB7nSTZsTt6O4ueDdn+YnCXd9gSant9qzS5Pk9b8dJbMb53eXFlbXaOfaLmyriorQv0cZR/9dkO0RE9koiumYiBiMRQTqKeiLXIo34l1sSZG0PZczKBtDLWEPo/FXNwA7BR6xdCjDa2v4W8fnr5TrUN4Rpo5obvAJYXfMSAjcRswGfQbQx25RfT5lChjaxXtGdFE2d7C/46iNYDWiXgJrRxO9/TF4Vgm4kx8TmNIYEwjasHRdRWVKWkFJY+sUU2AwgjasN6Dz8dQ7xJS6zkiTE5jR9226fN/UU9sxeeu6jsV/yYpb0OJREONPisotMU50Y9oNqfwmZQnBc59oBCrMWLtgnQ9oNEPof8M2h9BmVNN66QDZUat81rkFhQXcotF7kJxIXdZ5AEUF/KARR5BcSGPFBKxY9K5G9+A4sI3WM6PobiQj1nkEygu5BMWeQLFhTxhkd9CcSG/ZZEPoLiQD1jkQygu5EMW2YTiQjZZ5DEUF/KYRe5AcSF3FLJ6p46hZEQnYXblPaiXeaClSKHlHivffbKOLux9jz3drcDyu3ob/rux2x46jSuwOx7r7qwCy6+8XbCRbixvi/bIm7iweyx2H1aAG7vPYr8SryqwX3nstNcVWH6vHUA/N5a3vl/Dkxv7NYt9BDU3lvdRh9Dixh56eIxRBfaIxT4WbyqwPlZ/XIHl7X4D7Ioby/upJvR3Y32s6bQCy9vTE4hg3FjeWz2FVjf2KYt9Ji4rsM9Y7Ddg3d3Ybzw87LsKrPaxN8iD9CkeiWHH1lFrF7sSayOg1mb4p4VvSSk27kA7h+kXmD5hBixit0DseiIOCsSBt1x5YUdzind5Lo0C0fBEdArfhLUJ279X9Mda6oHYLhDbC4i6iBTnWo/lnKIL3cIhJ4XnwprPmLLCfmMtVuuh3vJqxGEJIdf2S1r5dyhbwgwKNVVH7WXh4yUyouc6xAVlb3qUmgePmxRWwUZdsqiOA9VhUW8dqLcsaupATVnUuQN1zqLMzrdxLY8VYPSPczGjJ7kCZIxcXSKICu6B19mDPRrB+jmCKPAJtRzC/wbl3lypkwyzefSTeMrxvGSJx1CbiRVoN1nhNuXXKe2wGCSTPQ9Vjo9PeLYxU3tOWuF54cmj4sTEn05C8vQLOhgtRrSfwug8pJY5RXeyFobfK/a9roXhd0jjc4riZS0MP1HST64ge1Nhm1fANmA3jZT2TT2Uhjx/kTR0/QZ5XbS4OKsDtWaQ3mUg/X01M/tXmJctqkn9mHoYjdwaX14aXwgNo+fc0nMYFYyeZNSra1HwSIYq7zX1UBky8qJDJYd5Cp0Z7NNTM6PrYTSOIOLaopx7ZtVDV++oGI2ph9E4EfLcc06RvK6H0ejTs9SHqYfRwNOWtsrzTT3UsqMGZO5s6qFWfUinwHgGJNe8bDFR0ZjipKmillB8UH9aY8f8y34Mz2xeFDlCPSUT21bT6RS+rF4iHS/EYNUmgXJgfDG1YrAyjZnYYPMrKcOk5N+X6Rgfj5o/AC1GsPvlHQB3Zp6ChPpMAq13ChTX2ayrPDKN22BxuErOFlAt1Tpho0XDV54aldtOqZXLy8xojR5bZK9zWnsjigkPSLOcHg4qZ7iKIqehg5KGeHohunun9mtZ+2ssbrSAGBUrrUs3QvImrT5PdWm9Yen4trrlmUCRdz5m/eJp85myNpjzZGSLUJY6nnY/fY5kt6FfvSPMGbf8LKIZRXt1TlYjoRupnM1C9WmxjMZn9GxoH9OdHPKQNLowj5GiMhLy1gxP0fE8PSKLattbjjfqS5/QyXpOVlfb43p030L3HejwHGcLPMYjqDUhZziGp6ZHlnOj0FVGGh+LPxa3oxnNYH1Gn5YspKYh7U1cspB1WfbLEpULQONqkFm6P41FOhrfWqLEZ/0ueUzuWrb8t+nmVt9vt2mNV6/m6pOYHnHdIK4R7Rp5qyufFjlICWbOTzYofq0fJfIL4Yg2lOP6wuIs9TKkG/+YMtgRRcYp7TZud5R72+dTi59oTkdC353jbXZGFjIi+xeBf8poTUb0a787oG/QpUVIyUb62J2kiG5csU7CrjETxyVCvtVg1ltMtmxK/DVde3fltBZlxiD9wHxhbWudHFAsGBPXsbLuZm/Xex9Emvck7FUiKZq18jHx/4T+6l+9TlaWVgRqGGcgV7bONR8Z5SyoozZ5+XobpPvaUt4qZHihpDb+z8h0qyTZNmVcKA966x5w7tKz5IWrZExy50t9pB+tO81FyqMFPeJozyiLl3a/rzwwyn2HvOQK7bkWrZI+rIJJkUXovtwp8iLfel5l6n608/8LdaPrstaQYiTMCa7UEHe+H1O2ZkuZwqqW6/c17Sa31scLver5DGktDqy9/D20/g7+arn1sx+dTskq3Kc1ICmYJ6MR2RIt9fDjdb/ES69MTcs8G35mTepedstV8mtp3UyOfR5M5YhWzaU6tdD1q9B4ZdF45anDJt01Gi3qdm2JTtncoqluK335hXBrBlCespT5iEyjEg8p7VzKj2qPpcrn+Br1jqW1xtJqw261bwPsPe+DdO/1xd39feHdI/GAYpsuRWAyf+nRLk0o5tKt9ZmapICc7yr7au/+FrUg9w5ZUKQs3+PEHSNvnbpU5oWkf1CeLSM7byyCfm/pQvXRNrZF9T8vIQe0J3Lalxpxl3rESn5bjmjBIq1aMUdEJ/9tiqlk3FGfM9u9zZxEpXjC5JtyVxleMlMYkv65k7f9pex138pfI8oJpyq67gCt8BlGChKjTxLckWVOM4ReTt4kyIi2Q/Zz2U7JW7yhJdEqST0Tmx42Rma9Zq3ba0uPWI/tU+iJWjez7urB80u9OXL8rnKj1yavNlAx6mzh+Wq02srLlZ/r9DBd4Gv0MaU+dmZhsrwypiW+8OYiJQrjIjE+XMJGESJ/mOQhMsvbKV/KuremXD5pkDbmJeVL3HugiHBFdx87o7lPmHF0luh1CGtTky0cJTyNy9T5gG1p8VTq+pIfkq3Xa71RanmiKk+hqdvewthvaSFjsn6p4M5sZG9b9lYpS+FPYSSFrpBv9FblhzbNL6Dg30i4skPN0efssAHx7T2xJXbew9sQb1RdnmhG1IK2oLeQe7fVOMs96nX0xqJu0/fh4M8jAV1z0ifkSUNll5R5yW3q/vQvyAqMRcxKb3qGj8Hmwo9kmVPIeBKybPxoEqG/ixM6Fs3BZyRlLv585L0GN4ozob/TFDYGTZ0fQZlDCA/9HoPfnJve4bxsTvX6Wubiy0N6AX3jonF481edq5h+PhZqbM3I++eA1uGshrr2Fv/rODQfwymcly+3nL5r9spj1mW/WJ3IYjwcvmcMN5/VXM3Rn2dWjM5ES25+Mu6LgmYqs0bz/uljPGrWgOY1E/IclJdO4u1VZOT1pYL3Ai4ZMvEf8Y9r/LcR3hQ0quQIoaTvKaqp6R48Nf2NS9fo9Gc+Mhk6VTKVqZk8okFvxG6JffEAfreKCDD07VD5XUr5H7Hu78/2oPWMrIc+RZcnBy1qi+n0w9yi9ehZnTGe3lxZX/wW8nLlZGN1/bPVu483Vr68r76h/KH4jfg95CXr4i/iS7EH4z0mTf1V/E38ffP65p82P9v8XHb94JrC/FqUfjbv/Rd5M9kH {xi }i Points cloud Positional encoding Token encoding Tokenize Le lycée Marcelin Berthelot étant situé sur le parcours touristique de « la boucle de la Marne », est connu de tous ceux qui ont visité les environs de Paris. « Ah, c’est cet immense bâtiment moderne » dit-on. Le lycée Marcelin Berthelot étant situé sur le parcours touristique de « la boucle de la Marne », est connu de tous ceux qui ont visité les environs de Paris. « Ah, c’est cet immense bâtiment moderne » dit-on. xi xj (Unmasked) Attention layer … next token probabilities Attention Norm MLP Classif N × …

Slide 3

Slide 3 text

˜ xi := ∑ j e⟨Qxi ,Kxj ⟩ ∑ ℓ e⟨Qxi ,Kxℓ ⟩ Vxj Transformers and attention mechanism … + AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69O10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/Ajzs2dE= x1 AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69P10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/AnoD2dI= x2 AAA9h3ictVv9chu3EYfTr9j9cto/O9O5VnYnybiqpHrSzGQ0E1uSJcWKJZuU7CS0Nfw40WcfeTSPlGQzfI7+2z5Jn6Nv0P7VV+juAjjgSNwtoLrCSMKB+O0uFsB+AMfOKE3yydraP6998IMf/ujHP/nw+o2f/uznv/jlzY9+dZJn03E3Pu5maTZ+1mnncZoM4+NJMknjZ6Nx3B500vhp5/UWfv70PB7nSTZsTt6O4ueDdn+YnCXd9gSant9qzS5Pk9b8dJbMb53eXFlbXaOfaLmyriorQv0cZR/9dkO0RE9koiumYiBiMRQTqKeiLXIo34l1sSZG0PZczKBtDLWEPo/FXNwA7BR6xdCjDa2v4W8fnr5TrUN4Rpo5obvAJYXfMSAjcRswGfQbQx25RfT5lChjaxXtGdFE2d7C/46iNYDWiXgJrRxO9/TF4Vgm4kx8TmNIYEwjasHRdRWVKWkFJY+sUU2AwgjasN6Dz8dQ7xJS6zkiTE5jR9226fN/UU9sxeeu6jsV/yYpb0OJREONPisotMU50Y9oNqfwmZQnBc59oBCrMWLtgnQ9oNEPof8M2h9BmVNN66QDZUat81rkFhQXcotF7kJxIXdZ5AEUF/KARR5BcSGPFBKxY9K5G9+A4sI3WM6PobiQj1nkEygu5BMWeQLFhTxhkd9CcSG/ZZEPoLiQD1jkQygu5EMW2YTiQjZZ5DEUF/KYRe5AcSF3FLJ6p46hZEQnYXblPaiXeaClSKHlHivffbKOLux9jz3drcDyu3ob/rux2x46jSuwOx7r7qwCy6+8XbCRbixvi/bIm7iweyx2H1aAG7vPYr8SryqwX3nstNcVWH6vHUA/N5a3vl/Dkxv7NYt9BDU3lvdRh9Dixh56eIxRBfaIxT4WbyqwPlZ/XIHl7X4D7Ioby/upJvR3Y32s6bQCy9vTE4hg3FjeWz2FVjf2KYt9Ji4rsM9Y7Ddg3d3Ybzw87LsKrPaxN8iD9CkeiWHH1lFrF7sSayOg1mb4p4VvSSk27kA7h+kXmD5hBixit0DseiIOCsSBt1x5YUdzind5Lo0C0fBEdArfhLUJ279X9Mda6oHYLhDbC4i6iBTnWo/lnKIL3cIhJ4XnwprPmLLCfmMtVuuh3vJqxGEJIdf2S1r5dyhbwgwKNVVH7WXh4yUyouc6xAVlb3qUmgePmxRWwUZdsqiOA9VhUW8dqLcsaupATVnUuQN1zqLMzrdxLY8VYPSPczGjJ7kCZIxcXSKICu6B19mDPRrB+jmCKPAJtRzC/wbl3lypkwyzefSTeMrxvGSJx1CbiRVoN1nhNuXXKe2wGCSTPQ9Vjo9PeLYxU3tOWuF54cmj4sTEn05C8vQLOhgtRrSfwug8pJY5RXeyFobfK/a9roXhd0jjc4riZS0MP1HST64ge1Nhm1fANmA3jZT2TT2Uhjx/kTR0/QZ5XbS4OKsDtWaQ3mUg/X01M/tXmJctqkn9mHoYjdwaX14aXwgNo+fc0nMYFYyeZNSra1HwSIYq7zX1UBky8qJDJYd5Cp0Z7NNTM6PrYTSOIOLaopx7ZtVDV++oGI2ph9E4EfLcc06RvK6H0ejTs9SHqYfRwNOWtsrzTT3UsqMGZO5s6qFWfUinwHgGJNe8bDFR0ZjipKmillB8UH9aY8f8y34Mz2xeFDlCPSUT21bT6RS+rF4iHS/EYNUmgXJgfDG1YrAyjZnYYPMrKcOk5N+X6Rgfj5o/AC1GsPvlHQB3Zp6ChPpMAq13ChTX2ayrPDKN22BxuErOFlAt1Tpho0XDV54aldtOqZXLy8xojR5bZK9zWnsjigkPSLOcHg4qZ7iKIqehg5KGeHohunun9mtZ+2ssbrSAGBUrrUs3QvImrT5PdWm9Yen4trrlmUCRdz5m/eJp85myNpjzZGSLUJY6nnY/fY5kt6FfvSPMGbf8LKIZRXt1TlYjoRupnM1C9WmxjMZn9GxoH9OdHPKQNLowj5GiMhLy1gxP0fE8PSKLattbjjfqS5/QyXpOVlfb43p030L3HejwHGcLPMYjqDUhZziGp6ZHlnOj0FVGGh+LPxa3oxnNYH1Gn5YspKYh7U1cspB1WfbLEpULQONqkFm6P41FOhrfWqLEZ/0ueUzuWrb8t+nmVt9vt2mNV6/m6pOYHnHdIK4R7Rp5qyufFjlICWbOTzYofq0fJfIL4Yg2lOP6wuIs9TKkG/+YMtgRRcYp7TZud5R72+dTi59oTkdC353jbXZGFjIi+xeBf8poTUb0a787oG/QpUVIyUb62J2kiG5csU7CrjETxyVCvtVg1ltMtmxK/DVde3fltBZlxiD9wHxhbWudHFAsGBPXsbLuZm/Xex9Emvck7FUiKZq18jHx/4T+6l+9TlaWVgRqGGcgV7bONR8Z5SyoozZ5+XobpPvaUt4qZHihpDb+z8h0qyTZNmVcKA966x5w7tKz5IWrZExy50t9pB+tO81FyqMFPeJozyiLl3a/rzwwyn2HvOQK7bkWrZI+rIJJkUXovtwp8iLfel5l6n608/8LdaPrstaQYiTMCa7UEHe+H1O2ZkuZwqqW6/c17Sa31scLver5DGktDqy9/D20/g7+arn1sx+dTskq3Kc1ICmYJ6MR2RIt9fDjdb/ES69MTcs8G35mTepedstV8mtp3UyOfR5M5YhWzaU6tdD1q9B4ZdF45anDJt01Gi3qdm2JTtncoqluK335hXBrBlCespT5iEyjEg8p7VzKj2qPpcrn+Br1jqW1xtJqw261bwPsPe+DdO/1xd39feHdI/GAYpsuRWAyf+nRLk0o5tKt9ZmapICc7yr7au/+FrUg9w5ZUKQs3+PEHSNvnbpU5oWkf1CeLSM7byyCfm/pQvXRNrZF9T8vIQe0J3Lalxpxl3rESn5bjmjBIq1aMUdEJ/9tiqlk3FGfM9u9zZxEpXjC5JtyVxleMlMYkv65k7f9pex138pfI8oJpyq67gCt8BlGChKjTxLckWVOM4ReTt4kyIi2Q/Zz2U7JW7yhJdEqST0Tmx42Rma9Zq3ba0uPWI/tU+iJWjez7urB80u9OXL8rnKj1yavNlAx6mzh+Wq02srLlZ/r9DBd4Gv0MaU+dmZhsrwypiW+8OYiJQrjIjE+XMJGESJ/mOQhMsvbKV/KuremXD5pkDbmJeVL3HugiHBFdx87o7lPmHF0luh1CGtTky0cJTyNy9T5gG1p8VTq+pIfkq3Xa71RanmiKk+hqdvewthvaSFjsn6p4M5sZG9b9lYpS+FPYSSFrpBv9FblhzbNL6Dg30i4skPN0efssAHx7T2xJXbew9sQb1RdnmhG1IK2oLeQe7fVOMs96nX0xqJu0/fh4M8jAV1z0ifkSUNll5R5yW3q/vQvyAqMRcxKb3qGj8Hmwo9kmVPIeBKybPxoEqG/ixM6Fs3BZyRlLv585L0GN4ozob/TFDYGTZ0fQZlDCA/9HoPfnJve4bxsTvX6Wubiy0N6AX3jonF481edq5h+PhZqbM3I++eA1uGshrr2Fv/rODQfwymcly+3nL5r9spj1mW/WJ3IYjwcvmcMN5/VXM3Rn2dWjM5ES25+Mu6LgmYqs0bz/uljPGrWgOY1E/IclJdO4u1VZOT1pYL3Ai4ZMvEf8Y9r/LcR3hQ0quQIoaTvKaqp6R48Nf2NS9fo9Gc+Mhk6VTKVqZk8okFvxG6JffEAfreKCDD07VD5XUr5H7Hu78/2oPWMrIc+RZcnBy1qi+n0w9yi9ehZnTGe3lxZX/wW8nLlZGN1/bPVu483Vr68r76h/KH4jfg95CXr4i/iS7EH4z0mTf1V/E38ffP65p82P9v8XHb94JrC/FqUfjbv/Rd5M9kH {xi }i Points cloud Positional encoding Token encoding Tokenize Le lycée Marcelin Berthelot étant situé sur le parcours touristique de « la boucle de la Marne », est connu de tous ceux qui ont visité les environs de Paris. « Ah, c’est cet immense bâtiment moderne » dit-on. Le lycée Marcelin Berthelot étant situé sur le parcours touristique de « la boucle de la Marne », est connu de tous ceux qui ont visité les environs de Paris. « Ah, c’est cet immense bâtiment moderne » dit-on. xi xj (Unmasked) Attention layer Arbitrary number of tokens Arbitrary number of layers Expressivity Understanding … next token probabilities Attention Norm MLP Classif N × …

Slide 4

Slide 4 text

In Context Mappings over Measures Smoothness and PDE’s Arbitrary number of tokens Arbitrary number of layers Universality Expressivity

Slide 5

Slide 5 text

Attention as In-context Mapping Point clouds: X := {xi }n i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping:

Slide 6

Slide 6 text

Attention as In-context Mapping Point clouds: X := {xi }n i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping: Single-head attention layer: X ↦ {Γθ [X](xi )}n i=1 Multi-head attention layer: X ↦ {∑H h=1 Wh Γθh [X](xi )}n i=1 K1 , Q1 , V1 … W1 W2 … K2 , Q2 , V2

Slide 7

Slide 7 text

Context-free layers: Multi-layer perceptron: X ↦ {Γθ (xi )}n i=1 Γθ (x) := x + θ1 ReLu(θ2 x) x1 Γθ (x1 ) x2 Γθ (x2 ) Attention as In-context Mapping Point clouds: X := {xi }n i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping: Single-head attention layer: X ↦ {Γθ [X](xi )}n i=1 Multi-head attention layer: X ↦ {∑H h=1 Wh Γθh [X](xi )}n i=1 K1 , Q1 , V1 … W1 W2 … K2 , Q2 , V2

Slide 8

Slide 8 text

Context-free layers: Multi-layer perceptron: X ↦ {Γθ (xi )}n i=1 Γθ (x) := x + θ1 ReLu(θ2 x) x1 Γθ (x1 ) x2 Γθ (x2 ) Attention as In-context Mapping Point clouds: X := {xi }n i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping: Single-head attention layer: X ↦ {Γθ [X](xi )}n i=1 Multi-head attention layer: X ↦ {∑H h=1 Wh Γθh [X](xi )}n i=1 K1 , Q1 , V1 … W1 W2 … K2 , Q2 , V2 Layer norm: Γθ (x) := θ1 ⊙ x ∥x∥ + θ2 Γθ θ1 θ2

Slide 9

Slide 9 text

Context-free layers: Multi-layer perceptron: X ↦ {Γθ (xi )}n i=1 Γθ (x) := x + θ1 ReLu(θ2 x) x1 Γθ (x1 ) x2 Γθ (x2 ) Attention as In-context Mapping Point clouds: X := {xi }n i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping: Single-head attention layer: X ↦ {Γθ [X](xi )}n i=1 Multi-head attention layer: X ↦ {∑H h=1 Wh Γθh [X](xi )}n i=1 K1 , Q1 , V1 … W1 W2 … K2 , Q2 , V2 Layer norm: Γθ (x) := θ1 ⊙ x ∥x∥ + θ2 Transformer composition of in-context and context-free layers. ≡ Γθ θ1 θ2

Slide 10

Slide 10 text

Attentions Operating over Measures Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj Number of token is arbitrary. n (Unmasked) attention is permutation invariant. Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y) Vy dμ(y) Γθ [X] X μ Γθ [μ] μ = 1 n ∑n i=1 δxi

Slide 11

Slide 11 text

Attentions Operating over Measures Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj Number of token is arbitrary. n (Unmasked) attention is permutation invariant. Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y) Vy dμ(y) Γθ [X] X μ Γθ [μ] μ = 1 n ∑n i=1 δxi Attention layers X ↦ {Γθ [X](xi )}n i=1 μ ↦ Γθ [μ]♯ μ Push-forward Γ♯ ∑ i δxi := ∑ i δΓ(xi ) (Γ♯ μ)(B) := μ(Γ−1(B)) Γ(x2 ) x1 Γ(x1 ) x2 Γ Γ μ Γ♯ μ

Slide 12

Slide 12 text

Attentions Operating over Measures Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj Number of token is arbitrary. n (Unmasked) attention is permutation invariant. Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y) Vy dμ(y) Γθ [X] X ˜ X Γ˜ θ [ ˜ X] μ Γθ [μ] ξ Γλ [ξ] μ = 1 n ∑n i=1 δxi Attention layers X ↦ {Γθ [X](xi )}n i=1 μ ↦ Γθ [μ]♯ μ Push-forward Γ♯ ∑ i δxi := ∑ i δΓ(xi ) (Γ♯ μ)(B) := μ(Γ−1(B)) Γ(x2 ) x1 Γ(x1 ) x2 Γ Γ μ Γ♯ μ Composing layers (Γλ ⋄ Γθ )[X] := Γλ [Y] ∘ Γθ [X] where Y := (Γθ [X](xi ))i (Γλ ⋄ Γθ )[μ] := Γλ [ξ] ∘ Γθ )[μ] where ξ := Γθ [μ]♯ μ

Slide 13

Slide 13 text

Masked Causal Attention over Measures For NLP: architectures must be causal for next token prediction & generative modeling. Γθ [X](xi ) := ∑ j≤i e⟨Qxi ,Kxj ⟩ ∑ ℓ≤i e⟨Qxi ,Kxℓ ⟩ Vxj breaks permutation invariance. → Masked attention mapping:

Slide 14

Slide 14 text

Masked Causal Attention over Measures For NLP: architectures must be causal for next token prediction & generative modeling. Γθ [X](xi ) := ∑ j≤i e⟨Qxi ,Kxj ⟩ ∑ ℓ≤i e⟨Qxi ,Kxℓ ⟩ Vxj breaks permutation invariance. → Masked attention mapping: Training: next token prediction min θ ∑ X n−1 ∑ i=1 ℓ(Γθ [X](xi ), xi+1 ) Testing: generative model X ↦ (x1 , …, xi , Γ[X](xi )) (simplified…) (simplified…)

Slide 15

Slide 15 text

Masked Causal Attention over Measures For NLP: architectures must be causal for next token prediction & generative modeling. Γθ [X](xi ) := ∑ j≤i e⟨Qxi ,Kxj ⟩ ∑ ℓ≤i e⟨Qxi ,Kxℓ ⟩ Vxj breaks permutation invariance. → Masked attention mapping: Training: next token prediction min θ ∑ X n−1 ∑ i=1 ℓ(Γθ [X](xi ), xi+1 ) Testing: generative model X ↦ (x1 , …, xi , Γ[X](xi )) (simplified…) (simplified…) μ = 1 n ∑n i=1 δ(xi ,ti ) Space-time lifting: Γθ [μ](x, t) := ∫ 1s≤t e⟨Qx,Ky⟩ ∫ 1s′  ≤t e⟨Qx,Ky′  ⟩dμ(y′  , s′  ) Vy dμ(y, s) t x

Slide 16

Slide 16 text

In Context Mappings over Measures Smoothness and PDE’s Arbitrary number of layers Arbitrary number of layers Universality Expressivity

Slide 17

Slide 17 text

W2 (μ, ν)2 := min T n ∑ i=1 ∥xi − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784

Slide 18

Slide 18 text

= inf T♯ μ=ν ∫ ∥x − T(x)∥2dμ(x) μ ν T W2 (μ, ν)2 := min T n ∑ i=1 ∥xi − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784

Slide 19

Slide 19 text

= inf T♯ μ=ν ∫ ∥x − T(x)∥2dμ(x) μ ν T W2 (μ, ν)2 := min T n ∑ i=1 ∥xi − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784 General measures: Kantorovitch relaxation Approximation by discrete measures or Kantorovitch 1942

Slide 20

Slide 20 text

How Smooth is Attention? Attention layer: Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) μ ↦ Γθ [μ]♯ μ Applications: Understanding robustness to attacks. Well-poseness of very deep transformers. W2 (Γθ [μ]♯ μ, Γθ [ν]♯ ν) ≤ Cθ W2 (μ, ν) Lipschitz regularity:

Slide 21

Slide 21 text

How Smooth is Attention? Attention layer: Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) μ ↦ Γθ [μ]♯ μ Applications: Understanding robustness to attacks. Well-poseness of very deep transformers. W2 (Γθ [μ]♯ μ, Γθ [ν]♯ ν) ≤ Cθ W2 (μ, ν) Lipschitz regularity: Theorem: [Castin, Peyré, Ablin] Cθ ≤ ∥V∥(1 + 3∥K⊤Q∥R2)e2∥K⊤Q∥R2 If supp(μ), supp(ν) ⊂ B(0,R), If furthermore μ = 1 n ∑ i δxi , ν = 1 n ∑ i δyi Cθ ≤ ∥V∥∥K⊤Q∥R2 12n + 3 R μ ν R ν μ

Slide 22

Slide 22 text

How Smooth is Attention? Attention layer: Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) μ ↦ Γθ [μ]♯ μ Applications: Understanding robustness to attacks. Well-poseness of very deep transformers. W2 (Γθ [μ]♯ μ, Γθ [ν]♯ ν) ≤ Cθ W2 (μ, ν) Lipschitz regularity: Extension to masked attention: use Wcond 2 (μ, ν)2 := ∫1 0 W2 2 (μ( ⋅ |t), ν( ⋅ |t))dμ[0,1] (t) W2 W2 t x μ( ⋅ |t) ν( ⋅ |t) Theorem: [Castin, Peyré, Ablin] Cθ ≤ ∥V∥(1 + 3∥K⊤Q∥R2)e2∥K⊤Q∥R2 If supp(μ), supp(ν) ⊂ B(0,R), If furthermore μ = 1 n ∑ i δxi , ν = 1 n ∑ i δyi Cθ ≤ ∥V∥∥K⊤Q∥R2 12n + 3 R μ ν R ν μ

Slide 23

Slide 23 text

Infinite Depth as a Neural PDE Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V)

Slide 24

Slide 24 text

Infinite Depth as a Neural PDE Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs

Slide 25

Slide 25 text

Infinite Depth as a Neural PDE Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs dμ dt + div(μΓθ [μ] ) = 0 Non-linear PDE [Sander, Ablin, Blondel, Peyré, 2022] [Geshkovski, Letrouit, Polyanskiy, Rigollet 2023] Not a Wasserstein flow :( → Mean field Michael Sander

Slide 26

Slide 26 text

Infinite Depth as a Neural PDE Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs dμ dt + div(μΓθ [μ] ) = 0 Non-linear PDE [Sander, Ablin, Blondel, Peyré, 2022] [Geshkovski, Letrouit, Polyanskiy, Rigollet 2023] Not a Wasserstein flow :( → Mean field Michael Sander Transformer: Tθ [μ0 ] : x(t = 0) x(t = 1) · x = Γθ [μ](x) μ(t = 0) = μ0 Training: minθ ∑ k ℓ(Tθ [μk](xk), yk) Context Previous Next « Theorem » convergence to the global minimum if initial loss small enough enough heads separated (μk)k Talks by Pierre Marion and Raphaël Barboni →

Slide 27

Slide 27 text

Gaussian Case and Clustering dμ dt + div(μΓθ [μ]) = 0 Theorem [Valérie Castin]: If , μ(0) = 𝒩 (m(0), Σ(0)) · m = V(Id+ΣQ⊤K)m · Σ = VΣQ⊤KΣ + ΣK⊤QΣV⊤ Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) θ(t) = (Q(t), K(t), V(t)) then μ(s) = 𝒩 (m(s), Σ(s)) t μ(0) μ(t)

Slide 28

Slide 28 text

Gaussian Case and Clustering dμ dt + div(μΓθ [μ]) = 0 Theorem [Valérie Castin]: If , μ(0) = 𝒩 (m(0), Σ(0)) · m = V(Id+ΣQ⊤K)m · Σ = VΣQ⊤KΣ + ΣK⊤QΣV⊤ Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) θ(t) = (Q(t), K(t), V(t)) then μ(s) = 𝒩 (m(s), Σ(s)) Theorem [Valérie Castin]: If and symmetric, stationary points of have rank less than V(t) = Id K(t)⊤Q(t) Σ(t) d/2. Conjecture: low-rank stationary covariances for any . K, Q, V … t μ(0) μ(∞) [Geshkovski, Letrouit, Polyanskiy, Rigollet 2023] The attention matrix converges to low-rank. → Clustering of for un-normalized attention. → μ t μ(0) μ(t)

Slide 29

Slide 29 text

In Context Mappings over Measures Smoothness and PDE’s Universality Arbitrary number of layers Arbitrary number of layers Expressivity

Slide 30

Slide 30 text

Universality Γθ [μ](x) := x + H ∑ h=1 ∫ e⟨Qhx,Khy⟩ ∫ e⟨Qhx,Khy′  ⟩dμ(y′  ) Vhy dμ(y) Theorem [Furuya, de Hoop, Peyré]: Let be -continuous on a compact . Γ⋆ : 𝒫 (Ω) × Ω → ℝd Wass2 × ℓ2 Ω ⊂ ℝd For any there exists and such that ε N (θ1 , …, θN ) Γθ [μ](x) := MLPθ (x) or ∀(μ, x) ∈ 𝒫 (Ω) × Ω, |Γ⋆[μ](x) − ΓθN ⋄ ⋯ ⋄ Γθ1 [μ](x)| ≤ ε with and . token dimensions ≤ 4d H ≤ d fixed dimensions, arbitrary # tokens. Masked transformers: requires Lipschitz in time. Novelties:

Slide 31

Slide 31 text

Universality Γθ [μ](x) := x + H ∑ h=1 ∫ e⟨Qhx,Khy⟩ ∫ e⟨Qhx,Khy′  ⟩dμ(y′  ) Vhy dμ(y) Theorem [Furuya, de Hoop, Peyré]: Let be -continuous on a compact . Γ⋆ : 𝒫 (Ω) × Ω → ℝd Wass2 × ℓ2 Ω ⊂ ℝd For any there exists and such that ε N (θ1 , …, θN ) Γθ [μ](x) := MLPθ (x) or ∀(μ, x) ∈ 𝒫 (Ω) × Ω, |Γ⋆[μ](x) − ΓθN ⋄ ⋯ ⋄ Γθ1 [μ](x)| ≤ ε with and . token dimensions ≤ 4d H ≤ d fixed dimensions, arbitrary # tokens. Masked transformers: requires Lipschitz in time. Novelties: Previous works: [Yun, Bhojanapalli, Singh Rawat, Reddi, Kumar, 2019] , dimension #tokens → H = 2 ∼ [Agrachev, Letrouit 2019] abstract genericity hypothesis (Lie algebra/control) → Discrete tokens: transformers are universal Turing machines: e.g. [Elhage et al 2021]

Slide 32

Slide 32 text

Sketch of proof Cylindrical algebra: γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y⟩dμ(y) (⟨v, y⟩ + c)dμ(y) First component of Attention MLP with skip connnexion. → ∘ 1-D elementary block: 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN : (θ1 , …, θN )} θ := (A, b, c, u, v) (γ1 ⊙ γ2 )[μ](x) := γ1 [μ](x)γ2 [μ](x)

Slide 33

Slide 33 text

Sketch of proof Cylindrical algebra: γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y⟩dμ(y) (⟨v, y⟩ + c)dμ(y) First component of Attention MLP with skip connnexion. → ∘ 1-D elementary block: 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN : (θ1 , …, θN )} θ := (A, b, c, u, v) (γ1 ⊙ γ2 )[μ](x) := γ1 [μ](x)γ2 [μ](x) Proposition: any map with can be uniformly approximated by a transformer with skip connexions. (μ, x) → (α1 [μ](x), …, αd [μ](x)) ∈ ℝd αi ∈ 𝒜 Use 1D dimension by dimension requires heads. → H = d Multiplications ⊙ double dimension. Compositions ∘ double dimension Compositions ∘ In-context ⋄ Use MLPs Embedding dimenson = 4d Proof sketch:

Slide 34

Slide 34 text

Sketch of Proof Lemma: is dense in continuous maps for 𝒜 𝒫 (Ω) × Ω → ℝ Wass2 × ℓ2 γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y′  ⟩dμ(y′  ) (⟨v, y⟩ + c)dμ(y) 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN }

Slide 35

Slide 35 text

Karl Weierstrass Marshall Stone Sketch of Proof Lemma: is dense in continuous maps for 𝒜 𝒫 (Ω) × Ω → ℝ Wass2 × ℓ2 γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y′  ⟩dμ(y′  ) (⟨v, y⟩ + c)dμ(y) 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN } Stone-Weierstrass theorem is compact. 𝒫 (Ω) × Ω Proof: are continuous. γθ , : A = b = u = v = 0 c = 1 γθ [μ] = 1 ∀θ, γθ [μ](x) = γθ [μ′  ](x′  ) (μ, x) = (μ′  , x′  ) ⟹ ?

Slide 36

Slide 36 text

Karl Weierstrass Marshall Stone Sketch of Proof Lemma: is dense in continuous maps for 𝒜 𝒫 (Ω) × Ω → ℝ Wass2 × ℓ2 γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y′  ⟩dμ(y′  ) (⟨v, y⟩ + c)dμ(y) 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN } : c = v = 0 ⟨x, u⟩ = ⟨x′  , u⟩ Stone-Weierstrass theorem is compact. 𝒫 (Ω) × Ω Proof: are continuous. γθ , : A = b = u = v = 0 c = 1 γθ [μ] = 1 ∀θ, γθ [μ](x) = γθ [μ′  ](x′  ) (μ, x) = (μ′  , x′  ) ⟹ ?

Slide 37

Slide 37 text

Karl Weierstrass Marshall Stone Sketch of Proof Lemma: is dense in continuous maps for 𝒜 𝒫 (Ω) × Ω → ℝ Wass2 × ℓ2 γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y′  ⟩dμ(y′  ) (⟨v, y⟩ + c)dμ(y) 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN } : c = v = 0 ⟨x, u⟩ = ⟨x′  , u⟩ : A = c = u = 0 L1 (μ)(b) = L1 (μ′  )(b) Lk (μ)(b) := ∫ ebyykv ∫ eby′  dμ(y′  ) dμ(y) In 1-D: In higher dimensions: use Radon transform. L′  k = Lk+1 − Lk L1 L1 (μ) = L1 (μ′  ) ⇒ ∀k, Lk (μ) = Lk (μ′  ) ⇒ ∀k, ∫ ykdμ(y) = ∫ ykdμ′  (y) Stone-Weierstrass theorem is compact. 𝒫 (Ω) × Ω Proof: are continuous. γθ , : A = b = u = v = 0 c = 1 γθ [μ] = 1 ∀θ, γθ [μ](x) = γθ [μ′  ](x′  ) (μ, x) = (μ′  , x′  ) ⟹ ?

Slide 38

Slide 38 text

440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 How Real data Adversarial data Figure 1. Scatter plots of the local Lipschitz constant of s data (upper row) and adversarial data (lower row) as a fun radius of inputs X = (x1, . . . , xn), defined as R := p 1/n correspond to two different pretrained BERT models: an respectively for attention layers 0 and 6. The third column i on the dataset AG NEWS. We see that the Lipschitz con sequence length n, and that the growth rate is p n for adve • The Lipschitz constant of self-attention on real data Open Problems Smoothness: eR mean-field n discrete practice n1/4 bridge the gap Universality: Replace scalar-valued cylindrical maps by more effective functions. Optimisation: Understand the structure of optimal (Q, K, V) Why is Adam normalization needed for training? Toward quantitative approximation bound, leverage smoothness. GPT-2 Cθ n