Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Transformers are Universal in Context Learners

Gabriel Peyré
September 10, 2024

Transformers are Universal in Context Learners

Gabriel Peyré

September 10, 2024
Tweet

More Decks by Gabriel Peyré

Other Decks in Science

Transcript

  1. Gabriel Peyré É C O L E N O R

    M A L E S U P É R I E U R E Transformers are Universal in Context Learners Takashi Furuya Maarten de Hoop Valérie Castin Pierre Ablin
  2. ˜ xi := ∑ j e⟨Qxi ,Kxj ⟩ ∑ ℓ

    e⟨Qxi ,Kxℓ ⟩ Vxj Transformers and attention mechanism … + <latexit sha1_base64="aTL0Qvb1dLhAur6wfZM9PGylzLY=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69O10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/Ajzs2dE=</latexit> x1 <latexit sha1_base64="7Z/IumRXp79HdyogVfnC2DA+LeM=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69P10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/AnoD2dI=</latexit> x2 <latexit sha1_base64="Fgw+vWgPriclgLxpVoHDEicBDLw=">AAA9h3ictVv9chu3EYfTr9j9cto/O9O5VnYnybiqpHrSzGQ0E1uSJcWKJZuU7CS0Nfw40WcfeTSPlGQzfI7+2z5Jn6Nv0P7VV+juAjjgSNwtoLrCSMKB+O0uFsB+AMfOKE3yydraP6998IMf/ujHP/nw+o2f/uznv/jlzY9+dZJn03E3Pu5maTZ+1mnncZoM4+NJMknjZ6Nx3B500vhp5/UWfv70PB7nSTZsTt6O4ueDdn+YnCXd9gSant9qzS5Pk9b8dJbMb53eXFlbXaOfaLmyriorQv0cZR/9dkO0RE9koiumYiBiMRQTqKeiLXIo34l1sSZG0PZczKBtDLWEPo/FXNwA7BR6xdCjDa2v4W8fnr5TrUN4Rpo5obvAJYXfMSAjcRswGfQbQx25RfT5lChjaxXtGdFE2d7C/46iNYDWiXgJrRxO9/TF4Vgm4kx8TmNIYEwjasHRdRWVKWkFJY+sUU2AwgjasN6Dz8dQ7xJS6zkiTE5jR9226fN/UU9sxeeu6jsV/yYpb0OJREONPisotMU50Y9oNqfwmZQnBc59oBCrMWLtgnQ9oNEPof8M2h9BmVNN66QDZUat81rkFhQXcotF7kJxIXdZ5AEUF/KARR5BcSGPFBKxY9K5G9+A4sI3WM6PobiQj1nkEygu5BMWeQLFhTxhkd9CcSG/ZZEPoLiQD1jkQygu5EMW2YTiQjZZ5DEUF/KYRe5AcSF3FLJ6p46hZEQnYXblPaiXeaClSKHlHivffbKOLux9jz3drcDyu3ob/rux2x46jSuwOx7r7qwCy6+8XbCRbixvi/bIm7iweyx2H1aAG7vPYr8SryqwX3nstNcVWH6vHUA/N5a3vl/Dkxv7NYt9BDU3lvdRh9Dixh56eIxRBfaIxT4WbyqwPlZ/XIHl7X4D7Ioby/upJvR3Y32s6bQCy9vTE4hg3FjeWz2FVjf2KYt9Ji4rsM9Y7Ddg3d3Ybzw87LsKrPaxN8iD9CkeiWHH1lFrF7sSayOg1mb4p4VvSSk27kA7h+kXmD5hBixit0DseiIOCsSBt1x5YUdzind5Lo0C0fBEdArfhLUJ279X9Mda6oHYLhDbC4i6iBTnWo/lnKIL3cIhJ4XnwprPmLLCfmMtVuuh3vJqxGEJIdf2S1r5dyhbwgwKNVVH7WXh4yUyouc6xAVlb3qUmgePmxRWwUZdsqiOA9VhUW8dqLcsaupATVnUuQN1zqLMzrdxLY8VYPSPczGjJ7kCZIxcXSKICu6B19mDPRrB+jmCKPAJtRzC/wbl3lypkwyzefSTeMrxvGSJx1CbiRVoN1nhNuXXKe2wGCSTPQ9Vjo9PeLYxU3tOWuF54cmj4sTEn05C8vQLOhgtRrSfwug8pJY5RXeyFobfK/a9roXhd0jjc4riZS0MP1HST64ge1Nhm1fANmA3jZT2TT2Uhjx/kTR0/QZ5XbS4OKsDtWaQ3mUg/X01M/tXmJctqkn9mHoYjdwaX14aXwgNo+fc0nMYFYyeZNSra1HwSIYq7zX1UBky8qJDJYd5Cp0Z7NNTM6PrYTSOIOLaopx7ZtVDV++oGI2ph9E4EfLcc06RvK6H0ejTs9SHqYfRwNOWtsrzTT3UsqMGZO5s6qFWfUinwHgGJNe8bDFR0ZjipKmillB8UH9aY8f8y34Mz2xeFDlCPSUT21bT6RS+rF4iHS/EYNUmgXJgfDG1YrAyjZnYYPMrKcOk5N+X6Rgfj5o/AC1GsPvlHQB3Zp6ChPpMAq13ChTX2ayrPDKN22BxuErOFlAt1Tpho0XDV54aldtOqZXLy8xojR5bZK9zWnsjigkPSLOcHg4qZ7iKIqehg5KGeHohunun9mtZ+2ssbrSAGBUrrUs3QvImrT5PdWm9Yen4trrlmUCRdz5m/eJp85myNpjzZGSLUJY6nnY/fY5kt6FfvSPMGbf8LKIZRXt1TlYjoRupnM1C9WmxjMZn9GxoH9OdHPKQNLowj5GiMhLy1gxP0fE8PSKLattbjjfqS5/QyXpOVlfb43p030L3HejwHGcLPMYjqDUhZziGp6ZHlnOj0FVGGh+LPxa3oxnNYH1Gn5YspKYh7U1cspB1WfbLEpULQONqkFm6P41FOhrfWqLEZ/0ueUzuWrb8t+nmVt9vt2mNV6/m6pOYHnHdIK4R7Rp5qyufFjlICWbOTzYofq0fJfIL4Yg2lOP6wuIs9TKkG/+YMtgRRcYp7TZud5R72+dTi59oTkdC353jbXZGFjIi+xeBf8poTUb0a787oG/QpUVIyUb62J2kiG5csU7CrjETxyVCvtVg1ltMtmxK/DVde3fltBZlxiD9wHxhbWudHFAsGBPXsbLuZm/Xex9Emvck7FUiKZq18jHx/4T+6l+9TlaWVgRqGGcgV7bONR8Z5SyoozZ5+XobpPvaUt4qZHihpDb+z8h0qyTZNmVcKA966x5w7tKz5IWrZExy50t9pB+tO81FyqMFPeJozyiLl3a/rzwwyn2HvOQK7bkWrZI+rIJJkUXovtwp8iLfel5l6n608/8LdaPrstaQYiTMCa7UEHe+H1O2ZkuZwqqW6/c17Sa31scLver5DGktDqy9/D20/g7+arn1sx+dTskq3Kc1ICmYJ6MR2RIt9fDjdb/ES69MTcs8G35mTepedstV8mtp3UyOfR5M5YhWzaU6tdD1q9B4ZdF45anDJt01Gi3qdm2JTtncoqluK335hXBrBlCespT5iEyjEg8p7VzKj2qPpcrn+Br1jqW1xtJqw261bwPsPe+DdO/1xd39feHdI/GAYpsuRWAyf+nRLk0o5tKt9ZmapICc7yr7au/+FrUg9w5ZUKQs3+PEHSNvnbpU5oWkf1CeLSM7byyCfm/pQvXRNrZF9T8vIQe0J3Lalxpxl3rESn5bjmjBIq1aMUdEJ/9tiqlk3FGfM9u9zZxEpXjC5JtyVxleMlMYkv65k7f9pex138pfI8oJpyq67gCt8BlGChKjTxLckWVOM4ReTt4kyIi2Q/Zz2U7JW7yhJdEqST0Tmx42Rma9Zq3ba0uPWI/tU+iJWjez7urB80u9OXL8rnKj1yavNlAx6mzh+Wq02srLlZ/r9DBd4Gv0MaU+dmZhsrwypiW+8OYiJQrjIjE+XMJGESJ/mOQhMsvbKV/KuremXD5pkDbmJeVL3HugiHBFdx87o7lPmHF0luh1CGtTky0cJTyNy9T5gG1p8VTq+pIfkq3Xa71RanmiKk+hqdvewthvaSFjsn6p4M5sZG9b9lYpS+FPYSSFrpBv9FblhzbNL6Dg30i4skPN0efssAHx7T2xJXbew9sQb1RdnmhG1IK2oLeQe7fVOMs96nX0xqJu0/fh4M8jAV1z0ifkSUNll5R5yW3q/vQvyAqMRcxKb3qGj8Hmwo9kmVPIeBKybPxoEqG/ixM6Fs3BZyRlLv585L0GN4ozob/TFDYGTZ0fQZlDCA/9HoPfnJve4bxsTvX6Wubiy0N6AX3jonF481edq5h+PhZqbM3I++eA1uGshrr2Fv/rODQfwymcly+3nL5r9spj1mW/WJ3IYjwcvmcMN5/VXM3Rn2dWjM5ES25+Mu6LgmYqs0bz/uljPGrWgOY1E/IclJdO4u1VZOT1pYL3Ai4ZMvEf8Y9r/LcR3hQ0quQIoaTvKaqp6R48Nf2NS9fo9Gc+Mhk6VTKVqZk8okFvxG6JffEAfreKCDD07VD5XUr5H7Hu78/2oPWMrIc+RZcnBy1qi+n0w9yi9ehZnTGe3lxZX/wW8nLlZGN1/bPVu483Vr68r76h/KH4jfg95CXr4i/iS7EH4z0mTf1V/E38ffP65p82P9v8XHb94JrC/FqUfjbv/Rd5M9kH</latexit> {xi }i Points cloud Positional encoding Token encoding Tokenize Le lycée Marcelin Berthelot étant situé sur le parcours touristique de « la boucle de la Marne », est connu de tous ceux qui ont visité les environs de Paris. « Ah, c’est cet immense bâtiment moderne » dit-on. Le lycée Marcelin Berthelot étant situé sur le parcours touristique de « la boucle de la Marne », est connu de tous ceux qui ont visité les environs de Paris. « Ah, c’est cet immense bâtiment moderne » dit-on. xi xj (Unmasked) Attention layer … next token probabilities Attention Norm MLP Classif N × …
  3. ˜ xi := ∑ j e⟨Qxi ,Kxj ⟩ ∑ ℓ

    e⟨Qxi ,Kxℓ ⟩ Vxj Transformers and attention mechanism … + <latexit sha1_base64="aTL0Qvb1dLhAur6wfZM9PGylzLY=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69O10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/Ajzs2dE=</latexit> x1 <latexit sha1_base64="7Z/IumRXp79HdyogVfnC2DA+LeM=">AAA9iHictVttcxu3EYbTt1h9c9qPnelcq7iTdFyNpHqadjKaifViSbFiySYlOwltDV9O1NknHs0jJdkM/0e/tn+kv6P/oP3Uv9DdBXDAkbhbQHV1IwkH4nl2sQcsdoFjZ5gm+Xh19Z+3Pvje93/wwx99eHvpxz/56c9+fuejX5zk2WTUjY+7WZqNnnfaeZwmg/h4nIzT+PlwFLcvOmn8rPN6Cz9/dhmP8iQbNMdvh/GLi3Z/kJwl3fYYql62iGE6inuz69P10zvLqyur9BMtFtZUYVmon6Pso1+vi5boiUx0xURciFgMxBjKqWiLHK5vxZpYFUOoeyGmUDeCUkKfx2ImlgA7gVYxtGhD7Wv424e7b1XtAO6RMyd0F6Sk8DsCZCTuAiaDdiMoo7SIPp8QM9ZWcU+JE3V7C/87iusCasfiHGo5nG7pi8O+jMWZ+DP1IYE+DakGe9dVLBOyCmoeWb0aA8MQ6rDcg89HUO4SUts5IkxOfUfbtunzf1FLrMX7rmo7Ef8mLe/CFYmG6n1WMLTFJfFH9DQn8JnUJwXJfWCIVR+xdEW2vqDeD6D9FOofwzWjkrZJB64p1c5qkVtwuZBbLHIXLhdyl0UewOVCHrDII7hcyCOFROyIbO7GN+By4Rus5CdwuZBPWORTuFzIpyzyBC4X8oRFfgOXC/kNi3wIlwv5kEU+gsuFfMQim3C5kE0WeQyXC3nMInfgciF3FLJ6po7gyognYWblAyiXZaCnSKHmAavfJnlHF3bTY053K7D8rN6G/27stodN4wrsjse4O6vA8iNvF3ykG8v7oj1aTVzYPRa7DyPAjd1nsV+KVxXYLz1m2usKLD/XDqCdG8t736/gzo39isU+hpIby69Rh1Djxh56rBjDCuwRi30i3lRgfbz+qALL+/0G+BU3ll+nmtDejfXxppMKLO9PTyCCcWP51eoZ1Lqxz1jsc3FdgX3OYr8G7+7Gfu2xwr6rwOo1dolWkD7FIzHM2Dq2djErsTQEtjYjPy3WlpRi4w7Uc5h+gekT5oJF7BaIXU/EQYE48NYrL/xoTvEuL6VRIBqeiE6xNmFpzLbvFe2xlHogtgvE9hyiLiLFZ637cknRha7hkONi5cKST5+ywn9jKVbjod7zasRhCSHH9jmN/HuULWEGhZaqYzsv1niJjOi+DnFF2ZvupZbB48aFV7BR1yyq40B1WNRbB+oti5o4UBMWdelAXbIoM/NtXMtjBBj747OY0p0cATJGrr4iiAoewKqzB3M0gvFzBFHgU6o5hP8Nyr25q04zzOZxncRdjhclTzyC0lQsQ73JCrcpv05phsWgmWx5qHJ8vMO9jamac9ILz4qVPCp2TPx5EtKnX/BgtBjRfArjeUQ1M4ruZCkMv1fMe10Kw++QxWcUxctSGH6stB/fQPemwjZvgG3AbBoq65tyKIfcf5EcurxEqy56XHyqF2rMIN91IP++ejL7N3guW1SS9jHlMI7c6l9e6l8Ih7Fzbtk5jAWjJxn16lIU3JOByntNOVSHjFbRgdLD3IU+GWzTU09Gl8M4jiDi2qKce2qVQ0fvsOiNKYdxnAi57zmjSF6Xwzj6dC/tYcphHLjb0lZ5vimHena0gMydTTnUqw9oFxj3gOSYlzUmKhpRnDRRbAnFB/W7NXbMv7iO4Z7NyyJHqGcysW01T6dYy+o10vFCDF5tHKgHxhcTKwYrc0zFOptfSR3GpfV9kces8Wj5A7BiBLNfngFwe+YpaKj3JNB7p8C4xmZd5Z5p3DqLw1FyNodqqdoxGy0auXLXqFx3SrVcXmZ6a+zYIn+d09gbUkx4QJbl7HBQ+YSrGDkLHZQsxPOF2O6dmq9l66+yuOEcYliMtC6dCMmTtPo81WX1hmXju+qUZwyXPPMx4xd3m8+Ut8GcJyNfhLrUybTb6X0kuw7X1XvC7HHLzyJ6ouivLslrJHQilbNZqN4tltH4lO4N9zGdyaEMydGF5xgplqGQp2a4i4776RF5VNvfcrLRXnqHTpZz8rraH9ej+xa670CH5zhbsGI8hlITcoZjuGt6ZDlLha0ysvhI/KE4Hc3oCdZn9GnJQ2oO6W/ikoesy7LPSyxXgMbRILN0f455Ho1vLTDxWb9LH5O7lj3/XTq51efbbRrj1aO5eiemR1LXSWpEs0ae6sq7eQlSg6nzk3WKX+t7ifJCJKIP5aS+tCRLuwzoxD+mDHZIkXFKs42bHeXW9v7U/Cda0pHQZ+d4mp2Rh4zI/0WwPmU0JiP6td8d0Cfo0iOk5CN9/E5SRDeuWCdhx5iJ4xIh32ow4y0mXzYh+ZrXnl05jUWZMch1YDY3trVNDigWjEnqSHl3M7frVx9Emvck7FEiGc1Y+YTkf0p/9a8eJ8sLIwItjE8gV77O9TwyylnQRm1a5et9kG5ra/lxocNLpbVZ/4xOH5c026aMC/XB1boHkrt0L2XhKBmR3vlCG7mO1u3mIvNwzo7Y2zPK4qXf76sVGPW+R6vkMs25Fo2SPoyCcZFF6LbcLvK83HpZZXY/7vz/wm5sXbYaMkbC7OBKC3H7+zFla7aWKYxqOX5f02xyW30016pezoDG4oU1l7+D2t/AX623vvfj6ZS8wiaNAclg7oxFZE200MJP1mZJlh6ZmsvcG3lmTOpWds1N8mvp3UyOfRnMckSj5lrtWujyTTheWRyvPG3YpLNGY0Vdrz3RKZtbNNVppa+8EGnNAOYJy8xHZBqVeGhp51J+rD2Wlc/xNeody7XKcrVhttqnAfac90G65/r87P6uWN0j8ZBimy5FYDJ/6dEsTSjm0rX1mZpkQMn3lX+1Z3+LalB6hzwoMsv3OHHGyFOnLl2zQtPfqZUtIz9vPIJ+b+lKtdE+tkXlPy4gL2hO5DQvNeI+tYiV/rYe0ZxHWrFijoh2/tsUU8m4oz5ntlubZxKV4gmTb8pZZWTJTGFA9ud23vYXstd9K3+NKCecqOi6A1zhTxgZJEbvJLgjy5yeEK5y8iRBRrQd8p+Lfkqe4g0sjVZI66nY8PAxMus1Y90eW7rHum+/h5ZodfPUXS14eam3RE7eTU702rSqXagYdTp3fzOutlrlyvd1dpjMyTX2mFAbO7MwWV4Z0xKfe0uRGoVJkRgfKWG9CNE/TPMQneXplC+zbq2ZyzsN0secU77EvQeKCFd094kzmvuU6Udnga9DWJtN1nBMuBuXqf0B29PirtTthXVI1t6uXY1SayWqWik0u71aGP8tPWRM3i8V3J6NbG3r3iplKfwujGToCvlGb1V+aHN+Dhf+jYQrO9QSffYOGxDfPhBbYuc9vA3xRpXljmZENegLenO5d1v1s9yi3kZvLHab30eCv4wEbM1pn9BKGqq7ZOY1t9n9+a/IC4xEzGpvWob3wZbC92RRUkh/EvJsfG8Sob+LE9oXLcGnJ2Up/nLkuQbXizOhv9MU1gfNzvegLCFEhn6Pwe+Zm9bhsmxJ9fZalOIrQ64C+sRF4/DkrzpXMe18PNTIeiLvXwJ6h7Madr1a/K/90HKMpHBZvtJy+q7ZK4+nLtvFakcW4+HwOWOk+Yzmaon+MrOidyZacsuTcV8U9KQyqzfvnx/jUTMGtKypkPugvHYSb48io68vC54LuHTIxH/EP27x30Z4U3BU6RHCpM8pqtl0C55Nf+PS1Tv9mY9OhqdKpzKbySMa9EbsltgXD+F3q4gAQ98Old+llP8R6/7+bA9qz8h76F10uXPQorqYdj/MKVqP7tUe4+md5bX5byEvFk7WV9b+tHL/yf3lLzbVN5Q/FL8Sv4W8ZE18Jr4Qe9DfYzpX+Kv4m/j7xtLG6sZnG3+RTT+4pTC/FKWfjc3/AnoD2dI=</latexit> x2 <latexit sha1_base64="Fgw+vWgPriclgLxpVoHDEicBDLw=">AAA9h3ictVv9chu3EYfTr9j9cto/O9O5VnYnybiqpHrSzGQ0E1uSJcWKJZuU7CS0Nfw40WcfeTSPlGQzfI7+2z5Jn6Nv0P7VV+juAjjgSNwtoLrCSMKB+O0uFsB+AMfOKE3yydraP6998IMf/ujHP/nw+o2f/uznv/jlzY9+dZJn03E3Pu5maTZ+1mnncZoM4+NJMknjZ6Nx3B500vhp5/UWfv70PB7nSTZsTt6O4ueDdn+YnCXd9gSant9qzS5Pk9b8dJbMb53eXFlbXaOfaLmyriorQv0cZR/9dkO0RE9koiumYiBiMRQTqKeiLXIo34l1sSZG0PZczKBtDLWEPo/FXNwA7BR6xdCjDa2v4W8fnr5TrUN4Rpo5obvAJYXfMSAjcRswGfQbQx25RfT5lChjaxXtGdFE2d7C/46iNYDWiXgJrRxO9/TF4Vgm4kx8TmNIYEwjasHRdRWVKWkFJY+sUU2AwgjasN6Dz8dQ7xJS6zkiTE5jR9226fN/UU9sxeeu6jsV/yYpb0OJREONPisotMU50Y9oNqfwmZQnBc59oBCrMWLtgnQ9oNEPof8M2h9BmVNN66QDZUat81rkFhQXcotF7kJxIXdZ5AEUF/KARR5BcSGPFBKxY9K5G9+A4sI3WM6PobiQj1nkEygu5BMWeQLFhTxhkd9CcSG/ZZEPoLiQD1jkQygu5EMW2YTiQjZZ5DEUF/KYRe5AcSF3FLJ6p46hZEQnYXblPaiXeaClSKHlHivffbKOLux9jz3drcDyu3ob/rux2x46jSuwOx7r7qwCy6+8XbCRbixvi/bIm7iweyx2H1aAG7vPYr8SryqwX3nstNcVWH6vHUA/N5a3vl/Dkxv7NYt9BDU3lvdRh9Dixh56eIxRBfaIxT4WbyqwPlZ/XIHl7X4D7Ioby/upJvR3Y32s6bQCy9vTE4hg3FjeWz2FVjf2KYt9Ji4rsM9Y7Ddg3d3Ybzw87LsKrPaxN8iD9CkeiWHH1lFrF7sSayOg1mb4p4VvSSk27kA7h+kXmD5hBixit0DseiIOCsSBt1x5YUdzind5Lo0C0fBEdArfhLUJ279X9Mda6oHYLhDbC4i6iBTnWo/lnKIL3cIhJ4XnwprPmLLCfmMtVuuh3vJqxGEJIdf2S1r5dyhbwgwKNVVH7WXh4yUyouc6xAVlb3qUmgePmxRWwUZdsqiOA9VhUW8dqLcsaupATVnUuQN1zqLMzrdxLY8VYPSPczGjJ7kCZIxcXSKICu6B19mDPRrB+jmCKPAJtRzC/wbl3lypkwyzefSTeMrxvGSJx1CbiRVoN1nhNuXXKe2wGCSTPQ9Vjo9PeLYxU3tOWuF54cmj4sTEn05C8vQLOhgtRrSfwug8pJY5RXeyFobfK/a9roXhd0jjc4riZS0MP1HST64ge1Nhm1fANmA3jZT2TT2Uhjx/kTR0/QZ5XbS4OKsDtWaQ3mUg/X01M/tXmJctqkn9mHoYjdwaX14aXwgNo+fc0nMYFYyeZNSra1HwSIYq7zX1UBky8qJDJYd5Cp0Z7NNTM6PrYTSOIOLaopx7ZtVDV++oGI2ph9E4EfLcc06RvK6H0ejTs9SHqYfRwNOWtsrzTT3UsqMGZO5s6qFWfUinwHgGJNe8bDFR0ZjipKmillB8UH9aY8f8y34Mz2xeFDlCPSUT21bT6RS+rF4iHS/EYNUmgXJgfDG1YrAyjZnYYPMrKcOk5N+X6Rgfj5o/AC1GsPvlHQB3Zp6ChPpMAq13ChTX2ayrPDKN22BxuErOFlAt1Tpho0XDV54aldtOqZXLy8xojR5bZK9zWnsjigkPSLOcHg4qZ7iKIqehg5KGeHohunun9mtZ+2ssbrSAGBUrrUs3QvImrT5PdWm9Yen4trrlmUCRdz5m/eJp85myNpjzZGSLUJY6nnY/fY5kt6FfvSPMGbf8LKIZRXt1TlYjoRupnM1C9WmxjMZn9GxoH9OdHPKQNLowj5GiMhLy1gxP0fE8PSKLattbjjfqS5/QyXpOVlfb43p030L3HejwHGcLPMYjqDUhZziGp6ZHlnOj0FVGGh+LPxa3oxnNYH1Gn5YspKYh7U1cspB1WfbLEpULQONqkFm6P41FOhrfWqLEZ/0ueUzuWrb8t+nmVt9vt2mNV6/m6pOYHnHdIK4R7Rp5qyufFjlICWbOTzYofq0fJfIL4Yg2lOP6wuIs9TKkG/+YMtgRRcYp7TZud5R72+dTi59oTkdC353jbXZGFjIi+xeBf8poTUb0a787oG/QpUVIyUb62J2kiG5csU7CrjETxyVCvtVg1ltMtmxK/DVde3fltBZlxiD9wHxhbWudHFAsGBPXsbLuZm/Xex9Emvck7FUiKZq18jHx/4T+6l+9TlaWVgRqGGcgV7bONR8Z5SyoozZ5+XobpPvaUt4qZHihpDb+z8h0qyTZNmVcKA966x5w7tKz5IWrZExy50t9pB+tO81FyqMFPeJozyiLl3a/rzwwyn2HvOQK7bkWrZI+rIJJkUXovtwp8iLfel5l6n608/8LdaPrstaQYiTMCa7UEHe+H1O2ZkuZwqqW6/c17Sa31scLver5DGktDqy9/D20/g7+arn1sx+dTskq3Kc1ICmYJ6MR2RIt9fDjdb/ES69MTcs8G35mTepedstV8mtp3UyOfR5M5YhWzaU6tdD1q9B4ZdF45anDJt01Gi3qdm2JTtncoqluK335hXBrBlCespT5iEyjEg8p7VzKj2qPpcrn+Br1jqW1xtJqw261bwPsPe+DdO/1xd39feHdI/GAYpsuRWAyf+nRLk0o5tKt9ZmapICc7yr7au/+FrUg9w5ZUKQs3+PEHSNvnbpU5oWkf1CeLSM7byyCfm/pQvXRNrZF9T8vIQe0J3Lalxpxl3rESn5bjmjBIq1aMUdEJ/9tiqlk3FGfM9u9zZxEpXjC5JtyVxleMlMYkv65k7f9pex138pfI8oJpyq67gCt8BlGChKjTxLckWVOM4ReTt4kyIi2Q/Zz2U7JW7yhJdEqST0Tmx42Rma9Zq3ba0uPWI/tU+iJWjez7urB80u9OXL8rnKj1yavNlAx6mzh+Wq02srLlZ/r9DBd4Gv0MaU+dmZhsrwypiW+8OYiJQrjIjE+XMJGESJ/mOQhMsvbKV/KuremXD5pkDbmJeVL3HugiHBFdx87o7lPmHF0luh1CGtTky0cJTyNy9T5gG1p8VTq+pIfkq3Xa71RanmiKk+hqdvewthvaSFjsn6p4M5sZG9b9lYpS+FPYSSFrpBv9FblhzbNL6Dg30i4skPN0efssAHx7T2xJXbew9sQb1RdnmhG1IK2oLeQe7fVOMs96nX0xqJu0/fh4M8jAV1z0ifkSUNll5R5yW3q/vQvyAqMRcxKb3qGj8Hmwo9kmVPIeBKybPxoEqG/ixM6Fs3BZyRlLv585L0GN4ozob/TFDYGTZ0fQZlDCA/9HoPfnJve4bxsTvX6Wubiy0N6AX3jonF481edq5h+PhZqbM3I++eA1uGshrr2Fv/rODQfwymcly+3nL5r9spj1mW/WJ3IYjwcvmcMN5/VXM3Rn2dWjM5ES25+Mu6LgmYqs0bz/uljPGrWgOY1E/IclJdO4u1VZOT1pYL3Ai4ZMvEf8Y9r/LcR3hQ0quQIoaTvKaqp6R48Nf2NS9fo9Gc+Mhk6VTKVqZk8okFvxG6JffEAfreKCDD07VD5XUr5H7Hu78/2oPWMrIc+RZcnBy1qi+n0w9yi9ehZnTGe3lxZX/wW8nLlZGN1/bPVu483Vr68r76h/KH4jfg95CXr4i/iS7EH4z0mTf1V/E38ffP65p82P9v8XHb94JrC/FqUfjbv/Rd5M9kH</latexit> {xi }i Points cloud Positional encoding Token encoding Tokenize Le lycée Marcelin Berthelot étant situé sur le parcours touristique de « la boucle de la Marne », est connu de tous ceux qui ont visité les environs de Paris. « Ah, c’est cet immense bâtiment moderne » dit-on. Le lycée Marcelin Berthelot étant situé sur le parcours touristique de « la boucle de la Marne », est connu de tous ceux qui ont visité les environs de Paris. « Ah, c’est cet immense bâtiment moderne » dit-on. xi xj (Unmasked) Attention layer Arbitrary number of tokens Arbitrary number of layers Expressivity Understanding … next token probabilities Attention Norm MLP Classif N × …
  4. In Context Mappings over Measures Smoothness and PDE’s Arbitrary number

    of tokens Arbitrary number of layers Universality Expressivity
  5. Attention as In-context Mapping Point clouds: X := {xi }n

    i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping:
  6. Attention as In-context Mapping Point clouds: X := {xi }n

    i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping: Single-head attention layer: X ↦ {Γθ [X](xi )}n i=1 Multi-head attention layer: X ↦ {∑H h=1 Wh Γθh [X](xi )}n i=1 K1 , Q1 , V1 … W1 W2 … K2 , Q2 , V2
  7. Context-free layers: Multi-layer perceptron: X ↦ {Γθ (xi )}n i=1

    Γθ (x) := x + θ1 ReLu(θ2 x) x1 Γθ (x1 ) x2 Γθ (x2 ) Attention as In-context Mapping Point clouds: X := {xi }n i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping: Single-head attention layer: X ↦ {Γθ [X](xi )}n i=1 Multi-head attention layer: X ↦ {∑H h=1 Wh Γθh [X](xi )}n i=1 K1 , Q1 , V1 … W1 W2 … K2 , Q2 , V2
  8. Context-free layers: Multi-layer perceptron: X ↦ {Γθ (xi )}n i=1

    Γθ (x) := x + θ1 ReLu(θ2 x) x1 Γθ (x1 ) x2 Γθ (x2 ) Attention as In-context Mapping Point clouds: X := {xi }n i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping: Single-head attention layer: X ↦ {Γθ [X](xi )}n i=1 Multi-head attention layer: X ↦ {∑H h=1 Wh Γθh [X](xi )}n i=1 K1 , Q1 , V1 … W1 W2 … K2 , Q2 , V2 Layer norm: Γθ (x) := θ1 ⊙ x ∥x∥ + θ2 Γθ θ1 θ2
  9. Context-free layers: Multi-layer perceptron: X ↦ {Γθ (xi )}n i=1

    Γθ (x) := x + θ1 ReLu(θ2 x) x1 Γθ (x1 ) x2 Γθ (x2 ) Attention as In-context Mapping Point clouds: X := {xi }n i=1 parameters θ := (Q, K, V) x xj Γθ (X, x) Γθ [X](x) := ∑ j e⟨Qx,Kxj ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj In-context mapping: Single-head attention layer: X ↦ {Γθ [X](xi )}n i=1 Multi-head attention layer: X ↦ {∑H h=1 Wh Γθh [X](xi )}n i=1 K1 , Q1 , V1 … W1 W2 … K2 , Q2 , V2 Layer norm: Γθ (x) := θ1 ⊙ x ∥x∥ + θ2 Transformer composition of in-context and context-free layers. ≡ Γθ θ1 θ2
  10. Attentions Operating over Measures Γθ [X](x) := ∑ j e⟨Qx,Kxj

    ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj Number of token is arbitrary. n (Unmasked) attention is permutation invariant. Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y) Vy dμ(y) Γθ [X] X μ Γθ [μ] μ = 1 n ∑n i=1 δxi
  11. Attentions Operating over Measures Γθ [X](x) := ∑ j e⟨Qx,Kxj

    ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj Number of token is arbitrary. n (Unmasked) attention is permutation invariant. Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y) Vy dμ(y) Γθ [X] X μ Γθ [μ] μ = 1 n ∑n i=1 δxi Attention layers X ↦ {Γθ [X](xi )}n i=1 μ ↦ Γθ [μ]♯ μ Push-forward Γ♯ ∑ i δxi := ∑ i δΓ(xi ) (Γ♯ μ)(B) := μ(Γ−1(B)) Γ(x2 ) x1 Γ(x1 ) x2 Γ Γ μ Γ♯ μ
  12. Attentions Operating over Measures Γθ [X](x) := ∑ j e⟨Qx,Kxj

    ⟩ ∑ ℓ e⟨Qx,Kxℓ ⟩ Vxj Number of token is arbitrary. n (Unmasked) attention is permutation invariant. Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y) Vy dμ(y) Γθ [X] X ˜ X Γ˜ θ [ ˜ X] μ Γθ [μ] ξ Γλ [ξ] μ = 1 n ∑n i=1 δxi Attention layers X ↦ {Γθ [X](xi )}n i=1 μ ↦ Γθ [μ]♯ μ Push-forward Γ♯ ∑ i δxi := ∑ i δΓ(xi ) (Γ♯ μ)(B) := μ(Γ−1(B)) Γ(x2 ) x1 Γ(x1 ) x2 Γ Γ μ Γ♯ μ Composing layers (Γλ ⋄ Γθ )[X] := Γλ [Y] ∘ Γθ [X] where Y := (Γθ [X](xi ))i (Γλ ⋄ Γθ )[μ] := Γλ [ξ] ∘ Γθ )[μ] where ξ := Γθ [μ]♯ μ
  13. Masked Causal Attention over Measures For NLP: architectures must be

    causal for next token prediction & generative modeling. Γθ [X](xi ) := ∑ j≤i e⟨Qxi ,Kxj ⟩ ∑ ℓ≤i e⟨Qxi ,Kxℓ ⟩ Vxj breaks permutation invariance. → Masked attention mapping:
  14. Masked Causal Attention over Measures For NLP: architectures must be

    causal for next token prediction & generative modeling. Γθ [X](xi ) := ∑ j≤i e⟨Qxi ,Kxj ⟩ ∑ ℓ≤i e⟨Qxi ,Kxℓ ⟩ Vxj breaks permutation invariance. → Masked attention mapping: Training: next token prediction min θ ∑ X n−1 ∑ i=1 ℓ(Γθ [X](xi ), xi+1 ) Testing: generative model X ↦ (x1 , …, xi , Γ[X](xi )) (simplified…) (simplified…)
  15. Masked Causal Attention over Measures For NLP: architectures must be

    causal for next token prediction & generative modeling. Γθ [X](xi ) := ∑ j≤i e⟨Qxi ,Kxj ⟩ ∑ ℓ≤i e⟨Qxi ,Kxℓ ⟩ Vxj breaks permutation invariance. → Masked attention mapping: Training: next token prediction min θ ∑ X n−1 ∑ i=1 ℓ(Γθ [X](xi ), xi+1 ) Testing: generative model X ↦ (x1 , …, xi , Γ[X](xi )) (simplified…) (simplified…) μ = 1 n ∑n i=1 δ(xi ,ti ) Space-time lifting: Γθ [μ](x, t) := ∫ 1s≤t e⟨Qx,Ky⟩ ∫ 1s′  ≤t e⟨Qx,Ky′  ⟩dμ(y′  , s′  ) Vy dμ(y, s) t x
  16. In Context Mappings over Measures Smoothness and PDE’s Arbitrary number

    of layers Arbitrary number of layers Universality Expressivity
  17. W2 (μ, ν)2 := min T n ∑ i=1 ∥xi

    − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784
  18. = inf T♯ μ=ν ∫ ∥x − T(x)∥2dμ(x) μ ν

    T W2 (μ, ν)2 := min T n ∑ i=1 ∥xi − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784
  19. = inf T♯ μ=ν ∫ ∥x − T(x)∥2dμ(x) μ ν

    T W2 (μ, ν)2 := min T n ∑ i=1 ∥xi − yT(i) ∥2 Optimal Transport (Wasserstein) Distance ∥xi − yj ∥2 xi yj T Monge 1784 General measures: Kantorovitch relaxation Approximation by discrete measures or Kantorovitch 1942
  20. How Smooth is Attention? Attention layer: Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) μ ↦ Γθ [μ]♯ μ Applications: Understanding robustness to attacks. Well-poseness of very deep transformers. W2 (Γθ [μ]♯ μ, Γθ [ν]♯ ν) ≤ Cθ W2 (μ, ν) Lipschitz regularity:
  21. How Smooth is Attention? Attention layer: Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) μ ↦ Γθ [μ]♯ μ Applications: Understanding robustness to attacks. Well-poseness of very deep transformers. W2 (Γθ [μ]♯ μ, Γθ [ν]♯ ν) ≤ Cθ W2 (μ, ν) Lipschitz regularity: Theorem: [Castin, Peyré, Ablin] Cθ ≤ ∥V∥(1 + 3∥K⊤Q∥R2)e2∥K⊤Q∥R2 If supp(μ), supp(ν) ⊂ B(0,R), If furthermore μ = 1 n ∑ i δxi , ν = 1 n ∑ i δyi Cθ ≤ ∥V∥∥K⊤Q∥R2 12n + 3 R μ ν R ν μ
  22. How Smooth is Attention? Attention layer: Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) μ ↦ Γθ [μ]♯ μ Applications: Understanding robustness to attacks. Well-poseness of very deep transformers. W2 (Γθ [μ]♯ μ, Γθ [ν]♯ ν) ≤ Cθ W2 (μ, ν) Lipschitz regularity: Extension to masked attention: use Wcond 2 (μ, ν)2 := ∫1 0 W2 2 (μ( ⋅ |t), ν( ⋅ |t))dμ[0,1] (t) W2 W2 t x μ( ⋅ |t) ν( ⋅ |t) Theorem: [Castin, Peyré, Ablin] Cθ ≤ ∥V∥(1 + 3∥K⊤Q∥R2)e2∥K⊤Q∥R2 If supp(μ), supp(ν) ⊂ B(0,R), If furthermore μ = 1 n ∑ i δxi , ν = 1 n ∑ i δyi Cθ ≤ ∥V∥∥K⊤Q∥R2 12n + 3 R μ ν R ν μ
  23. Infinite Depth as a Neural PDE Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V)
  24. Infinite Depth as a Neural PDE Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs
  25. Infinite Depth as a Neural PDE Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs dμ dt + div(μΓθ [μ] ) = 0 Non-linear PDE [Sander, Ablin, Blondel, Peyré, 2022] [Geshkovski, Letrouit, Polyanskiy, Rigollet 2023] Not a Wasserstein flow :( → Mean field Michael Sander
  26. Infinite Depth as a Neural PDE Γθ [μ](x) := ∫

    e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) xi (t + 1) = xi (t) + 1 T Γθ(t) [μ(t)](xi (t)) μ(t) = 1 n ∑n i=1 δxi (t) θ = (Q, K, V) T → + ∞ Infinite depth dxi dt (t) = Γθ(t) [μ(t)](xi (t)) Coupled EDOs dμ dt + div(μΓθ [μ] ) = 0 Non-linear PDE [Sander, Ablin, Blondel, Peyré, 2022] [Geshkovski, Letrouit, Polyanskiy, Rigollet 2023] Not a Wasserstein flow :( → Mean field Michael Sander Transformer: Tθ [μ0 ] : x(t = 0) x(t = 1) · x = Γθ [μ](x) μ(t = 0) = μ0 Training: minθ ∑ k ℓ(Tθ [μk](xk), yk) Context Previous Next « Theorem » convergence to the global minimum if initial loss small enough enough heads separated (μk)k Talks by Pierre Marion and Raphaël Barboni →
  27. Gaussian Case and Clustering dμ dt + div(μΓθ [μ]) =

    0 Theorem [Valérie Castin]: If , μ(0) = 𝒩 (m(0), Σ(0)) · m = V(Id+ΣQ⊤K)m · Σ = VΣQ⊤KΣ + ΣK⊤QΣV⊤ Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) θ(t) = (Q(t), K(t), V(t)) then μ(s) = 𝒩 (m(s), Σ(s)) t μ(0) μ(t)
  28. Gaussian Case and Clustering dμ dt + div(μΓθ [μ]) =

    0 Theorem [Valérie Castin]: If , μ(0) = 𝒩 (m(0), Σ(0)) · m = V(Id+ΣQ⊤K)m · Σ = VΣQ⊤KΣ + ΣK⊤QΣV⊤ Γθ [μ](x) := ∫ e⟨Qx,Ky⟩ ∫ e⟨Qx,Ky′  ⟩dμ(y′  ) Vy dμ(y) θ(t) = (Q(t), K(t), V(t)) then μ(s) = 𝒩 (m(s), Σ(s)) Theorem [Valérie Castin]: If and symmetric, stationary points of have rank less than V(t) = Id K(t)⊤Q(t) Σ(t) d/2. Conjecture: low-rank stationary covariances for any . K, Q, V … t μ(0) μ(∞) [Geshkovski, Letrouit, Polyanskiy, Rigollet 2023] The attention matrix converges to low-rank. → Clustering of for un-normalized attention. → μ t μ(0) μ(t)
  29. In Context Mappings over Measures Smoothness and PDE’s Universality Arbitrary

    number of layers Arbitrary number of layers Expressivity
  30. Universality Γθ [μ](x) := x + H ∑ h=1 ∫

    e⟨Qhx,Khy⟩ ∫ e⟨Qhx,Khy′  ⟩dμ(y′  ) Vhy dμ(y) Theorem [Furuya, de Hoop, Peyré]: Let be -continuous on a compact . Γ⋆ : 𝒫 (Ω) × Ω → ℝd Wass2 × ℓ2 Ω ⊂ ℝd For any there exists and such that ε N (θ1 , …, θN ) Γθ [μ](x) := MLPθ (x) or ∀(μ, x) ∈ 𝒫 (Ω) × Ω, |Γ⋆[μ](x) − ΓθN ⋄ ⋯ ⋄ Γθ1 [μ](x)| ≤ ε with and . token dimensions ≤ 4d H ≤ d fixed dimensions, arbitrary # tokens. Masked transformers: requires Lipschitz in time. Novelties:
  31. Universality Γθ [μ](x) := x + H ∑ h=1 ∫

    e⟨Qhx,Khy⟩ ∫ e⟨Qhx,Khy′  ⟩dμ(y′  ) Vhy dμ(y) Theorem [Furuya, de Hoop, Peyré]: Let be -continuous on a compact . Γ⋆ : 𝒫 (Ω) × Ω → ℝd Wass2 × ℓ2 Ω ⊂ ℝd For any there exists and such that ε N (θ1 , …, θN ) Γθ [μ](x) := MLPθ (x) or ∀(μ, x) ∈ 𝒫 (Ω) × Ω, |Γ⋆[μ](x) − ΓθN ⋄ ⋯ ⋄ Γθ1 [μ](x)| ≤ ε with and . token dimensions ≤ 4d H ≤ d fixed dimensions, arbitrary # tokens. Masked transformers: requires Lipschitz in time. Novelties: Previous works: [Yun, Bhojanapalli, Singh Rawat, Reddi, Kumar, 2019] , dimension #tokens → H = 2 ∼ [Agrachev, Letrouit 2019] abstract genericity hypothesis (Lie algebra/control) → Discrete tokens: transformers are universal Turing machines: e.g. [Elhage et al 2021]
  32. Sketch of proof Cylindrical algebra: γθ [μ](x) := ⟨x, u⟩

    + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y⟩dμ(y) (⟨v, y⟩ + c)dμ(y) First component of Attention MLP with skip connnexion. → ∘ 1-D elementary block: 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN : (θ1 , …, θN )} θ := (A, b, c, u, v) (γ1 ⊙ γ2 )[μ](x) := γ1 [μ](x)γ2 [μ](x)
  33. Sketch of proof Cylindrical algebra: γθ [μ](x) := ⟨x, u⟩

    + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y⟩dμ(y) (⟨v, y⟩ + c)dμ(y) First component of Attention MLP with skip connnexion. → ∘ 1-D elementary block: 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN : (θ1 , …, θN )} θ := (A, b, c, u, v) (γ1 ⊙ γ2 )[μ](x) := γ1 [μ](x)γ2 [μ](x) Proposition: any map with can be uniformly approximated by a transformer with skip connexions. (μ, x) → (α1 [μ](x), …, αd [μ](x)) ∈ ℝd αi ∈ 𝒜 Use 1D dimension by dimension requires heads. → H = d Multiplications ⊙ double dimension. Compositions ∘ double dimension Compositions ∘ In-context ⋄ Use MLPs Embedding dimenson = 4d Proof sketch:
  34. Sketch of Proof Lemma: is dense in continuous maps for

    𝒜 𝒫 (Ω) × Ω → ℝ Wass2 × ℓ2 γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y′  ⟩dμ(y′  ) (⟨v, y⟩ + c)dμ(y) 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN }
  35. Karl Weierstrass Marshall Stone Sketch of Proof Lemma: is dense

    in continuous maps for 𝒜 𝒫 (Ω) × Ω → ℝ Wass2 × ℓ2 γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y′  ⟩dμ(y′  ) (⟨v, y⟩ + c)dμ(y) 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN } Stone-Weierstrass theorem is compact. 𝒫 (Ω) × Ω Proof: are continuous. γθ , : A = b = u = v = 0 c = 1 γθ [μ] = 1 ∀θ, γθ [μ](x) = γθ [μ′  ](x′  ) (μ, x) = (μ′  , x′  ) ⟹ ?
  36. Karl Weierstrass Marshall Stone Sketch of Proof Lemma: is dense

    in continuous maps for 𝒜 𝒫 (Ω) × Ω → ℝ Wass2 × ℓ2 γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y′  ⟩dμ(y′  ) (⟨v, y⟩ + c)dμ(y) 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN } : c = v = 0 ⟨x, u⟩ = ⟨x′  , u⟩ Stone-Weierstrass theorem is compact. 𝒫 (Ω) × Ω Proof: are continuous. γθ , : A = b = u = v = 0 c = 1 γθ [μ] = 1 ∀θ, γθ [μ](x) = γθ [μ′  ](x′  ) (μ, x) = (μ′  , x′  ) ⟹ ?
  37. Karl Weierstrass Marshall Stone Sketch of Proof Lemma: is dense

    in continuous maps for 𝒜 𝒫 (Ω) × Ω → ℝ Wass2 × ℓ2 γθ [μ](x) := ⟨x, u⟩ + ∫ e⟨Ax+b,y⟩ ∫ e⟨Ax+b,y′  ⟩dμ(y′  ) (⟨v, y⟩ + c)dμ(y) 𝒜 := Span⋃N {γθ1 ⊙ ⋯ ⊙ γθN } : c = v = 0 ⟨x, u⟩ = ⟨x′  , u⟩ : A = c = u = 0 L1 (μ)(b) = L1 (μ′  )(b) Lk (μ)(b) := ∫ ebyykv ∫ eby′  dμ(y′  ) dμ(y) In 1-D: In higher dimensions: use Radon transform. L′  k = Lk+1 − Lk L1 L1 (μ) = L1 (μ′  ) ⇒ ∀k, Lk (μ) = Lk (μ′  ) ⇒ ∀k, ∫ ykdμ(y) = ∫ ykdμ′  (y) Stone-Weierstrass theorem is compact. 𝒫 (Ω) × Ω Proof: are continuous. γθ , : A = b = u = v = 0 c = 1 γθ [μ] = 1 ∀θ, γθ [μ](x) = γθ [μ′  ](x′  ) (μ, x) = (μ′  , x′  ) ⟹ ?
  38. 440 441 442 443 444 445 446 447 448 449

    450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 How Real data Adversarial data Figure 1. Scatter plots of the local Lipschitz constant of s data (upper row) and adversarial data (lower row) as a fun radius of inputs X = (x1, . . . , xn), defined as R := p 1/n correspond to two different pretrained BERT models: an respectively for attention layers 0 and 6. The third column i on the dataset AG NEWS. We see that the Lipschitz con sequence length n, and that the growth rate is p n for adve • The Lipschitz constant of self-attention on real data Open Problems Smoothness: eR mean-field n discrete practice n1/4 bridge the gap Universality: Replace scalar-valued cylindrical maps by more effective functions. Optimisation: Understand the structure of optimal (Q, K, V) Why is Adam normalization needed for training? Toward quantitative approximation bound, leverage smoothness. GPT-2 Cθ n