𝐷𝐷 = 𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑖𝑖 𝑖𝑖=1 𝑁𝑁 , - we adopt preprocessor function 𝑓𝑓𝜏𝜏1 , 𝑓𝑓𝜏𝜏2 , … 𝑓𝑓𝜏𝜏𝐾𝐾 for each task 𝜏𝜏1 , 𝜏𝜏2 , … , 𝜏𝜏𝐾𝐾 . - Given raw input text 𝑥𝑥 and preprocessor function 𝑥𝑥𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝜏𝜏 = 𝑓𝑓𝜏𝜏 (𝑥𝑥) - Define a vector of soft prompt tokens, input prompt tokens, denote as 𝓟𝓟𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝜏𝜏 = 𝒑𝒑1 𝜏𝜏, … , 𝒑𝒑𝑚𝑚 𝜏𝜏 ∈ ℝ𝑚𝑚×𝑑𝑑 - Pre-trained LM, then receives an input embedding, represented as [𝓟𝓟𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝜏𝜏 ; 𝑒𝑒𝑒𝑒𝑒𝑒 𝑥𝑥 ] - We generate the 𝓟𝓟𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝜏𝜏 that achieves the smallest KL loss, as follows: min 𝓟𝓟𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝜏𝜏 𝔼𝔼 𝒙𝒙~𝐷𝐷 𝐾𝐾𝐾𝐾 𝑃𝑃𝐿𝐿𝐿𝐿 𝒚𝒚 𝒙𝒙𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝜏𝜏 ||𝑃𝑃𝐿𝐿𝐿𝐿 𝒚𝒚 𝓟𝓟𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝜏𝜏 , 𝒙𝒙 (1) - In this formulation, 𝑃𝑃𝐿𝐿𝐿𝐿 denotes the likelihood determined by the pre-trained LM. - Optimizing Eq. (1), we derive 𝓟𝓟𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝜏𝜏1 , 𝓟𝓟𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝜏𝜏2 ,…, 𝓟𝓟𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝜏𝜏𝐾𝐾 for all pre-training tasks. PT2TT - Method