したがって、上記の最⼩化は下記で書き換えられる 定数 (⽣成モデルに関係ない値) 定数を除き、最⼩化を最⼤化に データセットを⽤いて近似 <latexit sha1_base64="kFPFWdBThJaNI6SLjm+pPUKthis=">AAADpnicnVLLbtNAFL2uebTh0dBuKrGJiFrSikZjhNoKCakCFkg8+iJNpbo1Y3fijOqX7EmU4PoH+AEWXYHEAvEZbPgBFv0E1GWRkBAL7thDKURQxFj23HvmnnOPR9eOPJ4IQg60If3M2XPnh0dKFy5eujxavjK2noSd2GENJ/TCeMOmCfN4wBqCC49tRDGjvu2xpr17T543uyxOeBg8Ff2IbfnUDXiLO1QgZJX3TZ+KduynDx9ltWg7nclqvem9vchKTdFmgsp0euqOyQNhpbO4tUQ/205V8INgeqFbOU6U4k7Wm/0n3sleJ8glq1wldZKvymBgqKAKai2H5a9gwg6E4EAHfGAQgMDYAwoJPptgAIEIsS1IEYsx4vk5gwxKyO1gFcMKiugufl3MNhUaYC41k5ztYBcP3xiZFZgkH8lbckQ+kHfkE/n2R60015Be+rjbBZdF1uiLibUvp7J83AW0f7L+6llACxZyrxy9Rzki/8Ip+N3nL4/Wbq9OplPkNTlE/6/IAXmPfxB0PztvVtjqPqpL/fvIKu4wxuix8rCEigwRmclbmME+Zl7jok/ZMVN37CJ+4xj7X0UKvQHFApNjYvw+FIPB+s26MVefW7lVXbyrBmYYrsI1qOFUzMMiPIBlaICjadp1jWiGXtOf6A29WZQOaYozDr8s/dl3mXnkfA==</latexit> KL(p⇤(x)||p✓(x)) = Z 1 1 p⇤(x) log p⇤(x)dx Z 1 1 p⇤(x) log p✓(x)dx <latexit sha1_base64="YSdoxOtWeF/RDSW5OY9vHDy6hjI=">AAADynicnVJLaxRBEK7J+IjrI6teBC+Ly4bdoEuPSBQhENSDkKh5bRLIJEPPpHe2ybzs6V127czNk3/AgycFD+KP8ODFP+AhP0E8RhDEgzWPJGwWDdjDdFd9Vd9X1U3ZkcdjScieNqafOn3m7Pi50vkLFy9NlC9fWY3DrnBYywm9UKzbNGYeD1hLcumx9Ugw6tseW7N3HqbxtR4TMQ+DFTmI2KZP3YC3uUMlQlb5k+nzwFKm7DBJE9OnsiN8NTef1KMtNZXU+43d3egwjm5jcgaz+kcUHkhL3cKjLQfJliqMA7bphW5lSOCgxnbSN2Pus+fDcm1BHWUk6mlixl3fUsGMgaroHRfCUNIoWeUqaZJsVUYNozCqUKyFsPwLTNiGEBzogg8MApBoe0Ahxm8DDCAQIbYJCjGBFs/iDBIoIbeLWQwzKKI7uLvobRRogH6qGWdsB6t4+AtkVqBGvpIPZJ98IR/JN/L7r1oq00h7GeBp51wWWROvri3/PJHl4ymhc8T6Z88S2nAv65Vj71GGpLdwcn7vxev95ftLNTVJ3pHv2P9bskc+4w2C3g/n/SJbeoPqqf4jZOVvKNB6UvTwDBUZIqmXvsIU1jGzHBf7TCsmxRu7iN88xP5XkUJ/RDHH0jExjg/FqLF6u2lMN6cX71RnHxQDMw7X4QbUcSruwiw8hgVogaPVtDltRWvp87rQB7rKU8e0gnMVhpb+8g9fZPVv</latexit> min ✓ KL(p⇤(x)||p✓(x)) = max ✓ Z 1 1 p⇤(x) log p✓(x)dx ' max ✓ 1 N N X n=1 log p✓(xn)
https://en.wikipedia.org/wiki/Expectation%E2%80%93maximi zation_algorithm 2. Blei, David M., and Michael I. Jordan. "Variational inference for Dirichlet process mixtures." (2006): 121-143. 3. "2.1. Gaussian mixture models", https://scikit- learn.org/stable/modules/mixture.html
• 各時刻 𝑡 のデータ 𝑥> に付加されるノイズをNN 𝜖3 で推定できるよう学習 <latexit sha1_base64="5O94BeDU4dfLUFyHj1xmCOaOULo=">AAAD9HicnVJPaxNBFH/b9U+Nfxr1IngJhpZEmzApUkUvRSN4sDRtTVrIxDC7TpKls3+cnYTWZfXuF/CgF5UeRPBLePELeOhHKB4rCOLBt7MbRUMVnGVn3vu99/u9N8OzAuGEipA9Y8o8cvTY8ekTuZOnTp+ZyZ891wr9obR50/aFLzctFnLheLypHCX4ZiA5cy3BN6yt20l8Y8Rl6PjefbUT8I7L+p7Tc2ymEOrm9yvUZWpgMxHdi7uRdqQb1euN5TgubXcjEt+kasAVK8/RQPqB8nWOZUV3xvkJuRmXVHk+KFEehI7wvXJMBe+ptt5pi0tVGIcqYwPpWjlNLdHwkVQRtZiMKBPBgKG+imPdw5U0WKtMhMdi84pKpz9Q5fRISz6IFuLU7+S6+SKpEr0Kk0YtM4qQrYaf/wYUHoIPNgzBBQ4eKLQFMAjxa0MNCASIdSBCTKLl6DiHGHLIHWIWxwyG6BbuffTaGeqhn2iGmm1jFYG/RGYBZskn8pYckI/kHdkn3w/VirRG0ssOnlbK5UF35tmF9a//ZLl4Khj8Yv21ZwU9uK57dbD3QCPJLeyUP3r8/GD9xtpsNEdek8/Y/yuyRz7gDbzRF3t3la+9QPVEv46s9A0lWstZDyuoyBFJvOQVLmMdqnP62GdSMc7euI/4/E/sfxUZbE8oplgyJrU/h2LSaC1Ua4vVxdWrxaVb2cBMw0W4BCWcimuwBHehAU2wjRVjaDwxnpoj86X5xtxNU6eMjHMeflvm+x+SJwd8</latexit> LDDPM(x0; ✓) / E U(t),p(✏) h ✏ ✏✓ p ¯ ↵tx0 + p 1 ¯ ↵t✏, t 2 i <latexit sha1_base64="+pH0vazVWBPaZ8hp7Z1jlVijPjo=">AAADUHicnVJdS9xAFD1mW7Vrq2t9KfRFXJRSdJkU0SII0g/oS/Grq4LRMInjGswmYTK71Ib8gf6BPvSphT4Uf0Zf+lrBB+0vKD4qtJRCeycJLe2iQickc++595w5Ga4T+V6sGDvqMkpXrnb39F4r912/0T9QGby5Eoct6Yq6G/qhXHN4LHwvEHXlKV+sRVLwpuOLVWf3oa6vtoWMvTB4pvYisdHkjcDb9lyuCLIrjy3uRzvcTlQ6NmtOWI5QWWJZlsNlklfTvGxFMtyyk3jWTDd1S8GM07JdqbIay9ZwZ2AWQRXFWggr32FhCyFctNCEQABFsQ+OmJ51mGCICNtAQpikyMvqAinKxG1Rl6AOTugufRuUrRdoQLnWjDO2S6f49EpiDmOUHbL37JR9ZPvsC/txrlaSaWgve7Q7OVdE9sDLW8tfL2U1aVfY+cO60LPCNu5nXj3yHmWI/gs357dfvDpdnlkaTcbYW3ZC/t+wI/aB/iBon7nvFsXSa1LX+o+Ild+hpOhp4WGeFAUhOtO3cJfOsbKeBvnUJ6bFHTcIH/+N/a8ix/MOxRzTY2L+OxSdwcq9mjlVm1qcrM49KAamF7cxgjs0FdOYwxMsoE6+9vEJx/hsHBjfjJ+lrrzVKHYM4a9VKv8CWR7HnQ==</latexit> ↵t = 1 t ¯ ↵t = t Y s=1 ↵s
Jascha Sohl-Dickstein, and Samy Bengio. "Density estimation using real nvp." arXiv preprint arXiv:1605.08803 (2016). 6. Rombach, Robin, et al. "High-resolution image synthesis with latent diffusion models." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022. 7. Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International conference on machine learning. PMLR, 2021.
is in this spirit that a majority of American governments have passed new laws since 2009 making the registration or voting process more difficult . <EOS> <pad> <pad> <pad> <pad> <pad> <pad> It is in this spirit that a majority of American governments have passed new laws since 2009 making the registration or voting process more difficult . <EOS> <pad> <pad> <pad> <pad> <pad> <pad> Self-Attentionの例: makingとmore difficultが関連していると判断 [B1]
[C3] は下記の2タスクで事前学習されたエンコーダ • Masked LM: マスクされた⽂章から元の⽂章を推定する • Next Sentence Prediction: 2つの⽂章が与えられた時、⼀⽅の⽂章が、 もう⼀⽅の次の⽂章になっているかどうかを判定する BERT The capital of France is [MASK]. The capital of France is paris. Masked LM The man went to the store. He bought a gallon of milk. BERT True or False Next Sentence Prediction
• BERTはMasked Language Modelという、マスクされたトークンから元 のトークンを推定するタスクで事前学習されており、⽂書校正は得意 BERT The capital of France is [MASK]. The capital of France is paris. Masked Language Model
al. "Attention is all you need." Advances in neural information processing systems 30 (2017). 2. Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018). 3. Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018). 4. Radford, Alec, et al. "Improving language understanding by generative pre-training." (2018).
All You Need (Transformer)", https://deeplearning.hatenablog.com/entry/transformer 6. "Transformer: A Novel Neural Network Architecture for Language Understanding", https://research.google/blog/transformer-a-novel-neural- network-architecture-for-language-understanding/ 7. "The Illustrated Transformer", http://jalammar.github.io/illustrated-transformer/ 8. Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
al. "Survey of hallucination in natural language generation." ACM Computing Surveys 55.12 (2023): 1-38. 2. Nalisnick, Eric, et al. "Do deep generative models know what they don't know?." arXiv preprint arXiv:1810.09136 (2018). 3. Ouyang, Long, et al. "Training language models to follow instructions with human feedback." Advances in neural information processing systems 35 (2022): 27730-27744. 4. Heng, Alvin, and Harold Soh. "Selective amnesia: A continual learning approach to forgetting in deep generative models." Advances in Neural Information Processing Systems 36 (2024).