Learning with Deep Convolutional Generative Adversarial Networks. https://arxiv.org/abs/1511.06434 • https://thesephist.com/posts/prism/ • Reconsidering Tweets: Intervening during Tweet Creation Decreases Offensive Content, https://ojs.aaai.org/index.php/ICWSM/article/view/19308 • StyDiff: a refined style transfer method based on diffusion models, https://www.nature.com/articles/s41598-025-17899-x • Analogies Explained: Towards Understanding Word Embeddings, https://arxiv.org/abs/1901.09813 • Effectively Steer LLM To Follow Preference via Building Confident Directions, https://arxiv.org/abs/2503.02989 • STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models, https://aclanthology.org/2025.emnlp-main.925/ • Dialz: A Python Toolkit for Steering Vectors, https://aclanthology.org/2025.acl-demo.35/ • ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection, https://aclanthology.org/2022.acl-long.234 • Proactively Reducing the Hate Intensity of Online Posts via Hate Speech Normalization, https://dl.acm.org/doi/10.1145/3534678.3539161 • https://www.emergentmind.com/topics/steering-vectors • https://bobrupakroy.medium.com/steering-large-language-models-with-activation-vectors-a-practical-guide-45866b3697ac • Improving Instruction-Following in Language Models through Activation Steering, https://arxiv.org/abs/2410.12877 • Language Model Alignment in Multilingual Trolley Problems, https://arxiv.org/abs/2407.02273 • ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual Learning, https://arxiv.org/abs/2501.01031 • https://www.lesswrong.com/posts/ndyngghzFY388Dnew/implementing-activation-steering