Amusing Abliteration

November 28, 2025

Science

230

Amusing Abliteration

Abliteration on LLMs is the act of removing guardrails - here I show how to make Llama 3.1 'less kind and good' with questions around explosives, financial restructuring advice, rude jokes and security vulnerabilities. I'm interested in the question - whilst guardrails stop us asking 'awkward questions', what other answers are watered down such that we don't get useful responses?
Created as an outcome of my playgroup research days: https://www.linkedin.com/feed/update/urn:li:activity:7396293087674933248/

ianozsvald

November 28, 2025

More Decks by ianozsvald

See All by ianozsvald

Build your own LLM, Live, with MicroGPT

0

110

0

94

playgroup - PyDataLondon 2025-10 Lightning Talk

0

56

Successful Projects through a bit of Rebellion

0

120

Valuable Lessons Learned on Kaggle’s ARC AGI LLM Challenge (PyDataGlobal 2024)

0

540

Valuable Lessons Learned on Kaggle’s ARC AGI LLM challenge

0

300

ARC AGI Kaggle with llama3 - First Steps

0

310

Failing to reason with LLMs (ARC AGI kaggle update with Llama3)

0

160

Llama.cpp for fun (and maybe profit) - 30 minute

0

310

Other Decks in Science

See All in Science

Bear-safety-running

0

170

20260410_SystemsThinking

1

120

[NLP2026 参加報告会] AI for Science まとめ / NLP2026

0

1.9k

データベース01: データベースを使わない世界

PRO

1

1.3k

検索と推論タスクに関する論文の紹介

1

250

ハミルトン・ヤコビ方程式の解の性質と物理的意味

0

800

2

740

AkarengaLT vol.40

0

110

なぜエネルギーは保存する？〜自由落下でわかる“対称性”とネーターの定理〜

syotasasaki593876

0

210

Bリーグのショットデータを活用した得点期待値モデルの構築 / Construction of expected points model using shot data of B.LEAGUE

0

160

機械学習 - SVM

PRO

2

1.2k

機械学習 - K-means & 階層的クラスタリング

PRO

0

1.8k

Featured

See All Featured

What does AI have to do with Human Rights?

PRO

1

2.2k

Crafting Experiences

1

210

Everyday Curiosity

0

250

464

140k

Dominate Local Search Results - an insider guide to GBP, reviews, and Local SEO

PRO

0

210

4 Signs Your Business is Dying

187

22k

The Invisible Side of Design

301

52k

Visual Storytelling: How to be a Superhuman Communicator

2

590

Design in an AI World

1

260

Building Adaptive Systems

44

3.1k

How to Ace a Technical Interview

281

24k

Measuring & Analyzing Core Web Vitals

9

880

Transcript

Amusing Abliterations PyDataLondon 2025-12 lightning talk @IanOzsvald – ianozsvald.com
At playgroup we talked about humour generation I wondered if
‘abliteration’ – removing safeguards, was a good idea It was The “why” By [ian]@ianozsvald[.com] Ian Ozsvald
By [ian]@ianozsvald[.com] Ian Ozsvald Guardrails prevent naughty stuff
By [ian]@ianozsvald[.com] Ian Ozsvald Abliteration removes guardrails <- This is
the same underlying model, no extra information added
By [ian]@ianozsvald[.com] Ian Ozsvald System exploits too
By [ian]@ianozsvald[.com] Ian Ozsvald Coarse humour? These safe jokes appear
in Google e.g. in reddit/r/DadJokes
By [ian]@ianozsvald[.com] Ian Ozsvald I can't tell you what it
said! !!CENSORED!! Coarse humour! :-( Unlike dad jokes I made at playgroup, this joke didn't appear in google searches
By [ian]@ianozsvald[.com] Ian Ozsvald Private equity – which is abliterated?
What is ‘abliteration’? LMStudio (/ollamma etc) What answers do you
miss due to guardrails? Next steps: By [ian]@ianozsvald[.com] Ian Ozsvald