Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Amusing Abliteration

Avatar for ianozsvald ianozsvald
November 28, 2025

Amusing Abliteration

Abliteration on LLMs is the act of removing guardrails - here I show how to make Llama 3.1 'less kind and good' with questions around explosives, financial restructuring advice, rude jokes and security vulnerabilities. I'm interested in the question - whilst guardrails stop us asking 'awkward questions', what other answers are watered down such that we don't get useful responses?
Created as an outcome of my playgroup research days: https://www.linkedin.com/feed/update/urn:li:activity:7396293087674933248/

Avatar for ianozsvald

ianozsvald

November 28, 2025
Tweet

More Decks by ianozsvald

Other Decks in Science

Transcript

  1. At playgroup we talked about humour generation I wondered if

    ‘abliteration’ – removing safeguards, was a good idea It was The “why” By [ian]@ianozsvald[.com] Ian Ozsvald
  2. By [ian]@ianozsvald[.com] Ian Ozsvald Abliteration removes guardrails <- This is

    the same underlying model, no extra information added
  3. By [ian]@ianozsvald[.com] Ian Ozsvald I can't tell you what it

    said! !!CENSORED!! Coarse humour! :-( Unlike dad jokes I made at playgroup, this joke didn't appear in google searches
  4. What is ‘abliteration’? LMStudio (/ollamma etc) What answers do you

    miss due to guardrails? Next steps: By [ian]@ianozsvald[.com] Ian Ozsvald