Slide 1

Slide 1 text

FIDONET Cybernetic immortality FidoNet and Generative AI: A New Approach to Museumification of Historical Content Resources Vasiliy Burov, Dmitry Soshnikov

Slide 2

Slide 2 text

Reality history museum visitors

Slide 3

Slide 3 text

Concept

Slide 4

Slide 4 text

Idea

Slide 5

Slide 5 text

FIDOneT, BBS, etc. • Founded 1984 by Tom Jennings • Very popular in 1990s • Specific culture & communities • Writing style, quoting

Slide 6

Slide 6 text

Fidonet popularity • HUMOR • HUMOR.FILTERED • STARWARS • ZX.SPECTRUM • FIDONET.HISTORY

Slide 7

Slide 7 text

MUSEIFICATION of Traditional content • Museums have learned to present traditional content as physical artifacts • and make replicas from ancient books to give them the opportunity to leaf through them • But this is only static content…

Slide 8

Slide 8 text

MUSEIFICATION of digital content Sites / Documents / Programs User generated content • FidoNet, Usenet, IRC – this is not only the frozen content itself, but also the style and dynamics… How can we recreate it?

Slide 9

Slide 9 text

Goal: Fidonet cybernetic immortality • Train a language model capable of generating an infinite stream of potential FIDONet messages • What does this show? • Spirit of FIDONet being still alive • The idea of Cybernetic Immortality and that some form is already possible • Problems • Obtaining the dataset • Choosing and training base language model • Making an exhibit

Slide 10

Slide 10 text

DATASET Source Years Size (original) Size (cleaned) Fido7 Usenet Archives 2013-2015 16 Gb (compressed) - Private Archives (JAM) 2001-2004 100Mb 88Mb English Usenet fido group archives 1997-2002 1.7 Gb 0.8 Gb ExecPC BBS Archives (en) 1997-1999 500Mb 500Mb • Datasets are very difficult to find, due to different media available at a time • Google UseNet archives cannot be scraped • Not a single point of aggregation (separate echos on different BBS systems / backbones)

Slide 11

Slide 11 text

Base language Model selection Model Size Comment LSTM 100K - 1M Training from scratch GPT-2 124M – 774M – 1.5B ruGPT 3 117M – 760M – 1.3B ruGPT 3.5 13B LLaMA 7B+ 12 hours training 1 epoch Nvidia A100 80Gb GPU Compromise between training time, required dataset size to avoid overfitting

Slide 12

Slide 12 text

Results Topic: UFO From: RON TAYLOR To: JACK SARGEANT Subj: UFOs JS> RT> You are certainly entitled to your opinion as an engineer or scientist, to JS> RT> the extent that you're entitled to an opinion with which you are JS> RT> absolutely convinced. However, if you are a skeptic, you shouldnt JS> RT> make your point clearly and firmly, not to "sell your ideas to me". JS>If you think of a UFO as merely a "flying disk" without an ET engineer behind the JS>scenes, you are free to believe that. I believe that the UFO is JS>real, but I am not content to just speculate about its nature. Because I'm not a skeptic and there are other people in this conference that are skeptic's for the most part... -Ron * QMPro 1.02 42-7029 * Why are there SO many atheists? Because God lets them. --- WILDMAIL!/WC v4.12 * Origin: CrimeBytes:Take A MegaByte Out Of Crime! (305)592-9831 1:135/5.0) https://huggingface.com/estonto/fido-gpt • Generated text is not present in the training dataset • Quoting style correctly reproduced (including names abbreviations) • Names are often present in the training dataset => Overfitting on names due to low database size/variability

Slide 13

Slide 13 text

implementation Client Web App Cloud Server (GPU) Messenger App Client Web App Pre-generation http://soshnikov.com/art/fidoci

Slide 14

Slide 14 text

Further work • Alternative approach – generation of conversation based on dialogue between different conversational models with different personalities • Training models for other languages • Implementing user interaction through chat-bots

Slide 15

Slide 15 text

Questions? • Vasily Burov ([email protected]) • Dmitry Soshnikov ([email protected], http://soshnikov.com)