FidoNet and Generative AI: A New Approach to Museumification of Historical Content Resources

FIDONET Cybernetic immortality FidoNet and Generative AI: A New Approach
to Museumification of Historical Content Resources Vasiliy Burov, Dmitry Soshnikov

Reality history museum visitors

Concept

FIDOneT, BBS, etc. • Founded 1984 by Tom Jennings •
Very popular in 1990s • Specific culture & communities • Writing style, quoting

Fidonet popularity • HUMOR • HUMOR.FILTERED • STARWARS • ZX.SPECTRUM
• FIDONET.HISTORY

MUSEIFICATION of Traditional content • Museums have learned to present
traditional content as physical artifacts • and make replicas from ancient books to give them the opportunity to leaf through them • But this is only static content…

MUSEIFICATION of digital content Sites / Documents / Programs User
generated content • FidoNet, Usenet, IRC – this is not only the frozen content itself, but also the style and dynamics… How can we recreate it?

Goal: Fidonet cybernetic immortality • Train a language model capable
of generating an infinite stream of potential FIDONet messages • What does this show? • Spirit of FIDONet being still alive • The idea of Cybernetic Immortality and that some form is already possible • Problems • Obtaining the dataset • Choosing and training base language model • Making an exhibit

DATASET Source Years Size (original) Size (cleaned) Fido7 Usenet Archives
2013-2015 16 Gb (compressed) - Private Archives (JAM) 2001-2004 100Mb 88Mb English Usenet fido group archives 1997-2002 1.7 Gb 0.8 Gb ExecPC BBS Archives (en) 1997-1999 500Mb 500Mb • Datasets are very difficult to find, due to different media available at a time • Google UseNet archives cannot be scraped • Not a single point of aggregation (separate echos on different BBS systems / backbones)

Base language Model selection Model Size Comment LSTM 100K -
1M Training from scratch GPT-2 124M – 774M – 1.5B ruGPT 3 117M – 760M – 1.3B ruGPT 3.5 13B LLaMA 7B+ 12 hours training 1 epoch Nvidia A100 80Gb GPU Compromise between training time, required dataset size to avoid overfitting

Results Topic: UFO From: RON TAYLOR To: JACK SARGEANT Subj:
UFOs JS> RT> You are certainly entitled to your opinion as an engineer or scientist, to JS> RT> the extent that you're entitled to an opinion with which you are JS> RT> absolutely convinced. However, if you are a skeptic, you shouldnt JS> RT> make your point clearly and firmly, not to "sell your ideas to me". JS>If you think of a UFO as merely a "flying disk" without an ET engineer behind the JS>scenes, you are free to believe that. I believe that the UFO is JS>real, but I am not content to just speculate about its nature. Because I'm not a skeptic and there are other people in this conference that are skeptic's for the most part... -Ron * QMPro 1.02 42-7029 * Why are there SO many atheists? Because God lets them. --- WILDMAIL!/WC v4.12 * Origin: CrimeBytes:Take A MegaByte Out Of Crime! (305)592-9831 1:135/5.0) https://huggingface.com/estonto/fido-gpt • Generated text is not present in the training dataset • Quoting style correctly reproduced (including names abbreviations) • Names are often present in the training dataset => Overfitting on names due to low database size/variability

implementation Client Web App Cloud Server (GPU) Messenger App Client
Web App Pre-generation http://soshnikov.com/art/fidoci

Further work • Alternative approach – generation of conversation based
on dialogue between different conversational models with different personalities • Training models for other languages • Implementing user interaction through chat-bots

Questions? • Vasily Burov ([email protected]) • Dmitry Soshnikov ([email protected], http://soshnikov.com)

FidoNet and Generative AI: A New Approach to Mu...

FidoNet and Generative AI: A New Approach to Museumification of Historical Content Resources

Dmitri Soshnikov

More Decks by Dmitri Soshnikov

Other Decks in Research

Featured

Transcript

FIDONET Cybernetic immortality FidoNet and Generative AI: A New Approach

Reality history museum visitors

Concept

Idea

FIDOneT, BBS, etc. • Founded 1984 by Tom Jennings •

Fidonet popularity • HUMOR • HUMOR.FILTERED • STARWARS • ZX.SPECTRUM

MUSEIFICATION of Traditional content • Museums have learned to present

MUSEIFICATION of digital content Sites / Documents / Programs User

Goal: Fidonet cybernetic immortality • Train a language model capable

DATASET Source Years Size (original) Size (cleaned) Fido7 Usenet Archives

Base language Model selection Model Size Comment LSTM 100K -

Results Topic: UFO From: RON TAYLOR To: JACK SARGEANT Subj:

implementation Client Web App Cloud Server (GPU) Messenger App Client

Further work • Alternative approach – generation of conversation based

Questions? • Vasily Burov ([email protected]) • Dmitry Soshnikov ([email protected], http://soshnikov.com)