Kids Through world-leading AI and a global team of digital risk experts we deliver complete Trust and Safety for social brands, digital platforms, advertisers and kids from bad actors that exploit, extort, distress, offend and misinform.
Like rolling a dice to decide which way to walk Markov chains are a type of stochastic process, where the next state is dependent on the current state Stochastic Process
probability of transitions between them No matter how the process arrived at its present state, possible future states are fixed A state could be anything Sunny / Rainy
E.g for each word in a book, index that word, and store the next word as a possible future state Over time after training, a single word might have a number of different possible future states A trained model can be used to create output with similar characteristics to the training set Are trained by indexing training data
previous states Remembering more than just the last state ‘Strengthens’ prediction causing predictions to fit tighter to trained data Number of prior states encoded in chain is called the ‘Order’ of the model
to single points in a sequence Single words in a sentence: [‘the’], [‘red’], [‘is’], [‘the’], [‘best’] Remembering more than just the last state But states could hold multiple points in a sequence Multiple words in a sentence: [‘the’, ‘red’], [‘red’, ’is’], [‘is’, ’the’], [‘the’, ‘best’]
2nd Order: Current state is ‘is this’ What word might come next? In 2nd order chain, ‘is’ will almost certainly not be likely, but it may be very likely in the 1st order Low order chains can cause ‘flip flopping’ ‘is this is this is this is this is this is this is this[…]’ Markov chains can be useful in linguistic analysis
sentence of text input as training data Split input into words (tokenise) For every word: State = ‘key’ based on the word, and the previous n-1 words (where n is the model Order) Store the next word in the sequence against the key Key (state) Value (next) the quick, lazy quick brown brown fox fox jumps jumps over over the lazy dog dog - The quick brown fox jumps over the lazy dog
is a manuscript do him untrustworthy is an appropriate receptacle. •Not the night of people who have anybody else, and all that a chorus of the end. Life is to do? 2nd Order (bigrams) •Go confidently in the foot; C++ makes it easy to take over and surrender. •Artists who seek consolation in existing churches often pay for maturity. 3rd Order (trigrams) •Resistance is never the agent of change. You have to put time into it to build an audience. •Victory attained by violence is tantamount to a defeat, for it is not in earning as much as you please.
hares naro frd t tiaristh ale o ceyouthemy sigers gsanjoredmig firche sthak indutequg be. 2nd Order (bigrams) •If ust of meme ted? Why whing a hat but composecoment th alit th ting land move of Chimetive warehing. 3rd Order (trigrams) •Trust founts a poingry took only God, premen was are you with storied itself - shalf the timind it.
model // States are words represented as strings. var model = new StringMarkov(3); // Learn an array of text model.Learn(Quotes); // generate some new text model.Walk(5).Dump(); C# library that can be used to create Markov Chains without much fuss Can model any type of data as state (generic type) No restriction on chain order length
to represent state Model automatically splits phrases (e.g sentences) into unigrams (e.g words) Comes with a number of pre-built models for common use cases MarkovSharp Generic Models Allows developers to add their own if needed
new sentences where words are based on trained words SubstringMarkov: states are collections of individual characters (letters) •Creates new words similar to those in the training set language in sentences •Words created are not necessarily real words - based on letter probability distributions in training set: Betteem we is got but dignitive endity what by with is world, whethings, how lition Este nouce et êtréfique a exemeuvé entturtance magraire depuisans peilien ages de our la tre de bind le phratille SanfordMidiMarkov: states are collections of musical MIDI notes •Creates new midi files where notes and timings are based on trained MIDI music.
Markov chain •Use orders as training data The higher order the chain, the more sensible the suggestion baconBot.RespondsTo(”what should I get?”) .With(BreakfastModel.GetSuggestion());
known, with probabilities Using simple example for sun/rain on left: •Given Sunny: Prain = 25% and Psun = 75% •Given Rainy: Prain = 50% and Psun = 50% Often represented as a matrix: S R S R Given Chance
sunny? •We don’t care about weather tomorrow, only day after S R S R Given Chance There are two outcomes that satisfy sun in 2 days time given it is sunny today: [S, S] or [R, S] •Given [S], PSS = 0.75 x 0.75 •Given [S], PRS = 0.25 x 0.5
S R S R Given Chance •Given [S], PSS = 0.75 x 0.75 •Given [S], PRS = 0.25 x 0.5 Both are viable outcomes, so if today is sunny, probability it is sunny in 2 days: •(0.75 x 0.75) + (0.25 x 0.5) = 0.5625 + 0.125 = 0.6875, or about 68% General solution for a state n steps ahead:
settle into a steady state representation of the model •Probability for state 200 days in the future and 201 days in the future is the same •Steady state is no longer dependent on the initial state
language generation (subreddit simulator / dissociated press) Reverse cypher (code breaker) Algorithmic music composition (Max/MSP / Supercollider) Predictive text Bioinformatics: DNA sequence simulation or gene drift over time Economics: predictions for market value or share price Basis for speech recognition (hidden Markov models)