The goal of this project was to build a random generator of sentences based on a Markov chain.

A Markov chain is a stochastic model describing a sequence of possibile event in which the probability of each event depends only on the state attained at the previous step. As such, a Markov chain has a *set of states* and a matric of transition probabilities **P** where *p _{ij}* is the probability of the chain transitioning from state

Given a text:

- The set of possible states is the vocabulary in the document.
- The probability of word
_{i}is equal to the number of occurrences of word_{i}in the text, divided by the length. - The probability of word
_{j}following word_{i}is equal to the number of times word_{j}occur after word_{i}, divided by the frequency of word_{i}in the document.

The algorithm for the generator is:

- Identify sentences in the document by separating them with
*<eos>*. - Compute the empirical probability distribution of the words in the text and the transition matrix.
- Select the first world of the sentence according to the empirical distribution of the vocabulary in the text.
- At each step, select the next word with according to the conditional probability given the previous word.
- Repeat 4 until
*<eos>*is selected.

©2017 Serena Peruzzo | Template by Bootstrapious.com & ported to Hugo by Kishan B