Random name generator

This random name generator tries to create humanly-pronouncable names, by imitating the frequency of bigrams in a dictionnary: in any language, each letter has a certain probability of being followed by a given letter in the alphabet. For example, in English, the letter Q is almost always followed by a U, and a C is almost never followed by a B. With the "follow-frequencies" attached to each letter, we are able to create words that look as if they could be part of a language.

In my alphabet, I have added a space, that determines the end of a word. For the first letter of each word, I look to see how frequent each letter is behind a space, and I declare that when a space pops up inside my letter-chain, it is the end of my word. If I didn't do that, we could potentially have a random sentence generator, however since my frequencies are extracted from a dictionnary (just the words, not their definitions), the frequencies of each letter in the sentence might be slightly biased. To correct that, I would base my frequency analysis on articles, books and things that contain real sentences: to imitate sentences, you should analyse sentences. Inside a sentence, you may find multiple occurrences of the word "the", whereas in a dictionnary you'll find it only once. Adding to that, I removed words that were less than 4 characters long from my dictionnary, which wouldn't happen in a sentence.

I believe what I did is very similar to Markov Chains (if not identical?). In the future, I would be glad to make a program with the same functionality, only based on recurrent neural networks: with exactly the same dictionnary I used to draw the statistics used by my Markov Chain model, I would train the neural network, asking for any given letter in the dictionnary which one follows. I believe I should shuffle the dictionnary in between each training run for this to be effective, otherwise the network might become convinced that each word starts with X, Y or Z because that's what it remembers best. Then I would run it, giving it a space to start with and asking what letter follows, and doing the same for each letter it outputs. Again, whenever the network outputs a space, I would stop it, and print the resulting string as a "random name".

A very nice resource for learning about recurrent neural networks can be found here: karpathy.github.io/2015/05/21/rnn-effectiveness.

Generate new