Tech News
← Back to articles

I fed 24 years of my blog posts to a Markov model

read original related products more articles

I Fed 24 Years of My Blog Posts to a Markov Model

By Susam Pal on 13 Dec 2025

Yesterday I shared a little program called the Mark V. Shaney Junior at github.com/susam/mvs. It is a minimal implementation of a Markov text generator inspired by the legendary Mark V. Shaney program from the 1980s. If you don't know about Mark V. Shaney, read more about it on the Wikipedia article Mark V. Shaney.

It is a very small program with only about 30 lines of Python (see mvs.py) that favours simplicity over efficiency. As a hobby, I often engage in exploratory programming where I write computer programs not to solve a specific problem but simply to explore a particular idea or topic for the sole purpose of recreation. I must have written small programs to explore Markov chains for various kinds of state spaces over a dozen times by now. Every time, I just pick my last experimental code and edit it to encode the new state space I am exploring. That's usually my general approach to such one-off programs. I have hundreds of tiny little experimental programs lying on my disk at any given time.

Once in a while, I get the itch to take one of those exploratory programs, give it some finishing touches, wrap it up in a nice Git repo along with a README.md , CHANGES.md and the whole shebang and share it on github.com/susam and codeberg.org/susam. The Mark V. Shaney Junior program that I shared yesterday happened to be one such exercise.

If you scroll down the README of this project, you'll find some nice examples of the gibberish produced by this program. The first few examples there are the result of training the model on A Christmas Carol by Charles Dickens, one of my favourite authors. It is often said that Dickens never used fewer words when more would suffice. So I thought there couldn't be a better piece of text when it comes to testing out my tiny Markov model. I'll not reproduce the generated text examples here for the sake of brevity. If you are interested to take a look, just head over to the Gibberish section of the README.

Soon after sharing the project, I wondered what kind of gibberish it would produce if I fed all 24 years of my blog posts and pages into the program. Well, here's one of the results:

$ python3 mvs.py < susam.txt while a query replace operation is approved by the user. The above variable defines the build job. It can be incredibly useful while working on assembly language and machine code. In fact, all internal resources like the result to refine the search prompt changes from bck-i-search: to fwd-i-search:. Now type C-SPC (i.e. ctrl+space) to set a mark causes Emacs to use 32-bit registers like EBP, ESP, etc. Thus the behaviour is undefined. Such code may behave differently when compiled with the readily available GNU tools like the shape of 8. Flipping "P" horizontally makes it a proper quine: cat $0

This is the text that comes out after the program consumes over 200 posts consisting of about 200,000 words. My blog also has a comments section with over 500 comments consisting of about 40,000 words. All comments were excluded while training the model. Here is another output example:

$ python3 mvs.py < susam.txt enjoy asking "what happens if" and then type M-x zap-up-to-char RET b. The buffer for this specific video, the actual fare for 8.3 km and 11 are all written from scratch. No prior knowledge is expected to slow down in future. For now, I will add a statement like x = 0.999... , the inner corner square as discussed in the code segment into the REPL window. Unlike Slimv, Vlime can work with and debug executable files, it can be more convenient. M-x: Execute Extended Command The key sequence M-q invokes the command cat and type TAB to indent the

... continue reading