A few years back I did a bit of dance music related data visualization over at Lazily Evaluated. My favourite was an analysis of clubs and their lineups using Resident Advisor / RA data, I called it Clubster Analysis. I always wanted to dig into the technical aspects of gathering the data, analyzing it and building the charts and graphs to tell a story and give people insight. With this blog I now have the right venue for that kind of tech talk, so here goes.
Data gathering #
To visualize data, first you have to get some! For this purpose I wrote a little scraper in Python. I used Beautiful Soup to parse the html and grab the bits and pieces I was interested in.
My scraping of a few thousand pages didn’t cause considerable load on the RA servers. But in the age of overzealous AI scrapers it’s worth being polite, so I throttled according to their robots.txt. I also maintained a local cache of html files I had already downloaded, so that I wouldn’t have fetch the same data repeatedly (past lineups are unlikely to change after the fact) just because I discovered some bug or error in my parsing.
The order I scraped in was:
Get the 20 most popular regions in RA (and then I dropped “Streamland” which was a pandemic era pseudo-region)
Fetch the most popular clubs and some related metadata for all of those regions.
For each club, get the lineups for every 2019 event of theirs (the last full year before the pandemic started).
Save the results to csv files
Clean up, verification and Analysis #
... continue reading