The Sound of Data About Justin Bieber


There is a wealth of information describing how fans interact with an artist and their music online. It's becoming increasingly common that with tools such as Musicmetric this data can be visualised as charts like this:

Line drawing of Justin Bieber    Timeline graph of Justin Bieber fans added

For the second Music Hack Day in London (4/5 September 2010) I wanted to explore how this same information could be presented not as images, but instead as sounds. Could a piece of music be generated from data about an artist?


Whilst I'm a keen amateur musician, this was the first time since my old Sinclair Spectrum circa 1984 that I've used programming to generate sounds. As such I decided on the following structure to minimise having to learn a load of new languages and tools:

Additionally, I decided that I would use the source data to only generate the melodic content of the song and manually decide how the percussion should be played. The idea behind this was that the beats could help keep the other parts of the song together and not make it sound like an extended jazz drumming solo.

How to generate the notes?

The first part part I tried to implement was generating MIDI files from Python code. There are quite a few options available, but in the end I selected this project due to ease of use and installation.

The first step I took was to just generate a single track of random notes to ensure everything would fit together. To this I added a rhythm which I programmed manually. I tried varying the length of each note slightly to give a less robotic feel, but this seemed to result in a lot of timing issues (note-on times are specified using floating point numbers surprisingly).

The second iteration's refinement was to add a bassline which followed the generated "melody", by copying and playing for four quarter notes every fourth note from the melody. The third iteration limited the range of notes from a free choice of all twelve tones in each octave to those that belong to a single scale. I experimented with switching between different scales for different sections, but for the final version I opted to stick with just the C major scale.

Another decision was how the piece would be structured. I decided to use four datasets for the artist Justin Bieber: the 24 hour moving average of the hourly diff for comments made to the artist on social networks, track plays, fans added, and profile page views. If the value for one of these datasets increased over time, then the corresponding notes would increase in pitch. For the final piece I used data for the preceding 2,048 hours.

Each of these data sets would be played in parallel by different "instruments". Since the "fans added" dataset is relatively stable in value over time I used this to generate the baseline. Similarly, since "profile views" and "track plays" are generally closely correlated I chose these for the main melody and pitched these notes highest.

Regarding the structure of the piece over time, I tried generating a single block of notes from the source data for the whole duration of the song, but this sounded very mononotonous (epecially compared to the initial experiment where all note pitches were chosen randomly). Instead I chose to generate sets of 4 bar blocks from the source data, which could be brought in and out of the mix during the piece. This approach allows different dataset combinations to be compared and contrasted by the listener during the song. The structure of the final piece looked like this:

Sequencer view of the track

Tracks at the top of the graphic are higher in pitch than those lower down, with the percussion tracks right at the bottom. In the graphic you can see each track's row of notes which looks like a sparkline visualisation of the data points.

What should it sound like?

Initially I'd decided that I wanted to make a dance/electronic style track and so to some extent I knew the palette of sounds I wanted to work from. I thought that a more mechanical style would suit the way I was going to generate the note data. I decided to use Roland TR-606 and Casio SK-1 drum machine sounds, software simulations of simple analogue synthesizers and a few effects units to complete the mix.

I had initially hoped to incorporate sampled elements from tracks by Justin Bieber as part of the track I generated, but in the end this did not come to pass despite some comedic experimental results.

I didn't have time to explore a lot of ideas such as creating the percussion noises from scratch using filtered white noise from analogue synthesizers, and generating controller data (in addition to note data) to modulate the timbre of instruments (filter cutoff) might also have been interesting to try out.

Since the piece is generated from a set of notes, as opposed to being directly generated as an audio signal, there's nothing to stop the same composition being remixed/reimagined using different instruments (e.g. piano), but this would probably need more thought about how to add a groove or swing to the piece so that all the notes do not have the same durations in the same staccato style.

As mentioned earlier, the piece was realised using Propellerheads Reason software, which simulates a rack of electronic studio equipment in software. The front of the rack looked like this:

Rack setup in Reason (front)

and the reverse of the rack, showing the units wired together, looked like this:

Rack setup in Reason (reverse)


The final track below is hosted on Soundcloud:

The Sound of Data About Justin Bieber by clockdivider

Is it useful? Probably not, but it was a lot of fun for me to make and once everything was setup, quite easy to express ideas as Python code.