BBC World Service Audio Archive: Computer generated Metadata

I completed a 5 month academic internship with BBC R&D as part of my studies on the EPSRC funded Media & Arts Technology PhD programme at Queen Mary University of London. During my time with R&D, I contributed to the BBC’s exploration to improve the metadata of large audio archives and how users might be encouraged to help.


R&D have been working with the World Service to explore practical and cost effective methods for publishing their audio archive on the web. The tagging experiment is part of R&D’s research that seeks to create and improve programme metadata and aid discovery of audio archive content using a combination of editorial, algorithmic and crowd-sourcing techniques.

Drawing upon existing research on user contribution systems and collaborative tagging environments, we are exploring the motivations to validate existing tags attached to radio programmes.


In 2012 R&D teamed up with BBC Global Minds, a panel of 15,000 participants across the world, to run the experiment with an international audience. This was designed to study how users contribute to improving the accuracy of programmes tags. In the experiment the participants were invited to listen to one of five short radio clips and review a selection of tags that attempted to describe the content. The interface allowed people to ‘agree’ or ‘disagree’ with existing tags and add additional ones they felt were missing. The observed tagging behaviours and survey responses provided insights into what motivates users to engage in the content and help correct the programme information.

Alt text

Caption: Screenshot of the experiment ‘with context’ showing clip synopsis and tag explanatory rollovers.

The tags presented to the participants were a mixture of ‘user tags’ generated from a previous paper based experiment and ‘machine tags’ generated algorithmically using speech to text software and DBpedia. These were used to identify key concepts to describe programmes with little or no pre-existing metadata, the resultant tags ranged in their degree of accuracy.

We wanted to find out if contextual information was interesting or useful to the listener and how it might impact on their behaviour in the task, such as to inform their decision making when agreeing or disagreeing with the tags. To do this we divided participants into two testing conditions. Half of the participants were given contextual information e.g. clip synopsis and tag explanations on rollover to describe how and why the tag was attached to the clip. The other participants were given only the basic programme title and tags.


The experiment attracted 680 participants, these findings are a summation of the quantitive data and qualitative analysis from the survey responses.


We wanted to know whether offering explanations of why a tag was attached to a radio clip was helpful or interesting to the listener. Only half of the experiments revealed the explanations to the listener, enabling us to compare how it impacted on behaviour in the tagging task.

We found that when contextual information was presented to the users, they were 44% more likely to add new tags and 22% less likely to disagree with tags.

Individuals who found the contextual information helpful explained that it enabled them to speed up their decision-making and exposed the thought processes of others. Participants expressed a particular pleasure derived from disagreeing with inaccurate ‘machine tags’. However, others found the explanations irritating, as they infringed upon the autonomy of the task at hand.


While only 5% of participants stated that they enjoyed disagreeing with tags, it was the most dominant of all actions. Adding tags makes up only a small percentage of the actions, owing in part to the high number of tags presented to users, which other studies have shown leads to fewer new tags being added. We know that generally 1% of the online audiences are prepared to act as ‘content contributors’.


When questioned after each clip, 28% of users reported agreeing with tags as the most enjoyable action to take and a further 29% enjoyed a mixture of agreeing, disagreeing and adding tags.

Alt text


While only 5% of participants stated that they preferred disagreeing with tags, it was the most dominant action undertaken, while only 2% of participants added tags.

Alt text


From the responses gathered in the post experiment survey, the motivations to engage with the task reinforce the existing literature on motivations for online volunteerism, such as:

  • Altruism
  • Personal or professional interest
  • Intellectual stimulation
  • Social reward
  • Protection/enhancement of personal investment
  • An outlet for creative & independent self-expression

An important motivational factor expressed by many was the idea of contributing to a shared task that would improve navigation of BBC archives. Many participants were also driven by a curiousity about other people’s opinions, which seems to heighten engagement with the task. Others confessed that a pedantic nature led to satisfaction from participating in this type of task.

From the survey we found that the more interesting the clip, the less participants interacted with tags. This suggests that motivation to participate is not tethered to enjoyment.


The top ten tags for each of the five clips were calculated according to the percentage of users who agreed with a tag; as such, the top tags represent the tags people are most likely to agree with. Only 16% of these top tags were ‘machine tags’ with ‘user tags’ or ‘editorial tags’ coming out on top. Conversely the ‘machine tags’ accounted for 84% of most disagreed with tags.


By examining the tags that provoked the most extreme user split (in the region of a 50% split on agrees and disagrees), we can identify some common trends or properties and identify the types of tags that are more difficult to achieve consensus on, and perhaps explain why.

User generated tags accounted for 75% of the tags that most divided opinion. Two problems that arise, and which seem to cause mixed responses, relate to spelling errors and descriptive phrases, such as ‘silvio burlesconi’ and ‘painful memories’.

It would be useful to investigate if and how users respond differently to whole phrase versus single word tags, depending on their experience of tagging. Previous experience of tagging might be a significant factor, particularly in these contested tags. For example, ‘needs a whole new way of thinking’ achieved 37.5% agreement from users who voted.


When asked what additional functionality participants desire, they asked for more control within the system such as editing, deleting, categorising and rating tags, as well as methods to facilitate discussion within the tagging community. These findings are now being incorporated into the development roadmap of the World Service radio Archive prototype.