Luckily my audio recorders are slightly more portable than this!

I’ve been transcribing my interviews with participants over the summer, with all the interest and frustration which that brings with it. Particularly when your laptop breaks and you have to resort to using your kids’ laptop, on which half the keys don’t work and the other half require a good, hard thump before they engage. My wrists will never be the same again…

Transcription is unbelievably time-consuming! I read somewhere that it takes 6 hours to transcribe an hour of speech, I haven’t timed myself but it could well be correct. Add in quite a lot of background noise for some locations (grass cutters are the particular bane of my life) and some bits require going over again and again before you can make anything out. Having more than one participant can be a godsend here, because the occasional overlap can mean that you can work backwards and dah-dah! realise that what you thought the participant said was actually completely wrong, and you just barked up the wrong tree.

There are no real shortcuts to transcription, though there are a few tools you can use to make it a bit quicker. There is transcription software you can use which means you can keep your text in the same screen as the audio controls. I’ve just been using the function keys on my keyboard though (I never noticed the play/pause/stop buttons before!) Some people say you can slow down the speech, but that just sounds like it would er, sound weird. It’s difficult enough understanding it at normal speed let alone under water. If you have serious amounts of transcription to do you’d be well off getting something like a foot pedal to make stopping and starting the audio easier whilst keeping your hands free.


Enough logistics. I found a couple of really interesting articles about some of the theoretical considerations behind transcription, which I think are worth paying attention to. The first article (Davidson, 2009) reviews the transcription literature, with the take-home message that researchers need to be explicit about transcription, that it is not simply a matter of typing out what is heard. Davidson argues that transcription is ‘selective and partial…representative and…interpretive’, and that there are big differences in how academic disciplines understand and discuss transcription issues.  Amongst other interesting papers, Bailey (2008) not only discusses transcription choices, but gives some enlightening examples of how transcription shapes interpretations. Finally, Oliver, Serovich and Mason (2005) also give some useful examples of transcription decisions and their relationship to theory and analysis, looking at the needs of different analytical perspectives for different types of transcription. They argue for a need for reflection on transcription choices, both in research design and also in analysis.  All these papers are useful if you are interested in this area, though there are bound to be many other good articles on the subject if you want to dig deeper -these are just some I found using a quick skim of the literature.

Using some of these concepts from the literature, I’m now going to look at how this relates to my transcription choices and reflect a bit on this.  For example, if you are carrying out a simple thematic analysis you require less exquisite detail than if you are doing a full monty conversation analysis, where the micro-exchanges of talk are examined. The more detail (length of pauses, tone of voice, tiny bits of notreallyspeech (um, er) and weird noises) you include, the more it breaks up the flow of your transcript, which is after all what you are using for your analysis. Well, der, because otherwise you could skip the transcript and just code straight from the audio! BUT, if the pauses or the tone of voice etc convey meaning which is important for your analysis, then yeah, you better put them all in. Good luck with that…Oliver, Serovich and Mason (2005) also make the point that you could work with different transcriptions of the same material for different purposes, with the specific type of transcription matching the research question and the analytical perspective.

I’m trying to relate all this to my own transcriptions right now. I sort of launched into transcribing without thinking too hard about what level of detail I needed, not having had transcription decisions covered during any of the qualitative analysis courses I’ve done over the years. (Aside: this is actually pretty weird! In three different universities only the practicalities of transcription were covered, not the significance of transcription choices in light of methodology. Does this say something about the way psychology as a discipline ignores such issues?) Plus at first I was just too busy trying to work out what the hell people were saying! The more I have worked with the audio, however, the more I have reflected on what I need to keep and what can be safely discarded for the purposes of my research questions. I’m not actually sure this is something you can decide too strictly in advance, but at least if you are aware of the issues then you know what kinds of decisions you are making, and justify these.

Tone of voice can be very important if it implies the person is being sarcastic, particularly if followed by a laugh to show that the opposite is in fact meant. If a participant is asked how they are feeling and they say ‘oh yeah, I feel great’ that means they feel good, yeah? And what if they say ‘oh yeah, I feel greeeeeeeaaaaaat!’ and then laugh? Probably the opposite? That’s probably the interpretation you’d make listening to the audio, so it seems sensible to use it as the basis for your transcription, maybe using something like (sarcastic tone of voice) or something to show that it should not be taken at face value. The thing is though, you are already interpreting the words and sounds the participant is making by listening to a recording. Sometimes you might have information from being there which completely changes this interpretation, such as body language, or the presence of something specific in the environment, or the action of someone nearby. Without this, you are interpreting using the audio alone (which may be incomplete. Or drowned out by grass-cutting, grrrr). Then you take the interpretation a step further by stripping out additional information in order to turn it into a transcript for reading. In doing so, you are necessarily editing and shaping the transcript for your purposes (to obtain a readable text for analysis). So you should think about what that purpose is if you want to understand how you have shaped this transcript, focusing on some aspects and ignoring or suppressing others.

In my case, I decided after a few transcriptions that I wasn’t that interested in the length of pauses, being satisfied with notation for a short pause and a longer one. For very long periods of people not talking I would give a time frame for this, with any background noises going on. I decided, however, that tone of voice could be important in situations such as the one described above, or in patches of speech where high emotion is being portrayed. Since my research question is how people are feeling during exercise, it seems stupid to ignore information in someone’s voice representing to others how they are feeling (note, I’m not claiming to know how they are feeling, just how they are representing this). Similarly, I found that non-speech noises conveyed affect-laden information, which cannot be discarded if you are trying to understand subjective experiences of participants. Some were general sounds heard in everyday conversation: laughing, shouting, whooping, sighing. And some seemed exercise-specific. Panting, heavy breathing (OK, maybe not ONLY specific to exercise…) and other odd noises which I have found extremely hard to transcribe but which seem important in conveying information about how the participant is suffering/struggling/coping/enjoying the exercise. You could probably do an entire analysis based on just these noises! I don’t intend to, but I do intend to consider them in my analysis, as they are unique to this context and also informative. I’m not quite sure yet HOW they are informative, I need to think about this a bit more.

One situation I have noticed occurring a few times now is when a participant gives quite a positive rating for how they are feeling, but their ‘affect noises’ give a different story. The participant says they are feeling fine, no problem, yet they are breathing heavily and making sighing noises or sudden expulsions of air suggesting effort or frustration. Without wanting to say that either is the ‘true’ picture of how the person is feeling, it might mean that the person does not wish to share their negative feelings with the researcher due to embarrassment or a wish for privacy. Or, it might mean that the person is suffering and struggling with the exercise but that they still ‘feel good’ because they are determined to finish and this determination feels good. They may be proud of what they have achieved so far and this feeling of pride outweighs physically unpleasant symptoms, leading to an overall positive evaluation. Ekkekakis’s ‘dual mode’ theory of exercise-related affect (discussed in an earlier blog) suggests this could certainly be the case.

This is an early stage in my transcriptions, but I’ve found it very interesting and useful to reflect on issues surrounding transcription and choices which are made regarding transcription.  My next step is to make a more structured list or protocol of my transcription choices, both to document some of these and also to ensure consistency across transcriptions. I’m sure this will evolve as I do more transcribing and reflect further on how transcription will best match my research questions, but so far I’ve found it a great reflexive exercise already, which will hopefully make for better, more rigorous research.




