I've recently been working with Naomi Nevler and others from Penn's Frontotemporal Degeneration Center on quantifying the diverse effects in speech and language of various neurodegenerative conditions. As part of an effort to establish baselines, I turned to the English-language part of the "Fisher" datasets of conversational telephone speech (LDC2004S13, LDC2004T19, LDC2005S13, LDC2005T19), where we have basic demographic information for 11,971 speakers, including age and sex. These datasets comprise 11,699 short telephone conversations between strangers on assigned topics, or 23,398 conversational sides, with a total duration of 1,958.5 hours. The calls were recorded in 2003.
For this morning's Breakfast Experiment™, I took a look at age-related changes in pitch range, as quantified by quantiles of fundamental frequency (f0) estimates. We have time-aligned transcripts, so after pitch-tracking everything, I can extract the f0 estimates for each speaker, combine them across calls if the speaker was involved in more than one call, and calculate various simple statistics. Here are the median values for the 90th, 50th, and 10th percentile of f0 estimates by decade of age from 20s to 70s. Values for female speakers are in red, and for male speakers in blue:
Here's the same data presented as semitones relative to 55 Hz.:
The basic trend is clear: pitch polarization by sex decreases with age, with male pitch quantiles going up and female pitch quantiles going down. The effects are moderate in size, with female quantiles being 8-9 semitones above males for speakers in their 20s, whereas female speakers are about 5-6 semitones above males for speakers in their 70s. A 3 semitone change is equal to about 19% ((2^(1/12))^3 ≅ 1.189).
Here are the Female – Male differences, in semitones
20s 30s 40s 50s 60s 70s Q90 8.40 7.64 7.05 6.87 6.52 5.47 Q50 8.93 8.23 7.63 7.67 7.15 6.10 Q10 7.82 7.26 6.85 6.61 5.93 5.00
Of course, the summary graphs above hide a lot of individual variation. Here are all the individual data points plotted for the same three quantiles as a function of age, with lowess lines added:
The more extreme scattering is probably due to octave errors in the pitch tracking — I set the minimum F0 to 50 and the maximum to 500 for all speakers, which permits or even encourages period doubling and halving.
As usual with data that shows age grading, we might be looking at a life-cycle effect — the behavior of individuals changes as they get older, whether due to biology or to culture — or at a historical development — gender polarization decreases over the decades, but the behavior of older people continues to be influenced by the norms they grew up with. The fact that the age effect for females and males goes in opposite directions makes it unlikely that some simple physical explanation like loss of tissue elasticity will work, though change in hormone levels remains a possible story.
Further technical details: I used a variant of the get_f0 pitch tracker, based on David Talkin's RAPT algorithm, set to generate 200 estimates per second. Widely varying amounts of speech were available per speaker, ranging from around 21 seconds (4156 frames) to about 38 minutes (460,832 frames), with a median value of 9.9 minutes (119,169 frames).
Some relevant earlier posts:
"Nationality, gender, and pitch", 11/12/2007
"Mailbag: F0 in Japanese vs. English", 11/13/2007
"How about the Germans?", 11/14/2007
"Sexy baby vocal virus", 8/15/2013
"Biology, sex, culture, and pitch", 8/16/2013
Update — Anne Cutler sends in a link to a longitudinal study: Alison Russell, Lynda Penny, and Cecilia Pemberton. "Speaking fundamental frequency changes over time in women: a longitudinal study." Journal of Speech, Language, and Hearing Research 38, no. 1 (1995): 101-109. The abstract:
Archival recordings of the human voice are a relatively untapped resource for both longitudinal and cross-sectional research into the aging voice. Through the availability of collections of old sound recordings, speech pathologists and voice scientists have access to a wealth of data for research purposes. This article reports on the use of such archival data to examine the changes in speaking fundamental frequency (SFF) in a group of Australian women's voices over the past 50 years, and discusses the benefits and problems associated with using archival data. Recordings made in 1945 of women were compared with recordings of the same women made in 1993 to investigate the changes in SFF with age. The results demonstrate a significant lowering of SFF with age in this group of Australian women. The implications for the interpretation of cross-sectional data on the aging voice, the use of archival data in voice research, and the need for further research using archival data are discussed.
And here's their Figure 1, presenting the results:
The ages are comparable to the span in the Fisher data — in 1945 the women recorded were 18-25 years old, so in 1993 they would have been 66-73.
But the change that they report is much greater. The mean F0 for our 20-something women is 208.9 Hz, and for the 70-something women 190.1 Hz. Their "mean SFF" ("speech fundamental frequency") was 229.0 Hz for the 1945 recordings, and 181.2 Hz for the 1993 recordings.
What might explain these differences?
There are a few obvious candidates. The Australian women were reading sentences, not participating in conversations; and the 1945 recordings, dating from before the advent of tape recorders, were made on "acetate-coated steel discs". We're not told anything about the microphone placement and other recording configuration issues, but it's well known that physiological arousal, background noise level, and perceived interlocutor distance are associated with changes in vocal effort and thus fundamental frequency (see e.g. "Raising his voice", 10/8/2011;"Debate quantification: How MAD did he get?", 10/29/2016; "MLK day: Pitch range", 1/16/2017). In 1945, the experience of reading into a fancy microphone in a sound-treated booth connected to a high-tech steel disc recorder would have been novel — most likely the subjects had never been recorded before and had never even seen recording equipment — and the effect might well have been exciting enough to raise F0 by 10% or so.
There might also be relevant cultural differences — see e.g. "Nationality, gender, and pitch", 11/12/2007. And the very different methods of F0 estimation might also have some consequences.
But anyhow, the direction of the effect is the same. And there are some other longitudinal datasets (e.g. LDC2013S05) that would be worth consulting, some other morning…