6.5 Euclidean distances

6.5.1 Vowel-space expansion

A number of studies in the last fifty years have been concerned with phonetic vowel reduction that is with the changes in vowel quality brought about by segmental, prosodic, and situational contexts. In Lindblom’s (1990) hyper- and hypoarticulation theory, speech production varies along a continuum from clear to less clear speech. According to this theory, speakers make as much effort to speak clearly as is required by the listener for understanding what is being said. Thus, the first time that a person’s name is mentioned, the production is likely to be clear because this is largely unpredictable information for the listener; but subsequent productions of the same name in an ongoing dialogue are likely to be less clear, because the listener can more easily predict its occurrence from context (Fowler & Housum, 1987).

A vowel that is spoken less clearly tends to be reduced which means that there is a deviation from its position in an acoustic space relative to a clear or citation-form production. The deviation is often manifested as centralisation, in which the vowel is produced nearer to the centre of the speaker’s vowel space than in clear speech. Equivalently, in clear speech there is an expansion of the vowel space. There is articulatory evidence for this type of vowel space expansion when vowels occur in prosodically accented words, often because these tend to be points of information focus that is at points of the utterance that are especially important for understanding what is being said (de Jong, 1995; Harrington, Fletcher & Beckman, 2000).

One of the ways of quantifying vowel space expansion is to measure the Euclidean or straight line distance between a vowel and the centre of the vowel space. Wright (2003) used just such a measure to compare so called easy and hard words on their distances to the centre of the vowel space. Easy words were those that have high lexical frequency (i.e., occur often) and low neighborhood density (there are few words that are phonemically similar). Since such words tend to be easier for the listener to understand, then, applying Lindblom’s (1990) model, the vowels should be more centralised compared with hard words which are both infrequent and high in neighborhood density.

Fig. 6.13. A 3-4-5 triangle. The length of the solid line is the Euclidean distance between the points (0, 0) and (3, 4). The dotted lines show the horizontal and vertical distances that are used for the Euclidean distance calculation.

In a two-dimensional space, the Euclidean distance is calculated by summing the square of the horizontal and vertical distances between the points and taking the square root. For example, the expressions in R for horizontal and vertical distances between the two points \((0, 0)\) and \((3,4)\) in Fig. 6.12 are \((0 – 3)^2\) and \((0 – 4)^2\) respectively. Thus the Euclidean distance between them is:

sqrt( (0 - 3)^2 + (0 - 4)^2 )

## [1] 5

Because of the nice way that vectors work in R, the same result is given by:

a = c(0, 0)
b = c(3, 4)
sqrt( sum( (a - b)^2 ))

## [1] 5

So a function to calculate the Euclidean distance between any two points a and b is:

euclid <- function(a, b)
{
# Function to calculate Euclidean distance between a and b; 
# a and b are vectors of the same length
sqrt(sum((a - b)^2))
}

In fact, this function works not just in a two-dimensional space, but in an n-dimensional space. So if there are two vowels, a and b, in a three-dimensional F1, F2, F3 space with coordinates for vowel a F1 = 500 Hz, F2 = 1500 Hz, F3 = 2500 Hz and for vowel b F1 = 220 Hz, F2 = 2400 Hz, F3 = 3000 Hz, then the straight line, Euclidean distance between a and b is just over 1066 Hz as follows:

a = c(500, 1500, 2500)
b = c(220, 2400, 3000)
euclid(a, b)

## [1] 1066.958

Exactly the same principle and hence the same function works in 4, 5, …n dimensional spaces even though any space higher than three dimensions cannot be seen or drawn. The only obligation on the function is that the vectors should be of the same length. The function can be made to break giving an error message, if the user should try to do otherwise:

euclid <- function(a, b)
{
# Function to calculate Euclidean distance between a and b; 
# a and b are vectors of the same length
if(length(a) != length(b))
stop("a and b must be of the same length")
sqrt(sum((a - b)^2))
}

a = c(3, 4)
b = c(10, 1, 2)
euclid(a, b)
## Error in euclid(a, b) : a and b must be of the same length

For the present task of assessing vowel space expansion, the distance of all the vowel tokens to the centre of the space will have to be measured. For illustrative purposes, a comparison will be made between the male and female speakers on the lax vowel data considered so far, although in practice, this technique is more likely to be used to compare vowels in easy and hard words or in accented and unaccented words as described earlier. The question we are asking is: is there any evidence that the lax vowels of the female speaker are more expanded, that is more distant from the centre of the vowel space than those of the male speaker in Figs. 6.9 and 6.10? A glance at Fig. 6.10 in particular must surely suggest that the answer to this question is ‘yes’ and indeed, the greater area of the polygon for the female speaker partly comes about because of the female’s higher F1 and F2 values.

In order to quantify these differences, a single point that is at the centre of the speaker’s vowel space, known as the centroid, has to be defined. This could be taken across a much larger sample of the speaker’s vowels than are available in these data sets: for the present, it will be taken to be the mean across all of the speaker’s lax vowels. For the male and female speaker these are:

## This first code is taken from 08_Analysis-of-Formant-data.Rmd
library(emuR)

#change this to your path
path2kielread = "/Users/reubold/kielread/kielread_emuDB/"
# load emuDB into current R session
kielread = load_emuDB(path2kielread, verbose = FALSE)
summary(kielread)

## Name:     kielread 
## UUID:     08b534b5-916a-4877-93aa-4f6b641995a1 
## Directory:    /Users/reubold/kielread/kielread_emuDB 
## Session count: 1 
## Bundle count: 200 
## Annotation item count:  12294 
## Label count:  32978 
## Link count:  11271 
## 
## Database configuration:
## 
## SSFF track definitions:
##       name columnName fileExtension
## 1 FORMANTS         fm           fms
## 
## Level definitions:
##       name    type nrOfAttrDefs                             attrDefNames
## 1     Word    ITEM            2                              Word; Func;
## 2 Syllable    ITEM            1                                Syllable;
## 3  Kanonic    ITEM            2                       Kanonic; SinfoKan;
## 4 Phonetic SEGMENT            4 Phonetic; Autoseg; SinfoPhon; LexAccent;
## 
## Link definitions:
##           type superlevelName sublevelName
## 1  ONE_TO_MANY           Word     Syllable
## 2  ONE_TO_MANY       Syllable      Kanonic
## 3 MANY_TO_MANY        Kanonic     Phonetic

vowlaxn = query(emuDBhandle = kielread,
                query = "Kanonic== a | E | I | O")
vowlaxn.fdat = get_trackdata(kielread,
                        seglist = vowlaxn,
                        ssffTrackName =  "FORMANTS",
                        resultType="emuRtrackdata",
                        verbose = FALSE)
vowlaxn.l = label(vowlaxn)
#bring all tracks to the same length (here: 21 samples, or any other uneven number of samples)
vowlaxn.fdat_norm = normalize_length(vowlaxn.fdat, N = 21)
#extract the relative time point 0.5
vowlaxn.fdat_norm.5 = vowlaxn.fdat_norm[vowlaxn.fdat_norm$times_norm==0.5,]

#label of left context
vowlaxn.fdat_norm.5$leftlabels = label(requery_seq(kielread,vowlaxn,offset=-1,calcTimes = FALSE))
#label of right context
vowlaxn.fdat_norm.5$rightlabels = label(requery_seq(kielread,vowlaxn,offset=1,calcTimes = FALSE))
#extract speaker labels (=extract the second and third label of $bundle)
vowlaxn.fdat_norm.5$spkr = substr(vowlaxn$bundle,2,3)
# get labels at the level "Word"
vowlaxn.fdat_norm.5$word = label(requery_hier(kielread,seglist = vowlaxn,level = "Word",calcTimes = FALSE))

#add these columns also to vowlaxn.fdat:
vowlaxn.fdat$leftlabels = vowlaxn.fdat$rightlabels = vowlaxn.fdat$spkr = vowlaxn.fdat$word = "something"
for (i in vowlaxn.fdat_norm.5$sl_rowIdx){
  vowlaxn.fdat[vowlaxn.fdat$sl_rowIdx==i,]$leftlabels=vowlaxn.fdat_norm.5[vowlaxn.fdat_norm.5$sl_rowIdx==i,]$leftlabels
  vowlaxn.fdat[vowlaxn.fdat$sl_rowIdx==i,]$rightlabels=vowlaxn.fdat_norm.5[vowlaxn.fdat_norm.5$sl_rowIdx==i,]$rightlabels
  vowlaxn.fdat[vowlaxn.fdat$sl_rowIdx==i,]$spkr=vowlaxn.fdat_norm.5[vowlaxn.fdat_norm.5$sl_rowIdx==i,]$spkr
  vowlaxn.fdat[vowlaxn.fdat$sl_rowIdx==i,]$word=vowlaxn.fdat_norm.5[vowlaxn.fdat_norm.5$sl_rowIdx==i,]$word
}

temp = vowlaxn.fdat_norm.5$spkr=="67"
m.av = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 2, mean)
m.av

##        T1        T2 
##  495.8366 1567.4390

f.av = apply(vowlaxn.fdat_norm.5[!temp,c("T1","T2")], 2, mean)
f.av

##        T1        T2 
##  534.7512 1964.4585

But there are good grounds for objecting to these means: in particular, the distribution of vowel tokens across the categories is not equal, as the following shows for the male speaker (the distribution is the same for the female speaker):

table(vowlaxn.l[temp])

## 
##  a  E  I  O 
## 63 41 85 16

In view of the relatively few back vowels, the centroids are likely to be biased towards the front of the vowel space. As an alternative, the centroids could be defined as the mean of the vowel means, which is the point that is at the centre of the polygons in Fig. 6.10. Recall that for the female speaker the mean position of all the vowels was given by:

temp = vowlaxn.fdat_norm.5$spkr=="67"
f = aggregate(cbind(T1,T2)~spkr+labels,data=vowlaxn.fdat_norm.5[!temp,],FUN=mean)
f

##   spkr labels       T1       T2
## 1   68      a 785.0952 1540.024
## 2   68      E 518.3171 2198.122
## 3   68      I 359.9294 2317.724
## 4   68      O 519.8750 1160.188

So the mean of these means is:

f.av = aggregate(cbind(T1,T2)~0,data=f,FUN=mean)
f.av

##         T1       T2
## 1 545.8042 1804.014

The centroid is shown in Fig. 6.14 and was produced as follows:

temp = vowlaxn.fdat_norm.5$spkr=="68"
f = vowlaxn.fdat_norm.5[temp,]
library(ggplot2)
# Ellipse plot with outliers
fpl <- ggplot(f) + 
            aes(y = T1, x  = T2, label=labels,color=labels) + 
            geom_text() + 
            scale_y_reverse() + scale_x_reverse() + 
            labs(x = "F2(Hz)", y = "F1(Hz)") +
            theme(legend.position="none")
fpl  + geom_text(data=cbind(f.av,labels="X"),size=20,color="black")

Fig. 6.14. Lax monophthongs in German for the female speaker in the F2 x F1 plane for data extracted at the vowels’ temporal midpoint. X is the centroid defined as the mean position of the same speaker’s mean across all tokens of the four vowel categories.

The Euclidean distances of each data point in Fig. 6.14 to X can be obtained by applying euclid() to the rows of the matrix using apply()with a second argument of 1 (meaning apply to rows):

temp = vowlaxn.fdat_norm.5$spkr=="68"
e.f = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 1, euclid, f.av)

The same technique as in 6.4 could be used to keep all the various objects that have something to do with lax vowels parallel to each other, as follows:

# Vector of zeros to store the results
edistances = rep(0, nrow(vowlaxn.fdat_norm.5))
# Logical vector to identify speaker 67
temp = vowlaxn.fdat_norm.5$spkr=="67"

# The next two commands give the male speaker's centroid analogous to f.av
m = aggregate(cbind(T1,T2)~spkr+labels,data=vowlaxn.fdat_norm.5[temp,],FUN=mean)
m.av = aggregate(cbind(T1,T2)~0,data=m,FUN=mean)

# Distances to the centroid for the male speaker
edistances[temp] = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 1, euclid, m.av)
# Distances to the centroid for the female speaker
edistances[!temp] = apply(vowlaxn.fdat_norm.5[!temp,c("T1","T2")], 1, euclid, f.av)

Since all the objects are parallel to each other, it only takes one line to produce a boxplot of the results comparing the Euclidean distances for the male and female speakers separately by vowel category (Fig 6.15):

edist.df = data.frame(edistances,spkr=vowlaxn.fdat_norm.5$spkr,labels=vowlaxn.fdat_norm.5$labels)
ggplot(edist.df) + 
  aes(y = edistances, x = spkr) + 
  geom_boxplot() +
  facet_grid(~labels)

Fig. 6.15: Boxplots of Euclidean distances to the centroid (Hz) for speaker 67 (male) and speaker 68 (female) for four lax vowel categories in German.

Figure 6.15 confirms what was suspected: the Euclidean distances are greater on every vowel category for the female speaker.

6.5.2 Relative distance between vowel categories

In dialectology and studies of sound change, there is often a need to compare the relative position of two vowel categories in a formant space. The sound change can sometimes be linked to age and social class, as the various pioneering studies by Labov (1994, 2001) have shown. It might be hypothesised that a vowel is in the process of fronting or raising: for example, the vowel in who’d in the standard accent of English has fronted in the last fifty years (Harrington et al, 2008; Hawkins & Midgley, 2005), there has been a substantial rearrangement of the front lax vowels in New Zealand English (Maclagen & Hay, 2007), and there is extensive evidence in Labov (1994, 2001) of numerous diachronic changes to North American vowels.

Vowels are often compared across two different age groups so that if there is a vowel change in progress, the position of the vowel in the older and younger groups might be different (this type of study is known as an apparent time study: see e.g., Bailey et al, 1991). Of course independently of sound change, studies comparing different dialects might seek to provide quantitative evidence for the relative differences in vowel positions: whether, for example, the vowel in Australian English head is higher and/or fronter than that of Standard Southern British English.

There are a number of ways of providing quantitative data of this kind. The one to be illustrated here is concerned with determining whether the position of a vowel in relation to other vowels is different in one set of data compared with another. I used just this technique (Harrington, 2006) to assess whether the long, final lax vowel in words like city, plenty, ready, was relatively closer to the tense vowel in [i] (heed) than in the lax vowel in [ɪ] (hid) in the more recent Christmas messages broadcast by Queen Elizabeth II over a fifty year period.

For illustrative purposes, the analysis will again make use of the lax vowel data. Fig. 6.10 suggests that [ɛ] is closer to [ɪ] than it is to [a] in the female than in the male speaker. Perhaps this is a sound change in progress, perhaps the female subject does not speak exactly the same variety as the male speaker; or perhaps it has something to do with differences between the speakers along the hyper- and hypoarticulation continuum, or perhaps it is an artefact of anatomical differences in the vocal tract between the male and female speaker. Whatever the reasons, it is just this sort of problem that can arise in sociophonetics in dealing with gradual and incremental sound change.

The way of addressing this issue based on Harrington (2006) is to work out two Euclidean distances: d1, the distance of all of the [ɛ] tokens to the centroid of [ɪ]; and d2, the distance of all of the same [ɛ] tokens to the centroid of [a]. The ratio of these two distances, \(d_1/d_2\) is indicative of how close (in terms of Euclidean distances) the [ɛ] tokens are to [ɪ] in relation to [a].The logarithm of this ratio, which will be termed \(E_{RATIO}\), gives the same information but in a more convenient form. More specifically, since

\[E_{RATIO} = log(\frac{d_1}{d_2})\] \[= log(d_1) – log(d_2)\]

The following three relationships hold for any single token of [ɛ]:

1. if an [ɛ] token is exactly equidistant between the [ɪ] and [a] centroids, then \(log(d_1) = log(d_2)\), and so \(E_{RATIO}\) is zero.
1. if an [ɛ] token is closer to the centroid of [ɪ], then \(log(d_1) < log(d_2)\) and so \(E_{RATIO}\) is negative.
1. if an [ɛ] token is closer to [a] than to [ɪ], \(log(d_1) > log(d_2)\) and so \(E_{RATIO}\) is positive.

The hypothesis to be tested is that the female speaker’s [ɛ] vowels are closer to her [ɪ] than to her [a] vowels compared with those for the male speaker. If so, then the female speaker’s \(E_{RATIO}\) should be smaller than that for the male speaker. The Euclidean distance calculations will be carried out as before in the F2 x F1 vowel space using the euclid() function written in 6.5.1. Here are the commands for the female speaker:

# Next two lines calculate  the centroid of female [ɪ]
temp = vowlaxn.fdat_norm.5$spkr == "68" & vowlaxn.fdat_norm.5$labels=="I"
mean.I.f = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 2, mean)

# Next two lines calculate  the centroid of female [a]
temp = vowlaxn.fdat_norm.5$spkr == "68" & vowlaxn.fdat_norm.5$labels=="a"
mean.a.f = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 2, mean)

# Logical vector to identify all the female speaker's [ɛ] vowels
temp = vowlaxn.fdat_norm.5$spkr == "68" & vowlaxn.fdat_norm.5$labels=="E"

# This is d1 above i.e., the distance of [ɛ] tokens to [ɪ] centroid
etoI.f = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 1, euclid, mean.I.f)

# This is d2 above i.e., the distance of [ɛ] tokens to [a] centroid
etoa.f = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 1, euclid, mean.a.f)

# ERATIO for the female speaker
ratio.log.f = log(etoI.f/etoa.f)

Exactly the same instructions can be carried out for the male speaker except that 68 should be replaced with 67 throughout in the above instructions. For the final line for the male speaker, ratio.log.m is used to store the male speaker’s \(E_{RATIO}\) values.

# Next two lines calculate  the centroid of male [ɪ]
temp = vowlaxn.fdat_norm.5$spkr == "67" & vowlaxn.fdat_norm.5$labels=="I"
mean.I.m = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 2, mean)

# Next two lines calculate  the centroid of male [a]
temp = vowlaxn.fdat_norm.5$spkr == "67" & vowlaxn.fdat_norm.5$labels=="a"
mean.a.m = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 2, mean)

# Logical vector to identify all the male speaker's [ɛ] vowels
temp = vowlaxn.fdat_norm.5$spkr == "67" & vowlaxn.fdat_norm.5$labels=="E"

# This is d1 above i.e., the distance of [ɛ] tokens to [ɪ] centroid
etoI.m = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 1, euclid, mean.I.m)

# This is d2 above i.e., the distance of [ɛ] tokens to [a] centroid
etoa.m = apply(vowlaxn.fdat_norm.5[temp,c("T1","T2")], 1, euclid, mean.a.m)

# ERATIO for the male speaker
ratio.log.m = log(etoI.m/etoa.m)

A histogram of the \(E_{RATIO}\) distributions for these two speakers can then be created as follows (Fig. 6.16):

ratio.log.df=data.frame(e_ratio=c(ratio.log.m,ratio.log.f),spkr=c(rep("67(m)",length(ratio.log.m)),rep("68(f)",length(ratio.log.f))))
ggplot(ratio.log.df) + 
  aes(e_ratio) + 
  geom_histogram(binwidth = 0.5) + 
  facet_wrap(~spkr)

Fig. 6.16: Histograms of the log. Euclidean distance ratios obtained from measuring the relative distance of [ɛ] tokens to the centroids of [ɪ] and [a] in the F1 x F2 space separately for a female (left) and a male (right) speaker.

It is clear enough that the \(E_{RATIO}\) values are smaller than those for the male speaker as a statistical test would confirm (assuming the data are normally distributed):

t.test(ratio.log.f, ratio.log.m)

## 
##  Welch Two Sample t-test
## 
## data:  ratio.log.f and ratio.log.m
## t = -6.0166, df = 78.689, p-value = 5.317e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.2234983 -0.6151758
## sample estimates:
##  mean of x  mean of y 
## -1.2318000 -0.3124629

So compared with the male speaker, the female speaker’s [ɛ] is relatively closer to [ɪ] in a formant space than it is to [a].

Mahalanobis distance

In 6.5.2, we have considered so far the Euclidean distance between any token of one category (e.g. /ɛ/) to the centroid (i.e. mean) of another category (e.g. /ɪ/). However, this does not take into account that the distribution of the /ɪ/ vowels is unlikely to form a perfect circle. Indded, /ɪ/ are distributed in such a way, so that they form a rather ‘flat’ ellipse:

fpl + stat_ellipse()

Why is this a problem? Suppose, there were two vowel tokens, \(a\) and \(b\), for which a distance to /O/ should be calculated.

temp = vowlaxn.fdat_norm.5$spkr=="68" & vowlaxn.fdat_norm.5$labels=="O"
o = vowlaxn.fdat_norm.5[temp,]
omean = aggregate(cbind(T1,T2)~0,data=vowlaxn.fdat_norm.5[temp,],FUN=mean)
library(ggplot2)
# Ellipse plot with outliers
opl <- ggplot(o) + 
            aes(y = T1, x  = T2, label=labels,color=labels) + 
            geom_text() + 
            stat_ellipse() +
            scale_y_reverse() + scale_x_reverse() + 
            labs(x = "F2(Hz)", y = "F1(Hz)") +
            theme(legend.position="none")
opl  + 
  geom_text(data=data.frame(omean,labels="O"),size=10,color="black") +
  geom_text(data=data.frame(T1=400,T2=1200,labels="a"),size=10,color="black") +
  geom_text(data=data.frame(T1=600,T2=1500,labels="b"),size=10,color="black")

Obviously, \(a\) is closer to the centroid of /O/, and \(b\) is further away. But does that mean, that \(a\) is more likely than \(b\) to be like /O/, i.e. that it belongs to the /O/-category? This is doubtful, given that \(a\) is outside the 95% confidence interval of the distribution of /O/, but \(b\) is within this distribution.

Problems like these can be solved by the Mahalanobis distance, which is “a measure of the distance between a point P and a distribution D, introduced by P. C. Mahalanobis in 1936. It is a multi-dimensional generalization of the idea of measuring how many standard deviations away P is from the mean of D. This distance is zero if P is at the mean of D, and grows as P moves away from the mean: along each principal component axis, it measures the number of standard deviations from P to the mean of D. If each of these axes is rescaled to have unit variance, then Mahalanobis distance corresponds to standard Euclidean distance in the transformed space. Mahalanobis distance is thus unitless and scale-invariant, and takes into account the correlations of the data set.” (see https://en.wikipedia.org/wiki/Mahalanobis_distance).

See also http://www.statistics4u.info/fundstat_germ/ee_mahalanobis_distance.html for a very intuitively accessible explanation.

In order to normalize for the effects of the distribution, the Mahalanobis distance uses the covariance matrix, which is given in R by the function cov. Instead of our aforementioned function euclid(), we can use the function mahalanobis(). As we mentioned earlier, the Euclidean distance of \(a\) to the centroid of the distribution of /O/ should be smaller than the Euclidean distance of \(b\) to the /O/ centroid:

a=c(400,1200)
b=c(600,1500)
#center of /O/ and the covariance matrix of the /O/ category
o_center=as.vector(apply(o[,c("T1","T2")],2,mean))
o_cov=cov(o[,c("T1","T2")])

euclid(o_center,a)

## [1] 126.3133

euclid(o_center,b)

## [1] 349.1311

In the case of the Mahalanobis distance, which takes the distribution of /O/ into account, \(b\) is closer to /O/ than \(a\):

mahalanobis(o_center,a,o_cov)

## [1] 6.814902

mahalanobis(o_center,b,o_cov)

## [1] 3.311878

Analysis of formants and formant transitions (Euclidean (and Mahalanobis) distances)

6.5 Euclidean distances

6.5.1 Vowel-space expansion

6.5.2 Relative distance between vowel categories

Mahalanobis distance