Once again, let us create a temporary EMU-SDMS-database first:

# load package
library(emuR)
# create demo data in directory
# provided by tempdir()
create_emuRdemoData(dir = tempdir())
# create path to demo database
path2ae = file.path(tempdir(), "emuR_demoData", "ae_emuDB")
# load database
ae = load_emuDB(path2ae, verbose = F)

We can use the summary() function to learn more about the given data-base.

summary(ae)
## Name:     ae 
## UUID:     0fc618dc-8980-414d-8c7a-144a649ce199 
## Directory:    /private/var/folders/vh/j2k1_0395x5_sgzpbl4bzzl00000gn/T/RtmpCDXHCZ/emuR_demoData/ae_emuDB 
## Session count: 1 
## Bundle count: 7 
## Annotation item count:  736 
## Label count:  844 
## Link count:  785 
## 
## Database configuration:
## 
## SSFF track definitions:
##   name columnName fileExtension
## 1  dft        dft           dft
## 2   fm         fm           fms
## 
## Level definitions:
##           name    type nrOfAttrDefs        attrDefNames
## 1    Utterance    ITEM            1          Utterance;
## 2 Intonational    ITEM            1       Intonational;
## 3 Intermediate    ITEM            1       Intermediate;
## 4         Word    ITEM            3 Word; Accent; Text;
## 5     Syllable    ITEM            1           Syllable;
## 6      Phoneme    ITEM            1            Phoneme;
## 7     Phonetic SEGMENT            1           Phonetic;
## 8         Tone   EVENT            1               Tone;
## 9         Foot    ITEM            1               Foot;
## 
## Link definitions:
##           type superlevelName sublevelName
## 1  ONE_TO_MANY      Utterance Intonational
## 2  ONE_TO_MANY   Intonational Intermediate
## 3  ONE_TO_MANY   Intermediate         Word
## 4  ONE_TO_MANY           Word     Syllable
## 5  ONE_TO_MANY       Syllable      Phoneme
## 6 MANY_TO_MANY        Phoneme     Phonetic
## 7  ONE_TO_MANY       Syllable         Tone
## 8  ONE_TO_MANY   Intonational         Foot
## 9  ONE_TO_MANY           Foot     Syllable

In the last chapter, we were interested in querying the rather complex hierarchy that can be seen at “Link definitions”. Today, we are interested in derived signals, i.e. in signals dervied from the signal, with which every utterance is associated with (this is usually an audio recording, but we could also have EMA-data, eletropalatographic data, and so on, possibly without any audio data at all). In an EMU-SMDS-data-base, derived data is stored in the so-called SSFF file format. SSFF stands for Simple Signal File Format. In the so-called SSFF track definitions, we can see two definitions, one for so-called dft-data, one for fm-data. To see these definitions, we could also have typed:

list_ssffTrackDefinitions(ae)
##   name columnName fileExtension
## 1  dft        dft           dft
## 2   fm         fm           fms

Be informed, that these data are signals derived from the audio data, and represent spectral analyses in the dft-tracks and calculated formants (and their bandwidths) in the “fm”-tracks. Keep in mind that every signal except audio data is stored in this format, so e.g. the EMA-tracks in the demo-data-bases ema or the electropalatographic data in the demo-data-base epgdorsal are stored in the SSFF format. See e.g. the aforementioned demo-data-bases in http://ips-lmu.github.io/EMU-webApp/ (open demo – ema/epgdorsal).

As the SSFF track definition informs us, these tracks are obviously saved in extra files, defined by their fileExtension, e.g. ‘dft’ or ‘fms’. If we open one of these files, we will notice, that we cannot see the data directly (because it is binary data), but we can read the header, e.g. in the case of the fms-files:

## SSFF -- (c) SHLRC
## Machine IBM-PC
## Record_Freq 200.0
## Start_Time 0.0025
## Column fm SHORT 4
## Column bw SHORT 4
## Original_Freq DOUBLE 20000.0
## -----------------

This means, that every fms-file contains four columns with formant values (recorded every 5 ms - as can be derived from the Record_Freq information) alongside with 4 columns which contain the bandwidths of the first four formants. We can read in these data into R with the following commands. First of all, we have to do a query, e.g. for the vowel /i:/, and then use the command get_trackdata():

# query loaded "ae" emuDB for all "i:" segments of the "Phonetic" level
sl = query(emuDBhandle = ae, 
           query = "Phonetic == i:")

# get the corresponding formant trackdata
ae_i_fm = get_trackdata(emuDBhandle = ae, 
                   seglist = sl, 
                   ssffTrackName = "fm")
# the following will fail, due to the missing SSFF track definition in our database
ae_i_bw = get_trackdata(emuDBhandle = ae, 
                   seglist = sl, 
                   ssffTrackName = "bw")

It will, however, not always be the case that we are served with pre-calculated derived data. Oftentimes we start with audio data (and accompanying segmentations) only. Recall e.g. the example from chapter 02:

demoDataDir = file.path(tempdir(), "emuR_demoData")
# create path to TextGrid collection
tgColDir = file.path(demoDataDir, "TextGrid_collection")
# convert TextGrid collection to the emuDB format
convert_TextGridCollection(dir = tgColDir,
                           dbName = "myFirst",
                           targetDir = tempdir(),
                           tierNames = c("Text", "Syllable","Phoneme", "Phonetic"))
# get path to emuDB called "myFirst"
# that was created by convert_TextGridCollection()
path2directory = file.path(tempdir(), "myFirst_emuDB")

# load emuDB into current R session
dbHandle = load_emuDB(path2directory, verbose = FALSE)
list_ssffTrackDefinitions(dbHandle)
## NULL

We cannot read in pre-calculated derived signals, because there are not any. We therefore have to calculate these. We can do this on-the-fly, or we can precalculate derived signals. In both cases, we will use the package wrassp. After calculating data with wrassp, we are able to read it into R with get_trackdata(). We will come back to this command later in this document.

The R package wrassp1

This passage gives an overview and introduction to the wrassp package. The wrassp package is a wrapper for R around Michel Scheffers’ libassp (Advanced Speech Signal Processor). The libassp library and therefore the wrassp package provide functionality for handling speech signales in most common audio formats and for performing signal analyses common in the phonetic and speech sciences. As such, wrassp fills a gap in the R package landscape as, to our knowledge, no previous packages provided this specialized functionality. The currently available signal processing functions provided by wrassp are:

Command Meaning
acfana() Analysis of short-term autocorrelation function
afdiff() Computes the first difference of the signal
affilter() Filters the audio signal (e.g., low-pass and high-pass)
cepstrum() Short-term cepstral analysis
cssSpectrum() Cepstral smoothed version of dftSpectrum()
dftSpectrum() Short-term DFT spectral analysis
forest() Formant estimation
ksvF0() F0 analysis of the signal
lpsSpectrum() Linear predictive smoothed version of dftSpectrum()
mhsF0() Pitch analysis of the speech signal using Michel Scheffers’ Modified Harmonic Sieve algorithm
rfcana() Linear prediction analysis
rmsana() Analysis of short-term Root Mean Square amplitude
zcrana() Analysis of the averages of the short-term positive and negative zero-crossing rates

The available file handling functions are:

Command Meaning
read.AsspDataObj() read an SSFF or audio file into an AsspDataObj, which is the in-memory equivalent of the SSFF or audio file.
write.AsspDataObj() write an AsspDataObj to file (usually SSFF or audio file formats).

See R’s help() function for a comprehensive list of every function and object provided by the wrassp package is required.

help(package="wrassp")
# create path to bundle in database
path2bndl = file.path(path2ae, "0000_ses", "msajc003_bndl")
# list files in bundle directory
list.files(path2bndl)
## [1] "msajc003_annot.json" "msajc003.dft"        "msajc003.fms"       
## [4] "msajc003.wav"

One of the aims of wrassp is to provide mechanisms for handling speech-related files such as audio files and derived and complementary signal files. To have an in-memory object that can hold these file types in a uniform way the wrassp package provides the AsspDataObj data type:

# load the wrassp package
library(wrassp)
# create path to wav file
path2wav <- file.path(path2bndl, "msajc003.wav")
# read audio file
au <- read.AsspDataObj(path2wav)
# show class
class(au)
## [1] "AsspDataObj"
## [1] "AsspDataObj"
# show print() output of object
print(au)
## Assp Data Object of file /var/folders/vh/j2k1_0395x5_sgzpbl4bzzl00000gn/T//RtmpCDXHCZ/emuR_demoData/ae_emuDB/0000_ses/msajc003_bndl/msajc003.wav.
## Format: WAVE (binary)
## 58089 records at 20000 Hz
## Duration: 2.904450 s
## Number of tracks: 1 
##   audio (1 fields)

The resulting au object is of the class AsspDataObj. The output of print provides additional information about the object, such as its sampling rate, duration, data type and data structure information. Since the file we loaded is audio only, the object contains exactly one track. Further, since it is a mono file, this track only has a single field.

We could plot the audio:

plot(seq(0,numRecs.AsspDataObj(au) - 1, 10)
/ rate.AsspDataObj(au),
au$audio[c(TRUE, rep(FALSE, 9))],
type = "l",
xlab = "time (s)",
ylab = "Audio samples (INT16)")

The export counterpart to read.AsspDataObj() function is write.AsspDataObj(). It is used to store in-memory AsspDataObj objects to disk and is particularly useful for converting other formats to or storing data in the SSFF file format. To show how this function can be used to write a slightly altered version of the au object to a file, The following initially multiplies all the sample values of au$audio by a factor of 0.5. The resulting AsspDataObj is then written to an audio file in a temporary directory provided by R’s tempdir() function.

# manipulate the audio samples
au$audio = au$audio * 0.5
# write to file in directory
# provided by tempdir()
write.AsspDataObj(au, file.path(tempdir(), 'newau.wav'))

Deriving signals by signal processing

This section will focus on demonstrating three of wrassp’s signal processing functions that calculate formant values, their corresponding bandwidths, the fundamental frequency contour and the RMS energy contour. Use wrasspOutputInfos in the following way to get more information aubout wrassp functions, e.g.:

# show output info of forest function
wrasspOutputInfos$forest
## $ext
## [1] "fms"
## 
## $tracks
## [1] "fm" "bw"
## 
## $outputType
## [1] "SSFF"

Formants and their bandwidths calculated by forest()

forest() is wrassp’s formant estimation function. The default behavior of this formant tracker is to calculate the first four formants and their bandwidths.

fmBwVals=forest(path2wav, toFile=F)
# show class vector
class(fmBwVals)
## [1] "AsspDataObj"
## [1] "AsspDataObj"
# show track names
tracks.AsspDataObj(fmBwVals)
## [1] "fm" "bw"
# check dimensions of tracks are the same
all(dim(fmBwVals$fm) == dim(fmBwVals$bw))
## [1] TRUE
# plot the formant values
matplot(seq(0, numRecs.AsspDataObj(fmBwVals) - 1)
/ rate.AsspDataObj(fmBwVals)
+ attr(fmBwVals, "startTime"),
fmBwVals$fm,
type = "l",
xlab = "time (s)",
ylab = "Formant frequency (Hz)")
# add legend
legend("topright",
legend = c("F1", "F2", "F3", "F4"),
col = 1:4,
lty = 1:4,
bg = "white")

Fundamental frequency contours

# calculate the fundamental frequency contour
ksvF0(path2wav)
## [1] 1
# create path to newly generated file
path2f0file = file.path(path2bndl,
paste0("msajc003.", wrasspOutputInfos$ksvF0$ext))
# read file from disk
f0vals = read.AsspDataObj(path2f0file)
# plot the fundamental frequency contour
plot(seq(0,numRecs.AsspDataObj(f0vals) - 1)
/ rate.AsspDataObj(f0vals) +
attr(f0vals, "startTime"),
f0vals$F0,
type = "l",
xlab = "time (s)",
ylab = "F0 frequency (Hz)")

RMS energy contour

The wrassp function for calculating the short-term Root Mean Square (RMS) amplitude of the signal is called rmsana(). As its usage is analogous to the above examples, here we will focus on using it to calculate the RMS values for all the audio files of the ae emuDB. The following example initially uses the list.files() function to aquire the file paths for every .wav file in the ae emuDB. As every signal processing function accepts one or multiple file paths, these file paths can simply be passed in as the main argument to the rmsana() function. As all of wrassp’s signal processing functions place their generated files in the same directory as the audio file they process, the rmsana() function will automatically place every .rms into the correct bundle directory.

# list all .wav files in the ae emuDB
paths2wavFiles = list.files(path2ae, pattern = "*.wav$",
recursive = TRUE, full.names = TRUE)
# calculate the RMS energy values for all .wav files
rmsana(paths2wavFiles,verbose=FALSE)
## [1] 7
# list new .rms files using
# wrasspOutputInfos->rmsana->ext
rmsFPs = list.files(path2ae,
pattern = paste0("*.",
wrasspOutputInfos$rmsana$ext),
recursive = TRUE,
full.names = TRUE)
# read first RMS file
rmsvals = read.AsspDataObj(rmsFPs[1])
# plot the RMS energy contour
plot(seq(0, numRecs.AsspDataObj(rmsvals) - 1)
/ rate.AsspDataObj(rmsvals)
+ attr(rmsvals, "startTime"),
rmsvals$rms,
type = "l",
xlab = "time (s)",
ylab = "RMS energy (dB)")

So, ae we can see, it is rather complicated to pre-calculate derived signals with the package wrassp only. However, emuR delivers much shorter ways. The long path shown here is still to be used, however, if one wishes to introduce speaker-group specific parameter values (e.g. different settings for gender and the like).

Using wrassp in the EMU-SDMS 2

In order to make the above created rms files readable for the EMU-SDMS system, we would have to define the rms track. In order to to so, do:

# add SSFF track defintion
# that references the .rms files
# calculated above 
ext = wrasspOutputInfos$rmsana$ext
colName = wrasspOutputInfos$rmsana$tracks[1]
add_ssffTrackDefinition(ae,
name = "rms",
fileExtension = ext,
columnName = colName)

From now on, rms is defined and therefore available for emuR - we can now use get_trackdata() to read in the rms-data.

add_ssffTrackDefinition() can, however, be used much more directly to precalculate derived signals, by means of the parameter onTheFlyFunctionName. The same parameter is available in the command get_trackdata() to calculate derived signals for a given seglist on-the-fly.

Extracting pre-defined tracks

To access data that are stored in files, the user has to define tracks for a database that point to sequences of samples in les that match a user-specified file extension. The user-defined name of such a track can then be used to reference the track in the signal data extraction process. Internally, emuR uses wrassp to read the appropriate files from disk, extract the sample sequences that match the result of a query and return values to the user for further inspection and evaluation.

# list currently available tracks
list_ssffTrackDefinitions(ae)
##   name columnName fileExtension
## 1  dft        dft           dft
## 2   fm         fm           fms
## 3  rms        rms           rms

In ae, there are three tracks available, that can be read by get_trackdata(). We could e.g.

# query all "ai" phonetic segments
ai_segs = query(ae, "Phonetic == ai")
# get "fm" track data for these segments
# Note that verbose is set to FALSE
# only to avoid a progress bar
# being printed in this document.
ai_td_fm = get_trackdata(emuDBhandle = ae,
                         seglist = ai_segs,
                         ssffTrackName = "fm",
                         verbose = FALSE)
# show summary of ai_td_fm
summary(ai_td_fm)
## Emu track data from 6 segments
## 
## Data is  4 dimensional from track fm 
## Mean data length is  30.5  samples

So, we needed an emuDBhandle, a seglist, and the correct ssffTrackName to read the formant values from the fms-files. Being able to access data that is stored in files is important for two main reasons.

Firstly, it is possible to generate files using external programs such as VoiceSauce (Shue et al., 2011), which can export its calculated output to the general purpose SSFF file format. This file mechanism is also used to access data produced by EMA, EPG or any other form of signal data recordings. Secondly, it is possible to track, save and access manipulated data such as formant values that have been manually corrected. It is also worth noting that the get trackdata() function has a predefined track which is always available without it having to be defined. The name of this track is MEDIAFILE SAMPLES which references the actual samples of the audio files of the database. The next example shows how this predefined track can be used to access the audio samples belonging to the segments in ai segs.

# get media file samples
ai_td_mfs = get_trackdata(ae,
        seglist = ai_segs,
        ssffTrackName = "MEDIAFILE_SAMPLES",
        verbose = FALSE)
# show summary of ai_td_fm
summary(ai_td_mfs)
## Emu track data from 6 segments
## 
## Data is  1 dimensional from track MEDIAFILE_SAMPLES 
## Mean data length is  3064.333  samples

Adding new tracks

The signal processing routines provided by the wrassp package can be used to produce SSFF files containing various derived signal data (e.g., formants, fundamental frequency, etc.). The following example shows how the add_ssffTrackDefinition() can be used to add a new track to the ae emuDB. Using the onTheFlyFunctionName parameter, the add_ssffTrackDefinition() function automatically executes the wrassp signal processing function ksvF0 (onTheFlyFunctionName = "ksvF0") and stores the results in SSFF files in the bundle directories.

# add new track and calculate
# .f0 files on-the-fly using wrassp::ksvF0()
add_ssffTrackDefinition(ae,
                        name = "F0",
                        onTheFlyFunctionName = "ksvF0",
                        verbose = FALSE)
# show newly added track
list_ssffTrackDefinitions(ae)
# show newly added files
list_files(ae, fileExtension = "f0")
# extract newly added trackdata
ai_td = get_trackdata(ae,
                      seglist = ai_segs,
                      ssffTrackName = "F0",
                      verbose = FALSE)

In the command add_ssffTrackDefinition(), we could have also added parameter values to the wrassp function ksvF0.

formals(wrassp::ksvF0)
## $listOfFiles
## NULL
## 
## $optLogFilePath
## NULL
## 
## $beginTime
## [1] 0
## 
## $endTime
## [1] 0
## 
## $windowShift
## [1] 5
## 
## $gender
## [1] "u"
## 
## $maxF
## [1] 600
## 
## $minF
## [1] 50
## 
## $minAmp
## [1] 50
## 
## $maxZCR
## [1] 3000
## 
## $toFile
## [1] TRUE
## 
## $explicitExt
## NULL
## 
## $outputDirectory
## NULL
## 
## $forceToLog
## useWrasspLogger
## 
## $verbose
## [1] TRUE

We can see, that the parameter gender is by default on “u” (undefined). As we now, that the ae data-base consists of male speech only, we could set this to “m” (male):

add_ssffTrackDefinition(ae,
                        name = "F0",
                        onTheFlyFunctionName = "ksvF0",
                        onTheFlyParams = list(gender = "m"),
                        verbose = FALSE)
## There are files present in 'ae' that have the file extention 'f0'! Continuing will overwrite these files! Do you wish to proceed? (y/n)

Pre-calculated dervied signals can be shown and corrected in the EMU-webApp. Pre-calculated formants may be overlaid to the spectrogram. Learn more about this in chapter 08.

One disadvantage of this method may not be withholded: it is – until now – not püossible to pre-calculate speaker-group-(e.g. gender)-specific data by setting different parameters for various speaker-groups. However, implementation of this feature is in the making.

Calculating tracks on-the-fly

With the wrassp package, we were able to implement a new form of signal data extraction which was not available in the legacy system. The user is now able to select one of the signal processing routines provided by wrassp and pass it on to the signal data extraction function. The signal data extraction function can then apply this wrassp function to each audio file as part of the signal data extraction process. This means that the user can quickly manipulate function parameters and evaluate the result without having to store to disk the files that would usually be generated by the various parameter experiments. In many cases this new functionality eliminates the need for defining a track definition for the entire database for temporary data analysis purposes. The following example shows how the onTheFlyFunctionName parameter of the get trackdata() function is used:

ai_td_pit = get_trackdata(ae,
seglist = ai_segs,
onTheFlyFunctionName = "mhsF0",
verbose = FALSE)
# show summary of ai_td
summary(ai_td_pit)
## Emu track data from 6 segments
## 
## Data is  1 dimensional from track pitch 
## Mean data length is  30.5  samples

The resulting object: trackdata vs. emuRtrackdata

The default resulting object of a call to get_trackdata() is of class trackdata The emuR package provides multiple routines such as dcut(), trapply(), eplot and dplot() for processing and visually inspect objects of this type (see Harrington, 2010, for the use of these functions).

iVu = query(ae,query="Phonetic==V|i:|u:")
iVu_fm = get_trackdata(ae,
                        seglist = iVu,
                        ssffTrackName =  "fm",
                        verbose = FALSE)
# show class of iVu_fm
class(iVu_fm)
## [1] "trackdata"
iVu_fm05=dcut(iVu_fm,.5,prop = TRUE)
eplot(iVu_fm05[,1:2],label(iVu),centroid=TRUE,formant = TRUE)

dplot(iVu_fm[,1:2],label(iVu))

dplot(iVu_fm[,1:2],label(iVu),normalise=TRUE,average=TRUE)

A trackdata object consists of three parts ($index, $ftime, and $data).

#show the first two segments' values only
iVu_fm[1:2,]
## trackdata from track: fm 
## index:
##      left right
## [1,]    1    14
## [2,]   15    31
## ftime:
##      start   end
## [1,] 187.5 252.5
## [2,] 342.5 422.5
## data:
##        T1   T2   T3   T4
## 187.5   0 1293 2424 3429
## 192.5 628 1306 2410 3348
## 197.5 655 1303 2397 3337
## 202.5 724 1283 2338 3326
## 207.5 717 1266 2312 3326
## 212.5 715 1244 2317 3462
## 217.5 711 1229 2349 3451
## 222.5 688 1214 2362 3391
## 227.5 625 1189 2364 3372
## 232.5 535 1173 2359 3381
## 237.5 496 1146 2328 3389
## 242.5 502 1136 2296 3323
## 247.5 458 1121 2258 3218
## 252.5 440 1111 2223 3177
## 342.5 454 1156 2221 3319
## 347.5 523 1187 2262 3355
## 352.5 568 1201 2286 3367
## 357.5 607 1222 2300 3387
## 362.5 618 1239 2306 3412
## 367.5 643 1265 2316 3433
## 372.5 665 1278 2304 3450
## 377.5 638 1283 2282 3464
## 382.5 584 1288 2254 3476
## 387.5 524 1296 2228 3482
## 392.5 493 1314 2213 3486
## 397.5 490 1329 2200 3515
## 402.5 484 1340 2186 3565
## 407.5 474 1358 2176 3609
## 412.5 455 1370 2170 3640
## 417.5 435 1375 2178 3659
## 422.5 417 1414 2188 3652

As the fms-files have fout columns of formant data (F1 … F4), $data has four columns as well. In f0-data, there will be only one column, called T1. In spectral data, however, there may by hundrets of columns:

iVu_spectral = get_trackdata(ae,
                        seglist = iVu,
                        onTheFlyFunctionName =  "dftSpectrum",
                        verbose = FALSE)
colnames(iVu_spectral$data)
##   [1] "T1"   "T2"   "T3"   "T4"   "T5"   "T6"   "T7"   "T8"   "T9"   "T10" 
##  [11] "T11"  "T12"  "T13"  "T14"  "T15"  "T16"  "T17"  "T18"  "T19"  "T20" 
##  [21] "T21"  "T22"  "T23"  "T24"  "T25"  "T26"  "T27"  "T28"  "T29"  "T30" 
##  [31] "T31"  "T32"  "T33"  "T34"  "T35"  "T36"  "T37"  "T38"  "T39"  "T40" 
##  [41] "T41"  "T42"  "T43"  "T44"  "T45"  "T46"  "T47"  "T48"  "T49"  "T50" 
##  [51] "T51"  "T52"  "T53"  "T54"  "T55"  "T56"  "T57"  "T58"  "T59"  "T60" 
##  [61] "T61"  "T62"  "T63"  "T64"  "T65"  "T66"  "T67"  "T68"  "T69"  "T70" 
##  [71] "T71"  "T72"  "T73"  "T74"  "T75"  "T76"  "T77"  "T78"  "T79"  "T80" 
##  [81] "T81"  "T82"  "T83"  "T84"  "T85"  "T86"  "T87"  "T88"  "T89"  "T90" 
##  [91] "T91"  "T92"  "T93"  "T94"  "T95"  "T96"  "T97"  "T98"  "T99"  "T100"
## [101] "T101" "T102" "T103" "T104" "T105" "T106" "T107" "T108" "T109" "T110"
## [111] "T111" "T112" "T113" "T114" "T115" "T116" "T117" "T118" "T119" "T120"
## [121] "T121" "T122" "T123" "T124" "T125" "T126" "T127" "T128" "T129" "T130"
## [131] "T131" "T132" "T133" "T134" "T135" "T136" "T137" "T138" "T139" "T140"
## [141] "T141" "T142" "T143" "T144" "T145" "T146" "T147" "T148" "T149" "T150"
## [151] "T151" "T152" "T153" "T154" "T155" "T156" "T157" "T158" "T159" "T160"
## [161] "T161" "T162" "T163" "T164" "T165" "T166" "T167" "T168" "T169" "T170"
## [171] "T171" "T172" "T173" "T174" "T175" "T176" "T177" "T178" "T179" "T180"
## [181] "T181" "T182" "T183" "T184" "T185" "T186" "T187" "T188" "T189" "T190"
## [191] "T191" "T192" "T193" "T194" "T195" "T196" "T197" "T198" "T199" "T200"
## [201] "T201" "T202" "T203" "T204" "T205" "T206" "T207" "T208" "T209" "T210"
## [211] "T211" "T212" "T213" "T214" "T215" "T216" "T217" "T218" "T219" "T220"
## [221] "T221" "T222" "T223" "T224" "T225" "T226" "T227" "T228" "T229" "T230"
## [231] "T231" "T232" "T233" "T234" "T235" "T236" "T237" "T238" "T239" "T240"
## [241] "T241" "T242" "T243" "T244" "T245" "T246" "T247" "T248" "T249" "T250"
## [251] "T251" "T252" "T253" "T254" "T255" "T256" "T257"

However, as the trackdata object is a fairly complex nested matrix object with internal reference matrices, which can be cumbersome to work with, the emuR package introduces a new equivalent object type called emuRtrackdata that essentially is a flat data.frame or data.table object. This object type can be retrieved by setting the resultType parameter of the get trackdata() function to emuRtrackdata:

iVu_fm_new = get_trackdata(ae,
                        seglist = iVu,
                        ssffTrackName =  "fm",
                        resultType="emuRtrackdata",
                        verbose = FALSE)
# show class of iVu_fm
class(iVu_fm_new)
## [1] "emuRtrackdata" "data.table"    "data.frame"
iVu_fm_new
##      sl_rowIdx labels    start      end          utts
##   1:         1      V  187.425  256.925 0000:msajc003
##   2:         1      V  187.425  256.925 0000:msajc003
##   3:         1      V  187.425  256.925 0000:msajc003
##   4:         1      V  187.425  256.925 0000:msajc003
##   5:         1      V  187.425  256.925 0000:msajc003
##  ---                                                 
## 205:        13      V 1943.175 2037.425 0000:msajc057
## 206:        13      V 1943.175 2037.425 0000:msajc057
## 207:        13      V 1943.175 2037.425 0000:msajc057
## 208:        13      V 1943.175 2037.425 0000:msajc057
## 209:        13      V 1943.175 2037.425 0000:msajc057
##                                   db_uuid session   bundle start_item_id
##   1: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc003           147
##   2: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc003           147
##   3: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc003           147
##   4: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc003           147
##   5: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc003           147
##  ---                                                                    
## 205: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc057           189
## 206: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc057           189
## 207: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc057           189
## 208: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc057           189
## 209: 0fc618dc-8980-414d-8c7a-144a649ce199    0000 msajc057           189
##      end_item_id    level start_item_seq_idx end_item_seq_idx    type
##   1:         147 Phonetic                  1                1 SEGMENT
##   2:         147 Phonetic                  1                1 SEGMENT
##   3:         147 Phonetic                  1                1 SEGMENT
##   4:         147 Phonetic                  1                1 SEGMENT
##   5:         147 Phonetic                  1                1 SEGMENT
##  ---                                                                 
## 205:         189 Phonetic                 28               28 SEGMENT
## 206:         189 Phonetic                 28               28 SEGMENT
## 207:         189 Phonetic                 28               28 SEGMENT
## 208:         189 Phonetic                 28               28 SEGMENT
## 209:         189 Phonetic                 28               28 SEGMENT
##      sample_start sample_end sample_rate times_rel times_orig  T1   T2
##   1:         3749       5138       20000         0      187.5   0 1293
##   2:         3749       5138       20000         5      192.5 628 1306
##   3:         3749       5138       20000        10      197.5 655 1303
##   4:         3749       5138       20000        15      202.5 724 1283
##   5:         3749       5138       20000        20      207.5 717 1266
##  ---                                                                  
## 205:        38864      40748       20000        65     2012.5 635 1274
## 206:        38864      40748       20000        70     2017.5 535 1268
## 207:        38864      40748       20000        75     2022.5   0 1223
## 208:        38864      40748       20000        80     2027.5 150 1192
## 209:        38864      40748       20000        85     2032.5   0 1227
##        T3   T4
##   1: 2424 3429
##   2: 2410 3348
##   3: 2397 3337
##   4: 2338 3326
##   5: 2312 3326
##  ---          
## 205: 2231 3350
## 206: 2303 3449
## 207: 2303 3675
## 208: 2290 3487
## 209: 2396 3517
names(iVu_fm_new)
##  [1] "sl_rowIdx"          "labels"             "start"             
##  [4] "end"                "utts"               "db_uuid"           
##  [7] "session"            "bundle"             "start_item_id"     
## [10] "end_item_id"        "level"              "start_item_seq_idx"
## [13] "end_item_seq_idx"   "type"               "sample_start"      
## [16] "sample_end"         "sample_rate"        "times_rel"         
## [19] "times_orig"         "T1"                 "T2"                
## [22] "T3"                 "T4"

The emuRtrackdata object is an amalgamation of both a segment list and a trackdata object. The first sl_rowIdx column of the iVu object indicates the row index of the segment list the current row belongs to, the times_rel and times_orig (and times_norm in the forthcoming emuR-version) columns represent the relative time and the original time of the samples contained in the current row and T1 (to Tn in n dimensional trackdata) contains the actual signal sample values. It is also worth noting that the emuR package provides a function called create emuRtrackdata(), which allows users to create emuRtrackdata from a segment list and a trackdata object. This is beneficial as it allows trackdata objects to be processed using functions provided by the emuR package (e.g., dcut() and trapply()) and then converts them into a standardized data.table object for further processing (e.g., using R packages such as lme4 or ggplot2 which were implemented to use with data.frame or data.table objects).

Alternative signal derivations

Some people believe, that e.g. wrassp’s formant tracker is not as good as the one used in praat. In order to use the one provided by praat, you could use Raphael Winkelmann’s script here: https://gist.github.com/raphywink/2512752a1efa56951f04 This script allows for calculation of formant frequencies in praat and converts the result into the SSFF format.


  1. This part of the chapter is nearly identical to Winkelmann, Raphael (to appear), “The EMU-SDMS Manual”, Dokumentation zur Mediendissertation zur Erlangung des Doktorgrades der Philosophie an der Ludwig-Maximilians-Universität München, chapter 7

  2. This sub-chapter owes much to chapters 6 and partially 7 from Winkelmann’s Dissertation