The aim is to get from a Praat TextGrid
to an Emu database format as exemplified by Fig. 1.1:
Figure 1.1: An utterance fragment in Praat and in Emu
The assumption is that you have a project called emu2021
and that it contains the following directories.
If not, please see 1. Preliminaries here
Start up R in the project you are using for this course.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(emuR)
##
## Attaching package: 'emuR'
## The following object is masked from 'package:base':
##
## norm
library(wrassp)
In R
, store the path to the directory testsample
as sourceDir
in exactly the following way:
= "./testsample" sourceDir
And also store in R
the path to emu_databases as targetDir
:
= "./emu_databases" targetDir
The directory /testsample/praat
on your computer contains a Praat
style database with .wav
files and .Textgrid
files
Define the path to this database in R
and check you can see these files with thenlist.files()
function:
= file.path(sourceDir, "praat")
path.praat list.files(path.praat)
## [1] "wetter1.TextGrid" "wetter1.wav" "wetter10.TextGrid"
## [4] "wetter10.wav" "wetter11.TextGrid" "wetter11.wav"
## [7] "wetter12.TextGrid" "wetter12.wav" "wetter13.TextGrid"
## [10] "wetter13.wav" "wetter14.TextGrid" "wetter14.wav"
## [13] "wetter15.TextGrid" "wetter15.wav" "wetter16.TextGrid"
## [16] "wetter16.wav" "wetter17.TextGrid" "wetter17.wav"
## [19] "wetter2.TextGrid" "wetter2.wav" "wetter3.TextGrid"
## [22] "wetter3.wav" "wetter4.TextGrid" "wetter4.wav"
## [25] "wetter6.TextGrid" "wetter6.wav" "wetter7.TextGrid"
## [28] "wetter7.wav"
The emuR function for converting the TextGridCollection to an Emu database and then storing the latter in targetDir
(defined above) is convert_TextGridCollection()
. It works like this:
# only execute once!
convert_TextGridCollection(path.praat,
dbName = "praat",
targetDir = targetDir)
The converted Praat
database can now be loaded:
= load_emuDB(file.path(targetDir, "praat_emuDB")) praat_DB
## INFO: Checking if cache needs update for 1 sessions and 14 bundles ...
## INFO: Performing precheck and calculating checksums (== MD5 sums) for _annot.json files ...
## INFO: Nothing to update!
and its properties examined as before:
summary(praat_DB)
And it can of course be viewed:
serve(praat_DB, useViewer = F)
wrassp
The task is to calculate the pitch from each of the utterance’s waveforms for the praat_DB
database created above. First, find the full path names of all of the .wav
files. They are here:
= list.files(path.praat, pattern = ".*wav$", recursive = T, full.names = T)
praat_wav_paths praat_wav_paths
## [1] "./testsample/praat/wetter1.wav" "./testsample/praat/wetter10.wav"
## [3] "./testsample/praat/wetter11.wav" "./testsample/praat/wetter12.wav"
## [5] "./testsample/praat/wetter13.wav" "./testsample/praat/wetter14.wav"
## [7] "./testsample/praat/wetter15.wav" "./testsample/praat/wetter16.wav"
## [9] "./testsample/praat/wetter17.wav" "./testsample/praat/wetter2.wav"
## [11] "./testsample/praat/wetter3.wav" "./testsample/praat/wetter4.wav"
## [13] "./testsample/praat/wetter6.wav" "./testsample/praat/wetter7.wav"
The signal processing package wrassp
will now be used to calculate the pitch for each of these .wav
files. To see the full range of signal processing routines available, enter:
?wrassp
There are two possible routines that are needed here for calculating pitch: ksvF0
and mhsF0
.
Here’s how to use mhsF0
with the default settings. The output is going to be stored in path.praat
(i.e. in /testsample/praat
on you computer).
# only execute once!
mhsF0(praat_wav_paths, outputDirectory = path.praat)
As the figure below shows, the pitch files have should now all been dumped in path.praat
i.e. in /testsample/praat
\(~\)
\(~\)
These calculated pitch files now need to be added to praat_DB
. This is done with the add_files()
function. The parameter targetSessionName
can be omitted in this case, because all of the bundles are stored in the session directory 0000
. This can be verified with:
list_bundles(praat_DB)
## # A tibble: 14 × 2
## session name
## <chr> <chr>
## 1 0000 wetter1
## 2 0000 wetter10
## 3 0000 wetter11
## 4 0000 wetter12
## 5 0000 wetter13
## 6 0000 wetter14
## 7 0000 wetter15
## 8 0000 wetter16
## 9 0000 wetter17
## 10 0000 wetter2
## 11 0000 wetter3
## 12 0000 wetter4
## 13 0000 wetter6
## 14 0000 wetter7
Now add the pitch files to praat_DB
:
# only execute once!
add_files(praat_DB,
dir = path.praat,
fileExtension = "pit",
targetSessionName = "0000")
Having added the files, they need to be defined. The information required is:
track name
. This can be anything and it is needed when referring to these signal files in R
.file extension
. This is pit
as already established above.columnName
. This is the name of the column in the .pit
files in which the fundamental frequency data is stored. This type of information (as well as information about the extension) is given by wrasspOutputInfos
. In this case, append $mhsF0
since this was the name of the signal processing routine that has been used to calculate the pitch data:$mhsF0 wrasspOutputInfos
## $ext
## [1] "pit"
##
## $tracks
## [1] "pitch"
##
## $outputType
## [1] "SSFF"
The column name is given by $tracks
which in this case is pitch
. Putting all this together, and using "pitch"
for the the name of the track gives:
# only execute once!
add_ssffTrackDefinition(praat_DB,
name = "pitch",
columnName = "pitch",
fileExtension = "pit")
summary(praat_DB)
The signals that are currently displayed for this praat_DB
database can be seen with the function get_signalCanvasesOrder()
as follows:
get_signalCanvasesOrder(praat_DB, perspectiveName = "default")
## [1] "OSCI" "SPEC"
which confirms that what is seen when viewing the database with the serve()
function is the waveform (OSCI
) and the spectrogram. The pitch data created above now needs to be added using the function set_signalCanvasesOrder
. The second argument should always be "default"
, thus:
set_signalCanvasesOrder(praat_DB, perspectiveName = "default",
order = c("OSCI", "SPEC", "pitch"))
serve(praat_DB, useViewer = F)
The next task is to add an event tier that can be used for labelling tones. Here the tier is called “Tone”. So far, the only only existing time tier is ORT
as confirmed by:
list_levelDefinitions(praat_DB)
In order to add a new tier called Tone
as an EVENT
tier:
# only execute once!
add_levelDefinition(praat_DB, "Tone", "EVENT")
Display Tone
so that it is above the ORT
tier and so directly underneath the signals:
get_levelCanvasesOrder(praat_DB, perspectiveName = "default")
set_levelCanvasesOrder(praat_DB,
perspectiveName = "default",
order = c("Tone", "ORT"))
Add two tone labels H* at pitch peak of morgens and ruhig in wetter1
as in Fig. 1.1 and save the result.
serve(praat_DB, useViewer=F)
The tones are to be linked to words within which they occur in time. To do this, define a hierarchical relationship such that ORT
dominates Tone
:
list_linkDefinitions(praat_DB)
# only execute once!
add_linkDefinition(praat_DB,
type = "ONE_TO_MANY",
superlevelName = "ORT",
sublevelName = "Tone")
list_linkDefinitions(praat_DB)
## NULL
Inspect the hierarchy:
summary(praat_DB)
# switch to hierarchy view
serve(praat_DB, useViewer = F)
This makes use of the autobuild_linkFromTimes
function in order to link the tones to the corresponding words:
# only execute once!
autobuild_linkFromTimes(praat_DB,
superlevelName = "ORT",
sublevelName = "Tone")
# switch to hierarchy view
serve(praat_DB, useViewer = F)