We will use a demo of an EMU-SDMS-database in this lecture that comes with the emuR package; we will ‘create’ this database by using the function create_emuRdemoData; the data base will be saved at mypath:

# load packages
library(emuR)
library(dplyr)
library(ggplot2)

# create demo data in directory
# provided by tempdir()
create_emuRdemoData(dir = mypath)

# create path to demo database
path2ae = file.path(mypath, "emuR_demoData", "ae_emuDB")
# load database
ae = load_emuDB(path2ae, verbose = F)

summary(ae)

## Name:     ae 
## UUID:     0fc618dc-8980-414d-8c7a-144a649ce199 
## Directory:    /Users/reubold/myEMURdata/emuR_demoData/ae_emuDB 
## Session count: 1 
## Bundle count: 7 
## Annotation item count:  736 
## Label count:  844 
## Link count:  785 
## 
## Database configuration:
## 
## SSFF track definitions:
##   name columnName fileExtension
## 1  dft        dft           dft
## 2   fm         fm           fms
## 
## Level definitions:
##           name    type nrOfAttrDefs        attrDefNames
## 1    Utterance    ITEM            1          Utterance;
## 2 Intonational    ITEM            1       Intonational;
## 3 Intermediate    ITEM            1       Intermediate;
## 4         Word    ITEM            3 Word; Accent; Text;
## 5     Syllable    ITEM            1           Syllable;
## 6      Phoneme    ITEM            1            Phoneme;
## 7     Phonetic SEGMENT            1           Phonetic;
## 8         Tone   EVENT            1               Tone;
## 9         Foot    ITEM            1               Foot;
## 
## Link definitions:
##           type superlevelName sublevelName
## 1  ONE_TO_MANY      Utterance Intonational
## 2  ONE_TO_MANY   Intonational Intermediate
## 3  ONE_TO_MANY   Intermediate         Word
## 4  ONE_TO_MANY           Word     Syllable
## 5  ONE_TO_MANY       Syllable      Phoneme
## 6 MANY_TO_MANY        Phoneme     Phonetic
## 7  ONE_TO_MANY       Syllable         Tone
## 8  ONE_TO_MANY   Intonational         Foot
## 9  ONE_TO_MANY           Foot     Syllable

In the level definitions, we see one EVENT level (“Tone”, one point in time), one SEGMENT level (“Phonetic”, with start and end times), and several ITEM levels, e.g. “Syllabe” or “Word”, which inherit time information from the level “Phonetic”. In the link definitions, we can see a very rich annotation structure, which results in the following tree-like structure for the first utterance:

serve(ae,autoOpenURL = "https://ips-lmu.github.io/EMU-webApp/?autoConnect=true")

Figure 1: Hierarchy of the first utterance of the database ae

We can also see so-called SSFF track definitions, which means in this case that - amongst other things - pre-calculated formants are available.

You should be informed that all seven utterances were read by the same speaker, so there will be no concerns about vowel normalisation. The male is a speaker of Australian English (therefore the database’s name ae).

0 Example of an analysis

We will now present a little example of how such a database could be analysed. To do so, we will use the function query() to query certain segments, get_trackdata() and other functions to read formants into R, and requery_hier() for further re-analysis.

First of all, we want to plot the edges of the Australian English vowel space. To do so, we will query back and front closed, mid, and open vowels.

# query A and V(front and back open vowels),
# i:and u: (front and back closed vowels), and
# E and o: (front and back mid vowels)
ae_vowels = query(emuDBhandle = ae,query = "[Phonetic== V|A|i:|u:|o:|E]")
#get the formants:
ae_formants = get_trackdata(ae, seglist = ae_vowels,ssffTrackName = "fm", resultType = "emuRtrackdata")
#get the formants at the vowels' temporal midpoints:
ae_formants_norm = normalize_length(ae_formants)
ae_midpoints = ae_formants_norm %>% filter(times_norm==0.5)
#plot the vowel space:
ggplot(ae_midpoints) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  geom_text() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +
  theme(legend.position="none")

This figure shows a vowel space as one would expect it: open vowels are near the bottom, closed vowels are on the top, mid vowels in the mid. Front vowels are on the left side of the plot, and back vowels are on the right-hand side. However, there is an exception: only one out of four /u:/s is actually really back, the other three are extremely fronted.

In order to re-inspect the data, we will henceforth concentrate on /u:/:

ggplot(ae_midpoints%>%filter(labels=="u:")) +
  aes(x=T2,y=T1,label=labels,col=labels) +
  geom_text() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +
  theme(legend.position="none")

In order to find out why three out of four /u:/ are so front, we should find out the words; this can be done by examining to which words the four /u:/s were linked (by means of requery_hier()):

ae_midpoints$Word = requery_hier(ae,seglist = ae_vowels, level = "Text")$labels
ggplot(ae_midpoints%>%filter(labels=="u:")) +
  aes(x=T2,y=T1,label=Word,col=labels) +
  geom_text() +
  scale_y_reverse() + scale_x_reverse() + 
  labs(x = "F2 (Hz)", y = "F1 (Hz)") +
  theme(legend.position="none")

As we can see, the back /u:/ comes from the word “to”, whereas the front vowels are linked to the words “new”, “beautiful”, and “futile”. All three words have in common that /u:/ should be preceded by /j/. This could cause the fronting of /u:/.

However, we should test whether our assumption is true. We will now query the sequences of the preceding consonant and /u:/, and analyse these sequences’ F2 trajectories:

Cu = query(emuDBhandle = ae,query = "[Phonetic=~ .* -> Phonetic== u:]")
Cu_formants = get_trackdata(ae, seglist = Cu,ssffTrackName = "fm", resultType = "emuRtrackdata")
ggplot(Cu_formants) +
  aes(x=times_rel,y=T2,col=labels,group=sl_rowIdx) +
  geom_line() +
  labs(x = "Duration (ms)", y = "F2 (Hz)")

In the word “to”, the preceding segment is labelled “H”, i.e. the aspiration of /t/. You can clearly see in the plot that the F2 trajectory is coming from a relatively high F2 locus, however, this locus is still much lower than F2 in /j/ (which is, of course, very similar to F2 in an /i:/ vowel). Therefore, we can conclude that the preceding /j/ is causing /u:/ to front in that context.

This little analysis was very dependent on several different kinds of queries and re-queries, and we would like to introduce you to the main concepts of these functions:

1. Simple queries with `query()`

We will start with very basic queries. The function for conducting queries is simply called query; this functions needs at least two arguments, emuDBhandle and query, e.g.:

V = query(emuDBhandle = ae,query = "[Phonetic==V]")

The expression ["Phonetic==V"] is a legal expression in the EMU Query Language (EQL) (details see below) and could be translated into “which labels in the level Phonetic are equal to the label ‘V’” (and ‘V’ is the SAMPA for English equivalent to IPA /ʌ/, i.e. the vowel in words like <cut>).

1.1 Results of `query()`: segment lists

An emuR segment list is a list of segment descriptors. Each segment descriptor describes a sequence of annotation elements. The list is usually a result of an emuDB query using function query like in the present example. query has found three tokens of [V]:

## segment  list from database:  ae 
## query was:  [Phonetic==V] 
##   labels    start      end session   bundle    level    type
## 1      V  187.425  256.925    0000 msajc003 Phonetic SEGMENT
## 2      V  340.175  426.675    0000 msajc003 Phonetic SEGMENT
## 3      V 1943.175 2037.425    0000 msajc057 Phonetic SEGMENT

This object is an attributed data.frame, with one row per segment descriptor:

Data frame columns

labels: labels or sequenced labels of segments concatenated by ‘->’
start: onset time in milliseconds
end: offset time in milliseconds
session: session name
bundle: bundle name (= utterance name)
level: name of the level that has been searched
type: type of “segment” row: ITEM: symbolic item, EVENT: event item, SEGMENT: segment

Additional hidden columns

db_uuid: UUID of emuDB (= a unique identifier)
startItemID: item ID of first element of sequence
endItemID: item ID of last element of sequence
sampleStart: start sample position
sampleEnd: end sample position
sampleRate: sample rate

Attributes

database: name of emuDB
query: Query string

This makes it easy to access certain informations, e.g.

#Get labels:
V$labels

## [1] "V" "V" "V"

#Get start times:
V$start

## [1]  187.425  340.175 1943.175

#Get end times:
V$end

## [1]  256.925  426.675 2037.425

#durations of the [V]s 
V$end - V$start

## [1] 69.50 86.50 94.25

#for the latter, there is also a special function in emuR:

dur(V)

## [1] 69.50 86.50 94.25

1.2 Inherited times

What happens, if we were looking for a timeless ITEM?

#Phonetic=EVENT, Phoneme=ITEM
list_levelDefinitions(ae)

##           name    type nrOfAttrDefs        attrDefNames
## 1    Utterance    ITEM            1          Utterance;
## 2 Intonational    ITEM            1       Intonational;
## 3 Intermediate    ITEM            1       Intermediate;
## 4         Word    ITEM            3 Word; Accent; Text;
## 5     Syllable    ITEM            1           Syllable;
## 6      Phoneme    ITEM            1            Phoneme;
## 7     Phonetic SEGMENT            1           Phonetic;
## 8         Tone   EVENT            1               Tone;
## 9         Foot    ITEM            1               Foot;

V_phoneme=query(emuDBhandle = ae,query = "[Phoneme==V]")
V_phoneme

## segment  list from database:  ae 
## query was:  [Phoneme==V] 
##   labels    start      end session   bundle   level type
## 1      V  187.425  256.925    0000 msajc003 Phoneme ITEM
## 2      V  340.175  426.675    0000 msajc003 Phoneme ITEM
## 3      V 1943.175 2037.425    0000 msajc057 Phoneme ITEM

## segment  list from database:  ae 
## query was:  [Phonetic==V] 
##   labels    start      end session   bundle    level    type
## 1      V  187.425  256.925    0000 msajc003 Phonetic SEGMENT
## 2      V  340.175  426.675    0000 msajc003 Phonetic SEGMENT
## 3      V 1943.175 2037.425    0000 msajc057 Phonetic SEGMENT

As you can see, V and V_phoneme both present times, although Phoneme is a timeless ITEM level. Times are inheritet from the SEGMENT level Phonetic. This, of course, will only work if Phoneme and Phonetic levels are linked (and they are linked, see also Figure 1):

list_linkDefinitions(ae)

##           type superlevelName sublevelName
## 1  ONE_TO_MANY      Utterance Intonational
## 2  ONE_TO_MANY   Intonational Intermediate
## 3  ONE_TO_MANY   Intermediate         Word
## 4  ONE_TO_MANY           Word     Syllable
## 5  ONE_TO_MANY       Syllable      Phoneme
## 6 MANY_TO_MANY        Phoneme     Phonetic
## 7  ONE_TO_MANY       Syllable         Tone
## 8  ONE_TO_MANY   Intonational         Foot
## 9  ONE_TO_MANY           Foot     Syllable

If the ITEM we are interested in was linked to several time-aligned segments, we would have to use query’s parameter timeRefSegmentLevel to choose the segment level from which query derives time information. However, this is not the case here.

The calculation of inherited times can be time-consuming. In many cases, we may not be interested in time information, but only in the labels; we therefore can turn off the calculation of inherited times with an additional parameter: calcTimes = FALSE:

#Phonetic=EVENT, Phoneme=ITEM
list_levelDefinitions(ae)

##           name    type nrOfAttrDefs        attrDefNames
## 1    Utterance    ITEM            1          Utterance;
## 2 Intonational    ITEM            1       Intonational;
## 3 Intermediate    ITEM            1       Intermediate;
## 4         Word    ITEM            3 Word; Accent; Text;
## 5     Syllable    ITEM            1           Syllable;
## 6      Phoneme    ITEM            1            Phoneme;
## 7     Phonetic SEGMENT            1           Phonetic;
## 8         Tone   EVENT            1               Tone;
## 9         Foot    ITEM            1               Foot;

V_phoneme2=query(emuDBhandle = ae,query = "[Phoneme==V]",calcTimes = FALSE)
V_phoneme2

## segment  list from database:  ae 
## query was:  [Phoneme==V] 
##   labels start end session   bundle   level type
## 1      V    NA  NA    0000 msajc003 Phoneme ITEM
## 2      V    NA  NA    0000 msajc003 Phoneme ITEM
## 3      V    NA  NA    0000 msajc057 Phoneme ITEM

In this case, all entries in start and end are NA (= Not Available).

1.1.3 `requery_hier()` and `requery_seq()`

1.1.3.1 Relation types

There are two (self-explaining) types of relations in the EMU-SDMS:

dominance
sequence

1.1.3.1.1 Dominance

By which words are the “V”s dominated? We could find out by a hierarchical re-query:

#find all "V"-labels in `ae`
V=query(emuDBhandle = ae,query = "[Phonetic==V]")

Now put this segment list into requery_hier() and look for the linked ITEM in level Word, attribute Text:

(V_Text = requery_hier(emuDBhandle = ae,seglist = V,level = "Text"))

## segment  list from database:  ae 
## query was:  FROM REQUERY 
##      labels    start      end session   bundle level type
## 1   amongst  187.425  674.175    0000 msajc003  Text ITEM
## 2   amongst  187.425  674.175    0000 msajc003  Text ITEM
## 3 customers 1824.425 2367.775    0000 msajc057  Text ITEM

Your result will be the ITEM labels and calculated times (for the corresponding words).

1.1.3.1.2 Sequence

You could also wish to know what “V”s sequential contexts are, e.g. the subsequent segments. We use the sequential structure of the database, and the command requery_seq(), with offset = 1 (offset = -1 would find the sound the precedes ‘V’):

requery_seq(emuDBhandle = ae,seglist = V,offset = 1)

## segment  list from database:  ae 
## query was:  FROM REQUERY 
##   labels    start      end session   bundle    level    type
## 1      m  256.925  340.175    0000 msajc003 Phonetic SEGMENT
## 2      N  426.675  483.425    0000 msajc003 Phonetic SEGMENT
## 3      s 2037.425 2085.175    0000 msajc057 Phonetic SEGMENT

We will discuss both commands more extensively later in the seminar, but wanted to show that it is possible to use the annotation structure and a given segment list to retrieve additional information afterwards. We could use both commands to express more complex queries: e.g. we could look for all “V” within the word “amongst” by querying “V”, then requery all linked words, and then deletin all “V” that are not linked to “amongst”. However, this would be rather cumbersome. A much easier way to conduct more complicated queries is the use of all possibilities of emuR’s query language EQL within the command query. However, before we can use more complex queries, we will have to learn the Emu Query Language.

2. The Emu Query Language `EQL`

To learn about the functionality of the EQL, you can always type

vignette("EQL")

As we have seen above, any query must be placed within " ", and any query can be placed within [ ]. You minimally have to give a level, and some sort of representation for a label (this may be a regular expression), unless you do not use one of the position and count functions (see below).

2.1 Single argument queries

2.1.1 Equality/inequality/matching/non-matching

2.1.1.1 Equality

In the examples above, we had looked for the equality of the labels to “V” on the level “Phonetic” (in the database ae):

query(emuDBhandle = ae, query = "Phonetic == V")

## segment  list from database:  ae 
## query was:  Phonetic == V 
##   labels    start      end session   bundle    level    type
## 1      V  187.425  256.925    0000 msajc003 Phonetic SEGMENT
## 2      V  340.175  426.675    0000 msajc003 Phonetic SEGMENT
## 3      V 1943.175 2037.425    0000 msajc057 Phonetic SEGMENT

So “==” is the equality operator. For backward compatibility with earlier versions of emuR, a single “=” is also allowed (but we ask you to prefer “==” instead):

query(emuDBhandle = ae, query = "Phonetic = V")

## segment  list from database:  ae 
## query was:  Phonetic = V 
##   labels    start      end session   bundle    level    type
## 1      V  187.425  256.925    0000 msajc003 Phonetic SEGMENT
## 2      V  340.175  426.675    0000 msajc003 Phonetic SEGMENT
## 3      V 1943.175 2037.425    0000 msajc057 Phonetic SEGMENT

2.1.1.2 Inequality

We can also search everything except “V” by the use of !=

query(emuDBhandle = ae, query = "Phonetic != V")

(We do not show the resulting segment list, because it is very long.) So one way to get ‘everything’ would be to query something that is probably not in your database, like “xyz”. However, there is a much better way: Using so-called regular expressions. To use these, you have to type “=~”, followed by the regular expression, in this case .* (meaning: any character (.) zero or more times (*) ). Please do not worry too much about regular expressions. This example will probably be the only one in this seminar:

Everything1 = query(emuDBhandle = ae, query = "Phonetic != xyz")
Everything2 = query(emuDBhandle = ae, query = "Phonetic =~ .*")
any(Everything1 != Everything2) # should result in FALSE if both are equal everywhere

## [1] FALSE

You can also negate the latter operator by “!~”. An example would be:

# What is the query to retrieve all ITEMs in the “Text” level that don’t begin with ‘a’?
query(emuDBhandle = ae, query = "Text !~ a.*")

## segment  list from database:  ae 
## query was:  Text !~ a.* 
##        labels    start      end session   bundle level type
## 1         her  674.175  739.925    0000 msajc003  Text ITEM
## 2     friends  739.925 1289.425    0000 msajc003  Text ITEM
## 3         she 1289.425 1463.175    0000 msajc003  Text ITEM
## 4         was 1463.175 1634.425    0000 msajc003  Text ITEM
## 5  considered 1634.425 2150.175    0000 msajc003  Text ITEM
## 6   beautiful 2033.675 2604.425    0000 msajc003  Text ITEM
## 7          it  299.975  411.675    0000 msajc010  Text ITEM
## 8          is  411.675  571.925    0000 msajc010  Text ITEM
## 9      futile  571.925 1090.975    0000 msajc010  Text ITEM
## 10         to 1090.975 1222.325    0000 msajc010  Text ITEM
## 11      offer 1222.325 1391.025    0000 msajc010  Text ITEM
## 12    further 1628.475 1957.775    0000 msajc010  Text ITEM
## 13 resistance 1957.775 2753.975    0000 msajc010  Text ITEM
## 14        the  299.975  379.525    0000 msajc012  Text ITEM
## 15      chill  379.525  744.525    0000 msajc012  Text ITEM
## 16       wind  744.525 1082.975    0000 msajc012  Text ITEM
## 17     caused 1082.975 1456.475    0000 msajc012  Text ITEM
## 18       them 1456.475 1564.975    0000 msajc012  Text ITEM
## 19         to 1564.975 1650.975    0000 msajc012  Text ITEM
## 20     shiver 1650.975 1994.975    0000 msajc012  Text ITEM
## 21  violently 1994.975 2692.325    0000 msajc012  Text ITEM
## 22         he  299.975  425.375    0000 msajc015  Text ITEM
## 23 emphasized  425.375 1129.075    0000 msajc015  Text ITEM
## 24        his 1129.075 1368.075    0000 msajc015  Text ITEM
## 25  strengths 1213.075 1797.425    0000 msajc015  Text ITEM
## 26      while 1797.425 2104.075    0000 msajc015  Text ITEM
## 27 concealing 2104.075 2693.675    0000 msajc015  Text ITEM
## 28        his 2693.675 2780.725    0000 msajc015  Text ITEM
## 29 weaknesses 2780.725 3456.825    0000 msajc015  Text ITEM
## 30     itches  299.975  662.425    0000 msajc022  Text ITEM
## 31         so 1113.675 1400.675    0000 msajc022  Text ITEM
## 32   tempting 1400.675 1806.275    0000 msajc022  Text ITEM
## 33         to 1806.275 1890.275    0000 msajc022  Text ITEM
## 34    scratch 1890.275 2469.525    0000 msajc022  Text ITEM
## 35       I'll  299.975  513.925    0000 msajc023  Text ITEM
## 36      hedge  513.925  819.025    0000 msajc023  Text ITEM
## 37         my  819.025 1038.775    0000 msajc023  Text ITEM
## 38       bets 1038.775 1421.925    0000 msajc023  Text ITEM
## 39       take 1495.275 1774.925    0000 msajc023  Text ITEM
## 40         no 1774.925 1964.425    0000 msajc023  Text ITEM
## 41      risks 1964.425 2554.175    0000 msajc023  Text ITEM
## 42       this  299.975  475.775    0000 msajc057  Text ITEM
## 43        new  475.775  666.675    0000 msajc057  Text ITEM
## 44    display  666.675 1211.175    0000 msajc057  Text ITEM
## 45       more 1578.675 1824.425    0000 msajc057  Text ITEM
## 46  customers 1824.425 2367.775    0000 msajc057  Text ITEM
## 47       than 2367.775 2480.425    0000 msajc057  Text ITEM
## 48       ever 2480.425 2794.925    0000 msajc057  Text ITEM

So, there are four similar operators, two for equality matching, and two for inequalitiy:

Symbol	Meaning
`==`	equality
`=~`	regular expression matching
`!=`	inequality
`!~`	regular expression non-matching

2.1.1.3 The `OR` operator

Use | to look for one label and another one(s), e.g. ‘m’ or ‘n’ can be retrieved via:

query(emuDBhandle = ae, query = "Phonetic == m|n")

## segment  list from database:  ae 
## query was:  Phonetic == m|n 
##    labels    start      end session   bundle    level    type
## 1       m  256.925  340.175    0000 msajc003 Phonetic SEGMENT
## 2       n 1031.925 1195.925    0000 msajc003 Phonetic SEGMENT
## 3       n 1741.425 1791.425    0000 msajc003 Phonetic SEGMENT
## 4       n 1515.475 1554.475    0000 msajc010 Phonetic SEGMENT
## 5       n 2430.975 2528.475    0000 msajc010 Phonetic SEGMENT
## 6       n  894.975 1022.975    0000 msajc012 Phonetic SEGMENT
## 7       m 1490.425 1564.975    0000 msajc012 Phonetic SEGMENT
## 8       n 2402.275 2474.875    0000 msajc012 Phonetic SEGMENT
## 9       m  496.575  558.575    0000 msajc015 Phonetic SEGMENT
## 10      n 2226.575 2271.075    0000 msajc015 Phonetic SEGMENT
## 11      n 3046.125 3067.675    0000 msajc015 Phonetic SEGMENT
## 12      m 1587.175 1655.675    0000 msajc022 Phonetic SEGMENT
## 13      m  819.025  902.925    0000 msajc023 Phonetic SEGMENT
## 14      n 1434.775 1495.275    0000 msajc023 Phonetic SEGMENT
## 15      n 1774.925 1833.925    0000 msajc023 Phonetic SEGMENT
## 16      n  508.675  543.975    0000 msajc057 Phonetic SEGMENT
## 17      m 1629.675 1709.175    0000 msajc057 Phonetic SEGMENT
## 18      m 2173.425 2233.425    0000 msajc057 Phonetic SEGMENT
## 19      n 2447.675 2480.425    0000 msajc057 Phonetic SEGMENT

You can expand this as well:

mnN = query(emuDBhandle = ae, query = "Phonetic == m | n | N")
summary(mnN)

## segment  list from database:  ae 
## query was:  Phonetic == m | n | N 
##  with 23 segments
## 
## Segment distribution:
## 
##  m  n  N 
##  7 12  4

2.2 Complex queries

2.2.1 Sequencial and dominance queries

2.2.1.1 Bracketing

In all hierarchical queries, bracketing with [ ] is required to structure your query. In simple queries, however, brackets are optional.

mnN = query(emuDBhandle = ae, query = "[Phonetic == m|n|N]")
summary(mnN)

## segment  list from database:  ae 
## query was:  [Phonetic == m|n|N] 
##  with 23 segments
## 
## Segment distribution:
## 
##  m  n  N 
##  7 12  4

However, this sequential query will fail, because of missing brackets:

query(ae, "Phonetic == V -> Phonetic == m")

2.2.1.2 Sequential queries

Use the -> operator to find sequences of segments:

query(ae, "[Phonetic == V -> Phonetic == m]")

## segment  list from database:  ae 
## query was:  [Phonetic == V -> Phonetic == m] 
##   labels   start     end session   bundle    level    type
## 1   V->m 187.425 340.175    0000 msajc003 Phonetic SEGMENT

Note: all row entries in the resulting segment list have the start time of “V”, the end time of “m” and their labels will be “V->m”. Change this with the so-called result modifier hash tag “#”:

query(ae, "[#Phonetic == V -> Phonetic == m]") # finds V, if V is followed by m

## segment  list from database:  ae 
## query was:  [#Phonetic == V -> Phonetic == m] 
##   labels   start     end session   bundle    level    type
## 1      V 187.425 256.925    0000 msajc003 Phonetic SEGMENT

query(ae, "[Phonetic == V -> #Phonetic == m]") #finds m, if m is preceded by V

## segment  list from database:  ae 
## query was:  [Phonetic == V -> #Phonetic == m] 
##   labels   start     end session   bundle    level    type
## 1      m 256.925 340.175    0000 msajc003 Phonetic SEGMENT

Keep in mind that only one hash tag per query is allowed.

You can search sequences of sequences, however, you have to use bracketing; otherwise, you get an error like in

query(ae, "[Phonetic == @ -> Phonetic == n  -> Phonetic == s]")

The correct code would be as follows:

query(ae, "[[Phonetic == @ -> Phonetic == n ] -> Phonetic == s]")

## segment  list from database:  ae 
## query was:  [[Phonetic == @ -> Phonetic == n ] -> Phonetic == s] 
##    labels    start      end session   bundle    level    type
## 1 @->n->s 1715.425 1893.175    0000 msajc003 Phonetic SEGMENT
## 2 @->n->s 2382.475 2753.975    0000 msajc010 Phonetic SEGMENT
## 3 @->n->s 2200.875 2408.575    0000 msajc015 Phonetic SEGMENT

A much more complex example would be:

## What is the query to retrieve all sequences of ITEMs containing labels “offer” followed by two arbitrary labels followed by “resistance”?
query(ae, "[[[Text == offer -> Text =~ .*] -> Text =~ .* ] -> Text == resistance]")

## segment  list from database:  ae 
## query was:  [[[Text == offer -> Text =~ .*] -> Text =~ .* ] -> Text == resistance] 
##                            labels    start      end session   bundle level
## 1 offer->any->further->resistance 1957.775 2753.975    0000 msajc010  Text
##   type
## 1 ITEM

2.2.1.3 Domination queries

Use the operator ^ for all queries, in which two linked levels are involved; e.g.

list_linkDefinitions(ae)

##           type superlevelName sublevelName
## 1  ONE_TO_MANY      Utterance Intonational
## 2  ONE_TO_MANY   Intonational Intermediate
## 3  ONE_TO_MANY   Intermediate         Word
## 4  ONE_TO_MANY           Word     Syllable
## 5  ONE_TO_MANY       Syllable      Phoneme
## 6 MANY_TO_MANY        Phoneme     Phonetic
## 7  ONE_TO_MANY       Syllable         Tone
## 8  ONE_TO_MANY   Intonational         Foot
## 9  ONE_TO_MANY           Foot     Syllable

## What is the query to retrieve all ITEMs containing the label “p” in the “Phoneme” level that occur in strong syllables (i.e. dominated by / linked to ITEMs of the level “Syllable” that contain the label “S” (=STRONG, as opposed to "W"=WEAK))?
query(ae, "[Phoneme == p ^ Syllable == S]")

## segment  list from database:  ae 
## query was:  [Phoneme == p ^ Syllable == S] 
##   labels    start      end session   bundle   level type
## 1      p  558.575  639.575    0000 msajc015 Phoneme ITEM
## 2      p 1655.675 1698.675    0000 msajc022 Phoneme ITEM
## 3      p  863.675  970.425    0000 msajc057 Phoneme ITEM

However, the operator is not directional; although “Syllable” dominates “Phoneme”, you could have asked

query(ae, "[Syllable == S ^ #Phoneme == p]")

## segment  list from database:  ae 
## query was:  [Syllable == S ^ #Phoneme == p] 
##   labels    start      end session   bundle   level type
## 1      p  558.575  639.575    0000 msajc015 Phoneme ITEM
## 2      p 1655.675 1698.675    0000 msajc022 Phoneme ITEM
## 3      p  863.675  970.425    0000 msajc057 Phoneme ITEM

So, “^” should not be translated with “is dominated by”, but rather into “is linked to”. However, you have to use the hash tag in order to get labels and times of the Phoneme level here. You can leave out the hash tag if the level you are interested in is the first one in your question.

You can query multiple dominations, however, like in the sequencing case, you have to use brackets:

## What is the query to retrieve all ITEMs on the “Phonetic” level that are part of a strong syllable (labeled “S”) and belong to the words “amongst” or “beautiful”?
query(ae, "[[Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful]")

## segment  list from database:  ae 
## query was:  [[Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful] 
##   labels    start      end session   bundle    level    type
## 1      m  256.925  340.175    0000 msajc003 Phonetic SEGMENT
## 2      V  340.175  426.675    0000 msajc003 Phonetic SEGMENT
## 3      N  426.675  483.425    0000 msajc003 Phonetic SEGMENT
## 4      s  483.425  566.925    0000 msajc003 Phonetic SEGMENT
## 5      t  566.925  596.675    0000 msajc003 Phonetic SEGMENT
## 6      H  596.675  674.175    0000 msajc003 Phonetic SEGMENT
## 7     db 2033.675 2150.175    0000 msajc003 Phonetic SEGMENT
## 8      j 2150.175 2211.175    0000 msajc003 Phonetic SEGMENT
## 9     u: 2211.175 2283.675    0000 msajc003 Phonetic SEGMENT

# same as
query(ae, "[[#Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful]")

## to get the "Text"-items instead, use
query(ae, "[[Phonetic =~ .* ^ Syllable == S] ^ #Text == amongst | beautiful]")

## segment  list from database:  ae 
## query was:  [[Phonetic =~ .* ^ Syllable == S] ^ #Text == amongst | beautiful] 
##      labels    start      end session   bundle level type
## 1   amongst  187.425  674.175    0000 msajc003  Text ITEM
## 2 beautiful 2033.675 2604.425    0000 msajc003  Text ITEM

2.2.1.4 Functions

The are three position functions and one count function. As the latter function results in a number, queries involve a comparison with a number (by using one of “==”, “!=”, “>”, “>=”, “<”, “<=”, see below); The result of the position functions is logical; we therefore ask, whether a certain condition is TRUE or FALSE.

2.2.1.4.1 Position functions

There are three position functions, Start(), Medial(), and End(). Example queries are:

## What is the query to retrieve all word-initial syllables?
## (NB: syllable labels are either "W" or "S")
query(ae, "[Start(Word, Syllable) == TRUE]")

## segment  list from database:  ae 
## query was:  [Start(Word, Syllable) == TRUE] 
##   labels    start      end session   bundle    level type
## 1      W  187.425  256.925    0000 msajc003 Syllable ITEM
## 2      S  674.175  739.925    0000 msajc003 Syllable ITEM
## 3      S  739.925 1289.425    0000 msajc003 Syllable ITEM
## 4      W 1289.425 1463.175    0000 msajc003 Syllable ITEM
## 5      W 1463.175 1634.425    0000 msajc003 Syllable ITEM
## 6      W 1634.425 1791.425    0000 msajc003 Syllable ITEM

...

Examples for Medial() and End() are:

## What is the query to retrieve all word-medial syllables?
query(ae, "[Medial(Word, Syllable) == TRUE]")
## What is the query to retrieve all word-final syllables?
query(ae, "[End(Word, Syllable) == TRUE]")

## segment  list from database:  ae 
## query was:  [Medial(Word, Syllable) == TRUE] 
##   labels    start      end session   bundle    level type
## 1      S 1791.425 1945.425    0000 msajc003 Syllable ITEM
## 2      W 2283.675 2361.925    0000 msajc003 Syllable ITEM
## 3      S 2078.475 2228.475    0000 msajc010 Syllable ITEM
## 4      W 2219.975 2304.775    0000 msajc012 Syllable ITEM
## 5      W 2304.775 2533.975    0000 msajc012 Syllable ITEM
## 6      W  639.575  706.575    0000 msajc015 Syllable ITEM

...

## segment  list from database:  ae 
## query was:  [End(Word, Syllable) == TRUE] 
##   labels    start      end session   bundle    level type
## 1      S  256.925  674.175    0000 msajc003 Syllable ITEM
## 2      S  674.175  739.925    0000 msajc003 Syllable ITEM
## 3      S  739.925 1289.425    0000 msajc003 Syllable ITEM
## 4      W 1289.425 1463.175    0000 msajc003 Syllable ITEM
## 5      W 1463.175 1634.425    0000 msajc003 Syllable ITEM
## 6      W 1945.425 2150.175    0000 msajc003 Syllable ITEM

...

Everything not being first or last element is medial:

query(ae, "[Medial(Word, Phoneme) == TRUE]")

## segment  list from database:  ae 
## query was:  [Medial(Word, Phoneme) == TRUE] 
##   labels   start      end session   bundle   level type
## 1      m 256.925  340.175    0000 msajc003 Phoneme ITEM
## 2      V 340.175  426.675    0000 msajc003 Phoneme ITEM
## 3      N 426.675  483.425    0000 msajc003 Phoneme ITEM
## 4      s 483.425  566.925    0000 msajc003 Phoneme ITEM
## 5      r 892.675  949.925    0000 msajc003 Phoneme ITEM
## 6      E 949.925 1031.925    0000 msajc003 Phoneme ITEM

...

2.2.1.4.2 Count function

The count function’s name is Num(). Num(x,y) counts how many y are in x. You can therefore ask things like the following:

## What is the query to retrieve all words that contain two syllables?
query(ae, "[Num(Text, Syllable) == 2]")

## segment  list from database:  ae 
## query was:  [Num(Text, Syllable) == 2] 
##      labels    start      end session   bundle level type
## 1   amongst  187.425  674.175    0000 msajc003  Text ITEM
## 2    futile  571.925 1090.975    0000 msajc010  Text ITEM
## 3       any 1436.725 1628.475    0000 msajc010  Text ITEM
## 4   further 1628.475 1957.775    0000 msajc010  Text ITEM
## 5    shiver 1650.975 1994.975    0000 msajc012  Text ITEM
## 6    itches  299.975  662.425    0000 msajc022  Text ITEM
## 7    always  775.475 1280.175    0000 msajc022  Text ITEM
## 8  tempting 1400.675 1806.275    0000 msajc022  Text ITEM
## 9   display  666.675 1211.175    0000 msajc057  Text ITEM
## 10 attracts 1211.175 1578.675    0000 msajc057  Text ITEM
## 11     ever 2480.425 2794.925    0000 msajc057  Text ITEM

## What is the query to retrieve all syllables that contain more than four phonemes?
query(ae, "[Num(Syllable, Phoneme) > 4]")

## segment  list from database:  ae 
## query was:  [Num(Syllable, Phoneme) > 4] 
##   labels    start      end session   bundle    level type
## 1      S  256.925  674.175    0000 msajc003 Syllable ITEM
## 2      S  739.925 1289.425    0000 msajc003 Syllable ITEM
## 3      W 2228.475 2753.975    0000 msajc010 Syllable ITEM
## 4      S 1213.075 1797.425    0000 msajc015 Syllable ITEM
## 5      S 1890.275 2469.525    0000 msajc022 Syllable ITEM
## 6      S 1964.425 2554.175    0000 msajc023 Syllable ITEM
## 7      S 1247.925 1578.675    0000 msajc057 Syllable ITEM

2.2.1.5 Conjunction

You can use & to search within several attribute definitions on the same level. For example, the level Word in ae has several attribute definitions

list_attributeDefinitions(ae,level="Word")

##     name level   type hasLabelGroups hasLegalLabels
## 1   Word  Word STRING          FALSE          FALSE
## 2 Accent  Word STRING          FALSE          FALSE
## 3   Text  Word STRING          FALSE          FALSE

We could, therefore, look for all accented (“S”) words by …

query(ae, "[Text =~.* & Accent == S]")

## segment  list from database:  ae 
## query was:  [Text =~.* & Accent == S] 
##        labels    start      end session   bundle level type
## 1     amongst  187.425  674.175    0000 msajc003  Text ITEM
## 2     friends  739.925 1289.425    0000 msajc003  Text ITEM
## 3   beautiful 2033.675 2604.425    0000 msajc003  Text ITEM
## 4      futile  571.925 1090.975    0000 msajc010  Text ITEM
## 5     further 1628.475 1957.775    0000 msajc010  Text ITEM
## 6  resistance 1957.775 2753.975    0000 msajc010  Text ITEM
## 7       chill  379.525  744.525    0000 msajc012  Text ITEM
## 8        wind  744.525 1082.975    0000 msajc012  Text ITEM
## 9      caused 1082.975 1456.475    0000 msajc012  Text ITEM
## 10     shiver 1650.975 1994.975    0000 msajc012  Text ITEM
## 11  violently 1994.975 2692.325    0000 msajc012  Text ITEM
## 12 emphasized  425.375 1129.075    0000 msajc015  Text ITEM
## 13  strengths 1213.075 1797.425    0000 msajc015  Text ITEM
## 14 concealing 2104.075 2693.675    0000 msajc015  Text ITEM
## 15 weaknesses 2780.725 3456.825    0000 msajc015  Text ITEM
## 16     itches  299.975  662.425    0000 msajc022  Text ITEM
## 17     always  775.475 1280.175    0000 msajc022  Text ITEM
## 18         so 1113.675 1400.675    0000 msajc022  Text ITEM
## 19   tempting 1400.675 1806.275    0000 msajc022  Text ITEM
## 20    scratch 1890.275 2469.525    0000 msajc022  Text ITEM
## 21         no 1774.925 1964.425    0000 msajc023  Text ITEM
## 22      risks 1964.425 2554.175    0000 msajc023  Text ITEM
## 23    display  666.675 1211.175    0000 msajc057  Text ITEM
## 24       more 1578.675 1824.425    0000 msajc057  Text ITEM
## 25       ever 2480.425 2794.925    0000 msajc057  Text ITEM

Another usage of “&” is to combine a basic query with a function, e.g.

## What is the query to retrieve all non-word-final “S” syllables?
query(ae, "[[Syllable == S  &  End(Word, Syllable) == FALSE]^#Text=~.*]")

## segment  list from database:  ae 
## query was:  [[Syllable == S  &  End(Word, Syllable) == FALSE]^#Text=~.*] 
##        labels    start      end session   bundle level type
## 1  considered 1634.425 2150.175    0000 msajc003  Text ITEM
## 2   beautiful 2033.675 2604.425    0000 msajc003  Text ITEM
## 3      futile  571.925 1090.975    0000 msajc010  Text ITEM
## 4         any 1436.725 1628.475    0000 msajc010  Text ITEM
## 5     further 1628.475 1957.775    0000 msajc010  Text ITEM
## 6  resistance 1957.775 2753.975    0000 msajc010  Text ITEM
## 7      shiver 1650.975 1994.975    0000 msajc012  Text ITEM
## 8   violently 1994.975 2692.325    0000 msajc012  Text ITEM
## 9  emphasized  425.375 1129.075    0000 msajc015  Text ITEM
## 10 concealing 2104.075 2693.675    0000 msajc015  Text ITEM
## 11 weaknesses 2780.725 3456.825    0000 msajc015  Text ITEM
## 12     itches  299.975  662.425    0000 msajc022  Text ITEM
## 13     always  775.475 1280.175    0000 msajc022  Text ITEM
## 14   tempting 1400.675 1806.275    0000 msajc022  Text ITEM
## 15  customers 1824.425 2367.775    0000 msajc057  Text ITEM
## 16       ever 2480.425 2794.925    0000 msajc057  Text ITEM

Lesson 3 - Querying an EMU-SDMS database

Jonathan Harrington / Ulrich Reubold

29 October 2018

0 Example of an analysis

1. Simple queries with `query()`

1.1 Results of `query()`: segment lists

1.2 Inherited times

1.1.3 `requery_hier()` and `requery_seq()`

1.1.3.1 Relation types

1.1.3.1.1 Dominance

1.1.3.1.2 Sequence

2. The Emu Query Language `EQL`

2.1 Single argument queries

2.1.1 Equality/inequality/matching/non-matching

2.1.1.1 Equality

2.1.1.2 Inequality

2.1.1.3 The `OR` operator

2.2 Complex queries

2.2.1 Sequencial and dominance queries

2.2.1.1 Bracketing

2.2.1.2 Sequential queries

2.2.1.3 Domination queries

2.2.1.4 Functions

2.2.1.4.1 Position functions

2.2.1.4.2 Count function

2.2.1.5 Conjunction

Lesson 3 - Querying an EMU-SDMS database

Jonathan Harrington / Ulrich Reubold

29 October 2018

0 Example of an analysis

1. Simple queries with query()

1.1 Results of query(): segment lists

1.2 Inherited times

1.1.3 requery_hier() and requery_seq()

1.1.3.1 Relation types

1.1.3.1.1 Dominance

1.1.3.1.2 Sequence

2. The Emu Query Language EQL

2.1 Single argument queries

2.1.1 Equality/inequality/matching/non-matching

2.1.1.1 Equality

2.1.1.2 Inequality

2.1.1.3 The OR operator

2.2 Complex queries

2.2.1 Sequencial and dominance queries

2.2.1.1 Bracketing

2.2.1.2 Sequential queries

2.2.1.3 Domination queries

2.2.1.4 Functions

2.2.1.4.1 Position functions

2.2.1.4.2 Count function

2.2.1.5 Conjunction

1. Simple queries with `query()`

1.1 Results of `query()`: segment lists

1.1.3 `requery_hier()` and `requery_seq()`

2. The Emu Query Language `EQL`

2.1.1.3 The `OR` operator