We will use a demo of an EMU-SDMS-database in this lecture that comes with the emuR
package; we will ‘create’ this database by using the function create_emuRdemoData
; the data base will be saved at mypath
:
# load packages
library(emuR)
library(dplyr)
library(ggplot2)
# create demo data in directory
# provided by tempdir()
create_emuRdemoData(dir = mypath)
# create path to demo database
path2ae = file.path(mypath, "emuR_demoData", "ae_emuDB")
# load database
ae = load_emuDB(path2ae, verbose = F)
summary(ae)
## Name: ae
## UUID: 0fc618dc-8980-414d-8c7a-144a649ce199
## Directory: /Users/reubold/myEMURdata/emuR_demoData/ae_emuDB
## Session count: 1
## Bundle count: 7
## Annotation item count: 736
## Label count: 844
## Link count: 785
##
## Database configuration:
##
## SSFF track definitions:
## name columnName fileExtension
## 1 dft dft dft
## 2 fm fm fms
##
## Level definitions:
## name type nrOfAttrDefs attrDefNames
## 1 Utterance ITEM 1 Utterance;
## 2 Intonational ITEM 1 Intonational;
## 3 Intermediate ITEM 1 Intermediate;
## 4 Word ITEM 3 Word; Accent; Text;
## 5 Syllable ITEM 1 Syllable;
## 6 Phoneme ITEM 1 Phoneme;
## 7 Phonetic SEGMENT 1 Phonetic;
## 8 Tone EVENT 1 Tone;
## 9 Foot ITEM 1 Foot;
##
## Link definitions:
## type superlevelName sublevelName
## 1 ONE_TO_MANY Utterance Intonational
## 2 ONE_TO_MANY Intonational Intermediate
## 3 ONE_TO_MANY Intermediate Word
## 4 ONE_TO_MANY Word Syllable
## 5 ONE_TO_MANY Syllable Phoneme
## 6 MANY_TO_MANY Phoneme Phonetic
## 7 ONE_TO_MANY Syllable Tone
## 8 ONE_TO_MANY Intonational Foot
## 9 ONE_TO_MANY Foot Syllable
In the level definitions
, we see one EVENT
level (“Tone”, one point in time), one SEGMENT
level (“Phonetic”, with start and end times), and several ITEM
levels, e.g. “Syllabe” or “Word”, which inherit time information from the level “Phonetic”. In the link definitions
, we can see a very rich annotation structure, which results in the following tree-like structure for the first utterance:
serve(ae,autoOpenURL = "https://ips-lmu.github.io/EMU-webApp/?autoConnect=true")
Figure 1: Hierarchy of the first utterance of the database ae
We can also see so-called SSFF track definitions
, which means in this case that - amongst other things - pre-calculated formants are available.
You should be informed that all seven utterances were read by the same speaker, so there will be no concerns about vowel normalisation. The male is a speaker of Australian English (therefore the database’s name ae
).
We will now present a little example of how such a database could be analysed. To do so, we will use the function query()
to query certain segments, get_trackdata()
and other functions to read formants into R
, and requery_hier()
for further re-analysis.
First of all, we want to plot the edges of the Australian English vowel space. To do so, we will query back and front closed, mid, and open vowels.
# query A and V(front and back open vowels),
# i:and u: (front and back closed vowels), and
# E and o: (front and back mid vowels)
ae_vowels = query(emuDBhandle = ae,query = "[Phonetic== V|A|i:|u:|o:|E]")
#get the formants:
ae_formants = get_trackdata(ae, seglist = ae_vowels,ssffTrackName = "fm", resultType = "emuRtrackdata")
#get the formants at the vowels' temporal midpoints:
ae_formants_norm = normalize_length(ae_formants)
ae_midpoints = ae_formants_norm %>% filter(times_norm==0.5)
#plot the vowel space:
ggplot(ae_midpoints) +
aes(x=T2,y=T1,label=labels,col=labels) +
geom_text() +
scale_y_reverse() + scale_x_reverse() +
labs(x = "F2 (Hz)", y = "F1 (Hz)") +
theme(legend.position="none")
This figure shows a vowel space as one would expect it: open vowels are near the bottom, closed vowels are on the top, mid vowels in the mid. Front vowels are on the left side of the plot, and back vowels are on the right-hand side. However, there is an exception: only one out of four /u:/s is actually really back, the other three are extremely fronted.
In order to re-inspect the data, we will henceforth concentrate on /u:/:
ggplot(ae_midpoints%>%filter(labels=="u:")) +
aes(x=T2,y=T1,label=labels,col=labels) +
geom_text() +
scale_y_reverse() + scale_x_reverse() +
labs(x = "F2 (Hz)", y = "F1 (Hz)") +
theme(legend.position="none")
In order to find out why three out of four /u:/ are so front, we should find out the words; this can be done by examining to which words the four /u:/s were linked (by means of requery_hier()
):
ae_midpoints$Word = requery_hier(ae,seglist = ae_vowels, level = "Text")$labels
ggplot(ae_midpoints%>%filter(labels=="u:")) +
aes(x=T2,y=T1,label=Word,col=labels) +
geom_text() +
scale_y_reverse() + scale_x_reverse() +
labs(x = "F2 (Hz)", y = "F1 (Hz)") +
theme(legend.position="none")
As we can see, the back /u:/ comes from the word “to”, whereas the front vowels are linked to the words “new”, “beautiful”, and “futile”. All three words have in common that /u:/ should be preceded by /j/. This could cause the fronting of /u:/.
However, we should test whether our assumption is true. We will now query the sequences of the preceding consonant and /u:/, and analyse these sequences’ F2 trajectories:
Cu = query(emuDBhandle = ae,query = "[Phonetic=~ .* -> Phonetic== u:]")
Cu_formants = get_trackdata(ae, seglist = Cu,ssffTrackName = "fm", resultType = "emuRtrackdata")
ggplot(Cu_formants) +
aes(x=times_rel,y=T2,col=labels,group=sl_rowIdx) +
geom_line() +
labs(x = "Duration (ms)", y = "F2 (Hz)")
In the word “to”, the preceding segment is labelled “H”, i.e. the aspiration of /t/. You can clearly see in the plot that the F2 trajectory is coming from a relatively high F2 locus, however, this locus is still much lower than F2 in /j/ (which is, of course, very similar to F2 in an /i:/ vowel). Therefore, we can conclude that the preceding /j/ is causing /u:/ to front in that context.
This little analysis was very dependent on several different kinds of queries and re-queries, and we would like to introduce you to the main concepts of these functions:
query()
We will start with very basic queries. The function for conducting queries is simply called query
; this functions needs at least two arguments, emuDBhandle
and query
, e.g.:
V = query(emuDBhandle = ae,query = "[Phonetic==V]")
The expression ["Phonetic==V"]
is a legal expression in the EMU Query Language (EQL)
(details see below) and could be translated into “which labels in the level Phonetic are equal to the label ‘V’” (and ‘V’ is the SAMPA for English equivalent to IPA /ʌ/, i.e. the vowel in words like <cut>).
query()
: segment listsAn emuR segment list is a list of segment descriptors. Each segment descriptor describes a sequence of annotation elements. The list is usually a result of an emuDB query using function query
like in the present example. query
has found three tokens of [V]:
V
## segment list from database: ae
## query was: [Phonetic==V]
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 V 1943.175 2037.425 0000 msajc057 Phonetic SEGMENT
This object is an attributed data.frame, with one row per segment descriptor:
Data frame columns
labels: labels or sequenced labels of segments concatenated by ‘->’
start: onset time in milliseconds
end: offset time in milliseconds
session: session name
bundle: bundle name (= utterance name)
level: name of the level that has been searched
type: type of “segment” row: ITEM
: symbolic item, EVENT
: event item, SEGMENT
: segment
Additional hidden columns
db_uuid: UUID of emuDB (= a unique identifier)
startItemID: item ID of first element of sequence
endItemID: item ID of last element of sequence
sampleStart: start sample position
sampleEnd: end sample position
sampleRate: sample rate
Attributes
database: name of emuDB
query: Query string
This makes it easy to access certain informations, e.g.
#Get labels:
V$labels
## [1] "V" "V" "V"
#Get start times:
V$start
## [1] 187.425 340.175 1943.175
#Get end times:
V$end
## [1] 256.925 426.675 2037.425
#durations of the [V]s
V$end - V$start
## [1] 69.50 86.50 94.25
#for the latter, there is also a special function in emuR:
dur(V)
## [1] 69.50 86.50 94.25
What happens, if we were looking for a timeless ITEM
?
#Phonetic=EVENT, Phoneme=ITEM
list_levelDefinitions(ae)
## name type nrOfAttrDefs attrDefNames
## 1 Utterance ITEM 1 Utterance;
## 2 Intonational ITEM 1 Intonational;
## 3 Intermediate ITEM 1 Intermediate;
## 4 Word ITEM 3 Word; Accent; Text;
## 5 Syllable ITEM 1 Syllable;
## 6 Phoneme ITEM 1 Phoneme;
## 7 Phonetic SEGMENT 1 Phonetic;
## 8 Tone EVENT 1 Tone;
## 9 Foot ITEM 1 Foot;
V_phoneme=query(emuDBhandle = ae,query = "[Phoneme==V]")
V_phoneme
## segment list from database: ae
## query was: [Phoneme==V]
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phoneme ITEM
## 2 V 340.175 426.675 0000 msajc003 Phoneme ITEM
## 3 V 1943.175 2037.425 0000 msajc057 Phoneme ITEM
V
## segment list from database: ae
## query was: [Phonetic==V]
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 V 1943.175 2037.425 0000 msajc057 Phonetic SEGMENT
As you can see, V
and V_phoneme
both present times, although Phoneme
is a timeless ITEM
level. Times are inheritet from the SEGMENT
level Phonetic
. This, of course, will only work if Phoneme and Phonetic levels are linked (and they are linked, see also Figure 1):
list_linkDefinitions(ae)
## type superlevelName sublevelName
## 1 ONE_TO_MANY Utterance Intonational
## 2 ONE_TO_MANY Intonational Intermediate
## 3 ONE_TO_MANY Intermediate Word
## 4 ONE_TO_MANY Word Syllable
## 5 ONE_TO_MANY Syllable Phoneme
## 6 MANY_TO_MANY Phoneme Phonetic
## 7 ONE_TO_MANY Syllable Tone
## 8 ONE_TO_MANY Intonational Foot
## 9 ONE_TO_MANY Foot Syllable
If the ITEM
we are interested in was linked to several time-aligned segments, we would have to use query
’s parameter timeRefSegmentLevel
to choose the segment level from which query
derives time information. However, this is not the case here.
The calculation of inherited times can be time-consuming. In many cases, we may not be interested in time information, but only in the labels; we therefore can turn off the calculation of inherited times with an additional parameter: calcTimes = FALSE
:
#Phonetic=EVENT, Phoneme=ITEM
list_levelDefinitions(ae)
## name type nrOfAttrDefs attrDefNames
## 1 Utterance ITEM 1 Utterance;
## 2 Intonational ITEM 1 Intonational;
## 3 Intermediate ITEM 1 Intermediate;
## 4 Word ITEM 3 Word; Accent; Text;
## 5 Syllable ITEM 1 Syllable;
## 6 Phoneme ITEM 1 Phoneme;
## 7 Phonetic SEGMENT 1 Phonetic;
## 8 Tone EVENT 1 Tone;
## 9 Foot ITEM 1 Foot;
V_phoneme2=query(emuDBhandle = ae,query = "[Phoneme==V]",calcTimes = FALSE)
V_phoneme2
## segment list from database: ae
## query was: [Phoneme==V]
## labels start end session bundle level type
## 1 V NA NA 0000 msajc003 Phoneme ITEM
## 2 V NA NA 0000 msajc003 Phoneme ITEM
## 3 V NA NA 0000 msajc057 Phoneme ITEM
In this case, all entries in start
and end
are NA
(= N
ot A
vailable).
requery_hier()
and requery_seq()
There are two (self-explaining) types of relations in the EMU-SDMS:
dominance
sequence
By which words are the “V”s dominated? We could find out by a hierarchical re-query:
#find all "V"-labels in `ae`
V=query(emuDBhandle = ae,query = "[Phonetic==V]")
Now put this segment list into requery_hier() and look for the linked ITEM
in level Word
, attribute Text
:
(V_Text = requery_hier(emuDBhandle = ae,seglist = V,level = "Text"))
## segment list from database: ae
## query was: FROM REQUERY
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 3 customers 1824.425 2367.775 0000 msajc057 Text ITEM
Your result will be the ITEM
labels and calculated times (for the corresponding words).
You could also wish to know what “V”s sequential contexts are, e.g. the subsequent segments. We use the sequential structure of the database, and the command requery_seq()
, with offset = 1
(offset = -1
would find the sound the precedes ‘V’):
requery_seq(emuDBhandle = ae,seglist = V,offset = 1)
## segment list from database: ae
## query was: FROM REQUERY
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phonetic SEGMENT
## 2 N 426.675 483.425 0000 msajc003 Phonetic SEGMENT
## 3 s 2037.425 2085.175 0000 msajc057 Phonetic SEGMENT
We will discuss both commands more extensively later in the seminar, but wanted to show that it is possible to use the annotation structure and a given segment list to retrieve additional information afterwards. We could use both commands to express more complex queries: e.g. we could look for all “V” within the word “amongst” by querying “V”, then requery all linked words, and then deletin all “V” that are not linked to “amongst”. However, this would be rather cumbersome. A much easier way to conduct more complicated queries is the use of all possibilities of emuR’s query language EQL
within the command query
. However, before we can use more complex queries, we will have to learn the Emu Query Language.
EQL
To learn about the functionality of the EQL
, you can always type
vignette("EQL")
As we have seen above, any query must be placed within " "
, and any query can be placed within [ ]
. You minimally have to give a level, and some sort of representation for a label (this may be a regular expression), unless you do not use one of the position
and count
functions (see below).
In the examples above, we had looked for the equality of the labels to “V” on the level “Phonetic” (in the database ae
):
query(emuDBhandle = ae, query = "Phonetic == V")
## segment list from database: ae
## query was: Phonetic == V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 V 1943.175 2037.425 0000 msajc057 Phonetic SEGMENT
So “==” is the equality operator. For backward compatibility with earlier versions of emuR, a single “=” is also allowed (but we ask you to prefer “==” instead):
query(emuDBhandle = ae, query = "Phonetic = V")
## segment list from database: ae
## query was: Phonetic = V
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 V 1943.175 2037.425 0000 msajc057 Phonetic SEGMENT
We can also search everything except “V” by the use of !=
query(emuDBhandle = ae, query = "Phonetic != V")
(We do not show the resulting segment list, because it is very long.) So one way to get ‘everything’ would be to query something that is probably not in your database, like “xyz”. However, there is a much better way: Using so-called regular expressions. To use these, you have to type “=~”, followed by the regular expression, in this case .*
(meaning: any character (.
) zero or more times (*
) ). Please do not worry too much about regular expressions. This example will probably be the only one in this seminar:
Everything1 = query(emuDBhandle = ae, query = "Phonetic != xyz")
Everything2 = query(emuDBhandle = ae, query = "Phonetic =~ .*")
any(Everything1 != Everything2) # should result in FALSE if both are equal everywhere
## [1] FALSE
You can also negate the latter operator by “!~”. An example would be:
# What is the query to retrieve all ITEMs in the “Text” level that don’t begin with ‘a’?
query(emuDBhandle = ae, query = "Text !~ a.*")
## segment list from database: ae
## query was: Text !~ a.*
## labels start end session bundle level type
## 1 her 674.175 739.925 0000 msajc003 Text ITEM
## 2 friends 739.925 1289.425 0000 msajc003 Text ITEM
## 3 she 1289.425 1463.175 0000 msajc003 Text ITEM
## 4 was 1463.175 1634.425 0000 msajc003 Text ITEM
## 5 considered 1634.425 2150.175 0000 msajc003 Text ITEM
## 6 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
## 7 it 299.975 411.675 0000 msajc010 Text ITEM
## 8 is 411.675 571.925 0000 msajc010 Text ITEM
## 9 futile 571.925 1090.975 0000 msajc010 Text ITEM
## 10 to 1090.975 1222.325 0000 msajc010 Text ITEM
## 11 offer 1222.325 1391.025 0000 msajc010 Text ITEM
## 12 further 1628.475 1957.775 0000 msajc010 Text ITEM
## 13 resistance 1957.775 2753.975 0000 msajc010 Text ITEM
## 14 the 299.975 379.525 0000 msajc012 Text ITEM
## 15 chill 379.525 744.525 0000 msajc012 Text ITEM
## 16 wind 744.525 1082.975 0000 msajc012 Text ITEM
## 17 caused 1082.975 1456.475 0000 msajc012 Text ITEM
## 18 them 1456.475 1564.975 0000 msajc012 Text ITEM
## 19 to 1564.975 1650.975 0000 msajc012 Text ITEM
## 20 shiver 1650.975 1994.975 0000 msajc012 Text ITEM
## 21 violently 1994.975 2692.325 0000 msajc012 Text ITEM
## 22 he 299.975 425.375 0000 msajc015 Text ITEM
## 23 emphasized 425.375 1129.075 0000 msajc015 Text ITEM
## 24 his 1129.075 1368.075 0000 msajc015 Text ITEM
## 25 strengths 1213.075 1797.425 0000 msajc015 Text ITEM
## 26 while 1797.425 2104.075 0000 msajc015 Text ITEM
## 27 concealing 2104.075 2693.675 0000 msajc015 Text ITEM
## 28 his 2693.675 2780.725 0000 msajc015 Text ITEM
## 29 weaknesses 2780.725 3456.825 0000 msajc015 Text ITEM
## 30 itches 299.975 662.425 0000 msajc022 Text ITEM
## 31 so 1113.675 1400.675 0000 msajc022 Text ITEM
## 32 tempting 1400.675 1806.275 0000 msajc022 Text ITEM
## 33 to 1806.275 1890.275 0000 msajc022 Text ITEM
## 34 scratch 1890.275 2469.525 0000 msajc022 Text ITEM
## 35 I'll 299.975 513.925 0000 msajc023 Text ITEM
## 36 hedge 513.925 819.025 0000 msajc023 Text ITEM
## 37 my 819.025 1038.775 0000 msajc023 Text ITEM
## 38 bets 1038.775 1421.925 0000 msajc023 Text ITEM
## 39 take 1495.275 1774.925 0000 msajc023 Text ITEM
## 40 no 1774.925 1964.425 0000 msajc023 Text ITEM
## 41 risks 1964.425 2554.175 0000 msajc023 Text ITEM
## 42 this 299.975 475.775 0000 msajc057 Text ITEM
## 43 new 475.775 666.675 0000 msajc057 Text ITEM
## 44 display 666.675 1211.175 0000 msajc057 Text ITEM
## 45 more 1578.675 1824.425 0000 msajc057 Text ITEM
## 46 customers 1824.425 2367.775 0000 msajc057 Text ITEM
## 47 than 2367.775 2480.425 0000 msajc057 Text ITEM
## 48 ever 2480.425 2794.925 0000 msajc057 Text ITEM
So, there are four similar operators, two for equality matching, and two for inequalitiy:
Symbol | Meaning |
---|---|
== |
equality |
=~ |
regular expression matching |
!= |
inequality |
!~ |
regular expression non-matching |
OR
operatorUse |
to look for one label and another one(s), e.g. ‘m’ or ‘n’ can be retrieved via:
query(emuDBhandle = ae, query = "Phonetic == m|n")
## segment list from database: ae
## query was: Phonetic == m|n
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phonetic SEGMENT
## 2 n 1031.925 1195.925 0000 msajc003 Phonetic SEGMENT
## 3 n 1741.425 1791.425 0000 msajc003 Phonetic SEGMENT
## 4 n 1515.475 1554.475 0000 msajc010 Phonetic SEGMENT
## 5 n 2430.975 2528.475 0000 msajc010 Phonetic SEGMENT
## 6 n 894.975 1022.975 0000 msajc012 Phonetic SEGMENT
## 7 m 1490.425 1564.975 0000 msajc012 Phonetic SEGMENT
## 8 n 2402.275 2474.875 0000 msajc012 Phonetic SEGMENT
## 9 m 496.575 558.575 0000 msajc015 Phonetic SEGMENT
## 10 n 2226.575 2271.075 0000 msajc015 Phonetic SEGMENT
## 11 n 3046.125 3067.675 0000 msajc015 Phonetic SEGMENT
## 12 m 1587.175 1655.675 0000 msajc022 Phonetic SEGMENT
## 13 m 819.025 902.925 0000 msajc023 Phonetic SEGMENT
## 14 n 1434.775 1495.275 0000 msajc023 Phonetic SEGMENT
## 15 n 1774.925 1833.925 0000 msajc023 Phonetic SEGMENT
## 16 n 508.675 543.975 0000 msajc057 Phonetic SEGMENT
## 17 m 1629.675 1709.175 0000 msajc057 Phonetic SEGMENT
## 18 m 2173.425 2233.425 0000 msajc057 Phonetic SEGMENT
## 19 n 2447.675 2480.425 0000 msajc057 Phonetic SEGMENT
You can expand this as well:
mnN = query(emuDBhandle = ae, query = "Phonetic == m | n | N")
summary(mnN)
## segment list from database: ae
## query was: Phonetic == m | n | N
## with 23 segments
##
## Segment distribution:
##
## m n N
## 7 12 4
In all hierarchical queries, bracketing with [ ]
is required to structure your query. In simple queries, however, brackets are optional.
mnN = query(emuDBhandle = ae, query = "[Phonetic == m|n|N]")
summary(mnN)
## segment list from database: ae
## query was: [Phonetic == m|n|N]
## with 23 segments
##
## Segment distribution:
##
## m n N
## 7 12 4
However, this sequential query will fail, because of missing brackets:
query(ae, "Phonetic == V -> Phonetic == m")
Use the ->
operator to find sequences of segments:
query(ae, "[Phonetic == V -> Phonetic == m]")
## segment list from database: ae
## query was: [Phonetic == V -> Phonetic == m]
## labels start end session bundle level type
## 1 V->m 187.425 340.175 0000 msajc003 Phonetic SEGMENT
Note: all row entries in the resulting segment list have the start time of “V”, the end time of “m” and their labels will be “V->m”. Change this with the so-called result modifier
hash tag “#”:
query(ae, "[#Phonetic == V -> Phonetic == m]") # finds V, if V is followed by m
## segment list from database: ae
## query was: [#Phonetic == V -> Phonetic == m]
## labels start end session bundle level type
## 1 V 187.425 256.925 0000 msajc003 Phonetic SEGMENT
query(ae, "[Phonetic == V -> #Phonetic == m]") #finds m, if m is preceded by V
## segment list from database: ae
## query was: [Phonetic == V -> #Phonetic == m]
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phonetic SEGMENT
Keep in mind that only one hash tag per query is allowed.
You can search sequences of sequences, however, you have to use bracketing; otherwise, you get an error like in
query(ae, "[Phonetic == @ -> Phonetic == n -> Phonetic == s]")
The correct code would be as follows:
query(ae, "[[Phonetic == @ -> Phonetic == n ] -> Phonetic == s]")
## segment list from database: ae
## query was: [[Phonetic == @ -> Phonetic == n ] -> Phonetic == s]
## labels start end session bundle level type
## 1 @->n->s 1715.425 1893.175 0000 msajc003 Phonetic SEGMENT
## 2 @->n->s 2382.475 2753.975 0000 msajc010 Phonetic SEGMENT
## 3 @->n->s 2200.875 2408.575 0000 msajc015 Phonetic SEGMENT
A much more complex example would be:
## What is the query to retrieve all sequences of ITEMs containing labels “offer” followed by two arbitrary labels followed by “resistance”?
query(ae, "[[[Text == offer -> Text =~ .*] -> Text =~ .* ] -> Text == resistance]")
## segment list from database: ae
## query was: [[[Text == offer -> Text =~ .*] -> Text =~ .* ] -> Text == resistance]
## labels start end session bundle level
## 1 offer->any->further->resistance 1957.775 2753.975 0000 msajc010 Text
## type
## 1 ITEM
Use the operator ^
for all queries, in which two linked levels are involved; e.g.
list_linkDefinitions(ae)
## type superlevelName sublevelName
## 1 ONE_TO_MANY Utterance Intonational
## 2 ONE_TO_MANY Intonational Intermediate
## 3 ONE_TO_MANY Intermediate Word
## 4 ONE_TO_MANY Word Syllable
## 5 ONE_TO_MANY Syllable Phoneme
## 6 MANY_TO_MANY Phoneme Phonetic
## 7 ONE_TO_MANY Syllable Tone
## 8 ONE_TO_MANY Intonational Foot
## 9 ONE_TO_MANY Foot Syllable
## What is the query to retrieve all ITEMs containing the label “p” in the “Phoneme” level that occur in strong syllables (i.e. dominated by / linked to ITEMs of the level “Syllable” that contain the label “S” (=STRONG, as opposed to "W"=WEAK))?
query(ae, "[Phoneme == p ^ Syllable == S]")
## segment list from database: ae
## query was: [Phoneme == p ^ Syllable == S]
## labels start end session bundle level type
## 1 p 558.575 639.575 0000 msajc015 Phoneme ITEM
## 2 p 1655.675 1698.675 0000 msajc022 Phoneme ITEM
## 3 p 863.675 970.425 0000 msajc057 Phoneme ITEM
However, the operator is not directional; although “Syllable” dominates “Phoneme”, you could have asked
query(ae, "[Syllable == S ^ #Phoneme == p]")
## segment list from database: ae
## query was: [Syllable == S ^ #Phoneme == p]
## labels start end session bundle level type
## 1 p 558.575 639.575 0000 msajc015 Phoneme ITEM
## 2 p 1655.675 1698.675 0000 msajc022 Phoneme ITEM
## 3 p 863.675 970.425 0000 msajc057 Phoneme ITEM
So, “^” should not be translated with “is dominated by”, but rather into “is linked to”. However, you have to use the hash tag in order to get labels and times of the Phoneme level here. You can leave out the hash tag if the level you are interested in is the first one in your question.
You can query multiple dominations, however, like in the sequencing case, you have to use brackets:
## What is the query to retrieve all ITEMs on the “Phonetic” level that are part of a strong syllable (labeled “S”) and belong to the words “amongst” or “beautiful”?
query(ae, "[[Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful]")
## segment list from database: ae
## query was: [[Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful]
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phonetic SEGMENT
## 2 V 340.175 426.675 0000 msajc003 Phonetic SEGMENT
## 3 N 426.675 483.425 0000 msajc003 Phonetic SEGMENT
## 4 s 483.425 566.925 0000 msajc003 Phonetic SEGMENT
## 5 t 566.925 596.675 0000 msajc003 Phonetic SEGMENT
## 6 H 596.675 674.175 0000 msajc003 Phonetic SEGMENT
## 7 db 2033.675 2150.175 0000 msajc003 Phonetic SEGMENT
## 8 j 2150.175 2211.175 0000 msajc003 Phonetic SEGMENT
## 9 u: 2211.175 2283.675 0000 msajc003 Phonetic SEGMENT
# same as
query(ae, "[[#Phonetic =~ .* ^ Syllable == S] ^ Text == amongst | beautiful]")
## to get the "Text"-items instead, use
query(ae, "[[Phonetic =~ .* ^ Syllable == S] ^ #Text == amongst | beautiful]")
## segment list from database: ae
## query was: [[Phonetic =~ .* ^ Syllable == S] ^ #Text == amongst | beautiful]
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
The are three position functions and one count function. As the latter function results in a number, queries involve a comparison with a number (by using one of “==”, “!=”, “>”, “>=”, “<”, “<=”, see below); The result of the position functions is logical; we therefore ask, whether a certain condition is TRUE
or FALSE
.
There are three position functions, Start()
, Medial()
, and End()
. Example queries are:
## What is the query to retrieve all word-initial syllables?
## (NB: syllable labels are either "W" or "S")
query(ae, "[Start(Word, Syllable) == TRUE]")
## segment list from database: ae
## query was: [Start(Word, Syllable) == TRUE]
## labels start end session bundle level type
## 1 W 187.425 256.925 0000 msajc003 Syllable ITEM
## 2 S 674.175 739.925 0000 msajc003 Syllable ITEM
## 3 S 739.925 1289.425 0000 msajc003 Syllable ITEM
## 4 W 1289.425 1463.175 0000 msajc003 Syllable ITEM
## 5 W 1463.175 1634.425 0000 msajc003 Syllable ITEM
## 6 W 1634.425 1791.425 0000 msajc003 Syllable ITEM
...
Examples for Medial() and End() are:
## What is the query to retrieve all word-medial syllables?
query(ae, "[Medial(Word, Syllable) == TRUE]")
## What is the query to retrieve all word-final syllables?
query(ae, "[End(Word, Syllable) == TRUE]")
## segment list from database: ae
## query was: [Medial(Word, Syllable) == TRUE]
## labels start end session bundle level type
## 1 S 1791.425 1945.425 0000 msajc003 Syllable ITEM
## 2 W 2283.675 2361.925 0000 msajc003 Syllable ITEM
## 3 S 2078.475 2228.475 0000 msajc010 Syllable ITEM
## 4 W 2219.975 2304.775 0000 msajc012 Syllable ITEM
## 5 W 2304.775 2533.975 0000 msajc012 Syllable ITEM
## 6 W 639.575 706.575 0000 msajc015 Syllable ITEM
...
## segment list from database: ae
## query was: [End(Word, Syllable) == TRUE]
## labels start end session bundle level type
## 1 S 256.925 674.175 0000 msajc003 Syllable ITEM
## 2 S 674.175 739.925 0000 msajc003 Syllable ITEM
## 3 S 739.925 1289.425 0000 msajc003 Syllable ITEM
## 4 W 1289.425 1463.175 0000 msajc003 Syllable ITEM
## 5 W 1463.175 1634.425 0000 msajc003 Syllable ITEM
## 6 W 1945.425 2150.175 0000 msajc003 Syllable ITEM
...
Everything not being first or last element is medial:
query(ae, "[Medial(Word, Phoneme) == TRUE]")
## segment list from database: ae
## query was: [Medial(Word, Phoneme) == TRUE]
## labels start end session bundle level type
## 1 m 256.925 340.175 0000 msajc003 Phoneme ITEM
## 2 V 340.175 426.675 0000 msajc003 Phoneme ITEM
## 3 N 426.675 483.425 0000 msajc003 Phoneme ITEM
## 4 s 483.425 566.925 0000 msajc003 Phoneme ITEM
## 5 r 892.675 949.925 0000 msajc003 Phoneme ITEM
## 6 E 949.925 1031.925 0000 msajc003 Phoneme ITEM
...
The count function’s name is Num()
. Num(x,y)
counts how many y are in x. You can therefore ask things like the following:
## What is the query to retrieve all words that contain two syllables?
query(ae, "[Num(Text, Syllable) == 2]")
## segment list from database: ae
## query was: [Num(Text, Syllable) == 2]
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 futile 571.925 1090.975 0000 msajc010 Text ITEM
## 3 any 1436.725 1628.475 0000 msajc010 Text ITEM
## 4 further 1628.475 1957.775 0000 msajc010 Text ITEM
## 5 shiver 1650.975 1994.975 0000 msajc012 Text ITEM
## 6 itches 299.975 662.425 0000 msajc022 Text ITEM
## 7 always 775.475 1280.175 0000 msajc022 Text ITEM
## 8 tempting 1400.675 1806.275 0000 msajc022 Text ITEM
## 9 display 666.675 1211.175 0000 msajc057 Text ITEM
## 10 attracts 1211.175 1578.675 0000 msajc057 Text ITEM
## 11 ever 2480.425 2794.925 0000 msajc057 Text ITEM
## What is the query to retrieve all syllables that contain more than four phonemes?
query(ae, "[Num(Syllable, Phoneme) > 4]")
## segment list from database: ae
## query was: [Num(Syllable, Phoneme) > 4]
## labels start end session bundle level type
## 1 S 256.925 674.175 0000 msajc003 Syllable ITEM
## 2 S 739.925 1289.425 0000 msajc003 Syllable ITEM
## 3 W 2228.475 2753.975 0000 msajc010 Syllable ITEM
## 4 S 1213.075 1797.425 0000 msajc015 Syllable ITEM
## 5 S 1890.275 2469.525 0000 msajc022 Syllable ITEM
## 6 S 1964.425 2554.175 0000 msajc023 Syllable ITEM
## 7 S 1247.925 1578.675 0000 msajc057 Syllable ITEM
You can use &
to search within several attribute definitions on the same level. For example, the level Word in ae
has several attribute definitions
list_attributeDefinitions(ae,level="Word")
## name level type hasLabelGroups hasLegalLabels
## 1 Word Word STRING FALSE FALSE
## 2 Accent Word STRING FALSE FALSE
## 3 Text Word STRING FALSE FALSE
We could, therefore, look for all accented (“S”) words by …
query(ae, "[Text =~.* & Accent == S]")
## segment list from database: ae
## query was: [Text =~.* & Accent == S]
## labels start end session bundle level type
## 1 amongst 187.425 674.175 0000 msajc003 Text ITEM
## 2 friends 739.925 1289.425 0000 msajc003 Text ITEM
## 3 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
## 4 futile 571.925 1090.975 0000 msajc010 Text ITEM
## 5 further 1628.475 1957.775 0000 msajc010 Text ITEM
## 6 resistance 1957.775 2753.975 0000 msajc010 Text ITEM
## 7 chill 379.525 744.525 0000 msajc012 Text ITEM
## 8 wind 744.525 1082.975 0000 msajc012 Text ITEM
## 9 caused 1082.975 1456.475 0000 msajc012 Text ITEM
## 10 shiver 1650.975 1994.975 0000 msajc012 Text ITEM
## 11 violently 1994.975 2692.325 0000 msajc012 Text ITEM
## 12 emphasized 425.375 1129.075 0000 msajc015 Text ITEM
## 13 strengths 1213.075 1797.425 0000 msajc015 Text ITEM
## 14 concealing 2104.075 2693.675 0000 msajc015 Text ITEM
## 15 weaknesses 2780.725 3456.825 0000 msajc015 Text ITEM
## 16 itches 299.975 662.425 0000 msajc022 Text ITEM
## 17 always 775.475 1280.175 0000 msajc022 Text ITEM
## 18 so 1113.675 1400.675 0000 msajc022 Text ITEM
## 19 tempting 1400.675 1806.275 0000 msajc022 Text ITEM
## 20 scratch 1890.275 2469.525 0000 msajc022 Text ITEM
## 21 no 1774.925 1964.425 0000 msajc023 Text ITEM
## 22 risks 1964.425 2554.175 0000 msajc023 Text ITEM
## 23 display 666.675 1211.175 0000 msajc057 Text ITEM
## 24 more 1578.675 1824.425 0000 msajc057 Text ITEM
## 25 ever 2480.425 2794.925 0000 msajc057 Text ITEM
Another usage of “&” is to combine a basic query with a function, e.g.
## What is the query to retrieve all non-word-final “S” syllables?
query(ae, "[[Syllable == S & End(Word, Syllable) == FALSE]^#Text=~.*]")
## segment list from database: ae
## query was: [[Syllable == S & End(Word, Syllable) == FALSE]^#Text=~.*]
## labels start end session bundle level type
## 1 considered 1634.425 2150.175 0000 msajc003 Text ITEM
## 2 beautiful 2033.675 2604.425 0000 msajc003 Text ITEM
## 3 futile 571.925 1090.975 0000 msajc010 Text ITEM
## 4 any 1436.725 1628.475 0000 msajc010 Text ITEM
## 5 further 1628.475 1957.775 0000 msajc010 Text ITEM
## 6 resistance 1957.775 2753.975 0000 msajc010 Text ITEM
## 7 shiver 1650.975 1994.975 0000 msajc012 Text ITEM
## 8 violently 1994.975 2692.325 0000 msajc012 Text ITEM
## 9 emphasized 425.375 1129.075 0000 msajc015 Text ITEM
## 10 concealing 2104.075 2693.675 0000 msajc015 Text ITEM
## 11 weaknesses 2780.725 3456.825 0000 msajc015 Text ITEM
## 12 itches 299.975 662.425 0000 msajc022 Text ITEM
## 13 always 775.475 1280.175 0000 msajc022 Text ITEM
## 14 tempting 1400.675 1806.275 0000 msajc022 Text ITEM
## 15 customers 1824.425 2367.775 0000 msajc057 Text ITEM
## 16 ever 2480.425 2794.925 0000 msajc057 Text ITEM