EXMARaLDA

original tutorial by Franziska Horn, modified by sabinebartsch

1 What is EXMARAaLDA?

EXMARaLDA, which is an acronym for “Extensible Markup Language for Discourse Annotation”, can be described as a system for computer assisted transcription and annotation of spoken language. EXMARaLDA can be used for the construction and analysis of spoken language corpora. The application has been developed and maintained by the SFB 538 "Mehrsprachigkeit" (Collaborative Research Centre on Multilingualism) at the University of Hamburg since 2002. The intention was to develop a platform-independent system, which provides compatibility with other tools, as well as sustainability using the XLM format. The relevance of such a tool can be explained by the variety of data, created and processed on different platforms and with different transcription conventions and transcription tools which are “for the most part outdated and technically as well as conceptionally incompatible with one another” (Schmidt, 2001).

2 Projects

This website presents an overview of projects using EXMARaLDA. One example is the multilingual electronic football dictionary Kicktionary which lists 2,000 football terms for English, German and French. The dictionary was developed by Thomas Schmidt. The transcription as well as the annotation of the spoken part of the corpus has been done with EXMARaLDA. Furthermore, the application is used in the joint project Sprachvariation in Norddeutschland ('Language Variation in Northern Germany', SiN) by scholars from the universities of Hamburg, Münster, Frankfurt/Oder, Potsdam, Kiel and Bielefeld. The aim is to investigate dialectal variation in Northern Germany.

3 System-Requirements and Installation

The application runs under Windows (XP and newer) and Unix/Linux/MacOSX and requires a Java Runtime Environment (JRE). You can download EXMARaLDA from this website. Considering your operating system you have to download different files:

  • Windows: One installation file which contains all three EXMARaLDA tools
  • Mac OS X: Three disk images, each containing one EXMARaLDA tool
  • Linux: One GNU-zipped tarball which contains all three EXMARaLDA tools

In a next a next step, installing the tool is necessary:

  • Windows: just double click the file exmaralda_setup.exe and follow the instructions
  • Mac OS X: mount the disk images and save the included application file on your computer
  • Linux: unzip the tarball and save the application files on your system

The package, which you have downloaded and installed, contains the three main components of the application. These are the partitur editor for making the transcriptions, the corpus manager (named COMA) for constructing and organizing your data and the tool EXAKT for analyzing your data. Furthermore, a directory named documentation is provided containing several documents as introductions for the different functionalities and components written in several languages.

4 Running EMARaLDA

In the following, the partitur editor and the tool EXAKT tool are described to give an insight into running EXMARaLDA. As an example, the audio file Helge_Schneider_Arbeitsamt.mp3 is chosen which can be found in the directory Arbeitsamt in the EXMARaLDA demo corpus. This demo corpus can be downloaded as a zip-archive here.

4.1 Making Transcriptions by Using the Partitur Editor

“Partitur” is not only the German word for a musical score. In linguistics, the term is used to describe a concept for representing transcriptions of spoken language. Similar to a music score where the notes for the different instruments or voices are written on different lines the utterances of the speakers, their actions and the modalities of the communication situation are described on different lines of a linguistic partitur. The advantage of this partitur system is the opportunity to represent simultaneity, for example, in cases of overlapping speaker turns. Furthermore, gestures or mimics, which accompany utterances, can be annotated. Additional pieces of information such as translations into other languages can be arranged accordingly. You can get a first impression of the editor as follows (figure 1):

  • Windows: select the appropriate element in the start menu or double click the link on your desktop
  • Mac OS X: double click on the application symbol
  • Linux: double click partitureditor.sh or call the shell script from a terminal window (you may have to make the file executable before being able to start it)

Figure 1: Partitur Editor

4.1.1 Preparation

At first, you have to create a new transcription by clicking File > New. Then, the meta information for your project should be edited (figure 2). After choosing Transcription > Meta Information you can enter a project and a transcription name etc. You can also specify the name of the transcription conception you are using (e.g. HIAT, GAT)1. A link to the audio file, you want to use, can be created by clicking Edit …. It is also possible to assign an audio file or files later by clicking Transcription > recordings. Just click add and choose the file(s) you are interested in. The selected audio file(s) will be loaded into the player and displayed as digital audio wave. Furthermore, you have the possibility to add own attributes (add attribute) and corresponding values (edit attribute…) for saving extra information, e.g. date, source, type of conversation.

Figure 2: Edit meta information

As a next step, you have to create a speaker table by clicking Transcription > Speakertable. A new speaker can be created by clicking on add Speaker. You can change the speaker abbreviation in the field Abbreviation and add some properties to the speaker, e.g. his or her sex and information about the mother tongue.

Figure 3: Speaker table

Then, a tier for each speaker has to be added which can be done by clicking Tier > Add Tier as shown in figure 4.

Figure 4: Adding a new tier

Besides the Speaker, a Type has to be selected for classifying the new tier. The different options and their meaning are described in figure 5.

Type Description
T(ranscription) Verbal tiers
D(escription) Non-verbal tiers
A(nnotation) Tier for annotations (e.g. translations)
L(ink) Tiers containing links e.g. to images
U(ser) D(efined) your choice

Figure 5: Overview of types

Furthermore, a Category can be entered, for example, v for verbal types, nv for non-verbal tiers and an abbreviation (ENG) of the target language in cases of translations. Specifying the category is recommended if there is more than one tier for one speaker to provide clarity. You can also edit the tiers by clicking on Tier > Edit tier. After adding all the tiers needed the partitur appears as follows (figure 6):

Figure 6: Added tiers

It is possible to remove, edit or to insert the tiers or change their order by using the options in the menu tier or by clicking on of the buttons below the menu bar:

4.1.2 Transcription

You can start by clicking the button Append interval underneath the timeline. As a consequence, a new interval, which follows immediately the currently transcribed part, is created. This interval is automatically two seconds long and green highlighted in the timeline (figure 7). Clicking on the button Play Selection repeats playing this interval.

Figure 7: Appending an interval

As a next step, you have to find the right boundary which marks the end of the utterance by dragging the mouse. To check your interval you can click one more time the button Play selection for playing the part. Then, enter an appropriate transcription into the appropriate cell in the partitur. The result of this first entry is shown in figure 8.

Figure 8: Entering a transcription

Now, you can repeat these steps for the rest of the audio file. Besides the button Append interval which is used for creating a new event at the end of the currently transcribed part there is another mode of entering transcriptions: Clicking the button Add event allows creating new events, when no preceding event exists, for instance, to transcribe passages where different speakers speak simultaneously. After your selection of the corresponding tier for this new event a new interval in the timeline is created. Then, the transcription can be entered (figure 9).

There are numerous options for editing your events. For example, it is possible to extend events to the left or right, merge different events together or split them by using the options in the menu events. Or you click on of the buttons below the menu bar:

Figure 9: Transcription overview

4.1.3 Saving Results and Generating an Output of a Transcription

You can save your transcription by clicking on File > Save as …. One advantage of EXMARaLDA is the variety of output formats which can be used for presenting results in a visually pleasing manner. To use the output formats you have to click on File > Output and choose one. So, your transcription can be visualized in an html-document (figure 10) or a text file which then can be opened, for instance, by MS Word.

Figure 10: html output

4.2 Searching and Analyzing by Using EXAKT

After transcribing conversations it can be useful to have a tool for analyzing the data. The application EXAKT (EXMARaLDA Analysis and Concordancing Tool) is developed to allow analyzing and searching within single files and corpora of spoken language transcriptions. It generates concordances showing a keyword in context. EXAKT can be accessed by clicking Edit > Exakt search…. As an example we open the file Helge_Schneider_Arbeitsamt.exb which can be found in the directory Arbeitsamt in the demo corpus. In a next step, after opening EXAKT we are searching for the word Arbeit.

Figure 11: Using EXAKT for search for Arbeit

After you enter the word all occurrences of the word and the corresponding context of usage is listed. Extended search options are also provided. You can fine-grain your search by clicking the button Search:. Figure 12 shows the search results for all words starting with Arbeit to explore possible word forms and compounds as well as their distribution. The tool generates the results by using regular expressions.

Figure 12: Extended search options

An overview and more information about the functionalities of EXAKT are provided in the file EXACT_short_Intro which can be found in the directory Documentation downloaded with the whole application.

5 Summary

EXMARaLDA provides various functionalities for transcribing and analyzing spoken language data. It offers not only a partitur editor for making transcriptions, but the tool furthermore contains a corpus manager (COMA) for organizing your data and an application to analyze it (EXAKT). Besides these extensive options EXMARaLDA can be recommended because of its platform-independence and its compatibility with other tools as well its sustainable data handling through using the XML format.

References

1 An overview of the HIAT transcription conventions is provided in the directory Documentation, file HIAT_EN. An introduction to GAT is provided by Selting et. al. By clicking View > Keyboard a virtual keyboard appears which offers you the special characters used in some transcription conventions.