The Principles of Teachers’ Speech Corpus Annotation

Keywords: teaching practices, classroom discourse, ethnographic methods in education studies, spoken corpus


The article describes the principles of creating a corpus of teachers’ speech, which enables to apply an ethnographic approach to study teaching practices. Through the analysis of a large dataset of real classroom recordings, this corpus aims to identify linguistic, psychological, and sociological factors contributing to the improvement of teaching effectiveness. The corpus includes audio recordings of lessons in 5–8 grades from several schools in Russia. Annotation of the corpus is conducted using the Praat program. To determine the linguistic parameters that can influence teachers’ effectiveness and should be annotated in the corpus, we conducted a survey aimed to find out how students describe an ideal and a poor teacher. Based on the survey results, along with an analysis of existing spoken corpora and papers in linguistics and education, we have developed an annotation system comprising 19 levels. Some of these levels overlap with those found in any spoken corpus (orthographic transcription of words, lemmas, parts of speech, morphological annotation). The following levels are specific to our corpus: the parts of the lesson (organizational stage, introduction of new material, etc.), the level at which fragments of reading are separated from the rest of the teacher’s speech, four levels for marking pauses, phonetic transcription level, volume annotation, two levels for error annotation (phonetic and grammatical separately), and four levels related to vocabulary (words with special derivational features, emotionally-evaluative vocabulary, word usage domains, discourse markers). The corpus will allow to provide recommendations for improving teachers’ speech behavior.


Download data is not yet available.


