2026 Summer School:
Language Science Meets Linguistic Diversity

.

31 August to 4 September, 2026
Berlin, Germany

.

The Endangered Languages Archive (ELAR), the Humboldt Universität zu Berlin (HU-Berlin) Institut für deutsche Sprache und Linguistik, and the Leibniz-Zentrums Allgemeine Sprachwissenschaft (ZAS) are offering an in person summer school in Linguistic Diversity and Language Science from Monday, August 31st through Friday, September 4th, 2026. The training will take place in person in Berlin, Germany.

The courses to be offered include:

  • Capturing Language Use with Mandana Seyfeddinipur (ELDP/ELAR/BBAW)
  • Languages of Central America with Elisabeth Verhoeven (HU)
  • Digital tools for annotation and analysis of language documentation data (ELAN, FLEx) with Kelsey Neely (ELDP/BBAW) and Christian Döhler (BBAW)
  • Gender within the domain of noun classification with Tom Güldemann (HU)
  • Data management and metadata creation for archiving in language documentation with Kelsey Neely (ELDP/BBAW)
  • Evolutionary approaches to language diversity with Russell Gray & colleagues (MPI-EVA)
  • Tenselessness: How languages without tense express temporal meanings with Ana Krajinovic (HU)
  • Oral Narratives worldwide: elicitation and evaluation with Natalia Gagarina (ZAS)
  • Extended/multiple exponence with Artemis Alexiadou (ZAS)
  • Establishing and tracking reference: Theory and annotation with Manfred Krifka (ZAS)
  • Supplementary workshop: Tools and methods for advanced data processing (Inception, Arborator, ANNIS) with Nico Lehmann (HU)

This five-day intensive training will involve hands-on practice, including homework assignments to be completed in the evenings. The language of instruction is English. Applicants should have demonstrated commitment and/or plans to carry out research on an under-resourced language.

Students may enrol in up to 3 courses. The cost for participation is 250 Euros. Participants will need to arrange their own travel and accommodations in Berlin.

Key dates:

15 January – Applications open

1 March – Applications due

1 April – Notification of acceptance sent

1 June – Registration payment due

31 August to 4 September – Summer school week

For more information about the training, please contact us at langdocinfo@gmail.com 

Course schedule and descriptions

9:30-11:00

Oral Narratives Worldwide: elicitation and evaluation -- Natalia Gagarina (ZAS)

The course provides theoretical foundations and practical skills for understanding acquisition of oral narratives and for eliciting, analyzing, and evaluating fictional and personal narratives across languages. It introduces the Multilingual Assessment Instrument for Narratives (MAIN) and integrates theory on narrative development with contemporary elicitation methods and systematic scoring procedures. Students gain hands-on experience in assessing oral narratives in different languages and adapting MAIN to new linguistic contexts. The course also highlights applications of narrative assessment in educational and research settings, enabling students to apply their knowledge to both academic and applied purposes.

TBA

11:30-13:00

Languages of Central America -- Elisabeth Verhoeven (HU)

Central America is a fascinating crossroads of linguistic diversity, where two distinct linguistic areas converge: the well-known Mesoamerican Sprachbund, home to the Mayan languages, and the lesser-studied Intermediate Area, which includes the Chibchan languages. These regions differ markedly in their nominal and verbal morphosyntax as well as in their sentence structures, making Central America an ideal setting for typological and areal-linguistic exploration. The course begins with an overview of the key areal features that shape the linguistic landscape of Central America. Building on this foundation, we will explore the characteristic typological patterns of both areas through case studies of selected languages. During the course, participants will focus on a language of their choice and contribute their findings to a collaborative cluster analysis of the Central American linguistic area.

This course introduces fundamental concepts and practices for the creation of structured data from documentary recordings of under-described and under-resourced languages. Participants will learn how to structure and curate corpora derived from language documentation projects so that they can be reused for comparative analysis, variationist studies, and corpus-informed typology. We will carefully consider the analytic and practical choices documenters must make at each stage of annotation and analysis, from segmenting natural language data to querying a corpus. We will use the transcription and annotation tool ELAN and the interlinear glossing and lexical database creation tool Fieldworks Language Explorer (FLEx). [Note: FLEx is only available for Windows and Linux systems.]

Multiple expoence is defined as the occurrence of multiple realizations of a single morphosemantic feature, bundle of features, or derivational category within a word (Harris 2017). In this course, we will discuss data from a broad variety of typologically diverse languages in order to identify the main types of multiple exponence that have been discussed in the literature. We will then entertain various  treatments of multiple exponence and raise the question of whether it is possible to treat all different types thereof via the same operation.

14:00-15:30

Gender within the wider domain of nominal classification -- Tom Güldemann (HU)

Gender is traditionally defined as noun classification that involves syntactic agreement (cf., for example, Hockett 1958, Corbett 1991). Since both criteria can at times be hard to pin down, the recent past saw the emergence of more complex approaches to gender that are to capture also related phenomena of noun classification, notably classifiers, as within Corbett’s (2014) canonical typology or Wälchli and Di Garbo’s (2019: 330-1) “dynamic” characterization. On the basis of a wide range of cross-linguistic data, the course locates gender in its a wider domain of noun classification, provides a unified analytical approach for achieving transparent cross-linguistic comparability of systemic structures, and also examines its diverse diachronic dynamics from simple to so-called “mature” gender. Particular topics to be scrutinized include the nature of agreement class and its partly problematic relation to gender, “overt” gender on nominal controllers, the typology of semantic gender assignment, and the very origin of gender as a grammaticalized system.

This course leads students through the key considerations for managing research data and creating metadata for language documentation. The content includes the selection of appropriate file formats for long-term preservation of materials, the organization of files, file naming and versioning, and the creation of rich metadata describing the files and the collection as a whole. These topics are considered through the lenses of both the FAIR Data Principles as well as the CARE Principles for Indigenous Data Governance. The course also introduces best practices for depositing materials in trusted language archives and related repositories, emphasizing how archive policies, access levels, and discovery infrastructures shape the long-term usability of documented language data for communities and researchers.

All languages can refer to entities or concepts in the shared situation of the participants or in their shared background knowledge, they can introduce new entities into the discourse, and they can pick them up by anaphoric expressions. In the course, I will discuss important theoretical distinctions and known ways that languages use for these purposes, like various types of demonstratives, definite and indefinite articles, pronominal expressions that mark lexical distinctions of their antecedents like gender and number, and their saliency status, as well as prosodic and syntactic features. This includes more subtle phenomena like partitive and associative anaphors and reference to manners and to speech acts. The goal is to enable course participants to identify such more fine-grained distinctions in their documentary work. In particular, we will get familiar with annotation schemes of reference tracking, such as the RefLex Scheme by Riester and Baumann and the RefIND Annotation Guidelines by Schiborr, Schnell and Thiele. 

16:00-17:30

Tenselessness: How languages without tense convey temporal meanings -- Ana Krajinovic (HU)

While tense is a widespread grammatical category, a third of the languages of the world do not encode tense (based on Grambank data). In this course, we will ask how tenseless languages, or languages without the grammatical category of tense, talk about temporal meanings, such as past, present, and future. We will look both at the broad typology of tenselessness and at individual underdescribed languages. While some languages rely primarily on aspectual categories to talk about time, other languages use mood categories, or leave verbs unmarked for TAM (tense, aspect, mood). We will also discuss potential language universals in this domain, e.g. unmarked verbs favoring past and present interpretations over future meanings cross-linguistically.

This practical course focuses on audiovisual methods for documenting language as it is used in real-life interaction. Participants will gain hands-on experience in planning and recording natural communicative events, learning camera techniques and audio recording suited to field conditions. The course integrates discussion of ethical practices, informed consent, and metadata standards for the creation of accessible, multipurpose records of under-documented languages. Emphasis is placed on producing recordings that illuminate gesture, gaze, and other multimodal aspects of communication, equipping participants to create rich, analyzable records of linguistic and social practice.

18:00-19:30

(Wednesday, Sept. 2nd and Thursday, Sept. 3rd only)

Supplementary workshop: Tools and methods for advanced data processing (Inception, Arborator, ANNIS) -- Nico Lehmann (HU)

This practical course supplements the introduction to digital tools for annotation and analysis and goes beyond ELAN/FLEx, introducing tools for advanced data processing including dependency annotation and complex search queries as well as extraction of multi-layered annotations. Participants will practice with a workflow to convert ELAN data to formats such as CoNLL-U used for NLP applications and annotation tools such as Inception (and Arborator). Participants will then discover the benefits of using tools like Inception for advanced syntactic, semantic and pragmatic multi-layered annotations, e. g. combining a Universal Dependency (UD) layer with more specific, research-related span annotations. We will also explore the advanced search and visualisation tool ANNIS with which participants can create multi-layered queries combining diverse types of annotations and inspect initial findings before extraction of relevant layers for statistical analysis.

Nach oben scrollen