The first edition of the workshop on "Corpus-based Research in the Humanities" (CRH) will be held in Warsaw (Poland) on December 10th 2015.
CRH will be co-located with the "Fourteenth International Workshop on Treebanks and Linguistic Theories" (TLT-14), which will be held on December 11th - 12th 2015 (
The CRH workshop continues the series of workshops previously named "Annotation of Corpora for Research in the Humanities" (ACRH). Three editions of ACRH were held, respectively in 2011 (Heidelberg, Germany), 2012 (Lisbon, Portugal) and 2013 (Sofia, Bulgaria).

CRH wants to be a meeting place for both scholars from Computational Linguistics and from the Humanities (especially, Digital Humanities). Although the two research areas share a number of common topics, there is still limited collaboration between the two communities. Since the empirical evidence provided by corpora plays a central role in both disciplines, we believe that a workshop focussed on the different uses of (different kinds of) corpus data in the Humanities might represent a valid opportunity to make the two communities meet, discuss and compare their interests, methods and aims.

Submissions are invited for oral presentations and posters (with or without demonstrations) featuring high quality and previously unpublished research on the topics described below. Contributions should focus on results from completed as well as ongoing research, with an emphasis on novel approaches, methods, ideas, and perspectives, whether descriptive, theoretical, formal or computational.

Proceedings will be published in time for the workshop.

Research in the Humanities is predominantly text-based. For centuries scholars have studied documents such as historical manuscripts, literary works, legal contracts, diaries of important personalities, old tax records etc.
Manual analysis of such documents is still the dominant research paradigm in the Humanities. However, with the advent of the digital age this is increasingly complemented by approaches that utilise digital resources. More and more corpora are made available in digital form (theatrical plays, contemporary novels, critical literature, literary reviews etc.). This has a potentially profound impact on how research is conducted in the Humanities.
Digitised sources can be searched more easily than traditional, paper-based sources, allowing scholars to analyse texts quicker and more systematically. Moreover, digital data can also be (semi-)automatically mined: important facts, trends and interdependencies can be detected, complex statistics can be calculated and the results can be visualised and presented to the scholars, who can then delve further into the data for verification and deeper analysis.
Digitisation encourages empirical research, opening the road for completely new research paradigms that exploit `big data' for humanities research. This has also given rise to Digital Humanities (or E-Humanities) as a new research area.
Digitisation is only a first step, however. In their raw form, electronic corpora are of limited use to humanities researchers. The true potential of such resources is only unlocked if corpora are enriched with different layers of linguistic annotation (ranging from morphology to semantics). While corpus annotation can build on a long tradition in (corpus) linguistics and computational linguistics, corpus and computational linguistics on the one side and the Humanities on the other side have grown apart over the past decades.

The CRH workshop aims at building a tighter collaboration between people working in various areas of the Humanities (such as literature, philology, history etc.) and the research community involved in developing, using and making accessible different kinds of corpora. We believe that such a collaboration is now needed because of the increansingly important role played by the empirical evidence provided by corpora in research in the Humanities. Actually, such a interplay is still quite far from being achieved, as a gap still holds between computational linguists (who sometimes do not involve humanists in developing and exploiting corpora for the Humanities) and humanists (who sometimes just aren't aware that such corpora do exist and that automatic methods and standards to build and use them are today available).
Over the past few years a number of historical annotated corpora have been started, among which are treebanks for Middle, Early Modern and Old English, Early New High German, Medieval Portuguese, Ugaritic, Latin, Ancient Greek and several translations of the New Testament into Indo-European languages. The experience of these ever-growing set of projects can provide many suggestions on the methodology as well as on the practice of interaction between literary studies, philology and corpus linguistics.

To overcome the above mentioned issues, CRH aims at covering a wide range of topics related to the use of corpora for research in the Humanities.

The topics to be addressed in the workshop include (but are not limited to) the following:
- specific issues related to the annotation of corpora for research in the Humanities (annotation schemes and principles)
- corpora as a basis for research in the Humanities
- diachronic, historical and literary corpora
- use of corpora for stylometrics and authorship attribution
- philological issues, like different readings, textual variants, apparatus, non-standard orthography and spelling variation
- adaptation of NLP tools for older language varieties
- integration of corpora for the Humanities into language resources infrastructures
- tools for building and accessing corpora for the Humanities
- examples of fruitful collaboration between Computational Linguistics and Humanities in building and exploiting corpora
- theoretical aspects of the use of empirical evidence provided by corpora in the Humanities

INVITED SPEAKER: Reinhard Foertsch (Universität zu Köln, Germany)

Deadlines: always midnight, UTC ('Coordinated Universal Time'), ignoring DST ('Daylight Saving Time'):

- Deadline for paper submission: 20 September 2015 -> EXTENDED to 27 September 2015
- Notification of acceptance: 1 November 2015
- Final version of paper: 22 November 2015
- Workshop: 10 December 2015

We invite to submit long abstracts describing original, unpublished research related to the topics of the workshop. Abstracts should not exceed 6 pages (references included).
The language of the workshop is English. All abstracts must be submitted in well-checked English.
Abstracts should be submitted in PDF format only. Submissions have to be made via the EasyChair page of the workshop at Please, first register at EasyChair if you do not have an EasyChair account.
The style guidelines follow the specifications required by TLT. They can be found here:

Please, note that as reviewing will be double-blind, the abstract should not include the authors' names and affiliations or any references to web-sites, project names etc. revealing the authors' identity. Furthermore, any self-reference should be avoided. For instance, instead of "We previously showed (Brown, 2001)...", use citations such as "Brown previously showed (Brown, 2001)...". Each submitted abstract will be reviewed by three members of the program committee.

Submitted abstracts can be for oral or poster presentations (with or without demo). There is no difference between the different kinds of presentation both in terms of reviewing process and publication in the proceedings (the limit of 6 pages holds for both abstracts intended for oral and poster presentations).

The authors of the accepted abstracts will be required to submit the full version of their paper, which may be extended up to 10 pages (references included).

The oral presentations at the workshop will be 30 minutes long (25 minutes for presentation and 5 minutes for questions and discussion).

Francesco Mambrini (Deutsches Archäologisches Institut, Berlin, Germany)
Marco Passarotti (Università Cattolica del Sacro Cuore, Milan, Italy)
Caroline Sporleder (University of Trier, Germany)

Monica Berti (Germany)
Federico Boschetti (Italy)
David Bouvier (Switzerland)
Neil Coffee (USA)
Dag Haug (Norway)
Neven Jovanovic (Croatia)
Mike Kestemont (Belgium)
John Lee (Hong Kong)
Alexander Mehler (Germany)
Roland Meyer (Germany)
Willard McCarty (UK)
Tony McEnery (UK)
John Nerbonne (The Netherlands)
Bruce Robertson (Canada)
Neel Smith (USA)
Uwe Springmann (Germany)
Melissa Terras (UK)
Sara Tonelli (Italy)
Lonneke van der Plas (Malta)
Martin Wynne (UK)

Adam Przepiórkowski (chair)
Michał Ciesiołka
Konrad Gołuchowski
Mateusz Kopeć
Katarzyna Krasnowska
Agnieszka Patejuk
Marcin Woliński
Alina Wróblewska