download the resources

All the language resources made available by the Index Thomisticus Treebank Project are downloadable for free and licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

The data for the Index Thomisticus Treebank and the Latin Dependency Treebank are available in the XML-based format licended by the Prague Markup Language (PML). The PML files of the treebanks are organized by annotation layers and linked each other through stand-off annotation:
  • files.w: raw text (words and punctuations)
  • files.m: morphological layer (lemmatization and morphological tagging)
  • files.a: analytical layer (surface syntactic annotation)
  • files.t: tectogrammatical layer (semantic and pragmatic annotation)
The analytical layer of annotation of the Index Thomisticus Treebank is available also in the following format:
  • CoNLL (proper names were assigned the "NP" value in the MISC field)

The Latin Valency Lexicon VALLEX is stored in a single XML file, whose structure is the same of that for the valency lexicon for Czech PDT-VALLEX, which is described here.

The currently distributed release of the Index Thomisticus Treebank (analytical layer) includes 447,306 nodes in 26,831 sentences. These are taken from Summa contra Gentiles (entire: 4 books) and from the concordances of lemma forma in Summa contra Gentiles, Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae (part).

The analytical layer of annotation of the Index Thomisticus Treebank can also be accessed and downloaded:
  • in CLARIN-D, through the search tool TüNDRA (a - free of charge - CLARIN account is required).
  • in HamleDT 3.0, where the treebank is available both in PDT-like and Universal Dependencies format
  • in Universal Dependencies, through both SETS Treebank Search and PML-TQ