download the resources

All the language resources made available by the Index Thomisticus Treebank Project are downloadable for free and licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

The data for the Index Thomisticus Treebank and the Latin Dependency Treebank are available in the XML-based format licended by the Prague Markup Language (PML). The PML files of the treebanks are organized by annotation layers and linked each other through stand-off annotation:
  • files.w: raw text (words and punctuations)
  • files.m: morphological layer (lemmatization and morphological tagging)
  • files.a: analytical layer (surface syntactic annotation)
  • files.t: tectogrammatical layer (semantic and pragmatic annotation)
The analytical layer of annotation of the Index Thomisticus Treebank is available also in the following formats:
  • CoNLL (proper names were assigned the "NP" value in the MISC field)
  • TiGer

The Latin Valency Lexicon VALLEX is stored in a single XML file, whose structure is the same of that for the valency lexicon for Czech PDT-VALLEX, which is described here.

The currently distributed release of the Index Thomisticus Treebank (analytical layer) includes 353,996 nodes in 21,059 sentences. These are taken from Summa contra Gentiles (SCG; entire books 01, 02 and 03) and from the concordances of lemma forma in Summa contra Gentiles, Scriptum super Sententiis Magistri Petri Lombardi and Summa Theologiae (part).

The analytical layer of annotation of the Index Thomisticus Treebank can also be accessed and downloaded:
  • in CLARIN-D, through the search tool TüNDRA (a - free of charge - CLARIN account is required).
  • in HamleDT 3.0, where the treebank is available both in PDT-like and Universal Dependencies format
  • in Universal Dependencies, through both SETS Treebank Search and PML-TQ