download the resources

All the language resources made available by the Index Thomisticus Treebank Project are downloadable for free and licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

The data for the Index Thomisticus Treebank and the Latin Dependency Treebank are available in the XML-based format licended by the Prague Markup Language (PML). The PML files of the treebanks are organized by annotation layers and linked each other through stand-off annotation:
  • files.w: raw text (words and punctuations)
  • files.m: morphological layer (lemmatization and morphological tagging)
  • files.a: analytical layer (surface syntactic annotation)
  • files.t: tectogrammatical layer (semantic and pragmatic annotation)
The analytical layer of annotation of the Index Thomisticus Treebank is available also in the following formats:

The Latin Valency Lexicon VALLEX is stored in a single XML file, whose structure is the same of that for the valency lexicon for Czech PDT-VALLEX, which is described here.

The currently distributed release of the Index Thomisticus Treebank (analytical layer) includes 322,351 nodes in 19,154 sentences.

The analytical layer of annotation of the Index Thomisticus Treebank can also be accessed and downloaded:
  • in CLARIN-D, through the search tool TüNDRA (a - free of charge - CLARIN account is required).
  • in HamleDT 3.0, where the treebank is available both in PDT-like and Universal Dependencies format
  • in Universal Dependencies, through both SETS Treebank Search and PML-TQ