************************************************ ***Use of the IT-TB Tagset in the CoNLL files*** ************************************************ In the IT-TB, each word is assigned 11 morphological codes. If a code does not fit a word (e.g. "mood" does not fit nouns), the code "-" is assigned. The CoNLL files report information about the 'fitting' codes only (i.e. no "-"). In this document, "position" refers to the first column (P) of the IT-TB tagset file ("Tagset_IT.pdf": available at http://itreebank.marginalia.it/doc/Tagset_IT.pdf). Each position corresponds to one attribute (second column of "Tagset_IT.pdf": ATT). The codes permitted in each position (i.e. for each attribute) and their respective values are reported in the columns "C" (codes) and "VAL" (values) of "Tagset_IT.pdf". A) Fine-grained PoS (column 5 in CoNLL files) A label formed by two codes: - Flexional category: position 3 - Flexional type: position 1 EXAMPLE: fine-grained PoS "B1" - flexional category: II decl. (B) - flexional type: nominal (1) NOTES: - column 4 in CoNLL files includes the coarse-grained PoS, which corresponds to the code for Flexional Type Example: Coarse-grained PoS "1" -> flexional type: nominal - Punctuations are labelled with PoS "Punc" - There are few cases of PoS labels formed by one code (instead of two). In these cases, the code is a "flexional type" one (position 1). Example: fine-grained PoS "5" (equal to coarse-grained PoS) -> flexional type: pseudo-lemma B) Morph (column 6 in CoNLL files) "Morph" features one or more values separated by |. Values are formed by the "attribute name" (second column in "Tagset_IT.pdf": ATT) + "code" (fourth column in "Tagset_IT.pdf": C). The attribute names are abbreviated as follows (three-letters abbreviations): grn: position 2 mod: position 4 tem: position 5 grp: position 6 cas: position 7 gen: position 8 com: position 9 var: position 10 vgr: position 11 EXAMPLE: "grn1|casA|gen3": - nominal degree ("grn"): positive (1) - case/number ("cas"): singular nominative (A) - gender ("gen"): neuter (3) NOTES: - there are words that are not assigned any morphological value (e.g. prepositions and conjunctions). In these cases, the "morph" column is filled with an underscore ("_") - solution of the abbreviations for the attribute names: grn: NomDegree mod: Mood tem: Tense grp: PartDegree cas: CaseNum gen: GendNumPers com: Comp var: FVar vgr: GVar