Title: Converting a dependency treebank to a categorial grammar treebank for Italian
Abstract: The Turin University Treebank (TUT) is a treebank with dependency-based annotations of 2,400 Italian sentences. By converting TUT to binary constituency trees, it is possible to produce a treebank of derivations of Combinatory Categorial Grammar (CCG), with an algorithm that traverses a tree in a top-down manner, employing a stack to record argument structure, using Part of Speech tags to determine the lexical categories. This method reaches a coverage of 77%, resulting in a CCGbank for Italian comprising 1,837 sentences, with an average length of 22,9 tokens. The CCGbank for English has proven to be a useful tool for developing efficient wide-coverage parsers for semantic interpretation, and the Italian CCGbank is expected to be an equally useful linguistic resource for training statistical parsers.
Publication Year: 2009
Publication Date: 2009-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 32
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot