Title: Relating Lexicon and Corpus: Computational Support for Corpus-Based Lexicon Building in DELIS
Abstract: Lexicographers agree to favour a corpus based approach to lexicon building over one which would be based on intropspection. However, there is not much tool support for corpus based lexicography, especially when it comes to relating the observations made in corpus with the classifications or the descriptions proposed in the dictionary. This problem is particularly relevant when it comes to relating the lexical semantic distinctions a lexicographer wants to make (readings, senses) with facts and data observed in the corpus. DELIS aims at bridging this gap, at least in part, by designing, implementing and integrating tools for corpus exploration and lexicon building into a toolbox. We give an overview of the DELIS approach and tools for corpus based lexicon building, which aims at supporting a description of lexical items at the levels of lexical semantics, syntax and morphosyntax, paying particular attention to the interrelationship between these levels. The tools allow to create, update and modify lexical specifications and to check these against corpus material. We illustrate our work with examples from the domain of perception verbs. 1. From corpus analysis to lexical modeling Many dictionaries, for both natural language processing (NLP) and human use, are based more on lexicographers' introspection than on real text as it occurs in newspapers, books, spoken discourse, etc. Only recent work in British lexicography (cf. work by Sinclair 1991, Atkins/Fillmore 1991), and a few dictionary projects for other languages (e.g. Den Danske Ordbog) are based on corpora and accompanied by methodological work on corpus use in lexicography. In NLP, corpus based lexicon construction is also not very common; among the few examples are some of the recent ARPA projects in the USA, and, the German VERBMOBIL project. 1.1. An approach to corpus lexicography The new, corpus-driven approach to dictionary making has not yet received much support in terms of dedicated computational tools except at COBUILD and within the HECTOR project. The main elements of the chain of corpus based lexicon building are acquisition (e.g. from corpus text), formal modeling and representation, as well as the use of lexical information in
Publication Year: 1994
Publication Date: 1994-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 6
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot