Title: Selection of Table Areas for Information Extraction
Abstract: Information contained in companies’ financial statements is valuable to support a variety of decisions. In such documents, much of the relevant information is contained in tables and is extracted mainly by hand. We propose a method that accomplishes a preliminary step of the task of automatically extracting information from tables in documents: selecting the lines which are likely to belong to the tables containing the information to be extracted. Our method has been developed by empirically analysing a set of Portuguese companies’ financial statements, using statistical and data mining techniques. Empirical evaluation indicates that more than 99% of the table lines are selected after discarding at least 50% of them. The method can cope with the complexity of styles used in assembling information on paper and adapt its performance accordingly, thus maximizing its results.
Publication Year: 2003
Publication Date: 2003-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 3
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot