Title: Cross-language Information Retrieval,Document Alignment and Visualization : A Study with Japanese and Chinese
Abstract: With the advent of the Internet and digital libraries, as well as the proliferation of multilingual information, sophisticated methods of representation and indexing, and the retrieval of such information is essential. In recent years, the amount of electronically available information has escalated. The non-English information (information in Asian and European languages) is growing rapidly. Although several reports are available on cross-language information retrieval (CLIR) with European languages, no research on Japanese-Chinese CLIR has so far been reported. In this thesis, I concentrate on these two languages, which are quite different from European languages in the sense that semantically rich ideographic Han-characters (hereafter, Kanji) are used in the writing systems of both languages. I explore several strategies using the Kanji and Kanji-derived-semantics for indexing and retrieval of Japanese and Chinese information. The Kanji-based Interlingual framework proposed in this thesis for Japanese-Chinese information retrieval is a flexible vector-space framework. Therefore, projection and dimensionality reduction techniques, such as the singular value decomposition (SVD), are easy to incorporate with this framework. SVD, the underlying technique of the latent semantic indexing (LSI) is capable of a term-to-concept mapping. This thesis includes experimental results based on such conceptual enhancements. The goal of this research is to investigate the unique characteristics of Japanese and Chinese and to propose a suitable framework for Japanese-Chinese IR and CLIR, which takes advantage of existing developments in Natural Language Processing (NLP) and Information Retrieval (IR) techniques. The proposed vector-space framework based on Kanji-semantics ∗ Doctor’s Thesis, Department of Information Science, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-DT-9861207, August, 30, 2001.
Publication Year: 2001
Publication Date: 2001-09-28
Language: en
Type: dissertation
Access and Citation
Cited By Count: 1
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot