Search results

1 – 2 of 2
Article
Publication date: 6 March 2009

Jyri Saarikoski, Jorma Laurikkala, Kalervo Järvelin and Martti Juhola

The aim of this paper is to explore the possibility of retrieving information with Kohonen self‐organising maps, which are known to be effective to group objects according to…

Abstract

Purpose

The aim of this paper is to explore the possibility of retrieving information with Kohonen self‐organising maps, which are known to be effective to group objects according to their similarity or dissimilarity.

Design/methodology/approach

After conventional preprocessing, such as transforming into vector space, documents from a German document collection were trained for a neural network of Kohonen self‐organising map type. Such an unsupervised network forms a document map from which relevant objects can be found according to queries.

Findings

Self‐organising maps ordered documents to groups from which it was possible to find relevant targets.

Research limitations/implications

The number of documents used was moderate due to the limited number of documents associated to test topics. The training of self‐organising maps entails rather long running times, which is their practical limitation. In future, the aim will be to build larger networks by compressing document matrices, and to develop document searching in them.

Practical implications

With self‐organising maps the distribution of documents can be visualised and relevant documents found in document collections of limited size.

Originality/value

The paper reports on an approach that can be especially used to group documents and also for information search. So far self‐organising maps have rarely been studied for information retrieval. Instead, they have been applied to document grouping tasks.

Details

Journal of Documentation, vol. 65 no. 2
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 May 2006

Tuomas Talvensaari, Jorma Laurikkala, Kalervo Järvelin and Martti Juhola

To present a method for creating a comparable document collection from two document collections in different languages.

Abstract

Purpose

To present a method for creating a comparable document collection from two document collections in different languages.

Design/methodology/approach

The best query keys were extracted from a Finnish source collection (articles of the newspaper Aamulehti) with the relative average term frequency formula. The keys were translated into English with a dictionary‐based query translation program. The resulting lists of words were used as queries that were run against the target collection (Los Angeles Times articles) with the nearest neighbor method. The documents were aligned with unrestricted and date‐restricted alignment schemes, which were also combined.

Findings

The combined alignment scheme was found the best, when the relatedness of the document pairs was assessed with a five‐degree relevance scale. Of the 400 document pairs, roughly 40 percent were highly or fairly related and 75 percent included at least lexical similarity.

Research limitations/implications

The number of alignment pairs was small due to the short common time period of the two collections, and their geographical (and thus, topical) remoteness. In future, our aim is to build larger comparable corpora in various languages and use them as source of translation knowledge for the purposes of cross‐language information retrieval (CLIR).

Practical implications

Readily available parallel corpora are scarce. With this method, two unrelated document collections can relatively easily be aligned to create a CLIR resource.

Originality/value

The method can be applied to weakly linked collections and morphologically complex languages, such as Finnish.

Details

Journal of Documentation, vol. 62 no. 3
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 2 of 2