Search results

1 – 10 of over 19000
Article
Publication date: 1 January 1996

PETER INGWERSEN

The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a…

2458

Abstract

The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the information space of IR systems. The concept seeks to represent the current user's information need, problem state, and domain work task or interest in a structure of causality. Further, it implies that we should apply different methods of representation and a variety of IR techniques of different cognitive and functional origin simultaneously to each semantic full‐text entity in the information space. The cognitive differences imply that by applying cognitive overlaps of information objects, originating from different interpretations of such objects through time and by type, the degree of uncertainty inherent in IR is decreased. Polyrepresentation and the use of cognitive overlaps are associated with, but not identical to, data fusion in IR. By explicitly incorporating all the cognitive structures participating in the interactive communication processes during IR, the cognitive theory provides a comprehensive view of these processes. It encompasses the ad hoc theories of text retrieval and IR techniques hitherto developed in mainstream retrieval research. It has elements in common with van Rijsbergen and Lalmas' logical uncertainty theory and may be regarded as compatible with that conception of IR. Epistemologically speaking, the theory views IR interaction as processes of cognition, potentially occurring in all the information processing components of IR, that may be applied, in particular, to the user in a situational context. The theory draws upon basic empirical results from information seeking investigations in the operational online environment, and from mainstream IR research on partial matching techniques and relevance feedback. By viewing users, source systems, intermediary mechanisms and information in a global context, the cognitive perspective attempts a comprehensive understanding of essential IR phenomena and concepts, such as the nature of information needs, cognitive inconsistency and retrieval overlaps, logical uncertainty, the concept of ‘document’, relevance measures and experimental settings. An inescapable consequence of this approach is to rely more on sociological and psychological investigative methods when evaluating systems and to view relevance in IR as situational, relative, partial, differentiated and non‐linear. The lack of consistency among authors, indexers, evaluators or users is of an identical cognitive nature. It is unavoidable, and indeed favourable to IR. In particular, for full‐text retrieval, alternative semantic entities, including Salton et al.'s ‘passage retrieval’, are proposed to replace the traditional document record as the basic retrieval entity. These empirically observed phenomena of inconsistency and of semantic entities and values associated with data interpretation support strongly a cognitive approach to IR and the logical use of polyrepresentation, cognitive overlaps, and both data fusion and data diffusion.

Details

Journal of Documentation, vol. 52 no. 1
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 11 September 2020

Chien-Yi Hsiang and Julia Taylor Rayz

This study aims to predict popular contributors through text representations of user-generated content in open crowds.

1451

Abstract

Purpose

This study aims to predict popular contributors through text representations of user-generated content in open crowds.

Design/methodology/approach

Three text representation approaches – count vector, Tf-Idf vector, word embedding and supervised machine learning techniques – are used to generate popular contributor predictions.

Findings

The results of the experiments demonstrate that popular contributor predictions are considered successful. The F1 scores are all higher than the baseline model. Popular contributors in open crowds can be predicted through user-generated content.

Research limitations/implications

This research presents brand new empirical evidence drawn from text representations of user-generated content that reveals why some contributors' ideas are more viral than others in open crowds.

Practical implications

This research suggests that companies can learn from popular contributors in ways that help them improve customer agility and better satisfy customers' needs. In addition to boosting customer engagement and triggering discussion, popular contributors' ideas provide insights into the latest trends and customer preferences. The results of this study will benefit marketing strategy, new product development, customer agility and management of information systems.

Originality/value

The paper provides new empirical evidence for popular contributor prediction in an innovation crowd through text representation approaches.

Details

Information Technology & People, vol. 35 no. 2
Type: Research Article
ISSN: 0959-3845

Keywords

Article
Publication date: 28 February 2023

Meltem Aksoy, Seda Yanık and Mehmet Fatih Amasyali

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals…

Abstract

Purpose

When a large number of project proposals are evaluated to allocate available funds, grouping them based on their similarities is beneficial. Current approaches to group proposals are primarily based on manual matching of similar topics, discipline areas and keywords declared by project applicants. When the number of proposals increases, this task becomes complex and requires excessive time. This paper aims to demonstrate how to effectively use the rich information in the titles and abstracts of Turkish project proposals to group them automatically.

Design/methodology/approach

This study proposes a model that effectively groups Turkish project proposals by combining word embedding, clustering and classification techniques. The proposed model uses FastText, BERT and term frequency/inverse document frequency (TF/IDF) word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish. The extracted terms were grouped using both the clustering and classification techniques. Natural groups contained within the corpus were discovered using k-means, k-means++, k-medoids and agglomerative clustering algorithms. Additionally, this study employs classification approaches to predict the target class for each document in the corpus. To classify project proposals, various classifiers, including k-nearest neighbors (KNN), support vector machines (SVM), artificial neural networks (ANN), classification and regression trees (CART) and random forest (RF), are used. Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.

Findings

The results show that the generated word embeddings can effectively represent proposal texts as vectors, and can be used as inputs for clustering or classification algorithms. Using clustering algorithms, the document corpus is divided into five groups. In addition, the results demonstrate that the proposals can easily be categorized into predefined categories using classification algorithms. SVM-Linear achieved the highest prediction accuracy (89.2%) with the FastText word embedding method. A comparison of manual grouping with automatic classification and clustering results revealed that both classification and clustering techniques have a high success rate.

Research limitations/implications

The proposed model automatically benefits from the rich information in project proposals and significantly reduces numerous time-consuming tasks that managers must perform manually. Thus, it eliminates the drawbacks of the current manual methods and yields significantly more accurate results. In the future, additional experiments should be conducted to validate the proposed method using data from other funding organizations.

Originality/value

This study presents the application of word embedding methods to effectively use the rich information in the titles and abstracts of Turkish project proposals. Existing research studies focus on the automatic grouping of proposals; traditional frequency-based word embedding methods are used for feature extraction methods to represent project proposals. Unlike previous research, this study employs two outperforming neural network-based textual feature extraction techniques to obtain terms representing the proposals: BERT as a contextual word embedding method and FastText as a static word embedding method. Moreover, to the best of our knowledge, there has been no research conducted on the grouping of project proposals in Turkish.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 16 no. 3
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 1 February 1990

BERND FROHMANN

A rule‐governed derivation of an indexing phrase from the text of a document is, in Wittgenstein's sense, a practice, rather than a mental operation explained by reference to…

Abstract

A rule‐governed derivation of an indexing phrase from the text of a document is, in Wittgenstein's sense, a practice, rather than a mental operation explained by reference to internally represented and tacitly known rules. Some mentalistic proposals for theory in information retrieval are criticised in light of Wittgenstein's remarks on following a rule. The conception of rules as practices shifts the theoretical significance of the social role of retrieval practices from the margins to the centre of enquiry into foundations of information retrieval. The abstracted notion of a cognitive act of ‘information processing’ deflects attention from fruitful directions of research.

Details

Journal of Documentation, vol. 46 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 29 April 2022

Eun Joo Park, Dong-Hyun Kim and Mi Jeong Kim

This study aims to examine whether a text stimulus could enhance students' imagination and thus enhance their creativity in the architectural design studio. The assumption is that…

Abstract

Purpose

This study aims to examine whether a text stimulus could enhance students' imagination and thus enhance their creativity in the architectural design studio. The assumption is that adopting the text stimulus in the conceptual design stage would support students' imagination through a nonlinear design process, and ultimately produce the creative values of design outcomes.

Design/methodology/approach

A curriculum that adopts a text stimulus was developed and used for first-year university students. The aim was to implement an architectural setting to stimulate students' imagination with a framework for creativity evaluation. The study focused not only on the design process that characterizes the generation of concepts and ideas, but also on the processes related to the creative practices that students need for developing their own expression methods to solve problems they encounter.

Findings

The results show that design education that emphasizes the imaginary could enhance students' creative thinking, thus leading to creative design. As a training tool in the design studio, the diversity of interpretation following the text stimulus was revealed to provoke a nonlinear design process and to eventually enhance students' originality, differential and inventiveness, which are associated with the creativity criteria for evaluation.

Originality/value

The study explores the translation of imaginary spaces from text into spatial design as a conceptual tool in order to characterize and support creativity throughout design education in the architectural design studio.

Details

Archnet-IJAR: International Journal of Architectural Research, vol. 16 no. 3
Type: Research Article
ISSN: 2631-6862

Keywords

Article
Publication date: 1 February 1978

W.J. HUTCHINS

The recent report for the Commission of the European Communities on current multilingual activities in the field of scientific and technical information and the 1977 conference on…

Abstract

The recent report for the Commission of the European Communities on current multilingual activities in the field of scientific and technical information and the 1977 conference on the same theme both included substantial sections on operational and experimental machine translation systems, and in its Plan of action the Commission announced its intention to introduce an operational machine translation system into its departments and to support research projects on machine translation. This revival of interest in machine translation may well have surprised many who have tended in recent years to dismiss it as one of the ‘great failures’ of scientific research. What has changed? What grounds are there now for optimism about machine translation? Or is it still a ‘utopian dream’ ? The aim of this review is to give a general picture of present activities which may help readers to reach their own conclusions. After a sketch of the historical background and general aims (section I), it describes operational and experimental machine translation systems of recent years (section II), it continues with descriptions of interactive (man‐machine) systems and machine‐assisted translation (section III), (and it concludes with a general survey of present problems and future possibilities section IV).

Details

Journal of Documentation, vol. 34 no. 2
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 30 March 2012

Marcelo Mendoza

Automatic text categorization has applications in several domains, for example e‐mail spam detection, sexual content filtering, directory maintenance, and focused crawling, among…

Abstract

Purpose

Automatic text categorization has applications in several domains, for example e‐mail spam detection, sexual content filtering, directory maintenance, and focused crawling, among others. Most information retrieval systems contain several components which use text categorization methods. One of the first text categorization methods was designed using a naïve Bayes representation of the text. Currently, a number of variations of naïve Bayes have been discussed. The purpose of this paper is to evaluate naïve Bayes approaches on text categorization introducing new competitive extensions to previous approaches.

Design/methodology/approach

The paper focuses on introducing a new Bayesian text categorization method based on an extension of the naïve Bayes approach. Some modifications to document representations are introduced based on the well‐known BM25 text information retrieval method. The performance of the method is compared to several extensions of naïve Bayes using benchmark datasets designed for this purpose. The method is compared also to training‐based methods such as support vector machines and logistic regression.

Findings

The proposed text categorizer outperforms state‐of‐the‐art methods without introducing new computational costs. It also achieves performance results very similar to more complex methods based on criterion function optimization as support vector machines or logistic regression.

Practical implications

The proposed method scales well regarding the size of the collection involved. The presented results demonstrate the efficiency and effectiveness of the approach.

Originality/value

The paper introduces a novel naïve Bayes text categorization approach based on the well‐known BM25 information retrieval model, which offers a set of good properties for this problem.

Details

International Journal of Web Information Systems, vol. 8 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 July 1999

Du ko Vitas and Cvetana Krstev

Discusses the linguistic influences on an electronic publishing infrastructure in an environment with unstable linguistic standardization from the computational point of view…

Abstract

Discusses the linguistic influences on an electronic publishing infrastructure in an environment with unstable linguistic standardization from the computational point of view. Essentially, in Serbia in the last half of the century (at least) publishing is based on the following facts: two alphabetic systems are regularly in use with the possibility to mix both alphabets in the same document; the various dialects are accepted as a part of a linguistic norm; orthography is unstable ‐ presently, several linguistic attitudes that have different views of the orthographic norm are under discussion; and, in Serbia, many minority languages are in use, which makes it difficult to provide efficient contact between different communities through electronic publishing. In this context, a systematic solution that responds to this complex situation has not been developed in the frame of traditional Serbian linguistics and lexicography in a way that enables the adequate incorporation of the new publishing technologies. Owing to these constraints, the direct application of electronic publishing tools frequently causes the degradation of the linguistic message. In such an environment, the promotion of electronic publishing therefore needs specific solutions. The paper discusses the general frame based on the specifically encoded system of electronic dictionaries that makes electronic texts independent of some of the mentioned constraints. The objective of such a frame is to enable the linguistic normalization of texts at the level of their internal representation, and to establish bridges for communicating with other language societies. Some aspects of electronic text representation that ensures its correct interpretation in different graphical systems and in different dialects are described. This also allows text indexing and retrieval using the same techniques that are available for languages not burdened with these problems.

Details

New Library World, vol. 100 no. 4
Type: Research Article
ISSN: 0307-4803

Keywords

Article
Publication date: 8 July 2022

Chuanming Yu, Zhengang Zhang, Lu An and Gang Li

In recent years, knowledge graph completion has gained increasing research focus and shown significant improvements. However, most existing models only use the structures of…

Abstract

Purpose

In recent years, knowledge graph completion has gained increasing research focus and shown significant improvements. However, most existing models only use the structures of knowledge graph triples when obtaining the entity and relationship representations. In contrast, the integration of the entity description and the knowledge graph network structure has been ignored. This paper aims to investigate how to leverage both the entity description and the network structure to enhance the knowledge graph completion with a high generalization ability among different datasets.

Design/methodology/approach

The authors propose an entity-description augmented knowledge graph completion model (EDA-KGC), which incorporates the entity description and network structure. It consists of three modules, i.e. representation initialization, deep interaction and reasoning. The representation initialization module utilizes entity descriptions to obtain the pre-trained representation of entities. The deep interaction module acquires the features of the deep interaction between entities and relationships. The reasoning component performs matrix manipulations with the deep interaction feature vector and entity representation matrix, thus obtaining the probability distribution of target entities. The authors conduct intensive experiments on the FB15K, WN18, FB15K-237 and WN18RR data sets to validate the effect of the proposed model.

Findings

The experiments demonstrate that the proposed model outperforms the traditional structure-based knowledge graph completion model and the entity-description-enhanced knowledge graph completion model. The experiments also suggest that the model has greater feasibility in different scenarios such as sparse data, dynamic entities and limited training epochs. The study shows that the integration of entity description and network structure can significantly increase the effect of the knowledge graph completion task.

Originality/value

The research has a significant reference for completing the missing information in the knowledge graph and improving the application effect of the knowledge graph in information retrieval, question answering and other fields.

Details

Aslib Journal of Information Management, vol. 75 no. 3
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 21 June 2023

Parvin Reisinezhad and Mostafa Fakhrahmad

Questionnaire studies of knowledge, attitude and practice (KAP) are effective research in the field of health, which have many shortcomings. The purpose of this research is to…

Abstract

Purpose

Questionnaire studies of knowledge, attitude and practice (KAP) are effective research in the field of health, which have many shortcomings. The purpose of this research is to propose an automatic questionnaire-free method based on deep learning techniques to address the shortcomings of common methods. Next, the aim of this research is to use the proposed method with public comments on Twitter to get the gaps in KAP of people regarding COVID-19.

Design/methodology/approach

In this paper, two models are proposed to achieve the mentioned purposes, the first one for attitude and the other for people’s knowledge and practice. First, the authors collect some tweets from Twitter and label them. After that, the authors preprocess the collected textual data. Then, the text representation vector for each tweet is extracted using BERT-BiGRU or XLNet-GRU. Finally, for the knowledge and practice problem, a multi-label classifier with 16 classes representing health guidelines is proposed. Also, for the attitude problem, a multi-class classifier with three classes (positive, negative and neutral) is proposed.

Findings

Labeling quality has a direct relationship with the performance of the final model, the authors calculated the inter-rater reliability using the Krippendorf alpha coefficient, which shows the reliability of the assessment in both problems. In the problem of knowledge and practice, 87% and in the problem of people’s attitude, 95% agreement was reached. The high agreement obtained indicates the reliability of the dataset and warrants the assessment. The proposed models in both problems were evaluated with some metrics, which shows that both proposed models perform better than the common methods. Our analyses for KAP are more efficient than questionnaire methods. Our method has solved many shortcomings of questionnaires, the most important of which is increasing the speed of evaluation, increasing the studied population and receiving reliable opinions to get accurate results.

Research limitations/implications

Our research is based on social network datasets. This data cannot provide the possibility to discover the public information of users definitively. Addressing this limitation can have a lot of complexity and little certainty, so in this research, the authors presented our final analysis independent of the public information of users.

Practical implications

Combining recurrent neural networks with methods based on the attention mechanism improves the performance of the model and solves the need for large training data. Also, using these methods is effective in the process of improving the implementation of KAP research and eliminating its shortcomings. These results can be used in other text processing tasks and cause their improvement. The results of the analysis on the attitude, practice and knowledge of people regarding the health guidelines lead to the effective planning and implementation of health decisions and interventions and required training by health institutions. The results of this research show the effective relationship between attitude, practice and knowledge. People are better at following health guidelines than being aware of COVID-19. Despite many tensions during the epidemic, most people still discuss the issue with a positive attitude.

Originality/value

To the best of our knowledge, so far, no text processing-based method has been proposed to perform KAP research. Also, our method benefits from the most valuable data of today’s era (i.e. social networks), which is the expression of people’s experiences, facts and free opinions. Therefore, our final analysis provides more realistic results.

Details

Kybernetes, vol. 52 no. 7
Type: Research Article
ISSN: 0368-492X

Keywords

1 – 10 of over 19000