Search results
1 – 10 of over 19000The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set…
Abstract
The objective of the paper is to amalgamate theories of text retrieval from various research traditions into a cognitive theory for information retrieval interaction. Set in a cognitive framework, the paper outlines the concept of polyrepresentation applied to both the user's cognitive space and the information space of IR systems. The concept seeks to represent the current user's information need, problem state, and domain work task or interest in a structure of causality. Further, it implies that we should apply different methods of representation and a variety of IR techniques of different cognitive and functional origin simultaneously to each semantic full‐text entity in the information space. The cognitive differences imply that by applying cognitive overlaps of information objects, originating from different interpretations of such objects through time and by type, the degree of uncertainty inherent in IR is decreased. Polyrepresentation and the use of cognitive overlaps are associated with, but not identical to, data fusion in IR. By explicitly incorporating all the cognitive structures participating in the interactive communication processes during IR, the cognitive theory provides a comprehensive view of these processes. It encompasses the ad hoc theories of text retrieval and IR techniques hitherto developed in mainstream retrieval research. It has elements in common with van Rijsbergen and Lalmas' logical uncertainty theory and may be regarded as compatible with that conception of IR. Epistemologically speaking, the theory views IR interaction as processes of cognition, potentially occurring in all the information processing components of IR, that may be applied, in particular, to the user in a situational context. The theory draws upon basic empirical results from information seeking investigations in the operational online environment, and from mainstream IR research on partial matching techniques and relevance feedback. By viewing users, source systems, intermediary mechanisms and information in a global context, the cognitive perspective attempts a comprehensive understanding of essential IR phenomena and concepts, such as the nature of information needs, cognitive inconsistency and retrieval overlaps, logical uncertainty, the concept of ‘document’, relevance measures and experimental settings. An inescapable consequence of this approach is to rely more on sociological and psychological investigative methods when evaluating systems and to view relevance in IR as situational, relative, partial, differentiated and non‐linear. The lack of consistency among authors, indexers, evaluators or users is of an identical cognitive nature. It is unavoidable, and indeed favourable to IR. In particular, for full‐text retrieval, alternative semantic entities, including Salton et al.'s ‘passage retrieval’, are proposed to replace the traditional document record as the basic retrieval entity. These empirically observed phenomena of inconsistency and of semantic entities and values associated with data interpretation support strongly a cognitive approach to IR and the logical use of polyrepresentation, cognitive overlaps, and both data fusion and data diffusion.
Chien-Yi Hsiang and Julia Taylor Rayz
This study aims to predict popular contributors through text representations of user-generated content in open crowds.
Abstract
Purpose
This study aims to predict popular contributors through text representations of user-generated content in open crowds.
Design/methodology/approach
Three text representation approaches – count vector, Tf-Idf vector, word embedding and supervised machine learning techniques – are used to generate popular contributor predictions.
Findings
The results of the experiments demonstrate that popular contributor predictions are considered successful. The F1 scores are all higher than the baseline model. Popular contributors in open crowds can be predicted through user-generated content.
Research limitations/implications
This research presents brand new empirical evidence drawn from text representations of user-generated content that reveals why some contributors' ideas are more viral than others in open crowds.
Practical implications
This research suggests that companies can learn from popular contributors in ways that help them improve customer agility and better satisfy customers' needs. In addition to boosting customer engagement and triggering discussion, popular contributors' ideas provide insights into the latest trends and customer preferences. The results of this study will benefit marketing strategy, new product development, customer agility and management of information systems.
Originality/value
The paper provides new empirical evidence for popular contributor prediction in an innovation crowd through text representation approaches.
Details
Keywords
A rule‐governed derivation of an indexing phrase from the text of a document is, in Wittgenstein's sense, a practice, rather than a mental operation explained by reference…
Abstract
A rule‐governed derivation of an indexing phrase from the text of a document is, in Wittgenstein's sense, a practice, rather than a mental operation explained by reference to internally represented and tacitly known rules. Some mentalistic proposals for theory in information retrieval are criticised in light of Wittgenstein's remarks on following a rule. The conception of rules as practices shifts the theoretical significance of the social role of retrieval practices from the margins to the centre of enquiry into foundations of information retrieval. The abstracted notion of a cognitive act of ‘information processing’ deflects attention from fruitful directions of research.
Valery J. Frants, Jacob Shapiro and Vladimir G. Voiskunskii
N.J. BELKIN, R.N. ODDY and H.M. BROOKS
In ‘ASK for Information Retrieval: Part I’, we discussed the theory and background to a design study for an information retrieval (IR) system based on the attempt to…
Abstract
In ‘ASK for Information Retrieval: Part I’, we discussed the theory and background to a design study for an information retrieval (IR) system based on the attempt to represent the anomalous states of knowledge (ASKs) underlying information needs. In Part II, we report the methods and results of the design study, and our conclusions.
The recent report for the Commission of the European Communities on current multilingual activities in the field of scientific and technical information and the 1977…
Abstract
The recent report for the Commission of the European Communities on current multilingual activities in the field of scientific and technical information and the 1977 conference on the same theme both included substantial sections on operational and experimental machine translation systems, and in its Plan of action the Commission announced its intention to introduce an operational machine translation system into its departments and to support research projects on machine translation. This revival of interest in machine translation may well have surprised many who have tended in recent years to dismiss it as one of the ‘great failures’ of scientific research. What has changed? What grounds are there now for optimism about machine translation? Or is it still a ‘utopian dream’ ? The aim of this review is to give a general picture of present activities which may help readers to reach their own conclusions. After a sketch of the historical background and general aims (section I), it describes operational and experimental machine translation systems of recent years (section II), it continues with descriptions of interactive (man‐machine) systems and machine‐assisted translation (section III), (and it concludes with a general survey of present problems and future possibilities section IV).
Automatic text categorization has applications in several domains, for example e‐mail spam detection, sexual content filtering, directory maintenance, and focused…
Abstract
Purpose
Automatic text categorization has applications in several domains, for example e‐mail spam detection, sexual content filtering, directory maintenance, and focused crawling, among others. Most information retrieval systems contain several components which use text categorization methods. One of the first text categorization methods was designed using a naïve Bayes representation of the text. Currently, a number of variations of naïve Bayes have been discussed. The purpose of this paper is to evaluate naïve Bayes approaches on text categorization introducing new competitive extensions to previous approaches.
Design/methodology/approach
The paper focuses on introducing a new Bayesian text categorization method based on an extension of the naïve Bayes approach. Some modifications to document representations are introduced based on the well‐known BM25 text information retrieval method. The performance of the method is compared to several extensions of naïve Bayes using benchmark datasets designed for this purpose. The method is compared also to training‐based methods such as support vector machines and logistic regression.
Findings
The proposed text categorizer outperforms state‐of‐the‐art methods without introducing new computational costs. It also achieves performance results very similar to more complex methods based on criterion function optimization as support vector machines or logistic regression.
Practical implications
The proposed method scales well regarding the size of the collection involved. The presented results demonstrate the efficiency and effectiveness of the approach.
Originality/value
The paper introduces a novel naïve Bayes text categorization approach based on the well‐known BM25 information retrieval model, which offers a set of good properties for this problem.
Details
Keywords
Du ko Vitas and Cvetana Krstev
Discusses the linguistic influences on an electronic publishing infrastructure in an environment with unstable linguistic standardization from the computational point of…
Abstract
Discusses the linguistic influences on an electronic publishing infrastructure in an environment with unstable linguistic standardization from the computational point of view. Essentially, in Serbia in the last half of the century (at least) publishing is based on the following facts: two alphabetic systems are regularly in use with the possibility to mix both alphabets in the same document; the various dialects are accepted as a part of a linguistic norm; orthography is unstable ‐ presently, several linguistic attitudes that have different views of the orthographic norm are under discussion; and, in Serbia, many minority languages are in use, which makes it difficult to provide efficient contact between different communities through electronic publishing. In this context, a systematic solution that responds to this complex situation has not been developed in the frame of traditional Serbian linguistics and lexicography in a way that enables the adequate incorporation of the new publishing technologies. Owing to these constraints, the direct application of electronic publishing tools frequently causes the degradation of the linguistic message. In such an environment, the promotion of electronic publishing therefore needs specific solutions. The paper discusses the general frame based on the specifically encoded system of electronic dictionaries that makes electronic texts independent of some of the mentioned constraints. The objective of such a frame is to enable the linguistic normalization of texts at the level of their internal representation, and to establish bridges for communicating with other language societies. Some aspects of electronic text representation that ensures its correct interpretation in different graphical systems and in different dialects are described. This also allows text indexing and retrieval using the same techniques that are available for languages not burdened with these problems.
Details
Keywords
The purpose of this paper is to examine what counts as knowledge in the organization/management field.
Abstract
Purpose
The purpose of this paper is to examine what counts as knowledge in the organization/management field.
Design/methodology/approach
Conventional, legitimated knowledge is analyzed through research into representations of an influential management text. Management and management accounting textbooks and research papers are investigated to establish the types of knowledge produced.
Findings
Mainstream representations of this book are partial, focusing on a “model” of what is likely to ensure successful organizational change – structural and systemic adaptations. What has been ignored is the problematization of structural change and the role of human agency. The foci and omissions of these representations cohere with divisions in the social sciences more generally – between “objectivist” and “subjectivist” ontologies and epistemologies.
Research limitations/implications
There is a need for further research into representations of texts about organizational change, the way the objectivist/subjectivist divide is played out, and its significance for organization/management studies and more widely for the social sciences.
Practical implications
Questions arise as to the validity and sustainability of such knowledge. Omissions about the difficulties in implementing structural change raise epistemological and practical difficulties for students, managers and consultants.
Social implications
Omissions of human subjectivities and agency from mainstream knowledge is problematic regarding successful organizational change and social issues more widely.
Originality/value
The paper's value lies in the in‐depth analysis of representations of a text in the organization/management area and the linking of the type of knowledge produced with broader epistemological and methodological issues in the social sciences.
Details