Search results

1 – 10 of over 6000
Article
Publication date: 29 April 2021

Heng-Yang Lu, Yi Zhang and Yuntao Du

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…

Abstract

Purpose

Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.

Design/methodology/approach

SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.

Findings

Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.

Originality/value

The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.

Details

Data Technologies and Applications, vol. 55 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 1 April 1974

KAREN SPARCK JONES

This article reviews the state of the art in automatic indexing, that is, automatic techniques for analysing and characterising documents, for manipulating their descriptions in…

Abstract

This article reviews the state of the art in automatic indexing, that is, automatic techniques for analysing and characterising documents, for manipulating their descriptions in searching, and for generating the index language used for these purposes. It concentrates on the literature from 1968 to 1973. Section I defines the topic and its context. Sections II and III consider work in syntax and semantics respectively in detail. Section IV comments on ‘indirect’ indexing. Section V briefly surveys operating mechanized systems. In Section VI major experiments in automatic indexing are reviewed, and Section VII attempts an overall conclusion on the current state of automatic indexing techniques.

Details

Journal of Documentation, vol. 30 no. 4
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 6 November 2023

Shahram Sedghi and Somayeh Ghaffari Heshajin

Genetics, a discipline of biology, is one of the most recent and rapidly advancing disciplines in science. This study aims to present a bibliometric analysis of the genetics…

Abstract

Purpose

Genetics, a discipline of biology, is one of the most recent and rapidly advancing disciplines in science. This study aims to present a bibliometric analysis of the genetics research output of Iranian authors, map the intellectual structure of these studies and investigate the development path of this literature and the interrelationships among the main topics.

Design/methodology/approach

This study searched the Web of Science database for documentation of Iranian-published genetics research published up to 2020. Further, this study used HistCite software to profile and analyze the most cited articles and references and to draw their historiographies.

Findings

A database search revealed 21,329 documents that created the study population. The highest cited publications based on the Global Citation Score (GCS) and Local Citation Score (LCS) achieved scores of 602 and 47, respectively. The publication growth rate study demonstrated consistent expansion over time. The scientific maps based on LCS and GCS had five and four clusters, respectively. Furthermore, journal articles emerged as the predominant type of publication.

Practical implications

The significance of this study is in its contribution to understanding the genetics research position in Iran, informing policymakers and researchers, helping scientific collaboration and its impact on public attitudes and quality of life. The results of the present study, with benefits for various groups of communities, such as policymakers, academic groups and public society, can bridge the gap between theoretical research and practical implications.

Social implications

The results of this study, by helping future advancement in health care, medical genetics and disease prevention, may have a direct and indirect positive influence on the quality of life. Furthermore, it may lead to more informed discussions on health care and biotechnology as well as influencing public attitudes and perceptions.

Originality/value

Ultimately, this study concludes that despite the proliferation of publications in terms of quantity and complexity, especially in areas such as disease diagnosis, prevention and treatment, there remains a need for more attention to other facets of genetics such as biology and biotechnology. Iranian publications are most related to population genetics, human genetics, molecular genetics, medical genetics, genomics, developmental genetics and evolutionary genetics out of 10 branches of genetics. This study reveals patterns in scientific outputs and authorship collaborations and plays an alternative and innovative role in revealing Iranian research trends in genetics.

Details

Global Knowledge, Memory and Communication, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9342

Keywords

Article
Publication date: 1 March 1979

JOHN E. BURNETT, DAVID COOPER, MICHAEL F. LYNCH, PETER WILLETT and MAUREEN WYCHERLEY

A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections. The…

Abstract

A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections. The vocabularies considered are sets of variable‐length character strings chosen from the fronts of document and query terms so as to occur with approximate equifrequency. Sets containing between 120 and 720 members were tested both using an application of the Cluster Hypothesis and in a series of linear associative retrieval experiments. The effectiveness of the smaller sets is low but the larger ones exhibit retrieval characteristics comparable to those of words.

Details

Journal of Documentation, vol. 35 no. 3
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 16 February 2010

Charles Abiodun Robert and Maduka Kingsley Attamah

The purpose of this paper is to provide a solution to discontinuity experienced when rendering web‐based multimedia due to buffering; and a solution to access to specific region

Abstract

Purpose

The purpose of this paper is to provide a solution to discontinuity experienced when rendering web‐based multimedia due to buffering; and a solution to access to specific region of web‐based multimedia rendering for annotation.

Design/methodology/approach

The methodology used was based on a Javascript multiplexer to segment and multiplex multimedia document for separate access to sections of the document.

Findings

The paper shows that without extra equipment or software, users of web multimedia documents can access documents just like cards are accessed in a card game.

Practical implications

With this approach, it is possible for a group of users to share a multimedia document on the internet with their specific comments on a specific segment of the document. It is also possible to evaluate users in a group based on their comments and their environment.

Originality/value

The paper shows a new way of sharing multimedia web documents integrating users' perspectives for knowledge management

Details

VINE, vol. 40 no. 1
Type: Research Article
ISSN: 0305-5728

Keywords

Article
Publication date: 1 April 1995

M.H. HEINE

A simple notation for describing the internal structure of a document is presented, and contrasted with other, more conventional notations for describing documents, in particular…

55

Abstract

A simple notation for describing the internal structure of a document is presented, and contrasted with other, more conventional notations for describing documents, in particular those related to subject‐classification systems and document description for bibliographic purposes, as well as with document metalanguage codes such as those of SGML. It is suggested such a notation should assist the science of human messaging through (1) permitting hypotheses to be more readily expressed and/or tested concerning document structure, and (2) facilitating the formation of taxonomies of documents based on their structures. Such a notation should also be of practical value in contributing to the processes of document specification, building and testing, and possibly also contribute to new generations of IR systems which link retrieval against record databases to the search systems internal to specific documents. It is suggested that, following formative criticism, professional standards for describing document structure should be sought based on the notation. The notation is at present limited to linear documents, but extensions to it to accommodate documents in non‐linear form (e.g. hypertext documents) and/or existing in physically distributed form, could usefully be constructed. Examples of the application of the notation are provided.

Details

Journal of Documentation, vol. 51 no. 4
Type: Research Article
ISSN: 0022-0418

Article
Publication date: 1 October 2003

Maria Pinto

The technological revolution is affecting the structure, form and content of documents, reducing the effectiveness of traditional abstracts that, to some extent, are inadequate to…

2406

Abstract

The technological revolution is affecting the structure, form and content of documents, reducing the effectiveness of traditional abstracts that, to some extent, are inadequate to the new documentary conditions. Aims to show the directions in which abstracting/abstracts can evolve to achieve the necessary adequacy in the new digital environments. Three researching trends are proposed: theoretical, methodological and pragmatic. Theoretically, there are some needs for expanding the document concept, reengineering abstracting and designing interdisciplinary models. Methodologically, the trend is toward the structuring, automating and qualifying of the abstracts. Pragmatically, abstracts networking, combined with alternative and complementary models, open a new and promising horizon. Automating, structuring and qualifying abstracting/abstract offer some short‐term prospects for progress. Concludes that reengineering, networking and visualising would be middle‐term fruitful areas of research toward the full adequacy of abstracting in the new electronic age.

Details

Journal of Documentation, vol. 59 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 1 February 1983

The recent copy of the ACM Computing Surveys, Vol. 14, 3, 1982, deals with some of the problems of the “User‐Interface”. All who use computer systems, and particularly the…

Abstract

The recent copy of the ACM Computing Surveys, Vol. 14, 3, 1982, deals with some of the problems of the “User‐Interface”. All who use computer systems, and particularly the Cybernetician, are concerned at the way in which information is stored, retrieved and edited. All too often, untried methods are implemented, and software engineers fail to search the literature for established and efficient techniques. Two papers in this issue of Computing Surveys brings together details of current editing and user interface developments. The first paper is about using and implementing interactive editing systems and the second, concerned with document formatting systems. Here interactive editing refers to the process of making changes to documents by direct, rather than batched, communication with the computer, during which the user's actions are interleaved with the computer's feedback on the results of each action.

Details

Kybernetes, vol. 12 no. 2
Type: Research Article
ISSN: 0368-492X

Article
Publication date: 14 November 2022

Iulian Vamanu

This study examined dossiers of informative pursual (DIPs), a particular type of secret police files, before and after the fall of Communism in Romania. These DIPs were often…

Abstract

Purpose

This study examined dossiers of informative pursual (DIPs), a particular type of secret police files, before and after the fall of Communism in Romania. These DIPs were often weaponized against citizens perceived to be anti-government.

Design/methodology/approach

Based on Buckland's (2017) concept of a document as an object with physical, mental and social parts, the study used thematic analysis to examine volumes of DIPs from 1945 to 1989 Communist Romania as well as several recorded reactions to the DIPs by the victims who were targeted by the Communist secret police.

Findings

Four themes were revealed by the study's findings and discussed within the manuscript: DIPs as unreliable epistemic tools, DIPs as tools to construct the identity of the “People's Enemy,” DIPs as weapons to fight the “People's Enemy” and DIPs as tools that could be used in counterattacks during post-Communism, including in political-economic blackmailing.

Research limitations/implications

There are two major limitations to research of DIPs. First, since many DIPs have been stolen, copied illicitly or even destroyed, it is difficult to articulate precisely their actual or potential social and political effects. Researchers may often detect these effects only indirectly, based on information leaks in the news. Second, many victims of surveillance practices during the Communist period have chosen not to leave records of their reactions to reading the DIPs that targeted them.

Social implications

Current and future comprehensive studies of DIPs can reveal possible parallels between surveillance by the Communist regime and the massive data-collection that occurs in democratic societies, particularly given the increased technical capabilities for processing data in these democratic societies.

Originality/value

Within documentation studies, secret police files and document weaponization have been particularly under-researched, therefore this study contributes to a small body of literature.

Book part
Publication date: 19 August 2019

Gary Mongiovi

In Democracy in Chains, Nancy MacLean draws attention to the influence that James M. Buchanan’s work has had on the political economic discourse of the past half century. Buchanan…

Abstract

In Democracy in Chains, Nancy MacLean draws attention to the influence that James M. Buchanan’s work has had on the political economic discourse of the past half century. Buchanan and his collaborators in the Virginia Political Economy tradition have provided intellectual firepower for efforts to delegitimize democratically sanctioned policies aimed at alleviating the dysfunctional consequences of market activity. While MacLean’s account contains some well-documented inaccuracies, her characterization of Buchanan’s agenda is broadly accurate. This chapter assesses Buchanan’s economics in light of the themes raised by MacLean. His work, we shall argue, is a modern manifestation of what Marx termed “vulgar economy,” that is, ruling-class ideology posing as science.

Details

Including a Symposium on Ludwig Lachmann
Type: Book
ISBN: 978-1-78769-862-8

Keywords

1 – 10 of over 6000