Search results
1 – 10 of over 6000Heng-Yang Lu, Yi Zhang and Yuntao Du
Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…
Abstract
Purpose
Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.
Design/methodology/approach
SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.
Findings
Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.
Originality/value
The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.
Details
Keywords
This article reviews the state of the art in automatic indexing, that is, automatic techniques for analysing and characterising documents, for manipulating their descriptions in…
Abstract
This article reviews the state of the art in automatic indexing, that is, automatic techniques for analysing and characterising documents, for manipulating their descriptions in searching, and for generating the index language used for these purposes. It concentrates on the literature from 1968 to 1973. Section I defines the topic and its context. Sections II and III consider work in syntax and semantics respectively in detail. Section IV comments on ‘indirect’ indexing. Section V briefly surveys operating mechanized systems. In Section VI major experiments in automatic indexing are reviewed, and Section VII attempts an overall conclusion on the current state of automatic indexing techniques.
Shahram Sedghi and Somayeh Ghaffari Heshajin
Genetics, a discipline of biology, is one of the most recent and rapidly advancing disciplines in science. This study aims to present a bibliometric analysis of the genetics…
Abstract
Purpose
Genetics, a discipline of biology, is one of the most recent and rapidly advancing disciplines in science. This study aims to present a bibliometric analysis of the genetics research output of Iranian authors, map the intellectual structure of these studies and investigate the development path of this literature and the interrelationships among the main topics.
Design/methodology/approach
This study searched the Web of Science database for documentation of Iranian-published genetics research published up to 2020. Further, this study used HistCite software to profile and analyze the most cited articles and references and to draw their historiographies.
Findings
A database search revealed 21,329 documents that created the study population. The highest cited publications based on the Global Citation Score (GCS) and Local Citation Score (LCS) achieved scores of 602 and 47, respectively. The publication growth rate study demonstrated consistent expansion over time. The scientific maps based on LCS and GCS had five and four clusters, respectively. Furthermore, journal articles emerged as the predominant type of publication.
Practical implications
The significance of this study is in its contribution to understanding the genetics research position in Iran, informing policymakers and researchers, helping scientific collaboration and its impact on public attitudes and quality of life. The results of the present study, with benefits for various groups of communities, such as policymakers, academic groups and public society, can bridge the gap between theoretical research and practical implications.
Social implications
The results of this study, by helping future advancement in health care, medical genetics and disease prevention, may have a direct and indirect positive influence on the quality of life. Furthermore, it may lead to more informed discussions on health care and biotechnology as well as influencing public attitudes and perceptions.
Originality/value
Ultimately, this study concludes that despite the proliferation of publications in terms of quantity and complexity, especially in areas such as disease diagnosis, prevention and treatment, there remains a need for more attention to other facets of genetics such as biology and biotechnology. Iranian publications are most related to population genetics, human genetics, molecular genetics, medical genetics, genomics, developmental genetics and evolutionary genetics out of 10 branches of genetics. This study reveals patterns in scientific outputs and authorship collaborations and plays an alternative and innovative role in revealing Iranian research trends in genetics.
Details
Keywords
JOHN E. BURNETT, DAVID COOPER, MICHAEL F. LYNCH, PETER WILLETT and MAUREEN WYCHERLEY
A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections. The…
Abstract
A study has been made of the effect of controlled variations in indexing vocabulary size on retrieval performance using the Cranfield 200 and 1400 test collections. The vocabularies considered are sets of variable‐length character strings chosen from the fronts of document and query terms so as to occur with approximate equifrequency. Sets containing between 120 and 720 members were tested both using an application of the Cluster Hypothesis and in a series of linear associative retrieval experiments. The effectiveness of the smaller sets is low but the larger ones exhibit retrieval characteristics comparable to those of words.
Charles Abiodun Robert and Maduka Kingsley Attamah
The purpose of this paper is to provide a solution to discontinuity experienced when rendering web‐based multimedia due to buffering; and a solution to access to specific region…
Abstract
Purpose
The purpose of this paper is to provide a solution to discontinuity experienced when rendering web‐based multimedia due to buffering; and a solution to access to specific region of web‐based multimedia rendering for annotation.
Design/methodology/approach
The methodology used was based on a Javascript multiplexer to segment and multiplex multimedia document for separate access to sections of the document.
Findings
The paper shows that without extra equipment or software, users of web multimedia documents can access documents just like cards are accessed in a card game.
Practical implications
With this approach, it is possible for a group of users to share a multimedia document on the internet with their specific comments on a specific segment of the document. It is also possible to evaluate users in a group based on their comments and their environment.
Originality/value
The paper shows a new way of sharing multimedia web documents integrating users' perspectives for knowledge management
Details
Keywords
A simple notation for describing the internal structure of a document is presented, and contrasted with other, more conventional notations for describing documents, in particular…
Abstract
A simple notation for describing the internal structure of a document is presented, and contrasted with other, more conventional notations for describing documents, in particular those related to subject‐classification systems and document description for bibliographic purposes, as well as with document metalanguage codes such as those of SGML. It is suggested such a notation should assist the science of human messaging through (1) permitting hypotheses to be more readily expressed and/or tested concerning document structure, and (2) facilitating the formation of taxonomies of documents based on their structures. Such a notation should also be of practical value in contributing to the processes of document specification, building and testing, and possibly also contribute to new generations of IR systems which link retrieval against record databases to the search systems internal to specific documents. It is suggested that, following formative criticism, professional standards for describing document structure should be sought based on the notation. The notation is at present limited to linear documents, but extensions to it to accommodate documents in non‐linear form (e.g. hypertext documents) and/or existing in physically distributed form, could usefully be constructed. Examples of the application of the notation are provided.
The technological revolution is affecting the structure, form and content of documents, reducing the effectiveness of traditional abstracts that, to some extent, are inadequate to…
Abstract
The technological revolution is affecting the structure, form and content of documents, reducing the effectiveness of traditional abstracts that, to some extent, are inadequate to the new documentary conditions. Aims to show the directions in which abstracting/abstracts can evolve to achieve the necessary adequacy in the new digital environments. Three researching trends are proposed: theoretical, methodological and pragmatic. Theoretically, there are some needs for expanding the document concept, reengineering abstracting and designing interdisciplinary models. Methodologically, the trend is toward the structuring, automating and qualifying of the abstracts. Pragmatically, abstracts networking, combined with alternative and complementary models, open a new and promising horizon. Automating, structuring and qualifying abstracting/abstract offer some short‐term prospects for progress. Concludes that reengineering, networking and visualising would be middle‐term fruitful areas of research toward the full adequacy of abstracting in the new electronic age.
Details
Keywords
The recent copy of the ACM Computing Surveys, Vol. 14, 3, 1982, deals with some of the problems of the “User‐Interface”. All who use computer systems, and particularly the…
Abstract
The recent copy of the ACM Computing Surveys, Vol. 14, 3, 1982, deals with some of the problems of the “User‐Interface”. All who use computer systems, and particularly the Cybernetician, are concerned at the way in which information is stored, retrieved and edited. All too often, untried methods are implemented, and software engineers fail to search the literature for established and efficient techniques. Two papers in this issue of Computing Surveys brings together details of current editing and user interface developments. The first paper is about using and implementing interactive editing systems and the second, concerned with document formatting systems. Here interactive editing refers to the process of making changes to documents by direct, rather than batched, communication with the computer, during which the user's actions are interleaved with the computer's feedback on the results of each action.
This study examined dossiers of informative pursual (DIPs), a particular type of secret police files, before and after the fall of Communism in Romania. These DIPs were often…
Abstract
Purpose
This study examined dossiers of informative pursual (DIPs), a particular type of secret police files, before and after the fall of Communism in Romania. These DIPs were often weaponized against citizens perceived to be anti-government.
Design/methodology/approach
Based on Buckland's (2017) concept of a document as an object with physical, mental and social parts, the study used thematic analysis to examine volumes of DIPs from 1945 to 1989 Communist Romania as well as several recorded reactions to the DIPs by the victims who were targeted by the Communist secret police.
Findings
Four themes were revealed by the study's findings and discussed within the manuscript: DIPs as unreliable epistemic tools, DIPs as tools to construct the identity of the “People's Enemy,” DIPs as weapons to fight the “People's Enemy” and DIPs as tools that could be used in counterattacks during post-Communism, including in political-economic blackmailing.
Research limitations/implications
There are two major limitations to research of DIPs. First, since many DIPs have been stolen, copied illicitly or even destroyed, it is difficult to articulate precisely their actual or potential social and political effects. Researchers may often detect these effects only indirectly, based on information leaks in the news. Second, many victims of surveillance practices during the Communist period have chosen not to leave records of their reactions to reading the DIPs that targeted them.
Social implications
Current and future comprehensive studies of DIPs can reveal possible parallels between surveillance by the Communist regime and the massive data-collection that occurs in democratic societies, particularly given the increased technical capabilities for processing data in these democratic societies.
Originality/value
Within documentation studies, secret police files and document weaponization have been particularly under-researched, therefore this study contributes to a small body of literature.
Details
Keywords
In Democracy in Chains, Nancy MacLean draws attention to the influence that James M. Buchanan’s work has had on the political economic discourse of the past half century. Buchanan…
Abstract
In Democracy in Chains, Nancy MacLean draws attention to the influence that James M. Buchanan’s work has had on the political economic discourse of the past half century. Buchanan and his collaborators in the Virginia Political Economy tradition have provided intellectual firepower for efforts to delegitimize democratically sanctioned policies aimed at alleviating the dysfunctional consequences of market activity. While MacLean’s account contains some well-documented inaccuracies, her characterization of Buchanan’s agenda is broadly accurate. This chapter assesses Buchanan’s economics in light of the themes raised by MacLean. His work, we shall argue, is a modern manifestation of what Marx termed “vulgar economy,” that is, ruling-class ideology posing as science.
Details