Search results

1 – 10 of 139
Article
Publication date: 1 July 2014

Byung-Won On, Gyu Sang Choi and Soo-Mok Jung

The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case…

Abstract

Purpose

The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case study of the name authority control problem in DLs.

Design/methodology/approach

To find a sample of name variants across DLs (e.g. DBLP and ACM) and in a single DL (e.g. ACM), the approach is based on two bipartite matching algorithms: Maximum Weighted Bipartite Matching and Maximum Cardinality Bipartite Matching.

Findings

First, the authors validated the effectiveness and efficiency of the bipartite matching algorithms. The authors also studied the nature of real cases of author name variants that had been found across DLs (e.g. ACM, CiteSeer and DBLP) and in a single DL.

Originality/value

To the best of the authors knowledge, there is less research effort to understand the nature of author name variants shown in DLs. A thorough analysis can help focus research effort on real problems that arise when the authors perform duplicate detection methods.

Details

Program, vol. 48 no. 3
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 9 December 2019

Noor Arshad, Abu Bakar, Saira Hanif Soroya, Iqra Safder, Sajjad Haider, Saeed-Ul Hassan, Naif Radi Aljohani, Salem Alelyani and Raheel Nawaz

The purpose of this paper is to present a novel approach for mining scientific trends using topics from Call for Papers (CFP). The work contributes a valuable input for…

366

Abstract

Purpose

The purpose of this paper is to present a novel approach for mining scientific trends using topics from Call for Papers (CFP). The work contributes a valuable input for researchers, academics, funding institutes and research administration departments by sharing the trends to set directions of research path.

Design/methodology/approach

The authors procure an innovative CFP data set to analyse scientific evolution and prestige of conferences that set scientific trends using scientific publications indexed in DBLP. Using the Field of Research code 804 from Australian Research Council, the authors identify 146 conferences (from 2006 to 2015) into different thematic areas by matching the terms extracted from publication titles with the Association for Computing Machinery Computing Classification System. Furthermore, the authors enrich the vocabulary of terms from the WordNet dictionary and Growbag data set. To measure the significance of terms, the authors adopt the following weighting schemas: probabilistic, gram, relative, accumulative and hierarchal.

Findings

The results indicate the rise of “big data analytics” from CFP topics in the last few years. Whereas the topics related to “privacy and security” show an exponential increase, the topics related to “semantic web” show a downfall in recent years. While analysing publication output in DBLP that matches CFP indexed in ERA Core A* to C rank conference, the authors identified that A* and A tier conferences not merely set publication trends, since B or C tier conferences target similar CFP.

Originality/value

Overall, the analyses presented in this research are prolific for the scientific community and research administrators to study research trends and better data management of digital libraries pertaining to the scientific literature.

Details

Library Hi Tech, vol. 40 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 17 April 2007

Hao Ding and Ingeborg Sølvberg

The purpose of this research is to describe a system to support querying across distributed digital libraries created in heterogeneous metadata schemas, without requiring the…

Abstract

Purpose

The purpose of this research is to describe a system to support querying across distributed digital libraries created in heterogeneous metadata schemas, without requiring the availability of a global schema.

Design/methodology/approach

The advantages and weaknesses of ontology based applications were investigated and have justified the utility of inferential rules in expressing complex relations between metadata terms in different metadata schemas. A process for combining ontologies and rules for specifying complex relations between metadata schemas were designed. The process was collapsed into a set of working phases and provides examples to illustrate how to interrelate two similar bibliographic ontology fragments for further query reformulation.

Findings

Equipping ontologies with inferencing power can help describe more complex relations between metadata terms. This approach is critical for properly interpreting queries from one ontology to another.

Research limitations/implications

A prototype system was built based on examples instead of practical experience.

Practical implications

The approach assumes that relations between metadata sets, or ontologies in the approach, are provided by domain experts with/without ontology tools.

Originality/value

A new approach has been proposed for facilitating heterogeneous metadata interoperation in digital libraries as a way of empowering ontologies with rich reasoning capabilities. The traditional approach assumes a global schema controlled by a central or virtual server to provide mapping between local and external metadata schemas. A more flexible and dynamic environment was studied, i.e. P2P‐based digital libraries, where peers may join and leave freely.

Details

The Electronic Library, vol. 25 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 10 December 2018

Luciano Barbosa

Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that…

Abstract

Purpose

Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that learns different representations of Web entities for entity resolution.

Design/methodology/approach

To match Web entities, the proposed network learns the following representations of entities: embeddings, which are vector representations of the words in the entities in a low-dimensional space; convolutional vectors from a convolutional layer, which capture short-distance patterns in word sequences in the entities; and bag-of-word vectors, created by a bow layer that learns weights for words in the vocabulary based on the task at hand. Given a pair of entities, the similarity between their learned representations is used as a feature to a binary classifier that identifies a possible match. In addition to those features, the classifier also uses a modification of inverse document frequency for pairs, which identifies discriminative words in pairs of entities.

Findings

The proposed approach was evaluated in two commercial and two academic entity resolution benchmarking data sets. The results have shown that the proposed strategy outperforms previous approaches in the commercial data sets, which are more challenging, and have similar results to its competitors in the academic data sets.

Originality/value

No previous work has used a single deep learning framework to learn different representations of Web entities for entity resolution.

Details

International Journal of Web Information Systems, vol. 15 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 28 February 2019

Muhammad Farooq, Hikmat Ullah Khan, Tassawar Iqbal and Saqib Iqbal

Bibliometrics is one of the research fields in library and information science that deals with the analysis of academic entities. In this regard, to gauge the productivity and…

Abstract

Purpose

Bibliometrics is one of the research fields in library and information science that deals with the analysis of academic entities. In this regard, to gauge the productivity and popularity of authors, publication counts and citation counts are common bibliometric measures. Similarly, the significance of a journal is measured using another bibliometric measure, impact factor. However, scarce attention has been paid to find the impact and productivity of conferences using these bibliometric measures. Moreover, the application of the existing techniques rarely finds the impact of conferences in a distinctive manner. The purpose of this paper is to propose and compare the DS-index with existing bibliometric indices, such as h-index, g-index and R-index, to study and rank conferences distinctively based on their significance.

Design/methodology/approach

The DS-index is applied to the self-developed large DBLP data set having publication data over 50 years covering more than 10,000 conferences.

Findings

The empirical results of the proposed index are compared with the existing indices using the standard performance evaluation measures. The results confirm that the DS-index performs better than other indices in ranking the conferences in a distinctive manner.

Originality/value

Scarce attention is paid to rank conferences in distinctive manner using bibliometric measures. In addition, exploiting the DS-index to assign unique ranks to the different conferences makes this research work novel.

Details

The Electronic Library , vol. 37 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 16 February 2022

Maedeh Mosharraf

The purpose of the paper is to propose a semantic model for describing open source software (OSS) in a machine–human understandable format. The model is extracted to support…

Abstract

Purpose

The purpose of the paper is to propose a semantic model for describing open source software (OSS) in a machine–human understandable format. The model is extracted to support source code reusing and revising as the two primary targets of OSS through a systematic review of related documents.

Design/methodology/approach

Conducting a systematic review, all the software reusing criteria are identified and introduced to the web of data by an ontology for OSS (O4OSS). The software semantic model introduced in this paper explores OSS through triple expressions in which the O4OSS properties are predicates.

Findings

This model improves the quality of web data by describing software in a structured machine–human readable profile, which is linked to the related data that was previously published on the web. Evaluating the OSS semantic model is accomplished through comparing it with previous approaches, comparing the software structured metadata with profile index of software in some well-known repositories, calculating the software retrieval rank and surveying domain experts.

Originality/value

Considering context-specific information and authority levels, the proposed software model would be applicable to any open and close software. Using this model to publish software provides an infrastructure of connected meaningful data and helps developers overcome some specific challenges. By navigating software data, many questions which can be answered only through reading multiple documents can be automatically responded on the web of data.

Details

Aslib Journal of Information Management, vol. 75 no. 4
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 17 August 2015

Takahiro Komamizu, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find…

Abstract

Purpose

The purpose of this paper is to extract appropriate terms to summarize the current results in terms of the contents of textual facets. Faceted search on XML data helps users find necessary information from XML data by giving attribute–content pairs (called facet-value pair) about the current search results. However, if most of the contents of a facet have longer texts in average (such facets are called textual facets), it is not easy to overview the current results.

Design/methodology/approach

The proposed approach is based upon subsumption relationships of terms among the contents of a facet. The subsumption relationship can be extracted using co-occurrences of terms among a number of documents (in this paper, a content of a facet is considered as a document). Subsumption relationships compose hierarchies, and the authors utilize the hierarchies to extract facet-values from textual facets. In the faceted search context, users have ambiguous search demands, they expect broader terms. Thus, we extract high-level terms in the hierarchies as facet-values.

Findings

The main findings of this paper are the extracted terms improve users’ search experiences, especially in cases when the search demands are ambiguous.

Originality/value

An originality of this paper is the way to utilize the textual contents of XML data for improving users’ search experiences on faceted search. The other originality is how to design the tasks to evaluate exploratory search like faceted search.

Details

International Journal of Web Information Systems, vol. 11 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 11 October 2018

Ali Daud, Tehmina Amjad, Muazzam Ahmed Siddiqui, Naif Radi Aljohani, Rabeeh Ayaz Abbasi and Muhammad Ahtisham Aslam

Citation analysis is an important measure for the assessment of quality and impact of academic entities (authors, papers and publication venues) used for ranking of research…

3088

Abstract

Purpose

Citation analysis is an important measure for the assessment of quality and impact of academic entities (authors, papers and publication venues) used for ranking of research articles, authors and publication venues. It is a common observation that high-level publication venues, with few exceptions (Nature, Science and PLOS ONE), are usually topic specific. The purpose of this paper is to investigate the claim correlation analysis between topic specificity and citation count of different types of publication venues (journals, conferences and workshops).

Design/methodology/approach

The topic specificity was calculated using the information theoretic measure of entropy (which tells us about the disorder of the system). The authors computed the entropy of the titles of the papers published in each venue type to investigate their topic specificity.

Findings

It was observed that venues usually with higher citations (high-level publication venues) have low entropy and venues with lesser citations (not-high-level publication venues) have high entropy. Low entropy means less disorder and more specific to topic and vice versa. The input data considered here were DBLP-V7 data set for the last 10 years. Experimental analysis shows that topic specificity and citation count of publication venues are negatively correlated to each other.

Originality/value

This paper is the first attempt to discover correlation between topic sensitivity and citation counts of publication venues. It also used topic specificity as a feature to rank academic entities.

Details

Library Hi Tech, vol. 37 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 9 September 2014

Somu Renugadevi, T.V. Geetha, R.L. Gayathiri, S. Prathyusha and T. Kaviya

The purpose of this paper is to propose the Collaborative Search System that attempts to achieve collaboration by implicitly identifying and reflecting search behaviour of…

Abstract

Purpose

The purpose of this paper is to propose the Collaborative Search System that attempts to achieve collaboration by implicitly identifying and reflecting search behaviour of collaborators in an academic network that is automatically and dynamically formed. By using the constructed Collaborative Hit Matrix (CHM), results are obtained that are based on the search behaviour and earned preferences of specialist communities of researchers, which are relevant to the user's need and reduce the time spent on bad links.

Design/methodology/approach

By using the Digital Bibliography Library Project (DBLP), the research communities are formed implicitly and dynamically based on the users’ research presence in the search environment and in the publication scenario, which is also used to assign users’ roles and establish links between the users. The CHM, to store the hit count and hit list of page results for queries, is also constructed and updated after every search session to enhance the collaborative search among the researchers.

Findings

The implicit researchers community formation, the assignment and dynamic updating of roles of the researchers based on research, search presence and search behaviour on the web as well as the usage of these roles during Collaborative Web Search have highly improved the relevancy of results. The CHM that holds the collaborative responses provided by the researchers on the search query results to support searching distinguishes this system from others. Thus the proposed system considerably improves the relevancy and reduces the time spent on bad links, thus improving recall and precision.

Originality/value

The research findings illustrate the better performance of the system, by connecting researchers working in the same field and allowing them to help each other in a web search environment.

Details

Aslib Journal of Information Management, vol. 66 no. 5
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 8 August 2008

Alexander Ivanyukovich, Maurizio Marchese and Fausto Giunchiglia

The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.

1248

Abstract

Purpose

The purpose of this paper is to provide support for automation of the annotation process of large corpora of digital content.

Design/methodology/approach

The paper presents and discusses an information extraction pipeline from digital document acquisition to information extraction, processing and management. An overall architecture that supports such an extraction pipeline is detailed and discussed.

Findings

The proposed pipeline is implemented in a working prototype of an autonomous digital library (A‐DL) system called ScienceTreks that: supports a broad range of methods for document acquisition; does not rely on any external information sources and is solely based on the existing information in the document itself and in the overall set in a given digital archive; and provides application programming interfaces (API) to support easy integration of external systems and tools in the existing pipeline.

Practical implications

The proposed A‐DL system can be used in automating end‐to‐end information retrieval and processing, supporting the control and elimination of error‐prone human intervention in the process.

Originality/value

High quality automatic metadata extraction is a crucial step in the move from linguistic entities to logical entities, relation information and logical relations, and therefore to the semantic level of digital library usability. This in turn creates the opportunity for value‐added services within existing and future semantic‐enabled digital library systems.

Details

Online Information Review, vol. 32 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of 139