Search results

1 – 10 of over 2000
Article
Publication date: 14 September 2015

Michael John Khoo, Jae-wook Ahn, Ceri Binding, Hilary Jane Jones, Xia Lin, Diana Massam and Douglas Tudhope

– The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.

1775

Abstract

Purpose

The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.

Design/methodology/approach

The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records.

Findings

The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies.

Research limitations/implications

The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity.

Practical implications

The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing.

Social implications

The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries.

Originality/value

The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger.

Details

Journal of Documentation, vol. 71 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 12 April 2011

Alejandra Segura, Christian Vidal‐Castro, Víctor Menéndez‐Domínguez, Pedro G. Campos and Manuel Prieto

This paper aims to show the results obtained from the data mining techniques application to learning objects (LO) metadata.

2064

Abstract

Purpose

This paper aims to show the results obtained from the data mining techniques application to learning objects (LO) metadata.

Design/methodology/approach

A general review of the literature was carried out. The authors gathered and pre‐processed the data, and then analyzed the results of data mining techniques applied upon the LO metadata.

Findings

It is possible to extract new knowledge based on learning objects stored in repositories. For example it is possible to identify distinctive features and group learning objects according to them. Semantic relationships can also be found among the attributes that describe learning objects.

Research limitations/implications

In the first section, four test repositories are included for case study. In the second section, the analysis is focused on the most complete repository from the pedagogical point of view.

Originality/value

Many publications report results of analysis on repositories mainly focused on the number, evolution and growth of the learning objects. But, there is a shortage of research using data mining techniques oriented to extract new semantic knowledge based on learning objects metadata.

Details

The Electronic Library, vol. 29 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 27 September 2011

Aleksandar Kovačević, Dragan Ivanović, Branko Milosavljević, Zora Konjović and Dušan Surla

The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific…

1214

Abstract

Purpose

The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific research activity of the University of Novi Sad (CRIS UNS).

Design/methodology/approach

The system is based on machine learning and performs automatic extraction and classification of metadata in eight pre‐defined categories. The extraction task is realised as a classification process. For the purpose of classification each row of text is represented with a vector that comprises different features: formatting, position, characteristics related to the words, etc. Experiments were performed with standard classification models. Both a single classifier with all eight categories and eight individual classifiers were tested. Classifiers were evaluated using the five‐fold cross validation, on a manually annotated corpus comprising 100 scientific papers in PDF format, collected from various conferences, journals and authors' personal web pages.

Findings

Based on the performances obtained on classification experiments, eight separate support vector machines (SVM) models (each of which recognises its corresponding category) were chosen. All eight models were established to have a good performance. The F‐measure was over 85 per cent for almost all of the classifiers and over 90 per cent for most of them.

Research limitations/implications

Automatically extracted metadata cannot be directly entered into CRIS UNS but requires control of the curators.

Practical implications

The proposed system for automatic metadata extraction using support vector machines model was integrated into the software system, CRIS UNS. Metadata extraction has been tested on the publications of researchers from the Department of Mathematics and Informatics of the Faculty of Sciences in Novi Sad. Analysis of extracted metadata from these publications showed that the performance of the system for the previously unseen data is in accordance with that obtained by the cross‐validation from eight separate SVM classifiers. This system will help in the process of synchronising metadata from CRIS UNS with other institutional repositories.

Originality/value

The paper documents a fully automated system for metadata extraction from scientific papers that was developed. The system is based on the SVM classifier and open source tools, and is capable of extracting eight types of metadata from scientific articles of any format that can be converted to PDF. Although developed as part of CRIS UNS, the proposed system can be integrated into other CRIS systems, as well as institutional repositories and library management systems.

Article
Publication date: 9 August 2011

Li Sun

The purpose of this paper is to describe a workflow of automated batch‐loading metadata from existing text to a database.

Abstract

Purpose

The purpose of this paper is to describe a workflow of automated batch‐loading metadata from existing text to a database.

Design/methodology/approach

It introduces a case for the experience of metadata creation at Rutgers University Libraries in a collaborative digital project with the Hoboken Public Library in New Jersey.

Findings

It is found that a well‐designed workflow is crucial to the success of metadata batch loading. It is also found that the metadata manager needs to collaborate with people of different roles and work carefully with data reorganization and transfer.

Practical implications

Metadata creation and management are an integrated component of any digital project. One's experience in metadata batch loading has practical significance that may be incorporated into the practice of other metadata projects. The workflow introduced in this paper will provide a valuable example for librarians and information professionals to consider or redesign their own digital efforts.

Originality/value

Based on a real exercise, this workflow has been proven to be unique and useful. It was, after the writing of this paper, applied to a new collaborative digital project and once again fulfilled the requirements for another batch‐transferring process.

Details

The Electronic Library, vol. 29 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 12 April 2011

Thomas Zschocke and Jan Beniest

The paper seeks to introduce a process for assuring the creation of quality educational metadata based on the ISO/IEC 19796‐1 standard to describe the agricultural learning…

1005

Abstract

Purpose

The paper seeks to introduce a process for assuring the creation of quality educational metadata based on the ISO/IEC 19796‐1 standard to describe the agricultural learning resources in the repository of the Consultative Group on International Agricultural Research (CGIAR).

Design/methodology/approach

The paper describes the general notion of quality in education and in the creation of educational metadata. It introduces a quality framework based on the ISO/IEC 19796‐1 standard on quality management and quality assurance for learning, education and training. This standard consists of a reference framework for the description of quality approaches (RFDQ) to describe, compare, and analyze quality management and quality assurance approaches, which has been adapted to the creation of educational metadata in the context of the learning object repository of the CGIAR.

Findings

In order to achieve consistency in the description of learning resources in a repository through quality educational metadata, a standardized process for metadata creators is essential. The reference framework of the ISO/IEC 19796‐1 standard provides a flexible approach that allows the optimization of the metadata creation process while assuring quality of the descriptive information.

Practical implications

The paper proposes a standardized process for the creation of learning object metadata based on the ISO/IEC 19796‐1 standard, and makes suggestions on how to use the reference framework when adapting a quality model for educational metadata.

Originality/value

ISO/IEC 19796‐1 is a very recent standard with a flexible reference framework to develop a quality model in education and training. It provides a novel approach for organizations maintaining learning repositories that are interested in standardizing the educational metadata creation process, especially when multiple stakeholders are involved.

Details

The Electronic Library, vol. 29 no. 2
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 6 November 2007

Chris Awre and Alma Swan

The purpose of the linking repositories study was to conduct research to identify appropriate sustainable technical and organisational models to support the development of…

Abstract

Purpose

The purpose of the linking repositories study was to conduct research to identify appropriate sustainable technical and organisational models to support the development of end‐user oriented services across repositories. The work covered four overlapping strands: user and community requirements, roles and responsibilities, technical architecture and infrastructure, and business and management models.

Design/methodology/approach

Interviews, focus groups and a questionnaire were used to elicit the knowledge held. This information was combined with a literature review and reported alongside the proposed models derived from an analysis of the information gathered.

Findings

Five distinct groups of end‐users were identified and their respective roles and responsibilities identified. Relevant services to serve these groups were also identified and a services model constructed showing the relationships between them. An aggregation model is proposed to support technical development. A range of business models are suggested, each of which may be applicable in different circumstances.

Research limitations/implications

The models contain a series of recommendations for subsequent research and testing to establish the relative merits of the models proposed and develop these further.

Practical implications

The technical model in particular makes a number of practical recommendations for how repositories need to be structured so as to best support end‐user services. These are complementary to recommendations on repository management.

Originality/value

The research reported in this paper represents a consolidation of views reported previously, and a novel analysis of this information to assist in taking repository service development further.

Details

OCLC Systems & Services: International digital library perspectives, vol. 23 no. 4
Type: Research Article
ISSN: 1065-075X

Keywords

Article
Publication date: 31 August 2005

Harold Boley, Virendrakumar C. Bhavsar, David Hirtle, Anurag Singh, Zhongwei Sun and Lu Yang

We have proposed and implemented AgentMatcher, an architecture for match‐making in e‐Business applications. It uses arc‐labeled and arc‐weighted trees to match buyers and sellers…

Abstract

We have proposed and implemented AgentMatcher, an architecture for match‐making in e‐Business applications. It uses arc‐labeled and arc‐weighted trees to match buyers and sellers via our novel similarity algorithm. This paper adapts the architecture for match‐making between learners and learning objects (LOs). It uses the Canadian Learning Object Metadata (CanLOM) repository of the eduSource e‐Learning project. Through AgentMatcher’s new indexing component, known as Learning Object Metadata Generator (LOMGen), metadata is extracted from HTML LOs for use in CanLOM. LOMGen semi‐automatically generates the LO metadata by combining a word frequency count and dictionary lookup. A subset of these metadata terms can be selected from a query interface, which permits adjustment of weights that express user preferences. Web‐based pre‐filtering is then performed over the CanLOM metadata kept in a relational database. Using an XSLT (Extensible Stylesheet Language Transformations) translator, the pre‐filtered result is transformed into an XML representation, called Weighted Object‐Oriented (WOO) RuleML (Rule Markup Language). This is compared to the WOO RuleML representation obtained from the query interface by AgentMatcher’s core Similarity Engine. The final result is presented as a ranked LO list with a user‐specified threshold.

Details

Interactive Technology and Smart Education, vol. 2 no. 3
Type: Research Article
ISSN: 1741-5659

Keywords

Content available
Article
Publication date: 1 June 2005

139

Abstract

Details

Library Hi Tech News, vol. 22 no. 5
Type: Research Article
ISSN: 0741-9058

Article
Publication date: 1 September 2001

Timothy W. Cole, William H. Mischo, Thomas G. Habing and Robert H. Ferrer

Describes an approach to the processing and presentation of online full‐text journals that utilizes several evolving information technologies, including extensible markup language…

Abstract

Describes an approach to the processing and presentation of online full‐text journals that utilizes several evolving information technologies, including extensible markup language (XML) and extensible stylesheet language transformations (XSLT). Discusses major issues and trade‐offs associated with these technologies, and also specific lessons learned from our use of these technologies in the Illinois Testbed of full‐text journal articles. Focuses especially on issues associated with the representation of documents in XML, techniques to create and normalize metadata describing XML document instances, XSLT features employed in the Illinois Testbed, and trade‐offs of different XSLT implementation options. Pays special attention to techniques for transforming between XML and HTML formats for rendering in today’s commercial Web browsers.

Details

Library Hi Tech, vol. 19 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 21 April 2020

Kushal Ajaybhai Anjaria

The progress of life science and social science research is contingent on effective modes of data storage, data sharing and data reproducibility. In the present digital era, data…

Abstract

Purpose

The progress of life science and social science research is contingent on effective modes of data storage, data sharing and data reproducibility. In the present digital era, data storage and data sharing play a vital role. For productive data-centric tasks, findable, accessible, interoperable and reusable (FAIR) principles have been developed as a standard convention. However, FAIR principles have specific challenges from computational implementation perspectives. The purpose of this paper is to identify the challenges related to computational implementations of FAIR principles. After identification of challenges, this paper aims to solve the identified challenges.

Design/methodology/approach

This paper deploys Petri net-based formal model and Petri net algebra to implement and analyze FAIR principles. The proposed Petri net-based model, theorems and corollaries may assist computer system architects in implementing and analyzing FAIR principles.

Findings

To demonstrate the use of derived petri net-based theorems and corollaries, existing data stewardship platforms – FAIRDOM and Dataverse – have been analyzed in this paper. Moreover, a data stewardship model – “Datalection” has been developed and conversed about in the present paper. Datalection has been designed based on the petri net-based theorems and corollaries.

Originality/value

This paper aims to bridge information science and life science using the formalism of data stewardship principles. This paper not only provides new dimensions to data stewardship but also systematically analyzes two existing data stewardship platforms FAIRDOM and Dataverse.

Details

Data Technologies and Applications, vol. 54 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

1 – 10 of over 2000