Search results
1 – 10 of over 2000Michael John Khoo, Jae-wook Ahn, Ceri Binding, Hilary Jane Jones, Xia Lin, Diana Massam and Douglas Tudhope
– The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.
Abstract
Purpose
The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query.
Design/methodology/approach
The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records.
Findings
The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies.
Research limitations/implications
The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity.
Practical implications
The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing.
Social implications
The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries.
Originality/value
The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger.
Details
Keywords
Alejandra Segura, Christian Vidal‐Castro, Víctor Menéndez‐Domínguez, Pedro G. Campos and Manuel Prieto
This paper aims to show the results obtained from the data mining techniques application to learning objects (LO) metadata.
Abstract
Purpose
This paper aims to show the results obtained from the data mining techniques application to learning objects (LO) metadata.
Design/methodology/approach
A general review of the literature was carried out. The authors gathered and pre‐processed the data, and then analyzed the results of data mining techniques applied upon the LO metadata.
Findings
It is possible to extract new knowledge based on learning objects stored in repositories. For example it is possible to identify distinctive features and group learning objects according to them. Semantic relationships can also be found among the attributes that describe learning objects.
Research limitations/implications
In the first section, four test repositories are included for case study. In the second section, the analysis is focused on the most complete repository from the pedagogical point of view.
Originality/value
Many publications report results of analysis on repositories mainly focused on the number, evolution and growth of the learning objects. But, there is a shortage of research using data mining techniques oriented to extract new semantic knowledge based on learning objects metadata.
Details
Keywords
Aleksandar Kovačević, Dragan Ivanović, Branko Milosavljević, Zora Konjović and Dušan Surla
The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific…
Abstract
Purpose
The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific research activity of the University of Novi Sad (CRIS UNS).
Design/methodology/approach
The system is based on machine learning and performs automatic extraction and classification of metadata in eight pre‐defined categories. The extraction task is realised as a classification process. For the purpose of classification each row of text is represented with a vector that comprises different features: formatting, position, characteristics related to the words, etc. Experiments were performed with standard classification models. Both a single classifier with all eight categories and eight individual classifiers were tested. Classifiers were evaluated using the five‐fold cross validation, on a manually annotated corpus comprising 100 scientific papers in PDF format, collected from various conferences, journals and authors' personal web pages.
Findings
Based on the performances obtained on classification experiments, eight separate support vector machines (SVM) models (each of which recognises its corresponding category) were chosen. All eight models were established to have a good performance. The F‐measure was over 85 per cent for almost all of the classifiers and over 90 per cent for most of them.
Research limitations/implications
Automatically extracted metadata cannot be directly entered into CRIS UNS but requires control of the curators.
Practical implications
The proposed system for automatic metadata extraction using support vector machines model was integrated into the software system, CRIS UNS. Metadata extraction has been tested on the publications of researchers from the Department of Mathematics and Informatics of the Faculty of Sciences in Novi Sad. Analysis of extracted metadata from these publications showed that the performance of the system for the previously unseen data is in accordance with that obtained by the cross‐validation from eight separate SVM classifiers. This system will help in the process of synchronising metadata from CRIS UNS with other institutional repositories.
Originality/value
The paper documents a fully automated system for metadata extraction from scientific papers that was developed. The system is based on the SVM classifier and open source tools, and is capable of extracting eight types of metadata from scientific articles of any format that can be converted to PDF. Although developed as part of CRIS UNS, the proposed system can be integrated into other CRIS systems, as well as institutional repositories and library management systems.
Details
Keywords
The purpose of this paper is to describe a workflow of automated batch‐loading metadata from existing text to a database.
Abstract
Purpose
The purpose of this paper is to describe a workflow of automated batch‐loading metadata from existing text to a database.
Design/methodology/approach
It introduces a case for the experience of metadata creation at Rutgers University Libraries in a collaborative digital project with the Hoboken Public Library in New Jersey.
Findings
It is found that a well‐designed workflow is crucial to the success of metadata batch loading. It is also found that the metadata manager needs to collaborate with people of different roles and work carefully with data reorganization and transfer.
Practical implications
Metadata creation and management are an integrated component of any digital project. One's experience in metadata batch loading has practical significance that may be incorporated into the practice of other metadata projects. The workflow introduced in this paper will provide a valuable example for librarians and information professionals to consider or redesign their own digital efforts.
Originality/value
Based on a real exercise, this workflow has been proven to be unique and useful. It was, after the writing of this paper, applied to a new collaborative digital project and once again fulfilled the requirements for another batch‐transferring process.
Details
Keywords
Thomas Zschocke and Jan Beniest
The paper seeks to introduce a process for assuring the creation of quality educational metadata based on the ISO/IEC 19796‐1 standard to describe the agricultural learning…
Abstract
Purpose
The paper seeks to introduce a process for assuring the creation of quality educational metadata based on the ISO/IEC 19796‐1 standard to describe the agricultural learning resources in the repository of the Consultative Group on International Agricultural Research (CGIAR).
Design/methodology/approach
The paper describes the general notion of quality in education and in the creation of educational metadata. It introduces a quality framework based on the ISO/IEC 19796‐1 standard on quality management and quality assurance for learning, education and training. This standard consists of a reference framework for the description of quality approaches (RFDQ) to describe, compare, and analyze quality management and quality assurance approaches, which has been adapted to the creation of educational metadata in the context of the learning object repository of the CGIAR.
Findings
In order to achieve consistency in the description of learning resources in a repository through quality educational metadata, a standardized process for metadata creators is essential. The reference framework of the ISO/IEC 19796‐1 standard provides a flexible approach that allows the optimization of the metadata creation process while assuring quality of the descriptive information.
Practical implications
The paper proposes a standardized process for the creation of learning object metadata based on the ISO/IEC 19796‐1 standard, and makes suggestions on how to use the reference framework when adapting a quality model for educational metadata.
Originality/value
ISO/IEC 19796‐1 is a very recent standard with a flexible reference framework to develop a quality model in education and training. It provides a novel approach for organizations maintaining learning repositories that are interested in standardizing the educational metadata creation process, especially when multiple stakeholders are involved.
Details
Keywords
Chris Awre and Alma Swan
The purpose of the linking repositories study was to conduct research to identify appropriate sustainable technical and organisational models to support the development of…
Abstract
Purpose
The purpose of the linking repositories study was to conduct research to identify appropriate sustainable technical and organisational models to support the development of end‐user oriented services across repositories. The work covered four overlapping strands: user and community requirements, roles and responsibilities, technical architecture and infrastructure, and business and management models.
Design/methodology/approach
Interviews, focus groups and a questionnaire were used to elicit the knowledge held. This information was combined with a literature review and reported alongside the proposed models derived from an analysis of the information gathered.
Findings
Five distinct groups of end‐users were identified and their respective roles and responsibilities identified. Relevant services to serve these groups were also identified and a services model constructed showing the relationships between them. An aggregation model is proposed to support technical development. A range of business models are suggested, each of which may be applicable in different circumstances.
Research limitations/implications
The models contain a series of recommendations for subsequent research and testing to establish the relative merits of the models proposed and develop these further.
Practical implications
The technical model in particular makes a number of practical recommendations for how repositories need to be structured so as to best support end‐user services. These are complementary to recommendations on repository management.
Originality/value
The research reported in this paper represents a consolidation of views reported previously, and a novel analysis of this information to assist in taking repository service development further.
Details
Keywords
Harold Boley, Virendrakumar C. Bhavsar, David Hirtle, Anurag Singh, Zhongwei Sun and Lu Yang
We have proposed and implemented AgentMatcher, an architecture for match‐making in e‐Business applications. It uses arc‐labeled and arc‐weighted trees to match buyers and sellers…
Abstract
We have proposed and implemented AgentMatcher, an architecture for match‐making in e‐Business applications. It uses arc‐labeled and arc‐weighted trees to match buyers and sellers via our novel similarity algorithm. This paper adapts the architecture for match‐making between learners and learning objects (LOs). It uses the Canadian Learning Object Metadata (CanLOM) repository of the eduSource e‐Learning project. Through AgentMatcher’s new indexing component, known as Learning Object Metadata Generator (LOMGen), metadata is extracted from HTML LOs for use in CanLOM. LOMGen semi‐automatically generates the LO metadata by combining a word frequency count and dictionary lookup. A subset of these metadata terms can be selected from a query interface, which permits adjustment of weights that express user preferences. Web‐based pre‐filtering is then performed over the CanLOM metadata kept in a relational database. Using an XSLT (Extensible Stylesheet Language Transformations) translator, the pre‐filtered result is transformed into an XML representation, called Weighted Object‐Oriented (WOO) RuleML (Rule Markup Language). This is compared to the WOO RuleML representation obtained from the query interface by AgentMatcher’s core Similarity Engine. The final result is presented as a ranked LO list with a user‐specified threshold.
Details
Keywords
Timothy W. Cole, William H. Mischo, Thomas G. Habing and Robert H. Ferrer
Describes an approach to the processing and presentation of online full‐text journals that utilizes several evolving information technologies, including extensible markup language…
Abstract
Describes an approach to the processing and presentation of online full‐text journals that utilizes several evolving information technologies, including extensible markup language (XML) and extensible stylesheet language transformations (XSLT). Discusses major issues and trade‐offs associated with these technologies, and also specific lessons learned from our use of these technologies in the Illinois Testbed of full‐text journal articles. Focuses especially on issues associated with the representation of documents in XML, techniques to create and normalize metadata describing XML document instances, XSLT features employed in the Illinois Testbed, and trade‐offs of different XSLT implementation options. Pays special attention to techniques for transforming between XML and HTML formats for rendering in today’s commercial Web browsers.
Details
Keywords
The progress of life science and social science research is contingent on effective modes of data storage, data sharing and data reproducibility. In the present digital era, data…
Abstract
Purpose
The progress of life science and social science research is contingent on effective modes of data storage, data sharing and data reproducibility. In the present digital era, data storage and data sharing play a vital role. For productive data-centric tasks, findable, accessible, interoperable and reusable (FAIR) principles have been developed as a standard convention. However, FAIR principles have specific challenges from computational implementation perspectives. The purpose of this paper is to identify the challenges related to computational implementations of FAIR principles. After identification of challenges, this paper aims to solve the identified challenges.
Design/methodology/approach
This paper deploys Petri net-based formal model and Petri net algebra to implement and analyze FAIR principles. The proposed Petri net-based model, theorems and corollaries may assist computer system architects in implementing and analyzing FAIR principles.
Findings
To demonstrate the use of derived petri net-based theorems and corollaries, existing data stewardship platforms – FAIRDOM and Dataverse – have been analyzed in this paper. Moreover, a data stewardship model – “Datalection” has been developed and conversed about in the present paper. Datalection has been designed based on the petri net-based theorems and corollaries.
Originality/value
This paper aims to bridge information science and life science using the formalism of data stewardship principles. This paper not only provides new dimensions to data stewardship but also systematically analyzes two existing data stewardship platforms FAIRDOM and Dataverse.
Details