Search results

1 – 10 of 965
Article
Publication date: 25 March 2020

Jihong Liang, Hao Wang and Xiaojing Li

The purpose of this paper is to explore the task design and assignment of full-text generation on mass Chinese historical archives (CHAs) by crowdsourcing, with special attention…

Abstract

Purpose

The purpose of this paper is to explore the task design and assignment of full-text generation on mass Chinese historical archives (CHAs) by crowdsourcing, with special attention paid to how to best divide full-text generation tasks into smaller ones assigned to crowdsourced volunteers and to improve the digitization of mass CHAs and the data-oriented processing of the digital humanities.

Design/methodology/approach

This paper starts from the complexities of character recognition of mass CHAs, takes Sheng Xuanhuai archives crowdsourcing project of Shanghai Library as a case study, and makes use of the theories of archival science, including diplomatics of Chinese archival documents, and the historical approach of Chinese archival traditions as the theoretical basis and analysis methods. The results are generated through the comprehensive research.

Findings

This paper points out that volunteer tasks of full-text generation include transcription, punctuation, proofreading, metadata description, segmentation, and attribute annotation in digital humanities and provides a metadata element set for volunteers to use in creating or revising metadata descriptions and also provides an attribute tag set. The two sets can be used across the humanities to construct overall observations about texts and the archives of which they are a part. Along these lines, this paper presents significant insights for application in outlining the principles, methods, activities, and procedures of crowdsourced full-text generation for mass CHAs.

Originality/value

This study is the first to explore and identify the effective design and allocation of tasks for crowdsourced volunteers completing full-text generation on CHAs in digital humanities.

Details

Aslib Journal of Information Management, vol. 72 no. 2
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 30 August 2013

Vanessa El‐Khoury, Martin Jergler, Getnet Abebe Bayou, David Coquil and Harald Kosch

A fine‐grained video content indexing, retrieval, and adaptation requires accurate metadata describing the video structure and semantics to the lowest granularity, i.e. to the…

Abstract

Purpose

A fine‐grained video content indexing, retrieval, and adaptation requires accurate metadata describing the video structure and semantics to the lowest granularity, i.e. to the object level. The authors address these requirements by proposing semantic video content annotation tool (SVCAT) for structural and high‐level semantic video annotation. SVCAT is a semi‐automatic MPEG‐7 standard compliant annotation tool, which produces metadata according to a new object‐based video content model introduced in this work. Videos are temporally segmented into shots and shots level concepts are detected automatically using ImageNet as background knowledge. These concepts are used as a guide to easily locate and select objects of interest which are then tracked automatically to generate an object level metadata. The integration of shot based concept detection with object localization and tracking drastically alleviates the task of an annotator. The paper aims to discuss these issues.

Design/methodology/approach

A systematic keyframes classification into ImageNet categories is used as the basis for automatic concept detection in temporal units. This is then followed by an object tracking algorithm to get exact spatial information about objects.

Findings

Experimental results showed that SVCAT is able to provide accurate object level video metadata.

Originality/value

The new contribution in this paper introduces an approach of using ImageNet to get shot level annotations automatically. This approach assists video annotators significantly by minimizing the effort required to locate salient objects in the video.

Details

International Journal of Pervasive Computing and Communications, vol. 9 no. 3
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 30 March 2012

José L. Navarro‐Galindo and José Samos

Nowadays, the use of WCMS (web content management systems) is widespread. The conversion of this infrastructure into its semantic equivalent (semantic WCMS) is a critical issue…

Abstract

Purpose

Nowadays, the use of WCMS (web content management systems) is widespread. The conversion of this infrastructure into its semantic equivalent (semantic WCMS) is a critical issue, as this enables the benefits of the semantic web to be extended. The purpose of this paper is to present a FLERSA (Flexible Range Semantic Annotation) for flexible range semantic annotation.

Design/methodology/approach

A FLERSA is presented as a user‐centred annotation tool for Web content expressed in natural language. The tool has been built in order to illustrate how a WCMS called Joomla! can be converted into its semantic equivalent.

Findings

The development of the tool shows that it is possible to build a semantic WCMS through a combination of semantic components and other resources such as ontologies and emergence technologies, including XML, RDF, RDFa and OWL.

Practical implications

The paper provides a starting‐point for further research in which the principles and techniques of the FLERSA tool can be applied to any WCMS.

Originality/value

The tool allows both manual and automatic semantic annotations, as well as providing enhanced search capabilities. For manual annotation, a new flexible range markup technique is used, based on the RDFa standard, to support the evolution of annotated Web documents more effectively than XPointer. For automatic annotation, a hybrid approach based on machine learning techniques (Vector‐Space Model + n‐grams) is used to determine the concepts that the content of a Web document deals with (from an ontology which provides a taxonomy), based on previous annotations that are used as a training corpus.

Article
Publication date: 9 September 2014

Rüdiger Rolf, Hannah Reuter, Martin Abel and Kai-Christoph Hamborg

– Improving the use of annotations in lecture recordings.

Abstract

Purpose

Improving the use of annotations in lecture recordings.

Design/methodology/approach

Requirements analysis with scenario based design (SBD) on focus groups.

Findings

These seven points have been extracted from the feedback of the focus groups: (1) Control of the annotation feature (turn on/turn off). (2) An option to decide who is able to see their comments (groups, lecturer, friends). (3) An easy and paper-like experience in creating a comment. (4) An option to discuss comments. (5) An option to import already existing comments. (6) Color-coding of the different types of comments. (7) An option to print their annotations within the context of the recording.

Research limitations/implications

The study was performed to improve the open-source lecture recording system Opencast Matterhorn.

Originality/value

Annotations can help to enable the students that use lecture recordings to move from a passive watching to an active viewing and reflecting.

Details

Interactive Technology and Smart Education, vol. 11 no. 3
Type: Research Article
ISSN: 1741-5659

Keywords

Article
Publication date: 1 August 2002

François Bry and Michael Kraus

While the World Wide Web (WWW or Web) is steadily expanding, electronic books (e‐books) remain a niche market. In this article, it is first postulated that specialized contents…

1148

Abstract

While the World Wide Web (WWW or Web) is steadily expanding, electronic books (e‐books) remain a niche market. In this article, it is first postulated that specialized contents and device independence can make Web‐based e‐books compete with paper prints; and that adaptive features that can be implemented by client‐side computing are relevant for e‐books, while more complex forms of adaptation requiring server‐side computations are not. Then, enhancements of the WWW standards (specifically of XML, XHTML, of the style‐sheet languages CSS and XSL, and of the linking language XLink) are proposed for a better support of client‐side adaptation and device independent content modeling. Finally, advanced browsing functionalities desirable for e‐books as well as their implementation in the WWW context are described.

Details

The Electronic Library, vol. 20 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Abstract

Details

Management for Scientists
Type: Book
ISBN: 978-1-78769-203-9

Article
Publication date: 7 August 2017

Sathyavikasini Kalimuthu and Vijaya Vijayakumar

Diagnosing genetic neuromuscular disorder such as muscular dystrophy is complicated when the imperfection occurs while splicing. This paper aims in predicting the type of muscular…

Abstract

Purpose

Diagnosing genetic neuromuscular disorder such as muscular dystrophy is complicated when the imperfection occurs while splicing. This paper aims in predicting the type of muscular dystrophy from the gene sequences by extracting the well-defined descriptors related to splicing mutations. An automatic model is built to classify the disease through pattern recognition techniques coded in python using scikit-learn framework.

Design/methodology/approach

In this paper, the cloned gene sequences are synthesized based on the mutation position and its location on the chromosome by using the positional cloning approach. For instance, in the human gene mutational database (HGMD), the mutational information for splicing mutation is specified as IVS1-5 T > G indicates (IVS - intervening sequence or introns), first intron and five nucleotides before the consensus intron site AG, where the variant occurs in nucleotide G altered to T. IVS (+ve) denotes forward strand 3′– positive numbers from G of donor site invariant and IVS (−ve) denotes backward strand 5′ – negative numbers starting from G of acceptor site. The key idea in this paper is to spot out discriminative descriptors from diseased gene sequences based on splicing variants and to provide an effective machine learning solution for predicting the type of muscular dystrophy disease with the splicing mutations. Multi-class classification is worked out through data modeling of gene sequences. The synthetic mutational gene sequences are created, as the diseased gene sequences are not readily obtainable for this intricate disease. Positional cloning approach supports in generating disease gene sequences based on mutational information acquired from HGMD. SNP-, gene- and exon-based discriminative features are identified and used to train the model. An eminent muscular dystrophy disease prediction model is built using supervised learning techniques in scikit-learn environment. The data frame is built with the extracted features as numpy array. The data are normalized by transforming the feature values into the range between 0 and 1 aid in scaling the input attributes for a model. Naïve Bayes, decision tree, K-nearest neighbor and SVM learned models are developed using python library framework in scikit-learn.

Findings

To the best knowledge of authors, this is the foremost pattern recognition model, to classify muscular dystrophy disease pertaining to splicing mutations. Certain essential SNP-, gene- and exon-based descriptors related to splicing mutations are proposed and extracted from the cloned gene sequences. An eminent model is built using statistical learning technique through scikit-learn in the anaconda framework. This paper also deliberates the results of statistical learning carried out with the same set of gene sequences with synonymous and non-synonymous mutational descriptors.

Research limitations/implications

The data frame is built with the Numpy array. Normalizing the data by transforming the feature values into the range between 0 and 1 aid in scaling the input attributes for a model. Naïve Bayes, decision tree, K-nearest neighbor and SVM learned models are developed using python library framework in scikit-learn. While learning the SVM model, the cost, gamma and kernel parameters are tuned to attain good results. Scoring parameters of the classifiers are evaluated using tenfold cross-validation using metric functions of scikit-learn library. Results of the disease identification model based on non-synonymous, synonymous and splicing mutations were analyzed.

Practical implications

Certain essential SNP-, gene- and exon-based descriptors related to splicing mutations are proposed and extracted from the cloned gene sequences. An eminent model is built using statistical learning technique through scikit-learn in the anaconda framework. The performance of the classifiers are increased by using different estimators from the scikit-learn library. Several types of mutations such as missense, non-sense and silent mutations are also considered to build models through statistical learning technique and their results are analyzed.

Originality/value

To the best knowledge of authors, this is the foremost pattern recognition model, to classify muscular dystrophy disease pertaining to splicing mutations.

Details

World Journal of Engineering, vol. 14 no. 4
Type: Research Article
ISSN: 1708-5284

Keywords

Content available
Article
Publication date: 1 August 2002

41

Abstract

Details

Kybernetes, vol. 31 no. 6
Type: Research Article
ISSN: 0368-492X

Article
Publication date: 10 December 2018

Bruno C.N. Oliveira, Alexis Huf, Ivan Luiz Salvadori and Frank Siqueira

This paper describes a software architecture that automatically adds semantic capabilities to data services. The proposed architecture, called OntoGenesis, is able to semantically…

Abstract

Purpose

This paper describes a software architecture that automatically adds semantic capabilities to data services. The proposed architecture, called OntoGenesis, is able to semantically enrich data services, so that they can dynamically provide both semantic descriptions and data representations.

Design/methodology/approach

The enrichment approach is designed to intercept the requests from data services. Therefore, a domain ontology is constructed and evolved in accordance with the syntactic representations provided by such services in order to define the data concepts. In addition, a property matching mechanism is proposed to exploit the potential data intersection observed in data service representations and external data sources so as to enhance the domain ontology with new equivalences triples. Finally, the enrichment approach is capable of deriving on demand a semantic description and data representations that link to the domain ontology concepts.

Findings

Experiments were performed using real-world datasets, such as DBpedia, GeoNames as well as open government data. The obtained results show the applicability of the proposed architecture and that it can boost the development of semantic data services. Moreover, the matching approach achieved better performance when compared with other existing approaches found in the literature.

Research limitations/implications

This work only considers services designed as data providers, i.e., services that provide an interface for accessing data sources. In addition, our approach assumes that both data services and external sources – used to enhance the domain ontology – have some potential of data intersection. Such assumption only requires that services and external sources share particular property values.

Originality/value

Unlike most of the approaches found in the literature, the architecture proposed in this paper is meant to semantically enrich data services in such way that human intervention is minimal. Furthermore, an automata-based index is also presented as a novel method that significantly improves the performance of the property matching mechanism.

Details

International Journal of Web Information Systems, vol. 15 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 June 2017

Mohammed Ourabah Soualah, Yassine Ait Ali Yahia, Abdelkader Keita and Abderrezak Guessoum

The purpose of this paper is to obtain online access to the digitised Arabic manuscripts images, which need to use a catalogue. The bibliographic cataloguing is unsuitable for old…

1094

Abstract

Purpose

The purpose of this paper is to obtain online access to the digitised Arabic manuscripts images, which need to use a catalogue. The bibliographic cataloguing is unsuitable for old Arabic manuscripts, and it is imperative to establish a new cataloguing model. In the research, the authors propose a new cataloguing model based on manuscript annotations and transcriptions. This model can be an effective solution to dynamic catalogue old Arabic manuscripts. In this field, the authors used the automatic extraction of the metadata that is based on the structural similarity of the documents.

Design/methodology/approach

This work is based on experimental methodology. The whole proposed concepts and formulas were tested for validation. This, allows the authors to make concise conclusions.

Findings

Cataloguing old Arabic manuscripts faces problem of unavailability of information. However, this information may be found in another place in a copy of the original manuscript. Thus, cataloguing Arabic manuscript cannot be done in one time, it is a continual process which require information updating. The idea is to make a pre-cataloguing of a manuscript, then try to complete and improve it through a specific platform. Consequently, in the research work, the authors propose a new cataloguing model, which the authors call “Dynamic cataloguing”.

Research limitations/implications

The success of the proposed model is confronted with the involvement of all actors of the model. It is based on the conviction and the motivation of actors of the collaborative platform.

Practical implications

The model can be used in several cataloguing fields, where the encoding model is based on XML. The model is innovative and implements a smart cataloguing model. The model is useful by using a web platform. It allows an automatic update of a catalogue.

Social implications

The model prompts the user to participate and enrich the catalogue. The user could improve his social status from a passive to an active.

Originality/value

The dynamic cataloguing model is a new concept. It has never been proposed in the literature until now. The proposed cataloguing model is based on automatic extraction of metadata from user annotations/transcription. It is a smart system which automatically updates or fills the catalogue with the extracted metadata.

1 – 10 of 965