Search results

1 – 10 of over 21000

View access options

Article

Publication date: 20 June 2016

Extraction, analysis and publication of bibliographical references within an institutional repository

The academic tradition of adding a reference section with references to cited and otherwise related academic material to an article provides a natural starting point for finding…

HTML

PDF (287 KB)

Downloads

681

Abstract

Purpose

The academic tradition of adding a reference section with references to cited and otherwise related academic material to an article provides a natural starting point for finding links to other publications. These links can then be published as linked data. Natural language processing technologies are available today that can perform the task of bibliographical reference extraction from text. Publishing references by the means of semantic web technologies is a prerequisite for a broader study and analysis of citations and thus can help to improve academic communication in a general sense. The paper aims to discuss these issues.

Design/methodology/approach

This paper examines the overall workflow required to extract, analyze and semantically publish bibliographical references within an Institutional Repository with the help of open source software components.

Findings

A publication infrastructure where references are available for software agents would enable additional benefits like citation analysis, e.g. the collection of citations of a known paper and the investigation of citation sentiment.The publication of reference information as demonstrated in this article is possible with existing semantic web technologies based on established ontologies and open source software components.

Research limitations/implications

Only a limited number of metadata extraction programs have been considered for performance evaluation and reference extraction was tested for journal articles only, whereas Institutional Repositories usually do contain a large number of other material like monographs. Also, citation analysis is in an experimental state and citation sentiment is currently not published at all. For future work, the problem of distributing reference information between repositories is an important problem that needs to be tackled.

Originality/value

Publishing reference information as linked data are new within the academic publishing domain.

Details

Library Hi Tech, vol. 34 no. 2

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

View access options

Article

Publication date: 1 July 2014

Extracting bibliographical data for PDF documents with HMM and external resources

Wen-Feng Hsiao, Te-Min Chang and Erwin Thomas

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable…

HTML

PDF (264 KB)

Downloads

483

Abstract

Purpose

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable document formats (PDFs).

Design/methodology/approach

The authors use PDFBox to extract text and font size information, a rule-based method to identify titles, and an Hidden Markov Model (HMM) to extract the titles and authors. Finally, the extracted titles and authors (possibly incorrect or incomplete) are sent as query strings to digital libraries (e.g. ACM, IEEE, CiteSeerX, SDOS, and Google Scholar) to retrieve the rest of metadata.

Findings

Four experiments are conducted to examine the feasibility of the proposed system. The first experiment compares two different HMM models: multi-state model and one state model (the proposed model). The result shows that one state model can have a comparable performance with multi-state model, but is more suitable to deal with real-world unknown states. The second experiment shows that our proposed model (without the aid of online query) can achieve as good performance as other researcher's model on Cora paper header dataset. In the third experiment the paper examines the performance of our system on a small dataset of 43 real PDF research papers. The result shows that our proposed system (with online query) can perform pretty well on bibliographical data extraction and even outperform the free citation management tool Zotero 3.0. Finally, the paper conducts the fourth experiment with a larger dataset of 103 papers to compare our system with Zotero 4.0. The result shows that our system significantly outperforms Zotero 4.0. The feasibility of the proposed model is thus justified.

Research limitations/implications

For academic implication, the system is unique in two folds: first, the system only uses Cora header set for HMM training, without using other tagged datasets or gazetteers resources, which means the system is light and scalable. Second, the system is workable and can be applied to extracting metadata of real-world PDF files. The extracted bibliographical data can then be imported into citation software such as endnote or refworks to increase researchers’ productivity.

Practical implications

For practical implication, the system can outperform the existing tool, Zotero v4.0. This provides practitioners good chances to develop similar products in real applications; though it might require some knowledge about HMM implementation.

Originality/value

The HMM implementation is not novel. What is innovative is that it actually combines two HMM models. The main model is adapted from Freitag and Mccallum (1999) and the authors add word features of the Nymble HMM (Bikel et al, 1997) to it. The system is workable even without manually tagging the datasets before training the model (the authors just use cora dataset to train and test on real-world PDF papers), as this is significantly different from what other works have done so far. The experimental results have shown sufficient evidence about the feasibility of our proposed method in this aspect.

Details

Program, vol. 48 no. 3

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 28 October 2014

dSPACE implementation of cascaded H-bridge inverter for harmonics minimization using artificial-intelligence

Vasundhara Mahajan, Pramod Agarwal and Hari Om Gupta

The active power filter with two-level inverter needs a high-rating coupling transformer for high-power applications. This complicates the control and system becomes bulky and…

HTML

PDF (1.7 MB)

Downloads

278

Abstract

Purpose

The active power filter with two-level inverter needs a high-rating coupling transformer for high-power applications. This complicates the control and system becomes bulky and expensive. The purpose of this paper is to motivate the use of multilevel inverter as harmonic filter, which eliminates the coupling transformer and allows direct control of the power circuit. The advancement in artificial intelligence (AI) for computation is explored for controller design.

Design/methodology/approach

The proposed scheme has a five-level cascaded H-bridge multilevel inverter (CHBMLI) as a harmonic filter. The control scheme includes one neural network controller and two fuzzy logic-based controllers for harmonic extraction, dc capacitor voltage balancing, and compensating current adjustment, respectively. The topology is modeled in MATLAB/SIMULINK and implemented using dSPACE DS1103 interface for experimentation.

Findings

The exhaustive simulation and experimental results demonstrate the robustness and effectiveness of the proposed topology and controllers for harmonic minimization for RL/RC load and change in load. The comparison between traditional PI controller and proposed AI-based controller is presented. It indicates that the AI-based controller is fast, dynamic, and adaptive to accommodate the changes in load. The total harmonic distortion obtained by applying AI-based controllers are well within the IEEE519 std. limits.

Originality/value

The simulation of high-power, medium-voltage system is presented and a downscaled prototype is designed and developed for implementation. The laboratory module of CHBMLI-based harmonic filter and AI-based controllers modeled in SIMULINK is executed using dSPACE DS1103 interface through real time workshop.

Details

COMPEL: The International Journal for Computation and Mathematics in Electrical and Electronic Engineering, vol. 33 no. 6

Type: Research Article

DOI:

ISSN: 0332-1649

Keywords

View access options

Article

Publication date: 5 July 2021

Ontology alignment evaluation for online assessment of e-learners: a new e-learning management system

Rajakumar B.R., Gokul Yenduri, Sumit Vyas and Binu D.

This paper aims to propose a new assessment system module for handling the comprehensive answers written through the answer interface.

HTML

PDF (1.1 MB)

Downloads

121

Abstract

Purpose

This paper aims to propose a new assessment system module for handling the comprehensive answers written through the answer interface.

Design/methodology/approach

The working principle is under three major phases: Preliminary semantic processing: In the pre-processing work, the keywords are extracted for each answer given by the course instructor. In fact, this answer is actually considered as the key to evaluating the answers written by the e-learners. Keyword and semantic processing of e-learners for hierarchical clustering-based ontology construction: For each answer given by each student, the keywords and the semantic information are extracted and clustered (hierarchical clustering) using a new improved rider optimization algorithm known as Rider with Randomized Overtaker Update (RR-OU). Ontology matching evaluation: Once the ontology structures are completed, a new alignment procedure is used to find out the similarity between two different documents. Moreover, the objects defined in this work focuses on “how exactly the matching process is done for evaluating the document.” Finally, the e-learners are classified based on their grades.

Findings

On observing the outcomes, the proposed model shows less relative mean squared error measure when weights were (0.5, 0, 0.5), and it was 71.78% and 16.92% better than the error values attained for (0, 0.5, 0.5) and (0.5, 0.5, 0). On examining the outcomes, the values of error attained for (1, 0, 0) were found to be lower than the values when weights were (0, 0, 1) and (0, 1, 0). Here, the mean absolute error (MAE) measure for weight (1, 0, 0) was 33.99% and 51.52% better than the MAE value for weights (0, 0, 1) and (0, 1, 0). On analyzing the overall error analysis, the mean absolute percentage error of the implemented RR-OU model was 3.74% and 56.53% better than k-means and collaborative filtering + Onto + sequential pattern mining models, respectively.

Originality/value

This paper adopts the latest optimization algorithm called RR-OU for proposing a new assessment system module for handling the comprehensive answers written through the answer interface. To the best of the authors’ knowledge, this is the first work that uses RR-OU-based optimization for developing a new ontology alignment-based online assessment of e-learners.

Details

Kybernetes, vol. 51 no. 2

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 1 June 2012

Mining consumer dialog in online forums

Carolin Kaiser and Freimut Bodendorf

The paper's aim is to mine and analyze opinion formation on the basis of consumer dialogs in online forums.

HTML

PDF (225 KB)

Downloads

2092

Abstract

Purpose

The paper's aim is to mine and analyze opinion formation on the basis of consumer dialogs in online forums.

Design/methodology/approach

The study identifies opinions, communication relationships, and dialog acts of forum users using different text mining methods. Utilizing this data, social networks can be derived and analyzed to detect influential users and opinion tendencies. The approach is applied to sample online forums discussing the iPhone.

Findings

Combining text mining and social network analysis enables the study of opinion formation and yields encouraging results. Out of the four methods employed for text mining, support vector machines performed best.

Research limitations/implications

The data set applied here is fairly small. More threads on different products will be considered in future work to improve validation.

Practical implications

The approach represents a valuable instrument for online market research. It enables companies to recognize opportunities and risks and to initiate appropriate marketing actions.

Originality/value

This work is one of the first studies that combine communication content, relationships and dialog acts for analyzing opinion formation.

Details

Internet Research, vol. 22 no. 3

Type: Research Article

DOI:

ISSN: 1066-2243

Keywords

View access options

Article

Publication date: 15 June 2020

Accurate profile measurement method for industrial stereo-vision systems

Yang Zhang, Wei Liu, Yongkang Lu, Xikang Cheng, Weiqi Luo, Hongtu Di and Fuji Wang

Profile measurement with boundary information plays a vital role in the detection of quality in the assembly of aviation parts. The purpose of this paper is to improve the…

HTML

PDF (1.6 MB)

Downloads

323

Abstract

Purpose

Profile measurement with boundary information plays a vital role in the detection of quality in the assembly of aviation parts. The purpose of this paper is to improve the evaluation accuracy of the aerodynamic shapes of airplanes, the profiles of large-sized parts need to be measured accurately.

Design/methodology/approach

In this paper, an accurate profile measurement method based on boundary reference points is proposed for the industrial stereo-vision system. Based on the boundary-reference points, the authors established a priori constraint for extracting the boundary of the measured part. Combining with the image features of background and the measured part, an image-edge compensation model is established to extract the boundary of the measured part. The critical point of a laser stripe on the edge of the measured part is extracted corresponding to the boundary constraint. Finally, as per the principle of binocular vision, the profile of the measured part is reconstructed.

Finding

Laboratory experiments validate the measurement accuracy of the proposed method which is 0.33 mm. In the analysis of results between the measured data and the theoretical model, the measuring accuracy of the proposed method was found to be significantly higher than that of the other traditional methods.

Practical implication

An aviation part was measured in the part-assembly shop by the proposed method, which verified the feasibility and effectiveness of this method. The research can realize the measurement of smooth surface boundary which can solve existing profile reconstruction problems for aviation parts.

Originality/value

According to the two-dimensional contour constraint, critical points of the laser strip sequence at the edge of measured part are extracted and the accurate profile reconstruction with the boundary is realized.

Details

Sensor Review, vol. 40 no. 4

Type: Research Article

DOI:

ISSN: 0260-2288

Keywords

View access options

Article

Publication date: 27 September 2011

Automatic extraction of metadata from scientific publications for CRIS systems

Aleksandar Kovačević, Dragan Ivanović, Branko Milosavljević, Zora Konjović and Dušan Surla

The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific…

HTML

PDF (171 KB)

Downloads

1215

Abstract

Purpose

The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific research activity of the University of Novi Sad (CRIS UNS).

Design/methodology/approach

The system is based on machine learning and performs automatic extraction and classification of metadata in eight pre‐defined categories. The extraction task is realised as a classification process. For the purpose of classification each row of text is represented with a vector that comprises different features: formatting, position, characteristics related to the words, etc. Experiments were performed with standard classification models. Both a single classifier with all eight categories and eight individual classifiers were tested. Classifiers were evaluated using the five‐fold cross validation, on a manually annotated corpus comprising 100 scientific papers in PDF format, collected from various conferences, journals and authors' personal web pages.

Findings

Based on the performances obtained on classification experiments, eight separate support vector machines (SVM) models (each of which recognises its corresponding category) were chosen. All eight models were established to have a good performance. The F‐measure was over 85 per cent for almost all of the classifiers and over 90 per cent for most of them.

Research limitations/implications

Automatically extracted metadata cannot be directly entered into CRIS UNS but requires control of the curators.

Practical implications

The proposed system for automatic metadata extraction using support vector machines model was integrated into the software system, CRIS UNS. Metadata extraction has been tested on the publications of researchers from the Department of Mathematics and Informatics of the Faculty of Sciences in Novi Sad. Analysis of extracted metadata from these publications showed that the performance of the system for the previously unseen data is in accordance with that obtained by the cross‐validation from eight separate SVM classifiers. This system will help in the process of synchronising metadata from CRIS UNS with other institutional repositories.

Originality/value

The paper documents a fully automated system for metadata extraction from scientific papers that was developed. The system is based on the SVM classifier and open source tools, and is capable of extracting eight types of metadata from scientific articles of any format that can be converted to PDF. Although developed as part of CRIS UNS, the proposed system can be integrated into other CRIS systems, as well as institutional repositories and library management systems.

Details

Program, vol. 45 no. 4

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 12 July 2013

Detecting and assessment of tsunami building damage using high‐resolution satellite images with GIS data

Chandana P. Dinesh, Abdul U. Bari, Ranjith P.G. Dissanayake and Mazayuki Tamura

The purpose of this paper is to present a method and results of evaluating damaged building extraction using an object recognition task in pre‐ and post‐tsunami event. The…

HTML

PDF (608 KB)

Downloads

259

Abstract

Purpose

The purpose of this paper is to present a method and results of evaluating damaged building extraction using an object recognition task in pre‐ and post‐tsunami event. The advantage of remote sensing and its applications made it possible to extract damaged building images and vulnerability easement of wide urban areas due to natural disasters.

Design/methodology/approach

The proposed approach involves several advanced morphological operators, among which are adaptive transforms with varying size, shape and grey level of the structuring elements. IKONOS‐2 satellite images consisting of pre‐ and post‐2004 Indian Ocean Tsunami site of the Kalmunai area on the East coast of Sri Lanka were used. Morphological operation using structural element are applied for segmented images, then extracted remaining building foot print using random forest classification method. This work extended further the road lines extraction using Hough transform.

Findings

The result was investigated using geographic information system (GIS) data and global positioning system (GPS) ground survey in the field and it appeared to have high accuracy: the confidence measures produced of a completely destroyed structure give 86 percent by object‐based, respectively, after the tsunami in one segment of Maruthamune GN Division.

Research limitations/implications

This study has also identified significant limitations, due to the resolution and clearness of satellite images and vegetation canopy over the building footprint.

Originality/value

The authors develop an automated method to detect damaged buildings and compare the results with GIS‐based ground survey.

Details

International Journal of Disaster Resilience in the Built Environment, vol. 4 no. 2

Type: Research Article

DOI:

ISSN: 1759-5908

Keywords

View access options

Article

Publication date: 20 September 2022

Discovery of topic evolution path and semantic relationship based on patent entity representation

Jinzhu Zhang, Yue Liu, Linqi Jiang and Jialu Shi

This paper aims to propose a method for better discovering topic evolution path and semantic relationship from the perspective of patent entity extraction and semantic…

HTML

PDF (1.2 MB)

Downloads

368

Abstract

Purpose

This paper aims to propose a method for better discovering topic evolution path and semantic relationship from the perspective of patent entity extraction and semantic representation. On the one hand, this paper identifies entities that have the same semantics but different expressions for accurate topic evolution path discovery. On the other hand, this paper reveals semantic relationships of topic evolution for better understanding what leads to topic evolution.

Design/methodology/approach

Firstly, a Bi-LSTM-CRF (bidirectional long short-term memory with conditional random field) model is designed for patent entity extraction and a representation learning method is constructed for patent entity representation. Secondly, a method based on knowledge outflow and inflow is proposed for discovering topic evolution path, by identifying and computing semantic common entities among topics. Finally, multiple semantic relationships among patent entities are pre-designed according to a specific domain, and then the semantic relationship among topics is identified through the proportion of different types of semantic relationships belonging to each topic.

Findings

In the field of UAV (unmanned aerial vehicle), this method identifies semantic common entities which have the same semantics but different expressions. In addition, this method better discovers topic evolution paths by comparison with a traditional method. Finally, this method identifies different semantic relationships among topics, which gives a detailed description for understanding and interpretation of topic evolution. These results prove that the proposed method is effective and useful. Simultaneously, this method is a preliminary study and still needs to be further investigated on other datasets using multiple emerging deep learning methods.

Originality/value

This work provides a new perspective for topic evolution analysis by considering semantic representation of patent entities. The authors design a method for discovering topic evolution paths by considering knowledge flow computed by semantic common entities, which can be easily extended to other patent mining-related tasks. This work is the first attempt to reveal semantic relationships among topics for a precise and detailed description of topic evolution.

Details

Aslib Journal of Information Management, vol. 75 no. 3

Type: Research Article

DOI:

ISSN: 2050-3806

Keywords

View access options

Article

Publication date: 20 November 2009

Direct use of information extraction from scientific text for modeling and simulation in the life sciences

Martin Hofman‐Apitius, Erfan Younesi and Vinod Kasam

The purpose of this paper is to demonstrate how the information extracted from scientific text can be directly used in support of life science research projects. In modern…

HTML

PDF (1.1 MB)

Downloads

773

Abstract

Purpose

The purpose of this paper is to demonstrate how the information extracted from scientific text can be directly used in support of life science research projects. In modern digital‐based research and academic libraries, librarians should be able to support data discovery and organization of digital entities in order to foster research projects effectively; thus the paper aims to speculate that text mining and knowledge discovery tools could be of great assistance to librarians. Such tools simply enable librarians to overcome increasing complexity in the number as well as contents of scientific literature, especially in the emerging interdisciplinary fields of science. This paper seeks to present an example of how evidences extracted from scientific literature can be directly integrated into in silico disease models in support of drug discovery projects.

Design/methodology/approach

The application of text‐mining as well as knowledge discovery tools is explained in the form of a knowledge‐based workflow for drug target candidate identification. Moreover, an in silico experimentation framework is proposed for the enhancement of efficiency and productivity in the early steps of the drug discovery workflow.

Findings

The in silico experimentation workflow has been successfully applied to searching for hit and lead compounds in the World‐wide In Silico Docking On Malaria (WISDOM) project and to finding novel inhibitor candidates.

Practical implications

Direct extraction of biological information from text will ease the task of librarians in managing digital objects and supporting research projects. It is expected that textual data will play an increasingly important role in evidence‐based approaches taken by biomedical and translational researchers.

Originality/value

The proposed approach provides a practical example for the direct integration of text‐ and knowledge‐based data into life science research projects, with the emphasis on their application by academic and research libraries in support of scientific projects.

Details

Library Hi Tech, vol. 27 no. 4

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

Access

Year

Content type

1 – 10 of over 21000