Search results

1 – 10 of over 9000
Article
Publication date: 1 July 2014

Wen-Feng Hsiao, Te-Min Chang and Erwin Thomas

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable…

Abstract

Purpose

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable document formats (PDFs).

Design/methodology/approach

The authors use PDFBox to extract text and font size information, a rule-based method to identify titles, and an Hidden Markov Model (HMM) to extract the titles and authors. Finally, the extracted titles and authors (possibly incorrect or incomplete) are sent as query strings to digital libraries (e.g. ACM, IEEE, CiteSeerX, SDOS, and Google Scholar) to retrieve the rest of metadata.

Findings

Four experiments are conducted to examine the feasibility of the proposed system. The first experiment compares two different HMM models: multi-state model and one state model (the proposed model). The result shows that one state model can have a comparable performance with multi-state model, but is more suitable to deal with real-world unknown states. The second experiment shows that our proposed model (without the aid of online query) can achieve as good performance as other researcher's model on Cora paper header dataset. In the third experiment the paper examines the performance of our system on a small dataset of 43 real PDF research papers. The result shows that our proposed system (with online query) can perform pretty well on bibliographical data extraction and even outperform the free citation management tool Zotero 3.0. Finally, the paper conducts the fourth experiment with a larger dataset of 103 papers to compare our system with Zotero 4.0. The result shows that our system significantly outperforms Zotero 4.0. The feasibility of the proposed model is thus justified.

Research limitations/implications

For academic implication, the system is unique in two folds: first, the system only uses Cora header set for HMM training, without using other tagged datasets or gazetteers resources, which means the system is light and scalable. Second, the system is workable and can be applied to extracting metadata of real-world PDF files. The extracted bibliographical data can then be imported into citation software such as endnote or refworks to increase researchers’ productivity.

Practical implications

For practical implication, the system can outperform the existing tool, Zotero v4.0. This provides practitioners good chances to develop similar products in real applications; though it might require some knowledge about HMM implementation.

Originality/value

The HMM implementation is not novel. What is innovative is that it actually combines two HMM models. The main model is adapted from Freitag and Mccallum (1999) and the authors add word features of the Nymble HMM (Bikel et al, 1997) to it. The system is workable even without manually tagging the datasets before training the model (the authors just use cora dataset to train and test on real-world PDF papers), as this is significantly different from what other works have done so far. The experimental results have shown sufficient evidence about the feasibility of our proposed method in this aspect.

Details

Program, vol. 48 no. 3
Type: Research Article
ISSN: 0033-0337

Keywords

Article
Publication date: 29 March 2024

Sihao Li, Jiali Wang and Zhao Xu

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information…

Abstract

Purpose

The compliance checking of Building Information Modeling (BIM) models is crucial throughout the lifecycle of construction. The increasing amount and complexity of information carried by BIM models have made compliance checking more challenging, and manual methods are prone to errors. Therefore, this study aims to propose an integrative conceptual framework for automated compliance checking of BIM models, allowing for the identification of errors within BIM models.

Design/methodology/approach

This study first analyzed the typical building standards in the field of architecture and fire protection, and then the ontology of these elements is developed. Based on this, a building standard corpus is built, and deep learning models are trained to automatically label the building standard texts. The Neo4j is utilized for knowledge graph construction and storage, and a data extraction method based on the Dynamo is designed to obtain checking data files. After that, a matching algorithm is devised to express the logical rules of knowledge graph triples, resulting in automated compliance checking for BIM models.

Findings

Case validation results showed that this theoretical framework can achieve the automatic construction of domain knowledge graphs and automatic checking of BIM model compliance. Compared with traditional methods, this method has a higher degree of automation and portability.

Originality/value

This study introduces knowledge graphs and natural language processing technology into the field of BIM model checking and completes the automated process of constructing domain knowledge graphs and checking BIM model data. The validation of its functionality and usability through two case studies on a self-developed BIM checking platform.

Details

Engineering, Construction and Architectural Management, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0969-9988

Keywords

Article
Publication date: 6 August 2019

Abir Boujelben and Ikram Amous

One key issue of maintaining Web information systems is to guarantee the consistency of their knowledge base, in particular, the rules governing them. There are currently few…

Abstract

Purpose

One key issue of maintaining Web information systems is to guarantee the consistency of their knowledge base, in particular, the rules governing them. There are currently few methods that can ensure that rule bases management can scale to the amount of knowledge in these systems environment.

Design/methodology/approach

In this paper, the authors propose a method to detect correct dependencies between rules. This work represents a preliminary step for a proposal to eliminate rule base anomalies. The authors previously developed a method that aimed to ameliorate the extraction of rules dependency relationships using a new technique. In this paper, they extend the proposal with other techniques to increase the number of extracted rules dependency relationships. The authors also add some modules to filter and represent them.

Findings

The authors evaluated their own method against other semantic methods. The results show that this work succeeded in extracting better numbers of correct rules dependency relationships. They also noticed that the rule groups deduced from this method’s results are very close to those provided by the rule bases developers.

Originality/value

This work can be applied to knowledge bases that include a fact base and a rule base. In addition, it is independent of the field of application.

Details

International Journal of Web Information Systems, vol. 15 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 June 2015

Quang-Minh Nguyen and Tuan-Dung Cao

The purpose of this paper is to propose an automatic method to generate semantic annotations of football transfer in the news. The current automatic news integration systems on…

Abstract

Purpose

The purpose of this paper is to propose an automatic method to generate semantic annotations of football transfer in the news. The current automatic news integration systems on the Web are constantly faced with the challenge of diversity, heterogeneity of sources. The approaches for information representation and storage based on syntax have some certain limitations in news searching, sorting, organizing and linking it appropriately. The models of semantic representation are promising to be the key to solving these problems.

Design/methodology/approach

The approach of the author leverages Semantic Web technologies to improve the performance of detection of hidden annotations in the news. The paper proposes an automatic method to generate semantic annotations based on named entity recognition and rule-based information extraction. The authors have built a domain ontology and knowledge base integrated with the knowledge and information management (KIM) platform to implement the former task (named entity recognition). The semantic extraction rules are constructed based on defined language models and the developed ontology.

Findings

The proposed method is implemented as a part of the sport news semantic annotations-generating prototype BKAnnotation. This component is a part of the sport integration system based on Web Semantics BKSport. The semantic annotations generated are used for improving features of news searching – sorting – association. The experiments on the news data from SkySport (2014) channel showed positive results. The precisions achieved in both cases, with and without integration of the pronoun recognition method, are both over 80 per cent. In particular, the latter helps increase the recall value to around 10 per cent.

Originality/value

This is one of the initial proposals in automatic creation of semantic data about news, football news in particular and sport news in general. The combination of ontology, knowledge base and patterns of language model allows detection of not only entities with corresponding types but also semantic triples. At the same time, the authors propose a pronoun recognition method using extraction rules to improve the relation recognition process.

Details

International Journal of Pervasive Computing and Communications, vol. 11 no. 2
Type: Research Article
ISSN: 1742-7371

Keywords

Open Access
Article
Publication date: 14 August 2020

Paramita Ray and Amlan Chakrabarti

Social networks have changed the communication patterns significantly. Information available from different social networking sites can be well utilized for the analysis of users…

6418

Abstract

Social networks have changed the communication patterns significantly. Information available from different social networking sites can be well utilized for the analysis of users opinion. Hence, the organizations would benefit through the development of a platform, which can analyze public sentiments in the social media about their products and services to provide a value addition in their business process. Over the last few years, deep learning is very popular in the areas of image classification, speech recognition, etc. However, research on the use of deep learning method in sentiment analysis is limited. It has been observed that in some cases the existing machine learning methods for sentiment analysis fail to extract some implicit aspects and might not be very useful. Therefore, we propose a deep learning approach for aspect extraction from text and analysis of users sentiment corresponding to the aspect. A seven layer deep convolutional neural network (CNN) is used to tag each aspect in the opinionated sentences. We have combined deep learning approach with a set of rule-based approach to improve the performance of aspect extraction method as well as sentiment scoring method. We have also tried to improve the existing rule-based approach of aspect extraction by aspect categorization with a predefined set of aspect categories using clustering method and compared our proposed method with some of the state-of-the-art methods. It has been observed that the overall accuracy of our proposed method is 0.87 while that of the other state-of-the-art methods like modified rule-based method and CNN are 0.75 and 0.80 respectively. The overall accuracy of our proposed method shows an increment of 7–12% from that of the state-of-the-art methods.

Details

Applied Computing and Informatics, vol. 18 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 14 June 2019

Nora Madi, Rawan Al-Matham and Hend Al-Khalifa

The purpose of this paper is to provide an overall review of grammar checking and relation extraction (RE) literature, their techniques and the open challenges associated with…

Abstract

Purpose

The purpose of this paper is to provide an overall review of grammar checking and relation extraction (RE) literature, their techniques and the open challenges associated with them; and, finally, suggest future directions.

Design/methodology/approach

The review on grammar checking and RE was carried out using the following protocol: we prepared research questions, planed for searching strategy, addressed paper selection criteria to distinguish relevant works, extracted data from these works, and finally, analyzed and synthesized the data.

Findings

The output of error detection models could be used for creating a profile of a certain writer. Such profiles can be used for author identification, native language identification or even the level of education, to name a few. The automatic extraction of relations could be used to build or complete electronic lexical thesauri and knowledge bases.

Originality/value

Grammar checking is the process of detecting and sometimes correcting erroneous words in the text, while RE is the process of detecting and categorizing predefined relationships between entities or words that were identified in the text. The authors found that the most obvious challenge is the lack of data sets, especially for low-resource languages. Also, the lack of unified evaluation methods hinders the ability to compare results.

Details

Data Technologies and Applications, vol. 53 no. 3
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 23 August 2013

Auhood Alfaries, David Bell and Mark Lycett

The purpose of the research is to speed up the process of semantic web services by transformation of current Web services into semantic web services. This can be achieved by…

Abstract

Purpose

The purpose of the research is to speed up the process of semantic web services by transformation of current Web services into semantic web services. This can be achieved by applying ontology learning techniques to automatically extract domain ontologies.

Design/methodology/approach

The work here presents a Service Ontology Learning Framework (SOLF), the core aspect of which extracts Structured Interpretation Patterns (SIP). These patterns are used to automate the acquisition (from production domain specific Web Services) of ontological concepts and the relations between those concepts.

Findings

A Semantic Web of accessible and re‐usable software services is able to support the increasingly dynamic and time‐limited development process. This is premised on the efficient and effective creation of supporting domain ontology.

Research limitations/implications

Though WSDL documents provide important application level service description, they alone are not sufficient for OL however, as: they typically provide technical descriptions only; and in many cases, Web services use XSD files to provide data type definitions. The need to include (and combine) other Web service resources in the OL process is therefore an important one.

Practical implications

Web service domain ontologies are the general means by which semantics are added to Web services; typically used as a common domain model and referenced by annotated or externally described Web artefacts (e.g. Web services). The development and deployment of Semantic Web services by enterprises and the wider business community has the potential to radically improve planned and ad‐hoc service re‐use. The reality is slower however, in good part because the development of an appropriate ontology is an expensive, error prone and labor intensive task. The proposed SOLF framework is aimed to overcome this problem by contributing a framework and a tool that can be used to build web service domain ontologies automatically.

Originality/value

The output of the SOLF process is an automatically generated OWL domain ontology, a basis from which a future Semantic Web Services can be delivered using existing Web services. It can be seen that the ontology created moves beyond basic taxonomy – extracting and relating concepts at a number of levels. More importantly, the approach provides integrated knowledge (represented by the individual WSDL documents) from a number of domain experts across a group of banks.

Article
Publication date: 20 November 2017

Jia-Yen Huang

The prediction of pre-election polls is an issue of concern for both politicians and voters. The Taiwan nine-in-one election held in 2014 ended with jaw-dropping results;…

Abstract

Purpose

The prediction of pre-election polls is an issue of concern for both politicians and voters. The Taiwan nine-in-one election held in 2014 ended with jaw-dropping results; apparently, traditional polls did not work well. As a remedy to this problem, the purpose of this paper is to utilize the comments posted on social media to analyze civilians’ views on the two candidates for mayor of Taichung City, Chih-chiang Hu, and Chia-Lung Lin.

Design/methodology/approach

After conducting word segmentation and part-of-speech tagging for the collected reviews, this study constructs the opinion phrase extraction rules for identifying the opinion words associated with the attribute words. Next, this study classifies the attribute words into six municipal governance-related topics and calculates the opinion scores for each candidate. Finally, this study uses correspondence analysis to transform opinion information on the candidates into a graphical display to facilitate the interpretation of voters’ views.

Findings

The results show that the topics of candidates’ backgrounds and transport infrastructure were the two most critical factors for the election prediction. Based on the predication, Lin outscores Hu by 17.74 percent which is close to the real election results.

Research limitations/implications

This study proposes new rules for the extraction of Chinese opinion words associated with attribute words.

Practical implications

This study applies Chinese semantic analysis to assist in predicting election results and investigating the topics of concern to voters.

Originality/value

The proposed opinion phrase extraction rules for Chinese social media, as well as the election forecast process, can provide valuable references for political parties and candidates to plan better nomination and election strategies.

Details

Aslib Journal of Information Management, vol. 69 no. 6
Type: Research Article
ISSN: 2050-3806

Keywords

Article
Publication date: 7 August 2009

Arash Joorabchi and Abdulhussain E. Mahdi

With the significant growth in electronic education materials such as syllabus documents and lecture notes, available on the internet and intranets, there is a need for robust…

1547

Abstract

Purpose

With the significant growth in electronic education materials such as syllabus documents and lecture notes, available on the internet and intranets, there is a need for robust central repositories of such materials to allow both educators and learners to conveniently share, search and access them. The purpose of this paper is to report on the work to develop a national repository for course syllabi in Ireland.

Design/methodology/approach

The paper describes a prototype syllabus repository system for higher education in Ireland, which has been developed by utilising a number of information extraction and document classification techniques, including a new fully unsupervised document classification method that uses a web search engine for automatic collection of training set for the classification algorithm.

Findings

Preliminary experimental results for evaluating the performance of the system and its various units, particularly the information extractor and the classifier, are presented and discussed.

Originality/value

In this paper, three major obstacles associated with creating a large‐scale syllabus repository are identified, and a comprehensive review of published research work related to addressing these problems is provided. Two different types of syllabus documents are identified and describe a rule‐based information extraction system capable of extracting structured information from unstructured syllabus documents is described. Finally, the importance of classifying resources in a syllabus digital library is highlighted, a number of standard education classification schemes are introduced, and the unsupervised automated document classification system, which classifies syllabus documents based on an extended version of the International Standard Classification of Education, is described.

Details

The Electronic Library, vol. 27 no. 4
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 1 August 2016

Peiman Alipour Sarvari, Alp Ustundag and Hidayet Takci

The purpose of this paper is to determine the best approach to customer segmentation and to extrapolate associated rules for this based on recency, frequency and monetary (RFM…

3795

Abstract

Purpose

The purpose of this paper is to determine the best approach to customer segmentation and to extrapolate associated rules for this based on recency, frequency and monetary (RFM) considerations as well as demographic factors. In this study, the impacts of RFM and demographic attributes have been challenged in order to enrich factors that lend comprehension to customer segmentation. Different types of scenario were designed, performed and evaluated meticulously under uniform test conditions. The data for this study were extracted from the database of a global pizza restaurant chain in Turkey. This paper summarizes the findings of the study and also provides evidence of its empirical implications to improve the performance of customer segmentation as well as achieving extracted rule perfection via effective model factors and variations. Accordingly, marketing and service processes will work more effectively and efficiently for customers and society. The implication of this study is that it explains a clear concept for interaction between producers and consumers.

Design/methodology/approach

Customer relationship management, which aims to manage record and evaluate customer interactions, is generally regarded as a vital tool for companies that wish to be successful in the rapidly changing global market. The prediction of customer behaviors is a strategically important and difficult issue because of the high variance and wide range of customer orders and preferences. So to have an effective tool for extracting rules based on customer purchasing behavior, considering tangible and intangible criteria is highly important. To overcome the challenges imposed by the multifaceted nature of this problem, the authors utilized artificial intelligence methods, including k-means clustering, Apriori association rule mining (ARM) and neural networks. The main idea was that customer clusters are better enhanced when segmentation processes are based on RFM analysis accompanied by demographic data. Weighted RFM (WRFM) and unweighted RFM values/scores were applied with and without demographic factors and utilized to compose different types and numbers of clusters. The Apriori algorithm was used to extract rules of association. The performance analyses of scenarios have been conducted based on these extracted rules. The number of rules, elapsed time and prediction accuracy were used to evaluate the different scenarios. The results of evaluations were compared with the outputs of another available technique.

Findings

The results showed that having an appropriate segmentation approach is vital if there are to be strong association rules. Also, it has been determined from the results that the weights of RFM attributes affect rule association performance positively. Moreover, to capture more accurate customer segments, a combination of RFM and demographic attributes is recommended for clustering. The results’ analyses indicate the undeniable importance of demographic data merged with WRFM. Above all, this challenge introduced the best possible sequence of factors for an analysis of clustering and ARM based on RFM and demographic data.

Originality/value

The work compared k-means and Kohonen clustering methods in its segmentation phase to prove the superiority of adopted segmentation techniques. In addition, this study indicated that customer segments containing WRFM scores and demographic data in the same clusters brought about stronger and more accurate association rules for the understanding of customer behavior. These so-called achievements were compared with the results of classical approaches in order to support the credibility of the proposed methodology. Based on previous works, classical methods for customer segmentation have overlooked any combination of demographic data with WRFM during clustering before proceeding to their rule extraction stages.

1 – 10 of over 9000