Search results

1 – 10 of over 5000

View access options

Article

Publication date: 31 January 2022

ALBERT-BPF: a book purchase forecast model for university library by using ALBERT for text feature extraction

Yejun Wu, Xiaxian Wang, Peilin Yu and YongKai Huang

The purpose of this research is to achieve automatic and accurate book purchase forecasts for the university libraries and improve efficiency of manual book purchase.

HTML

PDF (408 KB)

Downloads

228

Abstract

Purpose

The purpose of this research is to achieve automatic and accurate book purchase forecasts for the university libraries and improve efficiency of manual book purchase.

Design/methodology/approach

The authors presented a Book Purchase Forecast model with A Lite BERT(ALBERT-BPF) to achieve their goals. First, the authors process all the book data to unify format of books' features, such as ISBN, title, authors, brief introduction and so on. Second, they exploit the book order data to label all books supplied by booksellers with “purchased” or “non-purchased”. The labelled data will be used for model training. Last, the authors regard the book purchase task as a text classification problem and present a model named ALBERT-BPF, which applies ALBERT to extract text features of books and BPF classification layer to forecast purchased books, to solve the problem.

Findings

The application of deep learning in book purchase task is effective. The data the authors exploited are the historical book purchase data from their university library. The authors’ experiments on the data show that ALBERT-BPF can seek out the books that need to be purchased with an accuracy of over 82%. And the highest accuracy reached is 88.06%. These indicate that the deep learning model is sufficient to assist the traditional manual book purchase way.

Originality/value

This research applies ALBERT, which is based on the latest Natural Language Processing (NLP) architecture Transformer, to library book purchase task.

Details

Aslib Journal of Information Management, vol. 74 no. 4

Type: Research Article

DOI:

ISSN: 2050-3806

Keywords

View access options

Article

Publication date: 10 January 2020

Measuring book impact via content-level academic review mining

Qingqing Zhou and Chengzhi Zhang

As for academic papers, the customary methods for assessing the impact of books are based on citations, which is straightforward but limited to the coverage of databases…

HTML

PDF (298 KB)

Downloads

298

Abstract

Purpose

As for academic papers, the customary methods for assessing the impact of books are based on citations, which is straightforward but limited to the coverage of databases. Alternative metrics can be used to avoid such limitations, such as blog citations and library holdings. However, content-level information is generally ignored, thus overlooking users’ intentions. Meanwhile, abundant academic reviews express scholars’ opinions on books, which can be used to assess books’ impact via fine-grained review mining. Hence, this study aims to assess books’ use impacts by conducting content mining of academic reviews automatically and thereby confirmed the usefulness of academic reviews to libraries and readers.

Design/methodology/approach

Firstly, 61,933 academic reviews in Choice: Current Reviews for Academic Libraries were collected with three metadata metrics. Then, review contents were mined to obtain content metrics. Finally, to identify the reliability of academic reviews, Choice review metrics and other assessment metrics for use impact were compared and analysed.

Findings

The analysis results reveal that fine-grained mining of academic reviews can help users quickly understand multi-dimensional features of books, judge or predict the impacts of mass books, so as to provide references for different types of users (e.g. libraries and public readers) in book selection.

Originality/value

Book impact assessment via content mining can provide more detail information for massive users and cover shortcomings of traditional methods. It provides a new perspective and method for researches on use impact assessment. Moreover, this study’s proposed method might also be a means by which to measure other publications besides books.

Details

The Electronic Library , vol. 38 no. 1

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

Open Access

Article

Publication date: 14 August 2017

GrandBase: generating actionable knowledge from Big Data

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

HTML

PDF (755 KB)

Downloads

2120

Abstract

Purpose

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

Design/methodology/approach

In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.

Findings

Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.

Originality/value

To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.

Details

PSU Research Review, vol. 1 no. 2

Type: Research Article

DOI:

ISSN: 2399-1747

Keywords

View access options

Article

Publication date: 20 April 2012

DBpedia and the live extraction of structured data from Wikipedia

Mohamed Morsey, Jens Lehmann, Sören Auer, Claus Stadler and Sebastian Hellmann

DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However…

HTML

PDF (405 KB)

Downloads

2412

Abstract

Purpose

DBpedia extracts structured information from Wikipedia, interlinks it with other knowledge bases and freely publishes the results on the web using Linked Data and SPARQL. However, the DBpedia release process is heavyweight and releases are sometimes based on several months old data. DBpedia‐Live solves this problem by providing a live synchronization method based on the update stream of Wikipedia. This paper seeks to address these issues.

Design/methodology/approach

Wikipedia provides DBpedia with a continuous stream of updates, i.e. a stream of articles, which were recently updated. DBpedia‐Live processes that stream on the fly to obtain RDF data and stores the extracted data back to DBpedia. DBpedia‐Live publishes the newly added/deleted triples in files, in order to enable synchronization between the DBpedia endpoint and other DBpedia mirrors.

Findings

During the realization of DBpedia‐Live the authors learned that it is crucial to process Wikipedia updates in a priority queue. Recently‐updated Wikipedia articles should have the highest priority, over mapping‐changes and unmodified pages. An overall finding is that there are plenty of opportunities arising from the emerging Web of Data for librarians.

Practical implications

DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Many companies and researchers use DBpedia and its public services to improve their applications and research approaches. The DBpedia‐Live framework improves DBpedia further by timely synchronizing it with Wikipedia, which is relevant for many use cases requiring up‐to‐date information.

Originality/value

The new DBpedia‐Live framework adds new features to the old DBpedia‐Live framework, e.g. abstract extraction, ontology changes, and changesets publication.

Details

Program, vol. 46 no. 2

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 21 June 2011

Improving self‐organising information maps as navigational tools: a semantic approach

Yi‐ling Lin, Peter Brusilovsky and Daqing He

The goal of the research is to explore whether the use of higher‐level semantic features can help us to build better self‐organising map (SOM) representation as measured from a…

HTML

PDF (401 KB)

Downloads

563

Abstract

Purpose

The goal of the research is to explore whether the use of higher‐level semantic features can help us to build better self‐organising map (SOM) representation as measured from a human‐centred perspective. The authors also explore an automatic evaluation method that utilises human expert knowledge encapsulated in the structure of traditional textbooks to determine map representation quality.

Design/methodology/approach

Two types of document representations involving semantic features have been explored – i.e. using only one individual semantic feature, and mixing a semantic feature with keywords. Experiments were conducted to investigate the impact of semantic representation quality on the map. The experiments were performed on data collections from a single book corpus and a multiple book corpus.

Findings

Combining keywords with certain semantic features achieves significant improvement of representation quality over the keywords‐only approach in a relatively homogeneous single book corpus. Changing the ratios in combining different features also affects the performance. While semantic mixtures can work well in a single book corpus, they lose their advantages over keywords in the multiple book corpus. This raises a concern about whether the semantic representations in the multiple book corpus are homogeneous and coherent enough for applying semantic features. The terminology issue among textbooks affects the ability of the SOM to generate a high quality map for heterogeneous collections.

Originality/value

The authors explored the use of higher‐level document representation features for the development of better quality SOM. In addition the authors have piloted a specific method for evaluating the SOM quality based on the organisation of information content in the map.

Details

Online Information Review, vol. 35 no. 3

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

View access options

Article

Publication date: 2 April 2019

Use of multi-lexicons to analyse semantic features for summarization of touring reviews

Hei Chia Wang, Yu Hung Chiang and Yi Feng Sun

This paper aims to improve a sentiment analysis (SA) system to help users (i.e. customers or hotel managers) understand hotel evaluations. There are three main purposes in this…

HTML

PDF (1.7 MB)

Downloads

248

Abstract

Purpose

This paper aims to improve a sentiment analysis (SA) system to help users (i.e. customers or hotel managers) understand hotel evaluations. There are three main purposes in this paper: designing an unsupervised method for extracting online Chinese features and opinion pairs, distinguishing different intensities of polarity in opinion words and examining the changes in polarity in the time series.

Design/methodology/approach

In this paper, a review analysis system is proposed to automatically capture feature opinions experienced by other tourists presented in the review documents. In the system, a feature-level SA is designed to determine the polarity of these features. Moreover, an unsupervised method using a part-of-speech pattern clarification query and multi-lexicons SA to summarize all Chinese reviews is adopted.

Findings

The authors expect this method to help travellers search for what they want and make decisions more efficiently. The experimental results show the F-measure of the proposed method to be 0.628. It thus outperforms the methods used in previous studies.

Originality/value

The study is useful for travellers who want to quickly retrieve and summarize helpful information from the pool of messy hotel reviews. Meanwhile, the system will assist hotel managers to comprehensively understand service qualities with which guests are satisfied or dissatisfied.

Details

The Electronic Library , vol. 37 no. 1

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 1 July 2014

Extracting bibliographical data for PDF documents with HMM and external resources

Wen-Feng Hsiao, Te-Min Chang and Erwin Thomas

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable…

HTML

PDF (264 KB)

Downloads

488

Abstract

Purpose

The purpose of this paper is to propose an automatic metadata extraction and retrieval system to extract bibliographical information from digital academic documents in portable document formats (PDFs).

Design/methodology/approach

The authors use PDFBox to extract text and font size information, a rule-based method to identify titles, and an Hidden Markov Model (HMM) to extract the titles and authors. Finally, the extracted titles and authors (possibly incorrect or incomplete) are sent as query strings to digital libraries (e.g. ACM, IEEE, CiteSeerX, SDOS, and Google Scholar) to retrieve the rest of metadata.

Findings

Four experiments are conducted to examine the feasibility of the proposed system. The first experiment compares two different HMM models: multi-state model and one state model (the proposed model). The result shows that one state model can have a comparable performance with multi-state model, but is more suitable to deal with real-world unknown states. The second experiment shows that our proposed model (without the aid of online query) can achieve as good performance as other researcher's model on Cora paper header dataset. In the third experiment the paper examines the performance of our system on a small dataset of 43 real PDF research papers. The result shows that our proposed system (with online query) can perform pretty well on bibliographical data extraction and even outperform the free citation management tool Zotero 3.0. Finally, the paper conducts the fourth experiment with a larger dataset of 103 papers to compare our system with Zotero 4.0. The result shows that our system significantly outperforms Zotero 4.0. The feasibility of the proposed model is thus justified.

Research limitations/implications

For academic implication, the system is unique in two folds: first, the system only uses Cora header set for HMM training, without using other tagged datasets or gazetteers resources, which means the system is light and scalable. Second, the system is workable and can be applied to extracting metadata of real-world PDF files. The extracted bibliographical data can then be imported into citation software such as endnote or refworks to increase researchers’ productivity.

Practical implications

For practical implication, the system can outperform the existing tool, Zotero v4.0. This provides practitioners good chances to develop similar products in real applications; though it might require some knowledge about HMM implementation.

Originality/value

The HMM implementation is not novel. What is innovative is that it actually combines two HMM models. The main model is adapted from Freitag and Mccallum (1999) and the authors add word features of the Nymble HMM (Bikel et al, 1997) to it. The system is workable even without manually tagging the datasets before training the model (the authors just use cora dataset to train and test on real-world PDF papers), as this is significantly different from what other works have done so far. The experimental results have shown sufficient evidence about the feasibility of our proposed method in this aspect.

Details

Program, vol. 48 no. 3

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 22 October 2021

Using pretraining and text mining methods to automatically extract the chemical scientific data

Na Pang, Li Qian, Weimin Lyu and Jin-Dong Yang

In computational chemistry, the chemical bond energy (pKa) is essential, but most pKa-related data are submerged in scientific papers, with only a few data that have been…

HTML

PDF (3.3 MB)

Downloads

286

Abstract

Purpose

In computational chemistry, the chemical bond energy (pKa) is essential, but most pKa-related data are submerged in scientific papers, with only a few data that have been extracted by domain experts manually. The loss of scientific data does not contribute to in-depth and innovative scientific data analysis. To address this problem, this study aims to utilize natural language processing methods to extract pKa-related scientific data in chemical papers.

Design/methodology/approach

Based on the previous Bert-CRF model combined with dictionaries and rules to resolve the problem of a large number of unknown words of professional vocabulary, in this paper, the authors proposed an end-to-end Bert-CRF model with inputting constructed domain wordpiece tokens using text mining methods. The authors use standard high-frequency string extraction techniques to construct domain wordpiece tokens for specific domains. And in the subsequent deep learning work, domain features are added to the input.

Findings

The experiments show that the end-to-end Bert-CRF model could have a relatively good result and can be easily transferred to other domains because it reduces the requirements for experts by using automatic high-frequency wordpiece tokens extraction techniques to construct the domain wordpiece tokenization rules and then input domain features to the Bert model.

Originality/value

By decomposing lots of unknown words with domain feature-based wordpiece tokens, the authors manage to resolve the problem of a large amount of professional vocabulary and achieve a relatively ideal extraction result compared to the baseline model. The end-to-end model explores low-cost migration for entity and relation extraction in professional fields, reducing the requirements for experts.

Details

Data Technologies and Applications, vol. 56 no. 2

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 19 July 2024

Knowledge extraction by integrating emojis with text from online reviews

Kuoyi Lin, Xiaoyang Kan and Meilian Liu

This study develops and validates an innovative approach for extracting knowledge from online user reviews by integrating textual content and emojis. Recognizing the pivotal role…

HTML

PDF (1.2 MB)

Downloads

Abstract

Purpose

This study develops and validates an innovative approach for extracting knowledge from online user reviews by integrating textual content and emojis. Recognizing the pivotal role emojis play in enhancing the expressiveness and emotional depth of digital communication, this study aims to address the significant gap in existing sentiment analysis models, which have largely overlooked the contribution of emojis in interpreting user preferences and sentiments. By constructing a comprehensive model that synergizes emotional and semantic information conveyed through emojis and text, this study seeks to provide a more nuanced understanding of user preferences, thereby enhancing the accuracy and depth of knowledge extraction from online reviews. The goal is to offer a robust framework that enables more effective and empathetic engagement with user-generated content on digital platforms, paving the way for improved service delivery, product development and customer satisfaction through informed insights into consumer behavior and sentiments.

Design/methodology/approach

This study uses a structured methodology to integrate and analyze text and emojis from online reviews for effective knowledge extraction, focusing on user preferences and sentiments. This methodology consists of four key stages. First, this study leverages high-frequency noun analysis to identify and extract product attributes mentioned in online user reviews. By focusing on nouns that appear frequently, the authors can systematically discern the primary features or aspects of products that users discuss, thereby providing a foundation for a more detailed sentiment and preference analysis. Second, a foundational sentiment dictionary is established that incorporates sentiment-bearing words, intensifiers and negation terms to analyze the textual part of the reviews. This dictionary is used to assign sentiment scores to phrases and sentences within reviews, allowing the quantification of textual sentiments based on the presence and combination of these predefined lexical items. Third, an emoticon sentiment dictionary is developed to address the emotional content conveyed through emojis. This dictionary categorizes emojis based on their associated sentiments, thus enabling the quantification of emotional expressions in reviews. The sentiment scores derived from the emojis are then integrated with those from the textual analysis. This integration considers the weights of text- and emoji-based emotions to compute a comprehensive attribute sentiment score that reflects a nuanced understanding of user sentiments and preferences. Finally, the authors conduct an empirical study to validate the effectiveness of the proposed methodology in mining user preferences from online reviews by applying the approach to a data set of online reviews and evaluating its ability to accurately identify product attributes and user sentiments. The validation process assessed the reliability and accuracy of the methodology in extracting meaningful insights from the complex interplay between text and emojis. This study offers a holistic and nuanced framework for knowledge extraction from online reviews, capturing both explicit and implicit sentiments expressed by users through text and emojis. By integrating these elements, this study seeks to provide a comprehensive understanding of user preferences, contributing to improved consumer insight and strategic decision-making for businesses and researchers.

Findings

The application of the proposed methodology for integrating emojis with text in online reviews yields significant findings that underscore the feasibility and value of extracting realistic user knowledge to gain insights from user-generated content. The analysis successfully captured consumer preferences, which are instrumental in informing service decisions and driving innovation. This achievement is largely attributed to the development and utilization of a comprehensive emotion-sentiment dictionary tailored to interpret the complex interplay between textual and emoji-based expressions in online reviews. By implementing a sentiment calculation model that intricately combines textual sentiment analysis with emoji sentiment analysis, this study was able to accurately determine the final attribute emotion for various product features discussed in the reviews. This model effectively characterized the emotional knowledge of online users and provided a nuanced understanding of their sentiments and preferences. The emotional knowledge extracted is not only quantifiable but also rich in context, offering deeper insights into consumer behavior and attitudes. Furthermore, a case analysis is conducted to rigorously test the validity of the proposed model in a real-world scenario. This practical examination revealed that the model is not only capable of accurately extracting and analyzing user preferences but is also adaptable to different contexts and product categories. The case analysis highlights the robustness and flexibility of the model, demonstrating its potential to enhance the precision of knowledge extraction processes significantly. Overall, the results confirm the effectiveness of the proposed approach in integrating text and emojis for comprehensive knowledge extraction from online reviews. The findings validate the model’s capability to offer actionable insights into consumer preferences, thereby supporting more informed and strategic decision-making by businesses. This study contributes to the broader field of sentiment analysis by showcasing the untapped potential of emojis as valuable indicators of user sentiments, opening new avenues for research and applications in digital marketing and consumer behavior analysis.

Originality/value

This study introduces a pioneering approach to extract knowledge from Web user interactions, notably through the integration of online reviews that incorporate both textual content and emoticons. This innovative methodology stands out because it holistically considers the dual channels of communication, text and emojis, to comprehensively mine Web user preferences. The key contribution of this study lies in its novel insights into the extraction of consumer preferences, advancing beyond traditional text-based analysis to embrace nuanced expressions conveyed through emoticons. The originality of this study is underpinned by its acknowledgment of emoticons as a significant and untapped source of sentiment and preference indicators in online reviews. By effectively merging emoticon analysis and emoji emotion scoring with textual sentiment analysis, this study enriches the understanding of Web user preferences and enhances the accuracy and depth of consumer preference insights. This dual-analysis approach represents a significant leap forward in sentiment analysis, setting a new standard for how digital communication can be leveraged to derive meaningful insights into consumer behavior. Furthermore, the results have practical implications to businesses and marketers. The insights gained from this integrated analytical approach offer a more granular and emotionally nuanced view of customer feedback, which can inform more effective marketing strategies, product development and customer service practices. By pioneering this comprehensive method of knowledge extraction, this study paves the way for future research and practice to interpret and respond more accurately to the complex landscape of online consumer expressions. This study’s originality and value lie in its innovative method of capturing and analyzing the rich tapestry of Web user communication, offering a ground-breaking perspective on consumer preference extraction that promises to enhance both academic research and practical applications in the digital era.

Details

Journal of Knowledge Management, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 1367-3270

Keywords

View access options

Article

Publication date: 7 December 2022

BIM and real estate valuation: challenges, potentials and lessons for future directions

Peyman Jafary, Davood Shojaei, Abbas Rajabifard and Tuan Ngo

Building information modeling (BIM) is a striking development in the architecture, engineering and construction (AEC) industry, which provides in-depth information on different…

HTML

PDF (3.3 MB)

Downloads

1108

Abstract

Purpose

Building information modeling (BIM) is a striking development in the architecture, engineering and construction (AEC) industry, which provides in-depth information on different stages of the building lifecycle. Real estate valuation, as a fully interconnected field with the AEC industry, can benefit from 3D technical achievements in BIM technologies. Some studies have attempted to use BIM for real estate valuation procedures. However, there is still a limited understanding of appropriate mechanisms to utilize BIM for valuation purposes and the consequent impact that BIM can have on decreasing the existing uncertainties in the valuation methods. Therefore, the paper aims to analyze the literature on BIM for real estate valuation practices.

Design/methodology/approach

This paper presents a systematic review to analyze existing utilizations of BIM for real estate valuation practices, discovers the challenges, limitations and gaps of the current applications and presents potential domains for future investigations. Research was conducted on the Web of Science, Scopus and Google Scholar databases to find relevant references that could contribute to the study. A total of 52 publications including journal papers, conference papers and proceedings, book chapters and PhD and master's theses were identified and thoroughly reviewed. There was no limitation on the starting date of research, but the end date was May 2022.

Findings

Four domains of application have been identified: (1) developing machine learning-based valuation models using the variables that could directly be captured through BIM and industry foundation classes (IFC) data instances of building objects and their attributes; (2) evaluating the capacity of 3D factors extractable from BIM and 3D GIS in increasing the accuracy of existing valuation models; (3) employing BIM for accurate estimation of components of cost approach-based valuation practices; and (4) extraction of useful visual features for real estate valuation from BIM representations instead of 2D images through deep learning and computer vision.

Originality/value

This paper contributes to research efforts on utilization of 3D modeling in real estate valuation practices. In this regard, this paper presents a broad overview of the current applications of BIM for valuation procedures and provides potential ways forward for future investigations.

Details

Engineering, Construction and Architectural Management, vol. 31 no. 4

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

Access

Year

Content type

1 – 10 of over 5000

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions