Search results

1 – 10 of 346
Article
Publication date: 13 September 2018

Jian Zhan, Xin Janet Ge, Shoudong Huang, Liang Zhao, Johnny Kwok Wai Wong and Sean XiangJian He

Automated technologies have been applied to facility management (FM) practices to address labour demands of, and time consumed by, inputting and processing manual data…

Abstract

Purpose

Automated technologies have been applied to facility management (FM) practices to address labour demands of, and time consumed by, inputting and processing manual data. Less attention has been focussed on automation of visual information, such as images, when improving timely maintenance decisions. This study aims to develop image classification algorithms to improve information flow in the inspection-repair process through building information modelling (BIM).

Design/methodology/approach

To improve and automate the inspection-repair process, image classification algorithms were used to connect images with a corresponding image database in a BIM knowledge repository. Quick response (QR) code decoding and Bag of Words were chosen to classify images in the system. Graphical user interfaces (GUIs) were developed to facilitate activity collaboration and communication. A pilot case study in an inspection-repair process was applied to demonstrate the applications of this system.

Findings

The system developed in this study associates the inspection-repair process with a digital three-dimensional (3D) model, GUIs, a BIM knowledge repository and image classification algorithms. By implementing the proposed application in a case study, the authors found that improvement of the inspection-repair process and automated image classification with a BIM knowledge repository (such as the one developed in this study) can enhance FM practices by increasing productivity and reducing time and costs associated with ecision-making.

Originality/value

This study introduces an innovative approach that applies image classification and leverages a BIM knowledge repository to enhance the inspection-repair process in FM practice. The system designed provides automated image-classifying data from a smart phone, eliminates time required to input image data manually and improves communication and collaboration between FM personnel for maintenance in the decision-making process.

Details

Facilities, vol. 37 no. 7/8
Type: Research Article
ISSN: 0263-2772

Keywords

Article
Publication date: 23 August 2013

Ivo Lašek and Peter Vojtáš

The purpose of this paper is to focus on the problem of named entity disambiguation. The paper disambiguates named entities on a very detailed level. To each entity is…

Abstract

Purpose

The purpose of this paper is to focus on the problem of named entity disambiguation. The paper disambiguates named entities on a very detailed level. To each entity is assigned a concrete identifier of a corresponding Wikipedia article describing the entity.

Design/methodology/approach

For such a fine‐grained disambiguation a correct representation of the context is crucial. The authors compare various context representations: bag of words representation, linguistic representation and structured co‐occurrence representation. Models for each representation are described and evaluated. They also investigate the possibilities of multilingual named entity disambiguation.

Findings

Based on this evaluation, the structured co‐occurrence representation provides the best disambiguation results. It showed up that this method could be successfully applied also on other languages, not only on English.

Research limitations/implications

Despite its good results the structured co‐occurrence context representation has several limitations. It trades precision for recall, which might not be desirable in some use cases. Also it is not able to disambiguate two different types of entities, which are mentioned under the same name in the same text. These limitations can be overcome by combination with other described methods.

Practical implications

The authors provide a ready‐made web service, which can be directly plugged in existing applications using a REST interface.

Originality/value

The paper proposes a new approach to named entity disambiguation exploiting various context representation models (bag of words, linguistic and structural representation). The authors constructed a comprehensive dataset based on all English Wikipedia articles for named entity disambiguation. They evaluated and compared the individual context representation models on this dataset. They evaluate the support of multiple languages.

Details

International Journal of Web Information Systems, vol. 9 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 6 September 2022

Elena Fedorova, Pavel Drogovoz, Anna Popova and Vladimir Shiboldenkov

The paper examines whether, along with the financial performance, the disclosure of research and development (R&D) expenses, patent portfolios, patent citations and…

Abstract

Purpose

The paper examines whether, along with the financial performance, the disclosure of research and development (R&D) expenses, patent portfolios, patent citations and innovation activities affect the market capitalization of Russian companies.

Design/methodology/approach

The paper opted for a set of techniques including bag-of-words (BoW) to retrieve additional innovation-related data from companies' annual reports, self-organizing maps (SOM) to perform visual exploratory analysis and panel data regression (PDR) to conduct confirmatory analysis using data on 74 Russian publicly traded companies for the period 2013–2019.

Findings

The paper observes that the disclosure of nonfinancial data on R&D, patents and primarily product and marketing innovations positively affects the market capitalization of the largest Russian companies, which are mainly focused on energy, raw materials and utilities and are operating on international markets. The study suggests that these companies are financially well-resourced to innovate at risk and thus to provide positive signals to stakeholders and external agents.

Research limitations/implications

Our findings are important to management, investors, financial analysts, regulators and various agencies providing guidance on corporate governance and sustainability reporting. However, the authors acknowledge that the research results may lack generalizability due to the sample covering a single national context. Researchers are encouraged to test the proposed approach further on other countries' data by using the compiled lexicons.

Originality/value

The study aims to expand the domains of signaling theory and market valuation by providing new insights into the impact that companies' reporting on R&D, patents and innovation activities has on market capitalization. New nonfinancial factors that previous research does not investigate – innovation disclosure indicators (IDI) – are tested.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 17 May 2021

Sayeh Bagherzadeh, Sajjad Shokouhyar, Hamed Jahani and Marianna Sigala

Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and…

Abstract

Purpose

Research analyzing online travelers’ reviews has boomed over the past years, but it lacks efficient methodologies that can provide useful end-user value within time and budget. This study aims to contribute to the field by developing and testing a new methodology for sentiment analysis that surpasses the standard dictionary-based method by creating two hotel-specific word lexicons.

Design/methodology/approach

Big data of hotel customer reviews posted on the TripAdvisor platform were collected and appropriately prepared for conducting a binary sentiment analysis by developing a novel bag-of-words weighted approach. The latter provides a transparent and replicable procedure to prepare, create and assess lexicons for sentiment analysis. This approach resulted in two lexicons (a weighted lexicon, L1 and a manually selected lexicon, L2), which were tested and validated by applying classification accuracy metrics to the TripAdvisor big data. Two popular methodologies (a public dictionary-based method and a complex machine-learning algorithm) were used for comparing the accuracy metrics of the study’s approach for creating the two lexicons.

Findings

The results of the accuracy metrics confirmed that the study’s methodology significantly outperforms the dictionary-based method in comparison to the machine-learning algorithm method. The findings also provide evidence that the study’s methodology is generalizable for predicting users’ sentiment.

Practical implications

The study developed and validated a methodology for generating reliable lexicons that can be used for big data analysis aiming to understand and predict customers’ sentiment. The L2 hotel dictionary generated by the study provides a reliable method and a useful tool for analyzing guests’ feedback and enabling managers to understand, anticipate and re-actively respond to customers’ attitudes and changes. The study also proposed a simplified methodology for understanding the sentiment of each user, which, in turn, can be used for conducting comparisons aiming to detect and understand guests’ sentiment changes across time, as well as across users based on their profiles and experiences.

Originality/value

This study contributes to the field by proposing and testing a new methodology for conducting sentiment analysis that addresses previous methodological limitations, as well as the contextual specificities of the tourism industry. Based on the paper’s literature review, this is the first research study using a bag-of-words approach for conducting a sentiment analysis and creating a field-specific lexicon.

论可推广性的情感分析法以创建酒店字典:以TripAdvisor酒店评论为样本的大数据分析

摘要

研究目的

对于在线游客评论的研究在过去的几年中与日俱增, 但是仍缺乏有效方法能在有限的时间喝预算内提供终端用户价值。本论文开发并测试了一套情感分析的新方法, 创建两套酒店相关的词库, 此方法超越了标准词典式分析法。

研究设计/方法/途径

研究样本为TripAdvisor酒店客户评论的大数据, 通过开发崭新的有配重的词库法, 来开展两极式情感分析。这个崭新的具有配重的词库法能够呈现透明化和可复制的程序, 准备、创建、并检验情感分析的词条。这个方法用到了两种词典(有配重的词典L1和手动选择的词典L2), 本论文通过对TripAdvisor大数据进行使用词类划分精准度, 来检测和验证这两种词典。本论文采用两种热门方法(公共词典法和复杂机器学习算法)来对比词典的准确度。

研究结果

精确度对比结果证实了本论文的方法, 相较于机器学习算法, 显著地超越了以字典为基础的方法。研究结果还表明, 本论文的方法可以就预测用户情感趋势进行推广。

研究实际启示

本论文开发并验证了一项方法, 这种方法通过创建可信的词典进行大数据分析, 以判定用户情感。本论文创建的L2酒店词库对分析客人反馈是可靠有用的工具, 这个词库还能帮助酒店经理了解、预测、以及积极相应客人的态度和改变。本论文还提出了一项可以了解每个用户情感的简易方法, 这项方法可以通过对比的方式来检测和了解客人不同时间的情感变化, 以及根据其不同背景和经历的不同用户之间的变化。

研究原创性/价值

本论文提出并检测了一项新方法, 这项情感分析方法可以解决之前方法的局限并立脚于旅游行业。基于文献综述, 本论文是首篇研究, 使用词库法来进行情感分析和创建特别领域词典的方式。

Details

Journal of Hospitality and Tourism Technology, vol. 12 no. 2
Type: Research Article
ISSN: 1757-9880

Keywords

Article
Publication date: 5 June 2017

Eugene Yujun Fu, Hong Va Leong, Grace Ngai and Stephen C.F. Chan

Social signal processing under affective computing aims at recognizing and extracting useful human social interaction patterns. Fight is a common social interaction in…

Abstract

Purpose

Social signal processing under affective computing aims at recognizing and extracting useful human social interaction patterns. Fight is a common social interaction in real life. A fight detection system finds wide applications. This paper aims to detect fights in a natural and low-cost manner.

Design/methodology/approach

Research works on fight detection are often based on visual features, demanding substantive computation and good video quality. In this paper, the authors propose an approach to detect fight events through motion analysis. Most existing works evaluated their algorithms on public data sets manifesting simulated fights, where the fights are acted out by actors. To evaluate real fights, the authors collected videos involving real fights to form a data set. Based on the two types of data sets, the authors evaluated the performance of their motion signal analysis algorithm, which was then compared with the state-of-the-art approach based on MoSIFT descriptors with Bag-of-Words mechanism, and basic motion signal analysis with Bag-of-Words.

Findings

The experimental results indicate that the proposed approach accurately detects fights in real scenarios and performs better than the MoSIFT approach.

Originality/value

By collecting and annotating real surveillance videos containing real fight events and augmenting with well-known data sets, the authors proposed, implemented and evaluated a low computation approach, comparing it with the state-of-the-art approach. The authors uncovered some fundamental differences between real and simulated fights and initiated a new study in discriminating real against simulated fight events, with very good performance.

Details

International Journal of Pervasive Computing and Communications, vol. 13 no. 2
Type: Research Article
ISSN: 1742-7371

Keywords

Article
Publication date: 29 June 2021

Daejin Kim, Hyoung-Goo Kang, Kyounghun Bae and Seongmin Jeon

To overcome the shortcomings of traditional industry classification systems such as the Standard Industrial Classification Standard Industrial Classification, North…

Abstract

Purpose

To overcome the shortcomings of traditional industry classification systems such as the Standard Industrial Classification Standard Industrial Classification, North American Industry Classification System North American Industry Classification System, and Global Industry Classification Standard Global Industry Classification Standard, the authors explore industry classifications using machine learning methods as an application of interpretable artificial intelligence (AI).

Design/methodology/approach

The authors propose a text-based industry classification combined with a machine learning technique by extracting distinguishable features from business descriptions in financial reports. The proposed method can reduce the dimensions of word vectors to avoid the curse of dimensionality when measuring the similarities of firms.

Findings

Using the proposed method, the sample firms form clusters of distinctive industries, thus overcoming the limitations of existing classifications. The method also clarifies industry boundaries based on lower-dimensional information. The graphical closeness between industries can reflect the industry-level relationship as well as the closeness between individual firms.

Originality/value

The authors’ work contributes to the industry classification literature by empirically investigating the effectiveness of machine learning methods. The text mining method resolves issues concerning the timeliness of traditional industry classifications by capturing new information in annual reports. In addition, the authors’ approach can solve the computing concerns of high dimensionality.

Details

Internet Research, vol. 32 no. 2
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 25 October 2018

Panagiotis Stamolampros and Nikolaos Korfiatis

Although the literature has established the effect of online reviews on customer purchase intentions, the influence of psychological factors on online ratings is…

Abstract

Purpose

Although the literature has established the effect of online reviews on customer purchase intentions, the influence of psychological factors on online ratings is overlooked. This paper aims to examine these factors under the perspective of construal level theory (CLT).

Design/methodology/approach

Using review data from TripAdvisor and Booking.com, the authors study three dimensions of psychological distances (temporal, spatial and social) and their direct and interaction effects on review valence, using regression analysis. The authors examine the effect of these distances on the information content of online reviews using a novel bag-of-words model to assess its concreteness.

Findings

Temporal distance and spatial distance have positive direct effects on review valence. Social distance, on the other hand, has a negative direct effect. However, its interaction with the other two distances has a positive effect, suggesting that consumers tend to “zoom-out” to less concrete things in their ratings.

Practical implications

The findings provide implications for the interpretation of review ratings by the service providers and their information content.

Originality/value

This study extends the CLT and electronic word-of-mouth literature by jointly exploring the effect of all three psychological distances that are applicable in post-purchase evaluations. Methodologically, it provides a novel application of the bag-of-words model in evaluating the concreteness of online reviews.

Details

International Journal of Contemporary Hospitality Management, vol. 30 no. 10
Type: Research Article
ISSN: 0959-6119

Keywords

Article
Publication date: 13 March 2020

Jinwook Choi, Yongmoo Suh and Namchul Jung

The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating…

Abstract

Purpose

The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative information represented by published reports or management interview has been known as an important source in addition to quantitative information represented by financial values in assigning corporate credit rating in practice. Nevertheless, prior studies have room for further research in that they rarely employed qualitative information in developing prediction model of corporate credit rating.

Design/methodology/approach

This study adopted three document vectorization methods, Bag-Of-Words (BOW), Word to Vector (Word2Vec) and Document to Vector (Doc2Vec), to transform an unstructured textual data into a numeric vector, so that Machine Learning (ML) algorithms accept it as an input. For the experiments, we used the corpus of Management’s Discussion and Analysis (MD&A) section in 10-K financial reports as well as financial variables and corporate credit rating data.

Findings

Experimental results from a series of multi-class classification experiments show the predictive models trained by both financial variables and vectors extracted from MD&A data outperform the benchmark models trained only by traditional financial variables.

Originality/value

This study proposed a new approach for corporate credit rating prediction by using qualitative information extracted from MD&A documents as an input to ML-based prediction models. Also, this research adopted and compared three textual vectorization methods in the domain of corporate credit rating prediction and showed that BOW mostly outperformed Word2Vec and Doc2Vec.

Details

Data Technologies and Applications, vol. 54 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 18 April 2017

Mahmoud Al-Ayyoub, Ahmed Alwajeeh and Ismail Hmeidi

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the…

Abstract

Purpose

The authorship authentication (AA) problem is concerned with correctly attributing a text document to its corresponding author. Historically, this problem has been the focus of various studies focusing on the intuitive idea that each author has a unique style that can be captured using stylometric features (SF). Another approach to this problem, known as the bag-of-words (BOW) approach, uses keywords occurrences/frequencies in each document to identify its author. Unlike the first one, this approach is more language-independent. This paper aims to study and compare both approaches focusing on the Arabic language which is still largely understudied despite its importance.

Design/methodology/approach

Being a supervised learning problem, the authors start by collecting a very large data set of Arabic documents to be used for training and testing purposes. For the SF approach, they compute hundreds of SF, whereas, for the BOW approach, the popular term frequency-inverse document frequency technique is used. Both approaches are compared under various settings.

Findings

The results show that the SF approach, which is much cheaper to train, can generate more accurate results under most settings.

Practical implications

Numerous advantages of efficiently solving the AA problem are obtained in different fields of academia as well as the industry including literature, security, forensics, electronic markets and trading, etc. Another practical implication of this work is the public release of its sources. Specifically, some of the SF can be very useful for other problems such as sentiment analysis.

Originality/value

This is the first study of its kind to compare the SF and BOW approaches for authorship analysis of Arabic articles. Moreover, many of the computed SF are novel, while other features are inspired by the literature. As SF are language-dependent and most existing papers focus on English, extra effort must be invested to adapt such features to Arabic text.

Details

International Journal of Web Information Systems, vol. 13 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 26 August 2021

Dominika Hadro, Justyna Fijałkowska, Karolina Daszyńska-Żygadło, Ilze Zumente and Svetlana Mjakuškina

This study aims to verify whether non-financial disclosure in the construction industry (CI) responds to stakeholders’ information needs and explores the most frequent…

Abstract

Purpose

This study aims to verify whether non-financial disclosure in the construction industry (CI) responds to stakeholders’ information needs and explores the most frequent topics disclosed in terms of the environmental, social and governance (ESG) pillars.

Design/methodology/approach

This study uses a bag-of-words method and latent Dirichlet allocation to match stakeholders’ expectations with information disclosed by companies. This paper assesses the publicly available non-financial disclosure of the 46 European CI companies covered by the Refinitiv database with ESG scores.

Findings

This study provides two main findings. First, it shows the mismatch between stakeholders’ information needs and what they get in non-financial reporting. Despite non-financial information in CI disclosure, the information disclosed by many CI companies does not meet their users’ information needs. CI companies commonly focus on their sustainable products and health policy while omitting other topics of interest – the circular economy, unethical business behaviour, migrant policy and human trafficking. Second, this study indicates the defects of simple disclosure analysis based on keywords and highlights the importance of context in information analysis.

Practical implications

The proposed novel approach to text analysis offers several practical applications. It is a more effective tool for evaluating companies’ sustainability performance. It may be especially important to ESG rating providers. Additionally, the results may be of interest to companies wishing to improve their communication, and, in particular, to regulators and standard setters in two matters. The first is the need for more pressure to increase awareness among issuers to shift from disclosing large amounts of non-financial information to disclosing good quality non-financial information, which would be appropriate for meeting stakeholders’ expectations. The second is the necessity for deepening issuers’ understanding of the diverse stakeholders’ information needs, considering the substantial differences among industries and improving communication to meet them.

Originality/value

This study introduces text analysis that, apart from keywords, considers the context of these keywords’ appearances in a report’s narration. It allows a significantly improved understanding of the information disclosed and a more stable grounding for reasoning, leading to better and informed decisions. Moreover, this study verifies how the information disclosed matches stakeholders’ needs. Finally, it enriches the literature on sectoral analysis concerning non-financial disclosure.

Details

Meditari Accountancy Research, vol. 30 no. 3
Type: Research Article
ISSN: 2049-372X

Keywords

1 – 10 of 346