Search results

1 – 10 of over 2000

View access options

Article

Publication date: 11 May 2020

Comparing tagging suggestion models on discrete corpora

Bojan Bozic, Andre Rios and Sarah Jane Delany

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors…

HTML

PDF (1.7 MB)

Downloads

Abstract

Purpose

This paper aims to investigate the methods for the prediction of tags on a textual corpus that describes diverse data sets based on short messages; as an example, the authors demonstrate the usage of methods based on hotel staff inputs in a ticketing system as well as the publicly available StackOverflow corpus. The aim is to improve the tagging process and find the most suitable method for suggesting tags for a new text entry.

Design/methodology/approach

The paper consists of two parts: exploration of existing sample data, which includes statistical analysis and visualisation of the data to provide an overview, and evaluation of tag prediction approaches. The authors have included different approaches from different research fields to cover a broad spectrum of possible solutions. As a result, the authors have tested a machine learning model for multi-label classification (using gradient boosting), a statistical approach (using frequency heuristics) and three similarity-based classification approaches (nearest centroid, k-nearest neighbours (k-NN) and naive Bayes). The experiment that compares the approaches uses recall to measure the quality of results. Finally, the authors provide a recommendation of the modelling approach that produces the best accuracy in terms of tag prediction on the sample data.

Findings

The authors have calculated the performance of each method against the test data set by measuring recall. The authors show recall for each method with different features (except for frequency heuristics, which does not provide the option to add additional features) for the dmbook pro and StackOverflow data sets. k-NN clearly provides the best recall. As k-NN turned out to provide the best results, the authors have performed further experiments with values of k from 1–10. This helped us to observe the impact of the number of neighbours used on the performance and to identify the best value for k.

Originality/value

The value and originality of the paper are given by extensive experiments with several methods from different domains. The authors have used probabilistic methods, such as naive Bayes, statistical methods, such as frequency heuristics, and similarity approaches, such as k-NN. Furthermore, the authors have produced results on an industrial-scale data set that has been provided by a company and used directly in their project, as well as a community-based data set with a large amount of data and dimensionality. The study results can be used to select a model based on diverse corpora for a specific use case, taking into account advantages and disadvantages when applying the model to your data.

Details

International Journal of Web Information Systems, vol. 16 no. 2

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 30 August 2022

Safety tag generation and training material recommendation for construction workers: a persona-based approach

Pinsheng Duan, Jianliang Zhou and Wenhan Fan

Effective construction safety training has been considered to play a significant role in reducing the incidence of accidents. However, the current safety training methods pay less…

HTML

PDF (3.4 MB)

Downloads

315

Abstract

Purpose

Effective construction safety training has been considered to play a significant role in reducing the incidence of accidents. However, the current safety training methods pay less attention to the relationship between workers' personalized characteristics and their learning needs, which results in workers' low learning participation and poor training effect. The purpose of this paper is to improve the participation and effect of safety training for construction workers with a persona-based approach.

Design/methodology/approach

This paper presents a persona-based approach to safety tag generation and training material recommendation. By extracting the demographic characteristics and behavior patterns tags of construction workers, a neural network algorithm is introduced to calculate the learning needs tags of workers, and the collaborative filtering recommendation method is integrated to enrich the innovation of recommendation results. Offline experiments and online experiments are designed to verify the rationality of the proposed method.

Findings

The results show that the learning needs of workers are closely related to their background. The proposed method can effectively improve workers' interest in materials and the training effect compared with conventional safety training methods. The research provides a theoretical and practical reference for promoting active safety management and achieving worker-centered safety management.

Originality/value

First, a persona-based approach is introduced to establish a novel framework for solving the problem of personalized construction safety management. Second, an artificial intelligence algorithm is used to automatically extract the learning needs tag values and design a hybrid recommendation method for construction workers' personalized safety training. The collaborative filtering method is integrated to enrich the innovation of recommendation results.

Details

Engineering, Construction and Architectural Management, vol. 31 no. 1

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

View access options

Article

Publication date: 5 May 2023

Identifying business information through deep learning: analyzing the tender documents of an Internet-based logistics bidding platform

Ying Yu and Jing Ma

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee…

HTML

PDF (830 KB)

Downloads

685

Abstract

Purpose

The tender documents, an essential data source for internet-based logistics tendering platforms, incorporate massive fine-grained data, ranging from information on tenderee, shipping location and shipping items. Automated information extraction in this area is, however, under-researched, making the extraction process a time- and effort-consuming one. For Chinese logistics tender entities, in particular, existing named entity recognition (NER) solutions are mostly unsuitable as they involve domain-specific terminologies and possess different semantic features.

Design/methodology/approach

To tackle this problem, a novel lattice long short-term memory (LSTM) model, combining a variant contextual feature representation and a conditional random field (CRF) layer, is proposed in this paper for identifying valuable entities from logistic tender documents. Instead of traditional word embedding, the proposed model uses the pretrained Bidirectional Encoder Representations from Transformers (BERT) model as input to augment the contextual feature representation. Subsequently, with the Lattice-LSTM model, the information of characters and words is effectively utilized to avoid error segmentation.

Findings

The proposed model is then verified by the Chinese logistic tender named entity corpus. Moreover, the results suggest that the proposed model excels in the logistics tender corpus over other mainstream NER models. The proposed model underpins the automatic extraction of logistics tender information, enabling logistic companies to perceive the ever-changing market trends and make far-sighted logistic decisions.

Originality/value

(1) A practical model for logistic tender NER is proposed in the manuscript. By employing and fine-tuning BERT into the downstream task with a small amount of data, the experiment results show that the model has a better performance than other existing models. This is the first study, to the best of the authors' knowledge, to extract named entities from Chinese logistic tender documents. (2) A real logistic tender corpus for practical use is constructed and a program of the model for online-processing real logistic tender documents is developed in this work. The authors believe that the model will facilitate logistic companies in converting unstructured documents to structured data and further perceive the ever-changing market trends to make far-sighted logistic decisions.

Details

Data Technologies and Applications, vol. 58 no. 1

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 2 January 2023

Order and disorder in the evolution of online knowledge community: an investigation of the chaotic behavior in social tagging systems with evidence of stack overflow

Yanqing Shi, Hongye Cao and Si Chen

Online question-and-answer (Q&A) communities serve as important channels for knowledge diffusion. The purpose of this study is to investigate the dynamic development process of…

HTML

PDF (2.7 MB)

Downloads

256

Abstract

Purpose

Online question-and-answer (Q&A) communities serve as important channels for knowledge diffusion. The purpose of this study is to investigate the dynamic development process of online knowledge systems and explore the final or progressive state of system development. By measuring the nonlinear characteristics of knowledge systems from the perspective of complexity science, the authors aim to enrich the perspective and method of the research on the dynamics of knowledge systems, and to deeply understand the behavior rules of knowledge systems.

Design/methodology/approach

The authors collected data from the programming-related Q&A site Stack Overflow for a ten-year period (2008–2017) and included 48,373 tags in the analyses. The number of tags is taken as the time series, the correlation dimension and the maximum Lyapunov index are used to examine the chaos of the system and the Volterra series multistep forecast method is used to predict the system state.

Findings

There are strange attractors in the system, the whole system is complex but bounded and its evolution is bound to approach a relatively stable range. Empirical analyses indicate that chaos exists in the process of knowledge sharing in this social labeling system, and the period of change over time is about one week.

Originality/value

This study contributes to revealing the evolutionary cycle of knowledge stock in online knowledge systems and further indicates how this dynamic evolution can help in the setting of platform mechanics and resource inputs.

Details

Aslib Journal of Information Management, vol. 76 no. 1

Type: Research Article

DOI:

ISSN: 2050-3806

Keywords

View access options

Article

Publication date: 9 February 2018

A survey on mining stack overflow: question and answering (Q&A) community

Arshad Ahmad, Chong Feng, Shi Ge and Abdallah Yousif

Software developers extensively use stack overflow (SO) for knowledge sharing on software development. Thus, software engineering researchers have started mining the…

HTML

PDF (535 KB)

Downloads

1736

Abstract

Purpose

Software developers extensively use stack overflow (SO) for knowledge sharing on software development. Thus, software engineering researchers have started mining the structured/unstructured data present in certain software repositories including the Q&A software developer community SO, with the aim to improve software development. The purpose of this paper is show that how academics/practitioners can get benefit from the valuable user-generated content shared on various online social networks, specifically from Q&A community SO for software development.

Design/methodology/approach

A comprehensive literature review was conducted and 166 research papers on SO were categorized about software development from the inception of SO till June 2016.

Findings

Most of the studies revolve around a limited number of software development tasks; approximately 70 percent of the papers used millions of posts data, applied basic machine learning methods, and conducted investigations semi-automatically and quantitative studies. Thus, future research should focus on the overcoming existing identified challenges and gaps.

Practical implications

The work on SO is classified into two main categories; “SO design and usage” and “SO content applications.” These categories not only give insights to Q&A forum providers about the shortcomings in design and usage of such forums but also provide ways to overcome them in future. It also enables software developers to exploit such forums for the identified under-utilized tasks of software development.

Originality/value

The study is the first of its kind to explore the work on SO about software development and makes an original contribution by presenting a comprehensive review, design/usage shortcomings of Q&A sites, and future research challenges.

Details

Data Technologies and Applications, vol. 52 no. 2

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 9 November 2010

RFID‐Env: methods and software simulation for RFID environments

Marcelo Cunha de Azambuja, Carlos Fernando Jung, Carla Schwengber ten Caten and Fabiano Passuelo Hessel

The purpose of this paper is to present the results of an analytical and experimental research for the development of an innovative product designated RFID environment (RFID‐Env)…

HTML

PDF (571 KB)

Downloads

1628

Abstract

Purpose

The purpose of this paper is to present the results of an analytical and experimental research for the development of an innovative product designated RFID environment (RFID‐Env). This software is designed for the use of professionals in computer systems and plant engineering who are engaged in research and development (R&D) of ultra high frequency (UHF) passive radio frequency identification (RFID) systems as applied to the management and operation of logistic supply chains.

Design/methodology/approach

The RFID‐Env makes it possible to simulate on computer screens a complete RFID‐Env by processing user data on the technical and physical characteristics of real or virtual RFID‐Envs. Information outputted can include descriptions of the performance to be expected from a given configuration and detailed reports as to whether that particular configuration will succeed in reading all the RFID tags flowing through a defined system.

Findings

The paper shows the models and methods on how these simulations can be performed, and this is the major scientific contribution of this work, i.e. what are the logical and physical models that enable the development of software simulators for RFID‐Envs.

Research limitations/implications

This work will be continued to introduce more consideration of the physical environment, such as the interferences produced by the tagged products themselves by scattering the radio frequency (RF) signals, and the models, positioning and focusing of the antennas. New RF prediction models shall be created along the continuation of this paper, with the purpose to rise the amount of environments that can be simulated.

Practical implications

The product is intended for use by developers in computer sciences, and by engineers doing R&D for the solution of RFID problems, and makes it possible to simulate a complete range of virtual RFID‐Envs so that R&D can proceed in a non‐factory atmosphere.

Originality/value

There are only a few related papers that consider in an isolated form some of the problems approached here, but it was not found models that proposed as an integrated form all the processing to an RFID‐Env simulation like here presented.

Details

Business Process Management Journal, vol. 16 no. 6

Type: Research Article

DOI:

ISSN: 1463-7154

Keywords

View access options

Article

Publication date: 8 March 2021

Temporal evolution of tagging subnetwork features and motif under different activity levels – take the Q&A community Zhihu as an example

Xin Feng, Liangxuan Li, Jiapei Li, Meiru Cui, Liming Sun and Ye Wu

This paper aims to study the characteristics and evolution rules of tagging knowledge network for users with different activity levels in question-and-answer (Q&A) community…

HTML

PDF (909 KB)

Downloads

177

Abstract

Purpose

This paper aims to study the characteristics and evolution rules of tagging knowledge network for users with different activity levels in question-and-answer (Q&A) community represented by Zhihu.

Design/methodology/approach

A random sample of issue tag data generated by topics in the Zhihu network environment is selected. By defining user quality and selecting the top 20% and bottom 20% of users to focus on, i.e. top users and bot users, the authors apply time slicing for both types of data to construct label knowledge networks, use Q-Q diagrams and ARIMA models to analyze network indicators and introduce the theory and methods of network motif.

Findings

This study shows that when the power index of degree distribution is less than or equal to 3.1, the ARIMA model with rank index of label network has a higher fitting degree. With the development of the community, the correlation between tags in the tagging knowledge network is very weak.

Research limitations/implications

It is not comprehensive and sufficient to classify users only according to their activity levels. And traditional statistical analysis is not applicable to large data sets. In the follow-up work, the authors will further explore the characteristics of the network at a larger scale and longer timescale and consider adding more node features, including some edge features. Then, users are statistically classified according to the attributes of nodes and edges to construct complex networks, and algorithms such as machine learning and deep learning are used to calculate large-scale data sets to deeply study the evolution of knowledge networks.

Practical implications

This paper uses the real data of the Zhihu community to divide users according to user activity and combines the theoretical methods of statistical testing, time series and network motifs to carry out the time series evolution of the knowledge network of the Q&A community. And these research methods provide other network problems with some new ideas. Research has found that user activity has a certain impact on the evolution of the tagging network. The tagging network followed by users with high activity level tends to be stable, and the tagging network followed by users with low activity level gradually fluctuates.

Social implications

Research has found that user activity has a certain impact on the evolution of the tagging network. The tagging network followed by users with high activity level tends to be stable, and the tagging network followed by users with low activity level gradually fluctuates. For the community, understanding the formation mechanism of its network structure and key nodes in the network is conducive to improving the knowledge system of the content, finding user behavior preferences and improving user experience. Future research work will focus on identifying outbreak points from a large number of topics, predicting topical trends and conducting timely public opinion guidance and control.

Originality/value

In terms of data selection, the user quality is defined; the Zhihu tags are divided into two categories for time slicing; and network indicators and network motifs are compared and analyzed. In addition, statistical tests, time series analysis and network modality theory are used to analyze the tags.

Details

Information Discovery and Delivery, vol. 49 no. 2

Type: Research Article

DOI:

ISSN: 2398-6247

Keywords

View access options

Article

Publication date: 18 October 2011

Facets of user‐assigned tags and their effectiveness in image retrieval

Nicola Ransom and Pauline Rafferty

This study aims to consider the value of user‐assigned image tags by comparing the facets that are represented in image tags with those that are present in image queries to see if…

HTML

PDF (368 KB)

Downloads

1628

Abstract

Purpose

This study aims to consider the value of user‐assigned image tags by comparing the facets that are represented in image tags with those that are present in image queries to see if there is a similarity in the way that users describe and search for images.

Design/methodology/approach

A sample dataset was created by downloading a selection of images and associated tags from Flickr, the online photo‐sharing web site. The tags were categorised using image facets from Shatford's matrix, which has been widely used in previous research into image indexing and retrieval. The facets present in the image tags were then compared with the results of previous research into image queries.

Findings

The results reveal that there are broad similarities between the facets present in image tags and queries, with people and objects being the most common facet, followed by location. However, the results also show that there are differences in the level of specificity between tags and queries, with image tags containing more generic terms and image queries consisting of more specific terms. The study concludes that users do describe and search for images using similar image facets, but that measures to close the gap between specific queries and generic tags would improve the value of user tags in indexing image collections.

Originality/value

Research into tagging has tended to focus on textual resources with less research into non‐textual documents. In particular, little research has been undertaken into how user tags compare to the terms used in search queries, particularly in the context of digital images.

Details

Journal of Documentation, vol. 67 no. 6

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 7 August 2017

Predicting users’ demographic characteristics in a Chinese social media network

Qiangbing Wang, Shutian Ma and Chengzhi Zhang

Based on user-generated content from a Chinese social media platform, this paper aims to investigate multiple methods of constructing user profiles and their effectiveness in…

HTML

PDF (180 KB)

Downloads

768

Abstract

Purpose

Based on user-generated content from a Chinese social media platform, this paper aims to investigate multiple methods of constructing user profiles and their effectiveness in predicting their gender, age and geographic location.

Design/methodology/approach

This investigation collected 331,634 posts from 4,440 users of Sina Weibo. The data were divided into two parts, for training and testing . First, a vector space model and topic models were applied to construct user profiles. A classification model was then learned by a support vector machine according to the training data set. Finally, we used the classification model to predict users’ gender, age and geographic location in the testing data set.

Findings

The results revealed that in constructing user profiles, latent semantic analysis performed better on the task of predicting gender and age. By contrast, the method based on a traditional vector space model worked better in making predictions regarding the geographic location. In the process of applying a topic model to construct user profiles, the authors found that different prediction tasks should use different numbers of topics.

Originality/value

This study explores different user profile construction methods to predict Chinese social media network users’ gender, age and geographic location. The results of this paper will help to improve the quality of personal information gathered from social media platforms, and thereby improve personalized recommendation systems and personalized marketing.

Details

The Electronic Library, vol. 35 no. 4

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

Open Access

Article

Publication date: 17 November 2023

Blockchain-based digital twin data provenance for predictive asset management in building facilities

Peiman Tavakoli, Ibrahim Yitmen, Habib Sadri and Afshin Taheri

The purpose of this study is to focus on structured data provision and asset information model maintenance and develop a data provenance model on a blockchain-based digital twin…

HTML

PDF (2.3 MB)

Downloads

845

Abstract

Purpose

The purpose of this study is to focus on structured data provision and asset information model maintenance and develop a data provenance model on a blockchain-based digital twin smart and sustainable built environment (DT) for predictive asset management (PAM) in building facilities.

Design/methodology/approach

Qualitative research data were collected through a comprehensive scoping review of secondary sources. Additionally, primary data were gathered through interviews with industry specialists. The analysis of the data served as the basis for developing blockchain-based DT data provenance models and scenarios. A case study involving a conference room in an office building in Stockholm was conducted to assess the proposed data provenance model. The implementation utilized the Remix Ethereum platform and Sepolia testnet.

Findings

Based on the analysis of results, a data provenance model on blockchain-based DT which ensures the reliability and trustworthiness of data used in PAM processes was developed. This was achieved by providing a transparent and immutable record of data origin, ownership and lineage.

Practical implications

The proposed model enables decentralized applications (DApps) to publish real-time data obtained from dynamic operations and maintenance processes, enhancing the reliability and effectiveness of data for PAM.

Originality/value

The research presents a data provenance model on a blockchain-based DT, specifically tailored to PAM in building facilities. The proposed model enhances decision-making processes related to PAM by ensuring data reliability and trustworthiness and providing valuable insights for specialists and stakeholders interested in the application of blockchain technology in asset management and data provenance.

Details

Smart and Sustainable Built Environment, vol. 13 no. 1

Type: Research Article

DOI:

ISSN: 2046-6099

Keywords

Access

Year

Content type

1 – 10 of over 2000