Search results

1 – 10 of over 20000
Open Access
Article
Publication date: 15 February 2022

Martin Nečaský, Petr Škoda, David Bernhauer, Jakub Klímek and Tomáš Skopal

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking…

1211

Abstract

Purpose

Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth.

Design/methodology/approach

In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery.

Findings

The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework.

Originality/value

To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 15 March 2011

C.Y. Lam and W.H. Ip

Scheduling needs to be concise and well‐determined but able to respond to the ever‐changing and uncertain market or environment against the constraints of production capacity…

1178

Abstract

Purpose

Scheduling needs to be concise and well‐determined but able to respond to the ever‐changing and uncertain market or environment against the constraints of production capacity, resources, time frame, etc. The purpose of this paper is to model and solve a scheduling problem with another domain perspective that adopts the concept of agent, and an agent‐based scheduling environment is proposed for solving the scheduling problem, in which three agents are developed, i.e. a sales agent, a scheduling agent, and a production agent.

Design/methodology/approach

The modeling and development of the proposed agent‐based scheduling environment and its agents under constraints are discussed. Constraint priority scheduling concepts are applied to the environment and its agents, and the feature of responding to customer change orders is included in the model. The proposed agent‐based scheduling environment with three agents is applied to a lamp‐manufacturing company in China as a case study, and the integrated agent‐based approach is also illustrated in the case study.

Findings

Throughout the autonomous communication between agents in the proposed model, a constraint‐prioritized schedule is generated to fulfill customer orders and customer change orders, as well as to achieve a better scheduling performance result. From the simulation results and analysis in the case study, satisfactory results show that the proposed model can generate a constraint‐prioritized schedule for the studied company that can completely fulfill customer orders, adjust and fulfill customer change orders, and achieve a better scheduling result.

Originality/value

In this paper, the scheduling problem is modeled and solved by using the domain perspective of agent‐based approach. By using an agent‐based approach, the agents can be implemented to represent manufacturing resources or aggregations of resources. Under the proposed modeling approach, the collaboration across the entire scheduling activities can be enhanced, and the efficiency and effectiveness in the scheduling activities can also be increased.

Details

Industrial Management & Data Systems, vol. 111 no. 2
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 20 June 2022

Lokesh Singh, Rekh Ram Janghel and Satya Prakash Sahu

Automated skin lesion analysis plays a vital role in early detection. Having relatively small-sized imbalanced skin lesion datasets impedes learning and dominates research in…

Abstract

Purpose

Automated skin lesion analysis plays a vital role in early detection. Having relatively small-sized imbalanced skin lesion datasets impedes learning and dominates research in automated skin lesion analysis. The unavailability of adequate data poses difficulty in developing classification methods due to the skewed class distribution.

Design/methodology/approach

Boosting-based transfer learning (TL) paradigms like Transfer AdaBoost algorithm can compensate for such a lack of samples by taking advantage of auxiliary data. However, in such methods, beneficial source instances representing the target have a fast and stochastic weight convergence, which results in “weight-drift” that negates transfer. In this paper, a framework is designed utilizing the “Rare-Transfer” (RT), a boosting-based TL algorithm, that prevents “weight-drift” and simultaneously addresses absolute-rarity in skin lesion datasets. RT prevents the weights of source samples from quick convergence. It addresses absolute-rarity using an instance transfer approach incorporating the best-fit set of auxiliary examples, which improves balanced error minimization. It compensates for class unbalance and scarcity of training samples in absolute-rarity simultaneously for inducing balanced error optimization.

Findings

Promising results are obtained utilizing the RT compared with state-of-the-art techniques on absolute-rare skin lesion datasets with an accuracy of 92.5%. Wilcoxon signed-rank test examines significant differences amid the proposed RT algorithm and conventional algorithms used in the experiment.

Originality/value

Experimentation is performed on absolute-rare four skin lesion datasets, and the effectiveness of RT is assessed based on accuracy, sensitivity, specificity and area under curve. The performance is compared with an existing ensemble and boosting-based TL methods.

Details

Data Technologies and Applications, vol. 57 no. 1
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 28 February 2022

Yi-Cheng Chen

Recently, more and more attention has been put forth on the application and deep learning, due to the widespread practicability of neural network computation. The purpose of this…

Abstract

Purpose

Recently, more and more attention has been put forth on the application and deep learning, due to the widespread practicability of neural network computation. The purpose of this paper is developing an effective algorithm to automatically discover the optimal neural network architecture for several real applications.

Design/methodology/approach

The author proposes a novel algorithm, namely, progressive genetic-based neural architecture search (PG-NAS), as a solution to efficiently find the optimal neural network structure for given data. PG-NAS also employs several operations to effectively shrink the search space to reduce the computation cost and improve the accuracy validation.

Findings

The proposed PG-NAS could be utilized on several tasks for discovering the optimal network structure. The author reduces the demand of manual settings when implementing artificial intelligence (AI) models; hence, PG-NAS requires less human intervention than traditional machine learning. The average and top-1 metrics, such as error, loss and accuracy, are used to measure the discovered neural architectures of the proposed model over all baselines. The experimental results show that, with several real datasets, the proposed PG-NAS model consistently outperforms the state-of-the-art models in all metrics.

Originality/value

Generally, the size and the complexity of the search space for the neural network dominates the performance of computation time and resources. In this study, PG-NAS utilizes genetic operations to effectively generate the compact candidate set, i.e. fewer combinations need to be generated when constructing the candidate set. Moreover, by the proposed selector in PG-NAS, the non-promising network structure could be significantly pruned off. In addition, the accuracy derivation of each combination in the candidate set is also a performance bottleneck. The author develops a predictor network to efficiently estimate the accuracy to avoid the time-consuming derivation. The learning of the prediction process is also adjusted dynamically; this adaptive learning of the predictor could capture the pattern of training data effectively and efficiently. Furthermore, the proposed PG-NAS algorithm is applied on several real datasets to show its practicability and scalability.

Details

Industrial Management & Data Systems, vol. 122 no. 3
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 29 December 2023

B. Vasavi, P. Dileep and Ulligaddala Srinivasarao

Aspect-based sentiment analysis (ASA) is a task of sentiment analysis that requires predicting aspect sentiment polarity for a given sentence. Many traditional techniques use…

Abstract

Purpose

Aspect-based sentiment analysis (ASA) is a task of sentiment analysis that requires predicting aspect sentiment polarity for a given sentence. Many traditional techniques use graph-based mechanisms, which reduce prediction accuracy and introduce large amounts of noise. The other problem with graph-based mechanisms is that for some context words, the feelings change depending on the aspect, and therefore it is impossible to draw conclusions on their own. ASA is challenging because a given sentence can reveal complicated feelings about multiple aspects.

Design/methodology/approach

This research proposed an optimized attention-based DL model known as optimized aspect and self-attention aware long short-term memory for target-based semantic analysis (OAS-LSTM-TSA). The proposed model goes through three phases: preprocessing, aspect extraction and classification. Aspect extraction is done using a double-layered convolutional neural network (DL-CNN). The optimized aspect and self-attention embedded LSTM (OAS-LSTM) is used to classify aspect sentiment into three classes: positive, neutral and negative.

Findings

To detect and classify sentiment polarity of the aspect using the optimized aspect and self-attention embedded LSTM (OAS-LSTM) model. The results of the proposed method revealed that it achieves a high accuracy of 95.3 per cent for the restaurant dataset and 96.7 per cent for the laptop dataset.

Originality/value

The novelty of the research work is the addition of two effective attention layers in the network model, loss function reduction and accuracy enhancement, using a recent efficient optimization algorithm. The loss function in OAS-LSTM is minimized using the adaptive pelican optimization algorithm, thus increasing the accuracy rate. The performance of the proposed method is validated on four real-time datasets, Rest14, Lap14, Rest15 and Rest16, for various performance metrics.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 1 January 2008

Peng Liu, Elia El‐Darzi, Lei Lei, Christos Vasilakis, Panagiotis Chountas and Wei Huang

Purpose – Data preparation plays an important role in data mining as most real life data sets contained missing data. This paper aims to investigate different treatment methods…

Abstract

Purpose – Data preparation plays an important role in data mining as most real life data sets contained missing data. This paper aims to investigate different treatment methods for missing data. Design/methodology/approach – This paper introduces, analyses and compares well‐established treatment methods for missing data and proposes new methods based on naïve Bayesian classifier. These methods have been implemented and compared using a real life geriatric hospital dataset. Findings – In the case where a large proportion of the data is missing and many attributes have missing data, treatment methods based on naïve Bayesian classifier perform very well. Originality/value – This paper proposes an effective missing data treatment method and offers a viable approach to predict inpatient length of stay from a data set with many missing values.

Details

Journal of Enterprise Information Management, vol. 21 no. 1
Type: Research Article
ISSN: 1741-0398

Keywords

Article
Publication date: 24 March 2022

Shu-Ying Lin, Duen-Ren Liu and Hsien-Pin Huang

Financial price forecast issues are always a concern of investors. However, the financial applications based on machine learning methods mainly focus on stock market predictions…

Abstract

Purpose

Financial price forecast issues are always a concern of investors. However, the financial applications based on machine learning methods mainly focus on stock market predictions. Few studies have explored credit risk predictions. Understanding credit risk trends can help investors avoid market risks. The purpose of this study is to investigate the prediction model that can effectively predict credit default swaps (CDS).

Design/methodology/approach

A novel generative adversarial network (GAN) for CDS prediction is proposed. The authors take three features into account that are highly relevant to the future trends of CDS: historical CDS price, news and financial leverage. The main goal of this model is to improve the existing GAN-based regression model by adding finance and news feature extraction approaches. The proposed model adopts an attentional long short-term memory network and convolution network to process historical CDS data and news information, respectively. In addition to enhancing the effectiveness of the GAN model, the authors also design a data sampling strategy to alleviate the overfitting issue.

Findings

The authors conduct an experiment with a real dataset and evaluate the performance of the proposed model. The components and selected features of the model are evaluated for their ability to improve the prediction performance. The experimental results show that the proposed model performs better than other machine learning algorithms and traditional regression GAN.

Originality/value

There are very few studies on prediction models for CDS. With the proposed novel approach, the authors can improve the performance of CDS predictions. The proposed work can thereby increase the commercial value of CDS predictions to support trading decisions.

Details

Data Technologies and Applications, vol. 56 no. 5
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 3 March 2022

Ceilyn Boyd

A definition of data called data as assemblage is presented. The definition accommodates different forms and meanings of data; emphasizes data subjects and data workers; and…

Abstract

Purpose

A definition of data called data as assemblage is presented. The definition accommodates different forms and meanings of data; emphasizes data subjects and data workers; and reflects the sociotechnical aspects of data throughout its lifecycle of creation and use. A scalable assemblage model describing the anatomy and behavior of data, datasets and data infrastructures is also introduced.

Design/methodology/approach

Data as assemblage is compared to common meanings of data. The assemblage model's elements and relationships also are defined, mapped to the anatomy of a US Census dataset and used to describe the structure of research data repositories.

Findings

Replacing common data definitions with data as assemblage enriches information science and research data management (RDM) frameworks. Also, the assemblage model is shown to describe datasets and data infrastructures despite their differences in scale, composition and outward appearance.

Originality/value

Data as assemblage contributes a definition of data as mutable, portable, sociotechnical arrangements of material and symbolic components that serve as evidence. The definition is useful in information science and research data management contexts. The assemblage model contributes a scale-independent way to describe the structure and behavior of data, datasets and data infrastructures and supports analyses and comparisons involving them.

Details

Journal of Documentation, vol. 78 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 30 April 2021

Henry Lau, Yung Po Tsang, Dilupa Nakandala and Carman K.M. Lee

In the cold supply chain (SC), effective risk management is regarded as an essential component to address the risky and uncertain SC environment in handling time- and…

1076

Abstract

Purpose

In the cold supply chain (SC), effective risk management is regarded as an essential component to address the risky and uncertain SC environment in handling time- and temperature-sensitive products. However, existing multi-criteria decision-making (MCDM) approaches greatly rely on expert opinions for pairwise comparisons. Despite the fact that machine learning models can be customised to conduct pairwise comparisons, it is difficult for small and medium enterprises (SMEs) to intelligently measure the ratings between risk criteria without sufficiently large datasets. Therefore, this paper aims at developing an enterprise-wide solution to identify and assess cold chain risks.

Design/methodology/approach

A novel federated learning (FL)-enabled multi-criteria risk evaluation system (FMRES) is proposed, which integrates FL and the best–worst method (BWM) to measure firm-level cold chain risks under the suggested risk hierarchical structure. The factors of technologies and equipment, operations, external environment, and personnel and organisation are considered. Furthermore, a case analysis of an e-grocery SC in Australia is conducted to examine the feasibility of the proposed approach.

Findings

Throughout this study, it is found that embedding the FL mechanism into the MCDM process is effective in acquiring knowledge of pairwise comparisons from experts. A trusted federation in a cold chain network is therefore formulated to identify and assess cold SC risks in a systematic manner.

Originality/value

A novel hybridisation between horizontal FL and MCDM process is explored, which enhances the autonomy of the MCDM approaches to evaluate cold chain risks under the structured hierarchy.

Details

Industrial Management & Data Systems, vol. 121 no. 7
Type: Research Article
ISSN: 0263-5577

Keywords

Book part
Publication date: 21 February 2008

Junni L. Zhang, Donald B. Rubin and Fabrizia Mealli

In an evaluation of a job training program, the causal effects of the program on wages are often of more interest to economists than the program's effects on employment or on…

Abstract

In an evaluation of a job training program, the causal effects of the program on wages are often of more interest to economists than the program's effects on employment or on income. The reason is that the effects on wages reflect the increase in human capital due to the training program, whereas the effects on total earnings or income may be simply reflecting the increased likelihood of employment without any effect on wage rates. Estimating the effects of training programs on wages is complicated by the fact that, even in a randomized experiment, wages are truncated by nonemployment, i.e., are only observed and well-defined for individuals who are employed. We present a principal stratification approach applied to a randomized social experiment that classifies participants into four latent groups according to whether they would be employed or not under treatment and control, and argue that the average treatment effect on wages is only clearly defined for those who would be employed whether they were trained or not. We summarize large sample bounds for this average treatment effect, and propose and derive a Bayesian analysis and the associated Bayesian Markov Chain Monte Carlo computational algorithm. Moreover, we illustrate the application of new code checking tools to our Bayesian analysis to detect possible coding errors. Finally, we demonstrate our Bayesian analysis using simulated data.

Details

Modelling and Evaluating Treatment Effects in Econometrics
Type: Book
ISBN: 978-0-7623-1380-8

1 – 10 of over 20000