Search results

1 – 10 of 56
Article
Publication date: 1 August 2005

Nguyen Hong Quang and Wenny Rahayu

This paper presents a systematic XML Schema design approach which conceptually captures semantics of the problem domain at conceptual level and represents such semantics in XML…

Abstract

This paper presents a systematic XML Schema design approach which conceptually captures semantics of the problem domain at conceptual level and represents such semantics in XML Schema at schema level. At the conceptual level, objects, their inter‐relationships and constraints are semantically powered by object‐oriented models. At the schema level, these conceptual semantics are comprehensively represented in textbased representation of XML Schema using various schema components and design styles, each of which offers different quality characteristics. Two primary design styles in use are nesting and linking. The nesting design styles are developed based on the choice of schema components and their definition/declaration scopes (global vs. local), whereas the linking design styles use referencing facilities provided by XML Schema and other XML technologies such as XLink and XPointer. With an in‐depth analysis of outstanding problems of existing approaches, the proposed design approach is motivated to help improve the quality and robustness of the XML documents in large‐scale XML‐based applications.

Details

International Journal of Web Information Systems, vol. 1 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 15 May 2019

Ahmad Ali Abin

Constrained clustering is an important recent development in clustering literature. The goal of an algorithm in constrained clustering research is to improve the quality of…

Abstract

Purpose

Constrained clustering is an important recent development in clustering literature. The goal of an algorithm in constrained clustering research is to improve the quality of clustering by making use of background knowledge. The purpose of this paper is to suggest a new perspective for constrained clustering, by finding an effective transformation of data into target space on the reference of background knowledge given in the form of pairwise must- and cannot-link constraints.

Design/methodology/approach

Most of existing methods in constrained clustering are limited to learn a distance metric or kernel matrix from the background knowledge while looking for transformation of data in target space. Unlike previous efforts, the author presents a non-linear method for constraint clustering, whose basic idea is to use different non-linear functions for each dimension in target space.

Findings

The outcome of the paper is a novel non-linear method for constrained clustering which uses different non-linear functions for each dimension in target space. The proposed method for a particular case is formulated and explained for quadratic functions. To reduce the number of optimization parameters, the proposed method is modified to relax the quadratic function and approximate it by a factorized version that is easier to solve. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method.

Originality/value

This study proposes a new direction to the problem of constrained clustering by learning a non-linear transformation of data into target space without using kernel functions. This work will assist researchers to start development of new methods based on the proposed framework which will potentially provide them with new research topics.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 12 no. 2
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 10 August 2021

Elham Amirizadeh and Reza Boostani

The aim of this study is to propose a deep neural network (DNN) method that uses side information to improve clustering results for big datasets; also, the authors show that…

Abstract

Purpose

The aim of this study is to propose a deep neural network (DNN) method that uses side information to improve clustering results for big datasets; also, the authors show that applying this information improves the performance of clustering and also increase the speed of the network training convergence.

Design/methodology/approach

In data mining, semisupervised learning is an interesting approach because good performance can be achieved with a small subset of labeled data; one reason is that the data labeling is expensive, and semisupervised learning does not need all labels. One type of semisupervised learning is constrained clustering; this type of learning does not use class labels for clustering. Instead, it uses information of some pairs of instances (side information), and these instances maybe are in the same cluster (must-link [ML]) or in different clusters (cannot-link [CL]). Constrained clustering was studied extensively; however, little works have focused on constrained clustering for big datasets. In this paper, the authors have presented a constrained clustering for big datasets, and the method uses a DNN. The authors inject the constraints (ML and CL) to this DNN to promote the clustering performance and call it constrained deep embedded clustering (CDEC). In this manner, an autoencoder was implemented to elicit informative low dimensional features in the latent space and then retrain the encoder network using a proposed Kullback–Leibler divergence objective function, which captures the constraints in order to cluster the projected samples. The proposed CDEC has been compared with the adversarial autoencoder, constrained 1-spectral clustering and autoencoder + k-means was applied to the known MNIST, Reuters-10k and USPS datasets, and their performance were assessed in terms of clustering accuracy. Empirical results confirmed the statistical superiority of CDEC in terms of clustering accuracy to the counterparts.

Findings

First of all, this is the first DNN-constrained clustering that uses side information to improve the performance of clustering without using labels in big datasets with high dimension. Second, the author defined a formula to inject side information to the DNN. Third, the proposed method improves clustering performance and network convergence speed.

Originality/value

Little works have focused on constrained clustering for big datasets; also, the studies in DNNs for clustering, with specific loss function that simultaneously extract features and clustering the data, are rare. The method improves the performance of big data clustering without using labels, and it is important because the data labeling is expensive and time-consuming, especially for big datasets.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 14 no. 4
Type: Research Article
ISSN: 1756-378X

Keywords

Article
Publication date: 9 May 2016

Chao-Lung Yang and Thi Phuong Quyen Nguyen

Class-based storage has been studied extensively and proved to be an efficient storage policy. However, few literature addressed how to cluster stuck items for class-based…

2551

Abstract

Purpose

Class-based storage has been studied extensively and proved to be an efficient storage policy. However, few literature addressed how to cluster stuck items for class-based storage. The purpose of this paper is to develop a constrained clustering method integrated with principal component analysis (PCA) to meet the need of clustering stored items with the consideration of practical storage constraints.

Design/methodology/approach

In order to consider item characteristic and the associated storage restrictions, the must-link and cannot-link constraints were constructed to meet the storage requirement. The cube-per-order index (COI) which has been used for location assignment in class-based warehouse was analyzed by PCA. The proposed constrained clustering method utilizes the principal component loadings as item sub-group features to identify COI distribution of item sub-groups. The clustering results are then used for allocating storage by using the heuristic assignment model based on COI.

Findings

The clustering result showed that the proposed method was able to provide better compactness among item clusters. The simulated result also shows the new location assignment by the proposed method was able to improve the retrieval efficiency by 33 percent.

Practical implications

While number of items in warehouse is tremendously large, the human intervention on revealing storage constraints is going to be impossible. The developed method can be easily fit in to solve the problem no matter what the size of the data is.

Originality/value

The case study demonstrated an example of practical location assignment problem with constraints. This paper also sheds a light on developing a data clustering method which can be directly applied on solving the practical data analysis issues.

Details

Industrial Management & Data Systems, vol. 116 no. 4
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 3 April 2009

Maria Soledad Pera and Yiu‐Kai Ng

Tens of thousands of news articles are posted online each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of…

Abstract

Purpose

Tens of thousands of news articles are posted online each day, covering topics from politics to science to current events. To better cope with this overwhelming volume of information, RSS (news) feeds are used to categorize newly posted articles. Nonetheless, most RSS users must filter through many articles within the same or different RSS feeds to locate articles pertaining to their particular interests. Due to the large number of news articles in individual RSS feeds, there is a need for further organizing articles to aid users in locating non‐redundant, informative, and related articles of interest quickly. This paper aims to address these issues.

Design/methodology/approach

The paper presents a novel approach which uses the word‐correlation factors in a fuzzy set information retrieval model to: filter out redundant news articles from RSS feeds; shed less‐informative articles from the non‐redundant ones; and cluster the remaining informative articles according to the fuzzy equivalence classes on the news articles.

Findings

The clustering approach requires little overhead or computational costs, and experimental results have shown that it outperforms other existing, well‐known clustering approaches.

Research limitations/implications

The clustering approach as proposed in this paper applies only to RSS news articles; however, it can be extended to other application domains.

Originality/value

The developed clustering tool is highly efficient and effective in filtering and classifying RSS news articles and does not employ any labor‐intensive user‐feedback strategy. Therefore, it can be implemented in real‐world RSS feeds to aid users in locating RSS news articles of interest.

Details

International Journal of Web Information Systems, vol. 5 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 6 September 2018

Pengfei Zhao, Ji Wu, Zhongsheng Hua and Shijian Fang

The purpose of this paper is to identify electronic word-of-mouth (eWOM) customers from customer reviews. Thus, firms can precisely leverage eWOM customers to increase their…

2027

Abstract

Purpose

The purpose of this paper is to identify electronic word-of-mouth (eWOM) customers from customer reviews. Thus, firms can precisely leverage eWOM customers to increase their product sales.

Design/methodology/approach

This research proposed a framework to analyze the content of consumer-generated product reviews. Specific algorithms were used to identify potential eWOM reviewers, and then an evaluation method was used to validate the relationship between product sales and the eWOM reviewers identified by the authors’ proposed method.

Findings

The results corroborate that online product reviews that are made by the eWOM customers identified by the authors’ proposed method are more related to product sales than customer reviews that are made by non-eWOM customers and that the predictive power of the reviews generated by eWOM customers are significantly higher than the reviews generated by non-eWOM customers.

Research limitations/implications

The proposed method is useful in the data set, which is based on one type of products. However, for other products, the validity must be tested. Previous eWOM customers may have no significant influence on product sales in the future. Therefore, the proposed method should be tested in the new market environment.

Practical implications

By combining the method with the previous customer segmentation method, a new framework of customer segmentation is proposed to help firms understand customers’ value specifically.

Originality/value

This study is the first to identify eWOM customers from online reviews and to evaluate the relationship between reviewers and product sales.

Details

Industrial Management & Data Systems, vol. 119 no. 1
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 31 July 2023

Xinzhi Cao, Yinsai Guo, Wenbin Yang, Xiangfeng Luo and Shaorong Xie

Unsupervised domain adaptation object detection not only mitigates model terrible performance resulting from domain gap, but also has the ability to apply knowledge trained on a…

Abstract

Purpose

Unsupervised domain adaptation object detection not only mitigates model terrible performance resulting from domain gap, but also has the ability to apply knowledge trained on a definite domain to a distinct domain. However, aligning the whole feature may confuse the object and background information, making it challenging to extract discriminative features. This paper aims to propose an improved approach which is called intrinsic feature extraction domain adaptation (IFEDA) to extract discriminative features effectively.

Design/methodology/approach

IFEDA consists of the intrinsic feature extraction (IFE) module and object consistency constraint (OCC). The IFE module, designed on the instance level, mainly solves the issue of the difficult extraction of discriminative object features. Specifically, the discriminative region of the objects can be paid more attention to. Meanwhile, the OCC is deployed to determine whether category prediction in the target domain brings into correspondence with it in the source domain.

Findings

Experimental results demonstrate the validity of our approach and achieve good outcomes on challenging data sets.

Research limitations/implications

Limitations to this research are that only one target domain is applied, and it may change the ability of model generalization when the problem of insufficient data sets or unseen domain appeared.

Originality/value

This paper solves the issue of critical information defects by tackling the difficulty of extracting discriminative features. And the categories in both domains are compelled to be consistent for better object detection.

Details

International Journal of Web Information Systems, vol. 19 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 May 2006

Rajugan Rajagopalapillai, Elizabeth Chang, Tharam S. Dillon and Ling Feng

In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources…

Abstract

In data engineering, view formalisms are used to provide flexibility to users and user applications by allowing them to extract and elaborate data from the stored data sources. Conversely, since the introduction of EXtensible Markup Language (XML), it is fast emerging as the dominant standard for storing, describing, and interchanging data among various web and heterogeneous data sources. In combination with XML Schema, XML provides rich facilities for defining and constraining user‐defined data semantics and properties, a feature that is unique to XML. In this context, it is interesting to investigate traditional database features, such as view models and view design techniques for XML. However, traditional view formalisms are strongly coupled to the data language and its syntax, thus it proves to be a difficult task to support views in the case of semi‐structured data models. Therefore, in this paper we propose a Layered View Model (LVM) for XML with conceptual and schemata extensions. Here our work is three‐fold; first we propose an approach to separate the implementation and conceptual aspects of the views that provides a clear separation of concerns, thus, allowing analysis and design of views to be separated from their implementation. Secondly, we define representations to express and construct these views at the conceptual level. Thirdly, we define a view transformation methodology for XML views in the LVM, which carries out automated transformation to a view schema and a view query expression in an appropriate query language. Also, to validate and apply the LVM concepts, methods and transformations developed, we propose a viewdriven application development framework with the flexibility to develop web and database applications for XML, at varying levels of abstraction.

Details

International Journal of Web Information Systems, vol. 2 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 6 September 2021

Gerd Hübscher, Verena Geist, Dagmar Auer, Nicole Hübscher and Josef Küng

Knowledge- and communication-intensive domains still long for a better support of creativity that considers legal requirements, compliance rules and administrative tasks as well…

889

Abstract

Purpose

Knowledge- and communication-intensive domains still long for a better support of creativity that considers legal requirements, compliance rules and administrative tasks as well, because current systems focus either on knowledge representation or business process management. The purpose of this paper is to discuss our model of integrated knowledge and business process representation and its presentation to users.

Design/methodology/approach

The authors follow a design science approach in the environment of patent prosecution, which is characterized by a highly standardized, legally prescribed process and individual knowledge study. Thus, the research is based on knowledge study, BPM, graph-based knowledge representation and user interface design. The authors iteratively designed and built a model and a prototype. To evaluate the approach, the authors used analytical proof of concept, real-world test scenarios and case studies in real-world settings, where the authors conducted observations and open interviews.

Findings

The authors designed a model and implemented a prototype for evolving and storing static and dynamic aspects of knowledge. The proposed solution leverages the flexibility of a graph-based model to enable open and not only continuously developing user-centered processes but also pre-defined ones. The authors further propose a user interface concept which supports users to benefit from the richness of the model but provides sufficient guidance.

Originality/value

The balanced integration of the data and task perspectives distinguishes the model significantly from other approaches such as BPM or knowledge graphs. The authors further provide a sophisticated user interface design, which allows the users to effectively and efficiently use the graph-based knowledge representation in their daily study.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 30 January 2015

Aravindhan Arunagiri and Parthasarathy Ramachandran

Most literature on workflow (WF) adaptation considered the control flow correctness like absence of dead lock, live-lock, etc. during adaptation. The data aspect of WF adaptation…

Abstract

Purpose

Most literature on workflow (WF) adaptation considered the control flow correctness like absence of dead lock, live-lock, etc. during adaptation. The data aspect of WF adaptation like data flow, database schema changes and their correctness are less studied. When the WF schema is modified, their data flow and the database schema changes. The existing approaches used for adapting these data changes in the underlying database schema are time consuming and/or affect the old data persistence. The purpose of this paper is to concern the dynamic adaptation of the WF schema and implementing its data changes in the existing database schema.

Design/methodology/approach

A conceptual framework developed to adapt on-the-fly, the concomitant data changes during WF adaptation. The framework consists a set of data schema compliance criteria (DSC) which identify the data changes that can be directly accommodated in the existing database schema. Data adaptation algorithm (DAA) is developed to handle the data changes that does not conform to the DSC in the existing database schema.

Findings

In this approach the existing database schema is dynamically evolved without re-creating it, after WF schema adaptation. Therefore the WF schema changes can be implemented on-the-fly without stopping the running system. It also ensures the persistence of old data residing in the existing database.

Originality/value

A novel approach developed to adapt the data changes in the existing database schema, without requiring recreation or migration the data. This automated consistency checking of data attribute changes in the database schema and implement them dynamically.

Details

Business Process Management Journal, vol. 21 no. 1
Type: Research Article
ISSN: 1463-7154

Keywords

1 – 10 of 56