Search results

1 – 10 of over 12000

View access options

Article

Publication date: 18 April 2017

Pushing similarity joins down to the storage layer in XML databases

Leonardo Andrade Ribeiro and Theo Härder

This article aims to explore how to incorporate similarity joins into XML database management systems (XDBMSs). The authors aim to provide seamless and efficient integration of…

HTML

PDF (1.3 MB)

Downloads

518

Abstract

Purpose

This article aims to explore how to incorporate similarity joins into XML database management systems (XDBMSs). The authors aim to provide seamless and efficient integration of similarity joins on tree-structured data into an XDBMS architecture.

Design/methodology/approach

The authors exploit XDBMS-specific features to efficiently generate XML tree representations for similarity matching. In particular, the authors push down a large part of the structural similarity evaluation close to the storage layer.

Findings

Empirical experiments were conducted to measure and compare accuracy, performance and scalability of the tree similarity join using different similarity functions and on the top of different storage models. The results show that the authors’ proposal delivers performance and scalability without hurting the accuracy.

Originality/value

Similarity join is a fundamental operation for data integration. Unfortunately, none of the XDBMS architectures proposed so far provides an efficient support for this operation. Evaluating similarity joins on XML is challenging, because it requires similarity matching on the text and structure. In this work, the authors integrate similarity joins into an XDBMS. To the best of the authors’ knowledge, this work is the first to leverage the storage scheme of an XDBMS to support XML similarity join processing.

Details

International Journal of Web Information Systems, vol. 13 no. 1

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

Open Access

Article

Publication date: 4 August 2020

Handling data-skewness in character based string similarity join using Hadoop

Kanak Meena, Devendra K. Tayal, Oscar Castillo and Amita Jain

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the…

HTML

PDF (2 MB)

Downloads

737

Abstract

The scalability of similarity joins is threatened by the unexpected data characteristic of data skewness. This is a pervasive problem in scientific data. Due to skewness, the uneven distribution of attributes occurs, and it can cause a severe load imbalance problem. When database join operations are applied to these datasets, skewness occurs exponentially. All the algorithms developed to date for the implementation of database joins are highly skew sensitive. This paper presents a new approach for handling data-skewness in a character- based string similarity join using the MapReduce framework. In the literature, no such work exists to handle data skewness in character-based string similarity join, although work for set based string similarity joins exists. Proposed work has been divided into three stages, and every stage is further divided into mapper and reducer phases, which are dedicated to a specific task. The first stage is dedicated to finding the length of strings from a dataset. For valid candidate pair generation, MR-Pass Join framework has been suggested in the second stage. MRFA concepts are incorporated for string similarity join, which is named as “MRFA-SSJ” (MapReduce Frequency Adaptive – String Similarity Join) in the third stage which is further divided into four MapReduce phases. Hence, MRFA-SSJ has been proposed to handle skewness in the string similarity join. The experiments have been implemented on three different datasets namely: DBLP, Query log and a real dataset of IP addresses & Cookies by deploying Hadoop framework. The proposed algorithm has been compared with three known algorithms and it has been noticed that all these algorithms fail when data is highly skewed, whereas our proposed method handles highly skewed data without any problem. A set-up of the 15-node cluster has been used in this experiment, and we are following the Zipf distribution law for the analysis of skewness factor. Also, a comparison among existing and proposed techniques has been shown. Existing techniques survived till Zipf factor 0.5 whereas the proposed algorithm survives up to Zipf factor 1. Hence the proposed algorithm is skew insensitive and ensures scalability with a reasonable query processing time for string similarity database join. It also ensures the even distribution of attributes.

Details

Applied Computing and Informatics, vol. 18 no. 1/2

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 17 December 2021

Preventive maintenance planning considering machines’ reliability using group technology

Farouq Alhourani, Jean Essila and Bernie Farkas

The purpose of this paper is to develop an efficient and effective preventive maintenance (PM) plan that considers machines’ maintenance needs in addition to their reliability…

HTML

PDF (165 KB)

Downloads

260

Abstract

Purpose

The purpose of this paper is to develop an efficient and effective preventive maintenance (PM) plan that considers machines’ maintenance needs in addition to their reliability factor.

Design/methodology/approach

Similarity coefficient method in group technology (GT) philosophy is used. Machines’ reliability factor is considered to develop virtual machine cells based on their need for maintenance according to the type of failures they encounter.

Findings

Using similarity coefficient method in GT philosophy for PM planning results in grouping machines based on their common failures and maintenance needs. Using machines' reliability factor makes the plan more efficient since machines will be maintained at the same time intervals and when their maintenance is due. This helps to schedule a standard and efficient maintenance process where maintenance material, tools and labor are scheduled accordingly.

Practical implications

The proposed procedure will assist maintenance managers in developing an efficient and effective PM plans. These maintenance plans provide better inventory management for the maintenance materials and tools needed using the developed virtual machine cells.

Originality/value

This paper presents a new procedure to implement PM using the similarity coefficient method in GT. A new similarity coefficient equation that considers machines reliability is developed. Also a clustering algorithm that calculates the similarity between machine groups and form virtual machine cells is developed. A numerical example adopted from the literature is solved to demonstrate the proposed heuristic method.

Details

Journal of Quality in Maintenance Engineering, vol. 29 no. 1

Type: Research Article

DOI:

ISSN: 1355-2511

Keywords

View access options

Article

Publication date: 1 July 2014

A case study for understanding the nature of redundant entities in bibliographic digital libraries

Byung-Won On, Gyu Sang Choi and Soo-Mok Jung

The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case…

HTML

PDF (523 KB)

Downloads

287

Abstract

Purpose

The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case study of the name authority control problem in DLs.

Design/methodology/approach

To find a sample of name variants across DLs (e.g. DBLP and ACM) and in a single DL (e.g. ACM), the approach is based on two bipartite matching algorithms: Maximum Weighted Bipartite Matching and Maximum Cardinality Bipartite Matching.

Findings

First, the authors validated the effectiveness and efficiency of the bipartite matching algorithms. The authors also studied the nature of real cases of author name variants that had been found across DLs (e.g. ACM, CiteSeer and DBLP) and in a single DL.

Originality/value

To the best of the authors knowledge, there is less research effort to understand the nature of author name variants shown in DLs. A thorough analysis can help focus research effort on real problems that arise when the authors perform duplicate detection methods.

Details

Program, vol. 48 no. 3

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 1 July 2006

Professionals' relationships with clients in the apparel industry

Diana Saiki and Marilyn R. DeLong

The purpose of this paper is to analyze patterns of client and professional interaction as reported by professionals in the apparel industry.

HTML

PDF (90 KB)

Downloads

1398

Abstract

Purpose

The purpose of this paper is to analyze patterns of client and professional interaction as reported by professionals in the apparel industry.

Design/methodology/approach

This qualitative research involved 23 professionals or individuals who worked for more than ten years in an executive position. The participants, who worked in a variety of positions, in the US apparel industry were interviewed extensively about their professional experiences. Data were analyzed by identifying themes in the interview transcripts using a grounded approach methodology.

Findings

The participants described their professional relationships with clients. Clients included individuals in the general public and other industry professionals who used the service or bought the product. The participants, all women, showed similarity or homophily with clients' values, fashion level, age, gender, economic level, and body size. All participants emphasized differences or heterophily with clients in expertise and level of innovation.

Practical implications

This information is helpful for new professionals in the apparel industry and other business professionals to understand how to succeed and what to emphasize when relating to clients.

Originality/value

This study demonstrates how a grounded approach to interview analysis can add to theory and provide useful information about succeeding in a business environment. Limited research exists about professionals' use of homophily and heterophily to relate to their clients. Homophily and heterophily dimensions (e.g. age, gender, and expertise) used by apparel industry professionals in relating with clients are identified. Also, strategies that these professionals used to create homophily and heterophily are discussed.

Details

Qualitative Market Research: An International Journal, vol. 9 no. 3

Type: Research Article

DOI:

ISSN: 1352-2752

Keywords

View access options

Article

Publication date: 30 March 2012

The rhetoric of synergy in a global corporation: Visual and oral narratives of mimesis and similarity

Hugo Gaggiotti

The purpose of this paper is to expand understanding about the rhetoric of synergy and how it is manifested in a global corporation, Tubworld (name changed), during a period of…

HTML

PDF (257 KB)

Downloads

884

Abstract

Purpose

The purpose of this paper is to expand understanding about the rhetoric of synergy and how it is manifested in a global corporation, Tubworld (name changed), during a period of mergers and acquisitions.

Design/methodology/approach

The methodology is based on an analysis of visual and oral material collected during a four‐year, intermittent fieldwork project in four companies of the same corporation across four countries.

Findings

There were three main findings. First, the rhetoric of synergy is evidenced in the oral and the written, as well as the visual, and is part of the organizational experience of those involved in mergers – particularly expatriate managers. Second, the rhetoric of synergy operates not only in a prospective dimension (in order to justify the mergers or the takeovers and the future of the organization), but also in a retrospective dimension (in order to create a uniform mythical past). Third, the rhetoric of synergy is visualised and experienced not only in the private domain of the factories, but also in the public and semi‐public spaces that are part of the managers' visuals (gates, perimeter walls, signals, roads, water tanks), creating a synergetic world that is not limited to the organizational experience.

Research limitations/implications

The most notable limitations of the study are the temporal framework and the limited number of locations; the study was restricted to a moment in the organizational life of Tubworld and its expatriates – to the Tubworld factories that were part of the nuclear constitution of the corporation. On the positive side, this study contributes to a better understanding of the role of the rhetoric of synergy in organizational discourses that support change in general, and mergers and takeovers in particular, thereby providing a broader perception of synergy. The three findings contribute to a better understanding of the impact of the rhetoric of synergy in the organizational representation and practices of expatriate managers – particularly those involved in mergers, acquisitions and takeovers.

Originality/value

The method and approach – visual and oral narrative – make an original contribution to the literature, illuminating the problematic of mergers and acquisition at a material and symbolic level.

Details

Journal of Organizational Change Management, vol. 25 no. 2

Type: Research Article

DOI:

ISSN: 0953-4814

Keywords

Open Access

Article

Publication date: 20 July 2020

Data reconciliation and fusion methods: a survey

Abdelghani Bakhtouchi

With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds…

HTML

PDF (326 KB)

Downloads

1848

Abstract

With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds of data. Unfortunately, existing data is not proper due to the existence of the same information in different sources, as well as erroneous and incomplete data. The aim of data integration systems is to offer to a user a unique interface to query a number of sources. A key challenge of such systems is to deal with conflicting information from the same source or from different sources. We present, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion. The reference reconciliation methods seek to decide if two data descriptions are references to the same entity in reality. We define the principles of reconciliation method then we distinguish the methods of reference reconciliation, first on how to use the descriptions of references, then the way to acquire knowledge. We finish this section by discussing some current data reconciliation issues that are the subject of current research. Data fusion in turn, has the objective to merge duplicates into a single representation while resolving conflicts between the data. We define first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies. We present then, the relational operators and data fusion techniques. Likewise, we finish this section by discussing some current data fusion issues that are the subject of current research.

Details

Applied Computing and Informatics, vol. 18 no. 3/4

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 4 April 2016

De-duplicating a large crowd-sourced catalogue of bibliographic records

Ilija Subasic, Nebojsa Gvozdenovic and Kris Jack

The purpose of this paper is to describe a large-scale algorithm for generating a catalogue of scientific publication records (citations) from a crowd-sourced data, demonstrate…

HTML

PDF (718 KB)

Downloads

284

Abstract

Purpose

The purpose of this paper is to describe a large-scale algorithm for generating a catalogue of scientific publication records (citations) from a crowd-sourced data, demonstrate how to learn an optimal combination of distance metrics for duplicate detection and introduce a parallel duplicate clustering algorithm.

Design/methodology/approach

The authors developed the algorithm and compared it with state-of-the art systems tackling the same problem. The authors used benchmark data sets (3k data points) to test the effectiveness of our algorithm and a real-life data ( > 90 million) to test the efficiency and scalability of our algorithm.

Findings

The authors show that duplicate detection can be improved by an additional step we call duplicate clustering. The authors also show how to improve the efficiency of map/reduce similarity calculation algorithm by introducing a sampling step. Finally, the authors find that the system is comparable to the state-of-the art systems for duplicate detection, and that it can scale to deal with hundreds of million data points.

Research limitations/implications

Academic researchers can use this paper to understand some of the issues of transitivity in duplicate detection, and its effects on digital catalogue generations.

Practical implications

Industry practitioners can use this paper as a use case study for generating a large-scale real-life catalogue generation system that deals with millions of records in a scalable and efficient way.

Originality/value

In contrast to other similarity calculation algorithms developed for m/r frameworks the authors present a specific variant of similarity calculation that is optimized for duplicate detection of bibliographic records by extending previously proposed e-algorithm based on inverted index creation. In addition, the authors are concerned with more than duplicate detection, and investigate how to group detected duplicates. The authors develop distinct algorithms for duplicate detection and duplicate clustering and use the canopy clustering idea for multi-pass clustering. The work extends the current state-of-the-art by including the duplicate clustering step and demonstrate new strategies for speeding up m/r similarity calculations.

Details

Program, vol. 50 no. 2

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 9 April 2021

Why do consumers engage in online brand communities – and why should brands care?

Danita van Heerden and Melanie Wiese

The purpose of this paper is to explore consumers’ motivations for engaging in Facebook brand communities, and what outcomes brands can gain from online engagement.

HTML

PDF (302 KB)

Downloads

1934

Abstract

Purpose

The purpose of this paper is to explore consumers’ motivations for engaging in Facebook brand communities, and what outcomes brands can gain from online engagement.

Design/methodology/approach

An online consumer panel was used to collect data through convenience sampling; 497 useable questionnaires were collected.

Findings

The results of the structural equation modelling show that hedonic motivations are more prevalent in Facebook brand communities than utilitarian motivations. When considering the outcomes of online engagement, loyalty towards the brand community is the strongest outcome, followed by word-of-mouth and purchase intention.

Research limitations/implications

This research indicates that marketers should focus on creating content on Facebook brand communities that appeals to the hedonic needs of consumers, such as brand likeability, entertainment and interpersonal utility. This type of content will motivate members of these brand communities to engage online. When consumers engage online, it creates benefits for the brand such as loyalty, word-of-mouth and purchase intention.

Originality/value

This study presents a framework for investigating consumers’ motivation to engage online, based on a theoretical underpinning of both sense of community theory and uses and gratification theory. It also identifies three outcomes for brands that explain why it is worthwhile for firms to invest in engaging with consumers in Facebook brand communities while including a wide range of brand communities.

Details

Journal of Consumer Marketing, vol. 38 no. 4

Type: Research Article

DOI:

ISSN: 0736-3761

Keywords

View access options

Article

Publication date: 3 June 2019

An efficient semantic recommender method forArabic text

Bilal Hawashin, Shadi Alzubi, Tarek Kanan and Ayman Mansour

This paper aims to propose a new efficient semantic recommender method for Arabic content.

HTML

PDF (484 KB)

Downloads

291

Abstract

Purpose

This paper aims to propose a new efficient semantic recommender method for Arabic content.

Design/methodology/approach

Three semantic similarities were proposed to be integrated with the recommender system to improve its ability to recommend based on the semantic aspect. The proposed similarities are CHI-based semantic similarity, singular value decomposition (SVD)-based semantic similarity and Arabic WordNet-based semantic similarity. These similarities were compared with the existing similarities used by recommender systems from the literature.

Findings

Experiments show that the proposed semantic method using CHI-based similarity and using SVD-based similarity are more efficient than the existing methods on Arabic text in term of accuracy and execution time.

Originality/value

Although many previous works proposed recommender system methods for English text, very few works concentrated on Arabic Text. The field of Arabic Recommender Systems is largely understudied in the literature. Aside from this, there is a vital need to consider the semantic relationships behind user preferences to improve the accuracy of the recommendations. The contributions of this work are the following. First, as many recommender methods were proposed for English text and have never been tested on Arabic text, this work compares the performance of these widely used methods on Arabic text. Second, it proposes a novel semantic recommender method for Arabic text. As this method uses semantic similarity, three novel base semantic similarities were proposed and evaluated. Third, this work would direct the attention to more studies in this understudied topic in the literature.

Details

The Electronic Library , vol. 37 no. 2

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

Access

Year

Content type

1 – 10 of over 12000

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information