Search results

1 – 10 of 807
Article
Publication date: 14 June 2013

Yousuke Watanabe, Hidetaka Kamigaito and Haruo Yokota

Office documents are widely used in our daily activities, so the number of them has been increasing. A demand for sophisticated search for office documents becomes more important…

Abstract

Purpose

Office documents are widely used in our daily activities, so the number of them has been increasing. A demand for sophisticated search for office documents becomes more important. The recent file format of office documents is based on a package of multiple XML files. These XML files include not only body text but also page structure data and style data. The purpose of this paper is to utilize them to find similar office documents.

Design/methodology/approach

The authors propose SOS, a similarity search method based on structures and styles of office documents. SOS needs to compute similarity values between multiple pairs of XML files included in the office documents. We also propose LAX+, which is an algorithm to calculate a similarity value for a pair of XML files, by extending existing XML leaf node clustering algorithm.

Findings

SOS and LAX+ are evaluated by using three types of office documents (docx, xlsx and pptx) in our experiments. The results of LAX+ and SOS are better than ones of the existing algorithms.

Originality/value

Existing text‐based search engines do not take structure and style of documents into account. SOS can find similar documents by calculating similarities between multiple XML files corresponding to body texts, structures and styles.

Article
Publication date: 18 April 2017

Leonardo Andrade Ribeiro and Theo Härder

This article aims to explore how to incorporate similarity joins into XML database management systems (XDBMSs). The authors aim to provide seamless and efficient integration of…

Abstract

Purpose

This article aims to explore how to incorporate similarity joins into XML database management systems (XDBMSs). The authors aim to provide seamless and efficient integration of similarity joins on tree-structured data into an XDBMS architecture.

Design/methodology/approach

The authors exploit XDBMS-specific features to efficiently generate XML tree representations for similarity matching. In particular, the authors push down a large part of the structural similarity evaluation close to the storage layer.

Findings

Empirical experiments were conducted to measure and compare accuracy, performance and scalability of the tree similarity join using different similarity functions and on the top of different storage models. The results show that the authors’ proposal delivers performance and scalability without hurting the accuracy.

Originality/value

Similarity join is a fundamental operation for data integration. Unfortunately, none of the XDBMS architectures proposed so far provides an efficient support for this operation. Evaluating similarity joins on XML is challenging, because it requires similarity matching on the text and structure. In this work, the authors integrate similarity joins into an XDBMS. To the best of the authors’ knowledge, this work is the first to leverage the storage scheme of an XDBMS to support XML similarity join processing.

Details

International Journal of Web Information Systems, vol. 13 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 29 August 2008

Wilma Penzo

The semantic and structural heterogeneity of large Extensible Markup Language (XML) digital libraries emphasizes the need of supporting approximate queries, i.e. queries where the…

Abstract

Purpose

The semantic and structural heterogeneity of large Extensible Markup Language (XML) digital libraries emphasizes the need of supporting approximate queries, i.e. queries where the matching conditions are relaxed so as to retrieve results that possibly partially satisfy the user's requests. The paper aims to propose a flexible query answering framework which efficiently supports complex approximate queries on XML data.

Design/methodology/approach

To reduce the number of relaxations applicable to a query, the paper relies on the specification of user preferences about the types of approximations allowed. A specifically devised index structure which efficiently supports both semantic and structural approximations, according to the specified user preferences, is proposed. Also, a ranking model to quantify approximations in the results is presented.

Findings

Personalized queries, on one hand, effectively narrow the space of query reformulations, on the other hand, enhance the user query capabilities with a great deal of flexibility and control over requests. As to the quality of results, the retrieval process considerably benefits because of the presence of user preferences in the queries. Experiments demonstrate the effectiveness and the efficiency of the proposal, as well as its scalability.

Research limitations/implications

Future developments concern the evaluation of the effectiveness of personalization on queries through additional examinations of the effects of the variability of parameters expressing user preferences.

Originality/value

The paper is intended for the research community and proposes a novel query model which incorporates user preferences about query relaxations on large heterogeneous XML data collections.

Details

International Journal of Web Information Systems, vol. 4 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 June 2017

Mohammed Ourabah Soualah, Yassine Ait Ali Yahia, Abdelkader Keita and Abderrezak Guessoum

The purpose of this paper is to obtain online access to the digitised Arabic manuscripts images, which need to use a catalogue. The bibliographic cataloguing is unsuitable for old…

1064

Abstract

Purpose

The purpose of this paper is to obtain online access to the digitised Arabic manuscripts images, which need to use a catalogue. The bibliographic cataloguing is unsuitable for old Arabic manuscripts, and it is imperative to establish a new cataloguing model. In the research, the authors propose a new cataloguing model based on manuscript annotations and transcriptions. This model can be an effective solution to dynamic catalogue old Arabic manuscripts. In this field, the authors used the automatic extraction of the metadata that is based on the structural similarity of the documents.

Design/methodology/approach

This work is based on experimental methodology. The whole proposed concepts and formulas were tested for validation. This, allows the authors to make concise conclusions.

Findings

Cataloguing old Arabic manuscripts faces problem of unavailability of information. However, this information may be found in another place in a copy of the original manuscript. Thus, cataloguing Arabic manuscript cannot be done in one time, it is a continual process which require information updating. The idea is to make a pre-cataloguing of a manuscript, then try to complete and improve it through a specific platform. Consequently, in the research work, the authors propose a new cataloguing model, which the authors call “Dynamic cataloguing”.

Research limitations/implications

The success of the proposed model is confronted with the involvement of all actors of the model. It is based on the conviction and the motivation of actors of the collaborative platform.

Practical implications

The model can be used in several cataloguing fields, where the encoding model is based on XML. The model is innovative and implements a smart cataloguing model. The model is useful by using a web platform. It allows an automatic update of a catalogue.

Social implications

The model prompts the user to participate and enrich the catalogue. The user could improve his social status from a passive to an active.

Originality/value

The dynamic cataloguing model is a new concept. It has never been proposed in the literature until now. The proposed cataloguing model is based on automatic extraction of metadata from user annotations/transcription. It is a smart system which automatically updates or fills the catalogue with the extracted metadata.

Article
Publication date: 30 August 2011

Gilbert Tekli, Richard Chbeir and Jacques Fayolle

XML has spread beyond the computer science fields and reached other areas such as, e‐commerce, identification, information storage, instant messaging and others. Data communicated…

Abstract

Purpose

XML has spread beyond the computer science fields and reached other areas such as, e‐commerce, identification, information storage, instant messaging and others. Data communicated over these domains are now mainly based on XML. Thus, allowing non‐expert programmers to manipulate and control their XML data is essential. The purpose of this paper is to present an XA2C framework intended for both non‐expert and expert programmers and provide them with means to write/draw their XML data manipulation operations.

Design/methodology/approach

In the literature, this issue has been dealt with from two perspectives: first, XML alteration/adaptation techniques requiring a certain level of expertise to be implemented and are not unified yet; and second, Mashups, which are not formally defined yet and are not specific to XML data, and XML‐oriented visual languages are based on structural transformations and data extraction mainly and do not allow XML textual data manipulations. The paper discusses existing approaches and the XA2C framework is presented.

Findings

The framework is defined based on the dataflow paradigm (visual diagram compositions) while taking advantage of both Mashups and XML‐oriented visual languages by defining a well‐founded modular architecture and an XML‐oriented visual functional composition language based on colored petri nets allowing functional compositions. The framework takes advantage of existing XML alteration/adaptation techniques by defining them as XML‐oriented manipulation functions. A prototype called XA2C is developed and presented here for testing and validating the authors' approach.

Originality/value

This paper presents a detailed description of an XML‐oriented manipulation framework implementing the XML‐oriented composition definition language.

Details

International Journal of Web Information Systems, vol. 7 no. 3
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 19 June 2009

Imam Machdi, Toshiyuki Amagasa and Hiroyuki Kitagawa

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig…

Abstract

Purpose

The purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).

Design/methodology/approach

GMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.

Findings

GMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.

Research limitations/implications

The current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.

Practical implications

Note that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.

Originality/value

This paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.

Details

International Journal of Web Information Systems, vol. 5 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 1 June 2004

Remco Verdegem and Jacqueline Slats

Digital Preservation test‐bed is a three‐year practical research project with the overall goal of investigating options to secure sustained accessibility to authentic archival…

Abstract

Digital Preservation test‐bed is a three‐year practical research project with the overall goal of investigating options to secure sustained accessibility to authentic archival records over the long‐term, by carrying out experiments in a controlled and secure environment. This allows one to ascertain the effects of undertaken preservation action on archival records. Test‐bed is researching three different approaches to long‐term digital preservation: migration, XML and emulation. Not only will the effectiveness of each approach be evaluated, but also their limits, costs and application potential. Experiments take place on four different record types: text documents, spreadsheets, emails and databases of different size, complexity and nature. At the end of 2003 the digital preservation test‐bed project was to provide: advice on how to deal with current digital records, recommendations for an appropriate preservation approach or a combination of approaches per record type, functional requirements for a preservation function, cost models of the various preservation strategies, a decision model to select the right preservation strategy, and recommendations concerning archival guidelines and regulations.

Details

VINE, vol. 34 no. 2
Type: Research Article
ISSN: 0305-5728

Keywords

Article
Publication date: 1 September 2002

Zahiruddin Khurshid

The paper aims to review major developments in the MARC format, including a brief description of metadata schemes and cross‐walks. It also offers an assessment of how well MARC…

1539

Abstract

The paper aims to review major developments in the MARC format, including a brief description of metadata schemes and cross‐walks. It also offers an assessment of how well MARC works for Arabic script materials, a description of the degree to which MARC is used in Saudi Arabia, and the prospects for the use of XML versions of MARC in the Arab world.

Details

Library Hi Tech, vol. 20 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 18 October 2019

Jairo Francisco de Souza, Sean Wolfgand Matsui Siqueira and Bernardo Nunes

Although ontology matchers are annually proposed to address different aspects of the semantic heterogeneity problem, finding the most suitable alignment approach is still an…

Abstract

Purpose

Although ontology matchers are annually proposed to address different aspects of the semantic heterogeneity problem, finding the most suitable alignment approach is still an issue. This study aims to propose a computational solution for ontology meta-matching (OMM) and a framework designed for developers to make use of alignment techniques in their applications.

Design/methodology/approach

The framework includes some similarity functions that can be chosen by developers and then, automatically, set weights for each function to obtain better alignments. To evaluate the framework, several simulations were performed with a data set from the Ontology Alignment Evaluation Initiative. Simple similarity functions were used, rather than aligners known in the literature, to demonstrate that the results would be more influenced by the proposed meta-alignment approach than the functions used.

Findings

The results showed that the framework is able to adapt to different test cases. The approach achieved better results when compared with existing ontology meta-matchers.

Originality/value

Although approaches for OMM have been proposed, it is not easy to use them during software development. On the other hand, this work presents a framework that can be used by developers to align ontologies. New ontology matchers can be added and the framework is extensible to new methods. Moreover, this work presents a novel OMM approach modeled as a linear equation system which can be easily computed.

Details

International Journal of Web Information Systems, vol. 16 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 31 August 2005

Harold Boley, Virendrakumar C. Bhavsar, David Hirtle, Anurag Singh, Zhongwei Sun and Lu Yang

We have proposed and implemented AgentMatcher, an architecture for match‐making in e‐Business applications. It uses arc‐labeled and arc‐weighted trees to match buyers and sellers…

Abstract

We have proposed and implemented AgentMatcher, an architecture for match‐making in e‐Business applications. It uses arc‐labeled and arc‐weighted trees to match buyers and sellers via our novel similarity algorithm. This paper adapts the architecture for match‐making between learners and learning objects (LOs). It uses the Canadian Learning Object Metadata (CanLOM) repository of the eduSource e‐Learning project. Through AgentMatcher’s new indexing component, known as Learning Object Metadata Generator (LOMGen), metadata is extracted from HTML LOs for use in CanLOM. LOMGen semi‐automatically generates the LO metadata by combining a word frequency count and dictionary lookup. A subset of these metadata terms can be selected from a query interface, which permits adjustment of weights that express user preferences. Web‐based pre‐filtering is then performed over the CanLOM metadata kept in a relational database. Using an XSLT (Extensible Stylesheet Language Transformations) translator, the pre‐filtered result is transformed into an XML representation, called Weighted Object‐Oriented (WOO) RuleML (Rule Markup Language). This is compared to the WOO RuleML representation obtained from the query interface by AgentMatcher’s core Similarity Engine. The final result is presented as a ranked LO list with a user‐specified threshold.

Details

Interactive Technology and Smart Education, vol. 2 no. 3
Type: Research Article
ISSN: 1741-5659

Keywords

1 – 10 of 807