Search results

1 – 10 of over 2000
Article
Publication date: 1 February 1997

T.A. Spedding, W.L. Lee, R. de Souza and S.S.G. Lee

Describes the development of an adaptive simulation model for a keyboard assembly cell for real‐time decision support. Discusses the architecture of the modelling and control…

Abstract

Describes the development of an adaptive simulation model for a keyboard assembly cell for real‐time decision support. Discusses the architecture of the modelling and control system, including the movement of entities and conveyors, describing how up to four different keyboard types may be modelled, with a PC cell controller continually monitoring the state changes of the assembly line, passing the data captured to the simulation model created in ARENA.

Details

Integrated Manufacturing Systems, vol. 8 no. 1
Type: Research Article
ISSN: 0957-6061

Keywords

Article
Publication date: 4 April 2016

Ilija Subasic, Nebojsa Gvozdenovic and Kris Jack

The purpose of this paper is to describe a large-scale algorithm for generating a catalogue of scientific publication records (citations) from a crowd-sourced data, demonstrate…

Abstract

Purpose

The purpose of this paper is to describe a large-scale algorithm for generating a catalogue of scientific publication records (citations) from a crowd-sourced data, demonstrate how to learn an optimal combination of distance metrics for duplicate detection and introduce a parallel duplicate clustering algorithm.

Design/methodology/approach

The authors developed the algorithm and compared it with state-of-the art systems tackling the same problem. The authors used benchmark data sets (3k data points) to test the effectiveness of our algorithm and a real-life data ( > 90 million) to test the efficiency and scalability of our algorithm.

Findings

The authors show that duplicate detection can be improved by an additional step we call duplicate clustering. The authors also show how to improve the efficiency of map/reduce similarity calculation algorithm by introducing a sampling step. Finally, the authors find that the system is comparable to the state-of-the art systems for duplicate detection, and that it can scale to deal with hundreds of million data points.

Research limitations/implications

Academic researchers can use this paper to understand some of the issues of transitivity in duplicate detection, and its effects on digital catalogue generations.

Practical implications

Industry practitioners can use this paper as a use case study for generating a large-scale real-life catalogue generation system that deals with millions of records in a scalable and efficient way.

Originality/value

In contrast to other similarity calculation algorithms developed for m/r frameworks the authors present a specific variant of similarity calculation that is optimized for duplicate detection of bibliographic records by extending previously proposed e-algorithm based on inverted index creation. In addition, the authors are concerned with more than duplicate detection, and investigate how to group detected duplicates. The authors develop distinct algorithms for duplicate detection and duplicate clustering and use the canopy clustering idea for multi-pass clustering. The work extends the current state-of-the-art by including the duplicate clustering step and demonstrate new strategies for speeding up m/r similarity calculations.

Details

Program, vol. 50 no. 2
Type: Research Article
ISSN: 0033-0337

Keywords

Open Access
Article
Publication date: 29 June 2020

Paolo Manghi, Claudio Atzori, Michele De Bonis and Alessia Bardi

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate…

4437

Abstract

Purpose

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts.

Design/methodology/approach

This work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments.

Findings

GDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph.

Originality/value

To our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.

Details

Data Technologies and Applications, vol. 54 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 8 March 2011

Darryl Plecas, Amanda V. McCormick, Jason Levine, Patrick Neal and Irwin M. Cohen

The aim of this study is to test a technological solution to two traditional limitations of information sharing between law enforcement agencies: data quality and privacy concerns.

2575

Abstract

Purpose

The aim of this study is to test a technological solution to two traditional limitations of information sharing between law enforcement agencies: data quality and privacy concerns.

Design/methodology/approach

Entity Analytics Software (EAS) was tested in two studies with North American law enforcement agencies. In the first test, duplicated cases held in a police record system were successfully identified (4.0 percent) to a greater extent than the traditionally used software program (1.5 percent). This resulted in a difference of 11,954 cases that otherwise would not have been identified as duplications. In the second test, entity information held separately by police and border officials was shared anonymously between these two organizations. This resulted in 1,827 alerts regarding entities that appeared in both systems; traditionally, this information could not have been shared, given privacy concerns, and neither agency would be aware of the relevant information held by the other. Data duplication resulted in an additional 1,041 alerts, which highlights the need to use technological solutions to improve data quality prior to and during information sharing.

Findings

The current study demonstrated that EAS has the potential to merge data from different technologically based systems, while identifying errors and reducing privacy concerns through anonymization of identifiers.

Originality/value

While only one potential technological solution (EAS) was tested and organizations must consider the potential expense associated with implementing such technology, the implications resulting from both studies for improved awareness and greater efficiency support and facilitate information sharing between law enforcement organizations.

Details

Policing: An International Journal of Police Strategies & Management, vol. 34 no. 1
Type: Research Article
ISSN: 1363-951X

Keywords

Article
Publication date: 1 July 2014

Byung-Won On, Gyu Sang Choi and Soo-Mok Jung

The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case…

Abstract

Purpose

The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case study of the name authority control problem in DLs.

Design/methodology/approach

To find a sample of name variants across DLs (e.g. DBLP and ACM) and in a single DL (e.g. ACM), the approach is based on two bipartite matching algorithms: Maximum Weighted Bipartite Matching and Maximum Cardinality Bipartite Matching.

Findings

First, the authors validated the effectiveness and efficiency of the bipartite matching algorithms. The authors also studied the nature of real cases of author name variants that had been found across DLs (e.g. ACM, CiteSeer and DBLP) and in a single DL.

Originality/value

To the best of the authors knowledge, there is less research effort to understand the nature of author name variants shown in DLs. A thorough analysis can help focus research effort on real problems that arise when the authors perform duplicate detection methods.

Details

Program, vol. 48 no. 3
Type: Research Article
ISSN: 0033-0337

Keywords

Open Access
Article
Publication date: 20 July 2020

Abdelghani Bakhtouchi

With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds…

1794

Abstract

With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds of data. Unfortunately, existing data is not proper due to the existence of the same information in different sources, as well as erroneous and incomplete data. The aim of data integration systems is to offer to a user a unique interface to query a number of sources. A key challenge of such systems is to deal with conflicting information from the same source or from different sources. We present, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion. The reference reconciliation methods seek to decide if two data descriptions are references to the same entity in reality. We define the principles of reconciliation method then we distinguish the methods of reference reconciliation, first on how to use the descriptions of references, then the way to acquire knowledge. We finish this section by discussing some current data reconciliation issues that are the subject of current research. Data fusion in turn, has the objective to merge duplicates into a single representation while resolving conflicts between the data. We define first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies. We present then, the relational operators and data fusion techniques. Likewise, we finish this section by discussing some current data fusion issues that are the subject of current research.

Details

Applied Computing and Informatics, vol. 18 no. 3/4
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 14 October 2013

Trond Aalberg and Maja Žumer

Bibliographic records should now be used in innovative end-user applications that enable users to learn about, discover and exploit available content, and this information should…

2447

Abstract

Purpose

Bibliographic records should now be used in innovative end-user applications that enable users to learn about, discover and exploit available content, and this information should be interpreted and reused also beyond the library domain. New conceptual models such as FRBR offer the foundation for such developments. The main motivation for this research is to contribute to the adoption of the FRBR model in future bibliographic standards and systems, by analysing limitations in existing bibliographic information and looking for short- and long-term solutions that can improve the data quality in terms of expressing the FRBR model.

Design/methodology/approach

MARC records in three collections (BIBSYS catalogue, Slovenian National Bibliography and BTJ catalogue) were first analysed by looking at statistics of field and subfield usage to determine common patterns that express FRBR. Based on this, different rules for interpreting the information were developed. Finally typical problems/errors found in MARC records were analysed.

Findings

Different types of FRBR entity-relationship structures that typically can be found in bibliographic records are identified. Problems related to interpreting these from bibliographic records are analyzed. Frbrisation of consistent and complete MARC records is relatively successful, particularly if all entities are systematically described and relationships among them are clearly indicated.

Research limitations/implications

Advanced matching was not used for clustering of identical entities.

Practical implications

Cataloguing guidelines are proposed to enable better frbrisation of MARC records in the interim period, before new formats are developed and implemented.

Originality/value

This is the first in depth analysis of manifestations embodying several expressions and of works and agents as subjects.

Details

Journal of Documentation, vol. 69 no. 6
Type: Research Article
ISSN: 0022-0418

Keywords

Article
Publication date: 3 October 2023

Haklae Kim

Despite ongoing research into archival metadata standards, digital archives are unable to effectively represent records in their appropriate contexts. This study aims to propose a…

Abstract

Purpose

Despite ongoing research into archival metadata standards, digital archives are unable to effectively represent records in their appropriate contexts. This study aims to propose a knowledge graph that depicts the diverse relationships between heterogeneous digital archive entities.

Design/methodology/approach

This study introduces and describes a method for applying knowledge graphs to digital archives in a step-by-step manner. It examines archival metadata standards, such as Records in Context Ontology (RiC-O), for characterising digital records; explains the process of data refinement, enrichment and reconciliation with examples; and demonstrates the use of knowledge graphs constructed using semantic queries.

Findings

This study introduced the 97imf.kr archive as a knowledge graph, enabling meaningful exploration of relationships within the archive’s records. This approach facilitated comprehensive record descriptions about different record entities. Applying archival ontologies with general-purpose vocabularies to digital records was advised to enhance metadata coherence and semantic search.

Originality/value

Most digital archives serviced in Korea are limited in the proper use of archival metadata standards. The contribution of this study is to propose a practical application of knowledge graph technology for linking and exploring digital records. This study details the process of collecting raw data on archives, data preprocessing and data enrichment, and demonstrates how to build a knowledge graph connected to external data. In particular, the knowledge graph of RiC-O vocabulary, Wikidata and Schema.org vocabulary and the semantic query using it can be applied to supplement keyword search in conventional digital archives.

Details

The Electronic Library , vol. 42 no. 1
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 24 April 2007

R.N. Rustom and A. Yahia

Recently, there has been increased interest in the use of simulation for real‐time planning, scheduling, control of construction projects and obtaining optimum productivity. The…

1070

Abstract

Purpose

Recently, there has been increased interest in the use of simulation for real‐time planning, scheduling, control of construction projects and obtaining optimum productivity. The purpose of this case study is to demonstrate the use of simulation as an effective tool for estimating production rates in an attempt to prepare optimal time schedules.

Design/methodology/approach

Gaza Beach‐Camp Shore Protection Project was taken as a case study. The case study is used to demonstrate how to estimate effectively the production rates of labour and equipment during the implementation of the project activities and to estimate the duration of the project using process simulation. The model simulates the construction of 1,600 m of gabions divided into 32 identical stations. Probabilistic distribution functions were used to fit the time functions for each process and sub‐process based on 100 replications.

Findings

The simulation output generated three probabilistic values for completing each activity upon which the overall project completion time is determined. The resources utilizations for all processes were also generated and used in the determination of the average production rates.

Originality/value

The computation of productivity based on effective resources utilization has been demonstrated to give better results than estimating productivity based on aggregate resources assignments.

Details

Construction Innovation, vol. 7 no. 2
Type: Research Article
ISSN: 1471-4175

Keywords

Article
Publication date: 1 January 2005

Louis H. Kauffman

Discusses the notion of eigenform as explicated by Heinz von Foerster wherein an object is seen to be a token for those behaviors that lend the object its apparent stability in a…

Abstract

Purpose

Discusses the notion of eigenform as explicated by Heinz von Foerster wherein an object is seen to be a token for those behaviors that lend the object its apparent stability in a changing world.

Design/methodology/approach

Describes von Foerster's model for eigenforms and recursions and puts this model in the context of mathematical recursions, fractals, set theory, logic, quantum mechanics, the lambda calculus of Church and Curry, and the categorical framework of fixed points of Lawvere.

Findings

Determines that iterating a transformation upon itself is seen to be a key to understanding the nature of objects and the relationship of an observer and the apparent world of the observer.

Originality/value

Contemplates the concept of recursion in the context of second‐order cybernetics.

Details

Kybernetes, vol. 34 no. 1/2
Type: Research Article
ISSN: 0368-492X

Keywords

1 – 10 of over 2000