Search results
1 – 10 of over 10000Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that…
Abstract
Purpose
Matching instances of the same entity, a task known as entity resolution, is a key step in the process of data integration. This paper aims to propose a deep learning network that learns different representations of Web entities for entity resolution.
Design/methodology/approach
To match Web entities, the proposed network learns the following representations of entities: embeddings, which are vector representations of the words in the entities in a low-dimensional space; convolutional vectors from a convolutional layer, which capture short-distance patterns in word sequences in the entities; and bag-of-word vectors, created by a bow layer that learns weights for words in the vocabulary based on the task at hand. Given a pair of entities, the similarity between their learned representations is used as a feature to a binary classifier that identifies a possible match. In addition to those features, the classifier also uses a modification of inverse document frequency for pairs, which identifies discriminative words in pairs of entities.
Findings
The proposed approach was evaluated in two commercial and two academic entity resolution benchmarking data sets. The results have shown that the proposed strategy outperforms previous approaches in the commercial data sets, which are more challenging, and have similar results to its competitors in the academic data sets.
Originality/value
No previous work has used a single deep learning framework to learn different representations of Web entities for entity resolution.
Details
Keywords
Eduardo Krawietz Ramos, Rosa María Aguilar Chinea and Pedro Juan Baquero Pérez
This paper aims to study the competition problems and market failures in the Canary Islands and propose an alternative management model for the telecommunication transmission…
Abstract
Purpose
This paper aims to study the competition problems and market failures in the Canary Islands and propose an alternative management model for the telecommunication transmission network. This model is based on a wholesale-only open-access transmission network, available to all the retail service providers of this region, and managed by a unique entity subject to regulation with cost-based prices. The proposal hopefully will help to debate about the implementation of certain regulatory models in the network industries, concerning telecommunication submarine cables connecting archipelagos.
Design/methodology/approach
An empirical approach has been used, based on the observation and analysis of the regulatory policies applied to the wholesale transmission networks in the Canary Islands, Azores and Madeira archipelagos.
Findings
Results show a persistent margin squeeze situation on the retail broadband market in the Canary Islands, due to the pricing strategy on the Spanish mainland-Canaries wholesale market, which is, in turn, delaying the entry of alternatives and the level of development and efficiency of competition. The risk of duopoly collusion is also present on this wholesale market. Additionally, public aids will be needed to replace the systems connecting with the non-capital islands and to provide redundancy to El Hierro. The alternative proposal might help preventing the above. Eventually, several insights are considered for further investigation.
Originality/value
Little attention has been paid to this topic in the literature, regarding the analysis of regulatory policies applied over fiber optic submarine cable infrastructures in fragmented territories like archipelagos. Consequently, an empirical analysis has been accomplished to emphasize this research work, based on the regulatory policies adopted.
Details
Keywords
Sigal Arie Erez, Tobias Blanke, Mike Bryant, Kepa Rodriguez, Reto Speck and Veerle Vanden Daelen
This paper aims to describe the European Holocaust Research Infrastructure (EHRI) project's ongoing efforts to virtually integrate trans-national archival sources via the…
Abstract
Purpose
This paper aims to describe the European Holocaust Research Infrastructure (EHRI) project's ongoing efforts to virtually integrate trans-national archival sources via the reconstruction of collection provenance as it relates to copy collections (material copied from one archive to another) and the co-referencing of subject and authority terms across material held by distinct institutions.
Design/methodology/approach
This paper is a case study of approximately 6,000 words length. The authors describe the scope of the problem of archival fragmentation from both cultural and technical perspectives, with particular focus on Holocaust-related material, and describe, with graph-based visualisations, two ways in which EHRI seeks to better integrate information about fragmented material.
Findings
As a case study, the principal contributions of this paper include reports on our experience with extracting provenance-based connections between archival descriptions from encoded finding aids and the challenges of co-referencing access points in the absence of domain-specific controlled vocabularies.
Originality/value
Record linking in general is an important technique in computational approaches to humanities research and one that has rightly received significant attention from scholars. In the context of historical archives, however, the material itself is in most cases not digitised, meaning that computational attempts at linking must rely on finding aids which constitute much fewer rich data sources. The EHRI project’s work in this area is therefore quite pioneering and has implications for archival integration on a larger scale, where the disruptive potential of Linked Open Data is most obvious.
Details
Keywords
Amed Leiva-Mederos, Jose A. Senso, Yusniel Hidalgo-Delgado and Pedro Hipola
Information from Current Research Information Systems (CRIS) is stored in different formats, in platforms that are not compatible, or even in independent networks. It would be…
Abstract
Purpose
Information from Current Research Information Systems (CRIS) is stored in different formats, in platforms that are not compatible, or even in independent networks. It would be helpful to have a well-defined methodology to allow for management data processing from a single site, so as to take advantage of the capacity to link disperse data found in different systems, platforms, sources and/or formats. Based on functionalities and materials of the VLIR project, the purpose of this paper is to present a model that provides for interoperability by means of semantic alignment techniques and metadata crosswalks, and facilitates the fusion of information stored in diverse sources.
Design/methodology/approach
After reviewing the state of the art regarding the diverse mechanisms for achieving semantic interoperability, the paper analyzes the following: the specific coverage of the data sets (type of data, thematic coverage and geographic coverage); the technical specifications needed to retrieve and analyze a distribution of the data set (format, protocol, etc.); the conditions of re-utilization (copyright and licenses); and the “dimensions” included in the data set as well as the semantics of these dimensions (the syntax and the taxonomies of reference). The semantic interoperability framework here presented implements semantic alignment and metadata crosswalk to convert information from three different systems (ABCD, Moodle and DSpace) to integrate all the databases in a single RDF file.
Findings
The paper also includes an evaluation based on the comparison – by means of calculations of recall and precision – of the proposed model and identical consultations made on Open Archives Initiative and SQL, in order to estimate its efficiency. The results have been satisfactory enough, due to the fact that the semantic interoperability facilitates the exact retrieval of information.
Originality/value
The proposed model enhances management of the syntactic and semantic interoperability of the CRIS system designed. In a real setting of use it achieves very positive results.
Details
Keywords
Jacopo Carmassi and Richard John Herring
The purpose of this paper is to analyze whether and how “living wills” and public disclosure of such resolution plans contribute to market discipline and the effective resolution…
Abstract
Purpose
The purpose of this paper is to analyze whether and how “living wills” and public disclosure of such resolution plans contribute to market discipline and the effective resolution of too big and too complex to fail banks.
Design/methodology/approach
The disorderly collapse of Lehman Brothers is analyzed. Large, systemically important banks are now required to prepare resolution plans (living wills). In the USA, parts of the living wills must be disclosed to the public. The public component is analyzed with respect to contribution to market discipline and effective resolution of banks considered too big and complex to fail. In a statistical analysis of the publicly available section of living wills, this information is contrasted with legislative requirements.
Findings
The analysis of public disclosures of resolution plans shows that they are insufficient to facilitate market discipline and, in some instances, fail to enhance public understanding of the financial institution and its business. When coupled with the uncertainty over how an internationally active financial institution will be resolved, the paper concludes that these reforms will do little to reduce market expectations that some financial firms are simply too big or too complex to fail.
Research limitations/implications
A very small data set and the necessity of cross-checking the authors' observations with all publicly available sources. The authors have also tried to infer a purpose for public disclosure of parts of resolution plans. The authorities are remarkably vague on the issue and so the authors have assumed they actually did have a specific intent that would strengthen the system.
Practical implications
The inference from the publicly available portion of living wills is that the authorities are a very long way from abolishing too-big-to-fail.
Originality/value
So far as the authors know, this is the first in-depth analysis of the information available in the public sections of living wills.
Details
Keywords
Tsung Teng Chen and David C. Yen
The paper's aim is to document the development of a novel tool to address the inadequacies of existing cocitation visualization tools.
Abstract
Purpose
The paper's aim is to document the development of a novel tool to address the inadequacies of existing cocitation visualization tools.
Design/methodology/approach
The paper demonstrates the visualized effects of this tool and supplements the results with a case study that utilizes a large data set to explore the cross‐field studies among different computer science fields.
Findings
The tool displays cocitation graphs with latent visual cues and allows direct manipulation of the visualized graphs. The tool also facilitates the exploration of the relationships between articles in the graphs.
Research limitations/implications
The indirect cocitation relationships are vividly visualized by the citation network itself. The context lost by the conventional cocitation network may be preserved. Instead of being linked by explicit lines, the implicit cocitation relationships are shown by the closeness among the cocited nodes.
Practical implications
The preserved context of a cocitation network may facilitate the exploration of latent cross‐field studies. The cocitation visualization tool demonstrates that the context of the cocitation graph is preserved by using the citation network itself to reveal cocitation relationships.
Originality/value
The cocitation relationships are implied by the closeness among cocited nodes in a citation graph. The paper documents this novel approach which has not been seen before.
Details
Keywords
With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds…
Abstract
With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds of data. Unfortunately, existing data is not proper due to the existence of the same information in different sources, as well as erroneous and incomplete data. The aim of data integration systems is to offer to a user a unique interface to query a number of sources. A key challenge of such systems is to deal with conflicting information from the same source or from different sources. We present, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion. The reference reconciliation methods seek to decide if two data descriptions are references to the same entity in reality. We define the principles of reconciliation method then we distinguish the methods of reference reconciliation, first on how to use the descriptions of references, then the way to acquire knowledge. We finish this section by discussing some current data reconciliation issues that are the subject of current research. Data fusion in turn, has the objective to merge duplicates into a single representation while resolving conflicts between the data. We define first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies. We present then, the relational operators and data fusion techniques. Likewise, we finish this section by discussing some current data fusion issues that are the subject of current research.
Details
Keywords
Two trends mark the contemporary international scholarship on conflict and resolution. The scholarship on conflict has begun to look systematically at intra-state conflicts and…
Abstract
Two trends mark the contemporary international scholarship on conflict and resolution. The scholarship on conflict has begun to look systematically at intra-state conflicts and track the role of non-state actors, along with the more established trend of analysing inter-state conflict. Conflict resolution has also moved beyond looking at states and national and global-level NGOs to the role of local, non-state actors in preventing and/or minimising conflict. While the “mainstream” scholarly work emphasises a linear process of reaching resolutions in the aftermath of a conflict (e.g. Burton, 1990; Galtung, 1965), a range of “related” scholarship has begun to focus on factors that prevent conflict and their rapid diffusion over wider areas, as well as factors that contribute to longer term, peaceful, resolution (e.g. Das, Kleinman, Lock, Ramphele, & Reynolds, 2001; Sabet, 1998; Varshney, 2001). These related literature look beyond political solutions such as conflict management, boundary adjustments, and treaties, and the role of international and national formal bodies to resolve and manage conflict; their emphasis is on conflict prevention, the healing of conflict victims, and building and sustaining peace. With the recognition, in the 21st century, of the escalating production and spread of weaponry, the power of non-state actors to generate significant conflict, as well as the rapidly growing proportion of people who suffer from and cope with the aftermath of such conflict, the expanded frames for understanding conflict and resolution, requires further attention.
Byung-Won On, Gyu Sang Choi and Soo-Mok Jung
The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case…
Abstract
Purpose
The purpose of this paper is to collect and understand the nature of real cases of author name variants that have often appeared in bibliographic digital libraries (DLs) as a case study of the name authority control problem in DLs.
Design/methodology/approach
To find a sample of name variants across DLs (e.g. DBLP and ACM) and in a single DL (e.g. ACM), the approach is based on two bipartite matching algorithms: Maximum Weighted Bipartite Matching and Maximum Cardinality Bipartite Matching.
Findings
First, the authors validated the effectiveness and efficiency of the bipartite matching algorithms. The authors also studied the nature of real cases of author name variants that had been found across DLs (e.g. ACM, CiteSeer and DBLP) and in a single DL.
Originality/value
To the best of the authors knowledge, there is less research effort to understand the nature of author name variants shown in DLs. A thorough analysis can help focus research effort on real problems that arise when the authors perform duplicate detection methods.
Details