Search results

1 – 4 of 4
To view the access options for this content please click here
Article

David Martín-Moncunill, Miguel Angel Sicilia-Urban, Elena García-Barriocanal and Christian M. Stracke

The common understanding of generalization/specialization relations assumes the relation to be equally strong between a classifier and any of its related classifiers and…

Abstract

Purpose

The common understanding of generalization/specialization relations assumes the relation to be equally strong between a classifier and any of its related classifiers and also at every level of the hierarchy. Assigning a grade of relative distance to represent the level of similarity between the related pairs of classifiers could correct this situation, which has been considered as an oversimplification of the psychological account of the real-world relations. The paper aims to discuss these issues.

Design/methodology/approach

The evaluation followed an end-user perspective. In order to obtain a consistent data set of specialization distances, a group of 21 persons was asked to assign values to a set of relations from a selection of terms from the AGROVOC thesaurus. Then two sets of representations of the relations between the terms were built, one according to the calculated concept of specialization weights and the other one following the original order of the thesaurus. In total, 40 persons were asked to choose between the two sets following an A/B test-like experiment. Finally, short interviews were carried out after the test to inquiry about their decisions.

Findings

The results show that the use of this information could be a valuable tool for search and information retrieval purposes and for the visual representation of knowledge organization systems (KOS). Furthermore, the methodology followed in the study turned out to be useful for detecting inconsistencies in the thesaurus and could thus be used for quality control and optimization of the hierarchical relations.

Originality/value

The use of this relative distance information, namely, “concept specialization distance,” has been proposed mainly at a theoretical level. In the current experiment, the authors evaluate the potential use of this information from an end-user perspective, not only for text-based interfaces but also its application for the visual representation of KOS. Finally, the methodology followed for the elaboration of the concept specialization distance data set showed potential for detecting possible inconsistencies in KOS.

Details

Online Information Review, vol. 41 no. 6
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article

Alberto Nogales, Miguel Angel Sicilia-Urban and Elena García-Barriocanal

This paper reports on a quantitative study of data gathered from the Linked Open Vocabularies (LOV) catalogue, including the use of network analysis and metrics. The…

Abstract

Purpose

This paper reports on a quantitative study of data gathered from the Linked Open Vocabularies (LOV) catalogue, including the use of network analysis and metrics. The purpose of this paper is to gain insights into the structure of LOV and the use of vocabularies in the Web of Data. It is important to note that not all the vocabularies in it are registered in LOV. Given the de-centralised and collaborative nature of the use and adoption of these vocabularies, the results of the study can be used to identify emergent important vocabularies that are shaping the Web of Data.

Design/methodology/approach

The methodology is based on an analytical approach to a data set that captures a complete snapshot of the LOV catalogue dated April 2014. An initial analysis of the data is presented in order to obtain insights into the characteristics of the vocabularies found in LOV. This is followed by an analysis of the use of Vocabulary of a Friend properties that describe relations among vocabularies. Finally, the study is complemented with an analysis of the usage of the different vocabularies, and concludes by proposing a number of metrics.

Findings

The most relevant insight is that unsurprisingly the vocabularies with more presence are those used to model Semantic Web data, such as Resource Description Framework, RDF Schema and OWL, as well as broadly used standards as Simple Knowledge Organization System, DCTERMS and DCE. It was also discovered that the most used language is English and the vocabularies are not considered to be highly specialised in a field. Also, there is not a dominant scope of the vocabularies. Regarding the structural analysis, it is concluded that LOV is a heterogeneous network.

Originality/value

The paper provides an empirical analysis of the structure of LOV and the relations between its vocabularies, together with some metrics that may be of help to determine the important vocabularies from a practical perspective. The results are of interest for a better understanding of the evolution and dynamics of the Web of Data, and for applications that attempt to retrieve data in the Linked Data Cloud. These applications can benefit from the insights into the important vocabularies to be supported and the value added when mapping between and using the vocabularies.

Details

Online Information Review, vol. 41 no. 2
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article

Rutilio Rodolfo López Barbosa, Salvador Sánchez-Alonso and Miguel Angel Sicilia-Urban

– The purpose of this paper is to assess the reliability of numerical ratings of hotels calculated by three sentiment analysis algorithms.

Abstract

Purpose

The purpose of this paper is to assess the reliability of numerical ratings of hotels calculated by three sentiment analysis algorithms.

Design/methodology/approach

More than one million reviews and numerical ratings of hotels in seven cities in four countries were extracted from TripAdvisor web site. Reviews were classified as positive or negative using three sentiment analysis tools. The percentage of positive reviews was used to predict numerical ratings that were then compared with actual ratings.

Findings

All tools classified reviews as positive or negative in a way that correlated positively with numerical ratings. More complex algorithms worked better, yet predicted ratings showed reasonable agreement with actual ratings for most cities. Predictions for hotels were less reliable if based on less than 50-60 percent of available reviews.

Practical implications

These results validate that sentiment analysis can be used to transform unstructured qualitative data on user opinion into quantitative ratings. Current tools may be useful for summarizing opinions of user reviews of products and services on web sites that do not require users to post numerical ratings such as traveler forums. This summarizing may be valuable not just to potential users, but also to the service and product providers and offers validation and benchmarking for future improvement of opinion mining and prediction techniques.

Originality/value

This work assesses the correlation between sentiment analysis of hotels’ reviews and their actual ratings. The authors also evaluated the reliability of results of sentiment analysis calculated by three different algorithms.

Details

Aslib Journal of Information Management, vol. 67 no. 4
Type: Research Article
ISSN: 2050-3806

Keywords

To view the access options for this content please click here
Article

David Martín-Moncunill, Miguel-Ángel Sicilia-Urban, Elena García-Barriocanal and Salvador Sánchez-Alonso

Large terminologies usually contain a mix of terms that are either generic or domain specific, which makes the use of the terminology itself a difficult task that may…

Abstract

Purpose

Large terminologies usually contain a mix of terms that are either generic or domain specific, which makes the use of the terminology itself a difficult task that may limit the positive effects of these systems. The purpose of this paper is to systematically evaluate the degree of domain specificity of the AGROVOC controlled vocabulary terms as a representative of a large terminology in the agricultural domain and discuss the generic/specific boundaries across its hierarchy.

Design/methodology/approach

A user-oriented study with domain-experts in conjunction with quantitative and systematic analysis. First an in-depth analysis of AGROVOC was carried out to make a proper selection of terms for the experiment. Then domain-experts were asked to classify the terms according to their domain specificity. An evaluation was conducted to analyse the domain-experts’ results. Finally, the resulting data set was automatically compared with the terms in SUMO, an upper ontology and MILO, a mid-level ontology; to analyse the coincidences.

Findings

Results show the existence of a high number of generic terms. The motivation for several of the unclear cases is also depicted. The automatic evaluation showed that there is not a direct way to assess the specificity degree of a term by using SUMO and MILO ontologies, however, it provided additional validation of the results gathered from the domain-experts.

Research limitations/implications

The “domain-analysis” concept has long been discussed and it could be addressed from different perspectives. A resume of these perspectives and an explanation of the approach followed in this experiment is included in the background section.

Originality/value

The authors propose an approach to identify the domain specificity of terms in large domain-specific terminologies and a criterion to measure the overall domain specificity of a knowledge organisation system, based on domain-experts analysis. The authors also provide a first insight about using automated measures to determine the degree to which a given term can be considered domain specific. The resulting data set from the domain-experts’ evaluation can be reused as a gold standard for further research about these automatic measures.

Details

Online Information Review, vol. 39 no. 3
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 4 of 4