Search results

1 – 10 of over 2000
Open Access
Article
Publication date: 8 February 2023

Edoardo Ramalli and Barbara Pernici

Experiments are the backbone of the development process of data-driven predictive models for scientific applications. The quality of the experiments directly impacts the model…

Abstract

Purpose

Experiments are the backbone of the development process of data-driven predictive models for scientific applications. The quality of the experiments directly impacts the model performance. Uncertainty inherently affects experiment measurements and is often missing in the available data sets due to its estimation cost. For similar reasons, experiments are very few compared to other data sources. Discarding experiments based on the missing uncertainty values would preclude the development of predictive models. Data profiling techniques are fundamental to assess data quality, but some data quality dimensions are challenging to evaluate without knowing the uncertainty. In this context, this paper aims to predict the missing uncertainty of the experiments.

Design/methodology/approach

This work presents a methodology to forecast the experiments’ missing uncertainty, given a data set and its ontological description. The approach is based on knowledge graph embeddings and leverages the task of link prediction over a knowledge graph representation of the experiments database. The validity of the methodology is first tested in multiple conditions using synthetic data and then applied to a large data set of experiments in the chemical kinetic domain as a case study.

Findings

The analysis results of different test case scenarios suggest that knowledge graph embedding can be used to predict the missing uncertainty of the experiments when there is a hidden relationship between the experiment metadata and the uncertainty values. The link prediction task is also resilient to random noise in the relationship. The knowledge graph embedding outperforms the baseline results if the uncertainty depends upon multiple metadata.

Originality/value

The employment of knowledge graph embedding to predict the missing experimental uncertainty is a novel alternative to the current and more costly techniques in the literature. Such contribution permits a better data quality profiling of scientific repositories and improves the development process of data-driven models based on scientific experiments.

Open Access
Article
Publication date: 13 October 2022

Neha Keshan, Kathleen Fontaine and James A. Hendler

This paper aims to describe the “InDO: Institute Demographic Ontology” and demonstrates the InDO-based semiautomated process for both generating and extending a knowledge graph to…

Abstract

Purpose

This paper aims to describe the “InDO: Institute Demographic Ontology” and demonstrates the InDO-based semiautomated process for both generating and extending a knowledge graph to provide a comprehensive resource for marginalized US graduate students. The knowledge graph currently consists of instances related to the semistructured National Science Foundation Survey of Earned Doctorates (NSF SED) 2019 analysis report data tables. These tables contain summary statistics of an institute’s doctoral recipients based on a variety of demographics. Incorporating institute Wikidata links ultimately produces a table of unique, clearly readable data.

Design/methodology/approach

The authors use a customized semantic extract transform and loader (SETLr) script to ingest data from 2019 US doctoral-granting institute tables and preprocessed NSF SED Tables 1, 3, 4 and 9. The generated InDO knowledge graph is evaluated using two methods. First, the authors compare competency questions’ sparql results from both the semiautomatically and manually generated graphs. Second, the authors expand the questions to provide a better picture of an institute’s doctoral-recipient demographics within study fields.

Findings

With some preprocessing and restructuring of the NSF SED highly interlinked tables into a more parsable format, one can build the required knowledge graph using a semiautomated process.

Originality/value

The InDO knowledge graph allows the integration of US doctoral-granting institutes demographic data based on NSF SED data tables and presentation in machine-readable form using a new semiautomated methodology.

Details

International Journal of Web Information Systems, vol. 18 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 29 June 2020

Paolo Manghi, Claudio Atzori, Michele De Bonis and Alessia Bardi

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate…

4533

Abstract

Purpose

Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts.

Design/methodology/approach

This work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments.

Findings

GDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph.

Originality/value

To our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.

Details

Data Technologies and Applications, vol. 54 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Open Access
Book part
Publication date: 4 May 2018

Wilda Sitorus, Saib Suwilo and Mardiningsih

Hamming distance of a two bit strings u and v of length n is defined to be the number of positions of u and v with different digit. If G is a simple graph on n vertices and m

Abstract

Hamming distance of a two bit strings u and v of length n is defined to be the number of positions of u and v with different digit. If G is a simple graph on n vertices and m edges and B is an edge–vertex incidence matrix of G, then every edge e of G can be labeled using a binary digit string of length n from the row of B which corresponds to the edge e. We discuss Hamming distance of two different edges of the graph G. Then, we present formulae for the sum of all Hamming distances between two different edges of G, particularly when G is a path, a cycle, and a wheel, and some composite graphs.

Open Access
Article
Publication date: 1 June 2016

Hilda Freimuth

This study analyzed 24 IELTS Task One (data explanation) prompts for task type, diagram type, subject matter, level of critical thought, and geographical references, in order to…

829

Abstract

This study analyzed 24 IELTS Task One (data explanation) prompts for task type, diagram type, subject matter, level of critical thought, and geographical references, in order to determine whether Emirati university students’ anecdotal claims of cultural bias on the IELTS academic writing exam (as experienced by the researcher in the past decade of teaching IELTS in the United Arab Emirates) are valid. The analysis found that the majority of the task types (88%) were non-process in nature (i.e. required the description of data in the form of a chart or graph, rather than the description of a process); 40% of the non-process prompts consisted of more than one diagram. The analysis revealed that 33% of the non-process prompts included bar graphs and 29% included line graphs. Pie charts appeared in 25% of the prompts and tables in only 17%. An Emirati student English preparatory program survey indicated the pie chart as the easiest to understand – a finding that may highlight a difference between the most commonly used IELTS prompt and the students’ prompt preference. A content analysis of topics found a high percentage (58%) of subject matter related to the social sciences, with 79% of the geographical references pertaining to Western contexts. An analysis of the amount of critical thought needed for graph interpretation revealed 52% of non-process prompts required some form of critical thought. The study therefore found that the cultural bias perceived by Emirati students has some validity, given the students’ socio-cultural and educational background.

Details

Learning and Teaching in Higher Education: Gulf Perspectives, vol. 13 no. 1
Type: Research Article
ISSN: 2077-5504

Open Access
Article
Publication date: 24 February 2021

Bikash Barman and Kukil Kalpa Rajkhowa

The authors study the interdisciplinary relation between graph and algebraic structure ring defining a new graph, namely “non-essential sum graph”. The nonessential sum graph

Abstract

Purpose

The authors study the interdisciplinary relation between graph and algebraic structure ring defining a new graph, namely “non-essential sum graph”. The nonessential sum graph, denoted by NES(R), of a commutative ring R with unity is an undirected graph whose vertex set is the collection of all nonessential ideals of R and any two vertices are adjacent if and only if their sum is also a nonessential ideal of R.

Design/methodology/approach

The method is theoretical.

Findings

The authors obtain some properties of NES(R) related with connectedness, diameter, girth, completeness, cut vertex, r-partition and regular character. The clique number, independence number and domination number of NES(R) are also found.

Originality/value

The paper is original.

Details

Arab Journal of Mathematical Sciences, vol. 28 no. 1
Type: Research Article
ISSN: 1319-5166

Keywords

Open Access
Article
Publication date: 6 September 2021

Gerd Hübscher, Verena Geist, Dagmar Auer, Nicole Hübscher and Josef Küng

Knowledge- and communication-intensive domains still long for a better support of creativity that considers legal requirements, compliance rules and administrative tasks as well…

880

Abstract

Purpose

Knowledge- and communication-intensive domains still long for a better support of creativity that considers legal requirements, compliance rules and administrative tasks as well, because current systems focus either on knowledge representation or business process management. The purpose of this paper is to discuss our model of integrated knowledge and business process representation and its presentation to users.

Design/methodology/approach

The authors follow a design science approach in the environment of patent prosecution, which is characterized by a highly standardized, legally prescribed process and individual knowledge study. Thus, the research is based on knowledge study, BPM, graph-based knowledge representation and user interface design. The authors iteratively designed and built a model and a prototype. To evaluate the approach, the authors used analytical proof of concept, real-world test scenarios and case studies in real-world settings, where the authors conducted observations and open interviews.

Findings

The authors designed a model and implemented a prototype for evolving and storing static and dynamic aspects of knowledge. The proposed solution leverages the flexibility of a graph-based model to enable open and not only continuously developing user-centered processes but also pre-defined ones. The authors further propose a user interface concept which supports users to benefit from the richness of the model but provides sufficient guidance.

Originality/value

The balanced integration of the data and task perspectives distinguishes the model significantly from other approaches such as BPM or knowledge graphs. The authors further provide a sophisticated user interface design, which allows the users to effectively and efficiently use the graph-based knowledge representation in their daily study.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 9 October 2023

Aya Khaled Youssef Sayed Mohamed, Dagmar Auer, Daniel Hofer and Josef Küng

Data protection requirements heavily increased due to the rising awareness of data security, legal requirements and technological developments. Today, NoSQL databases are…

1037

Abstract

Purpose

Data protection requirements heavily increased due to the rising awareness of data security, legal requirements and technological developments. Today, NoSQL databases are increasingly used in security-critical domains. Current survey works on databases and data security only consider authorization and access control in a very general way and do not regard most of today’s sophisticated requirements. Accordingly, the purpose of this paper is to discuss authorization and access control for relational and NoSQL database models in detail with respect to requirements and current state of the art.

Design/methodology/approach

This paper follows a systematic literature review approach to study authorization and access control for different database models. Starting with a research on survey works on authorization and access control in databases, the study continues with the identification and definition of advanced authorization and access control requirements, which are generally applicable to any database model. This paper then discusses and compares current database models based on these requirements.

Findings

As no survey works consider requirements for authorization and access control in different database models so far, the authors define their requirements. Furthermore, the authors discuss the current state of the art for the relational, key-value, column-oriented, document-based and graph database models in comparison to the defined requirements.

Originality/value

This paper focuses on authorization and access control for various database models, not concrete products. This paper identifies today’s sophisticated – yet general – requirements from the literature and compares them with research results and access control features of current products for the relational and NoSQL database models.

Details

International Journal of Web Information Systems, vol. 20 no. 1
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 18 October 2022

Ramy Shaheen, Suhail Mahfud and Ali Kassem

This paper aims to study Irreversible conversion processes, which examine the spread of a one way change of state (from state 0 to state 1) through a specified society (the spread…

471

Abstract

Purpose

This paper aims to study Irreversible conversion processes, which examine the spread of a one way change of state (from state 0 to state 1) through a specified society (the spread of disease through populations, the spread of opinion through social networks, etc.) where the conversion rule is determined at the beginning of the study. These processes can be modeled into graph theoretical models where the vertex set V(G) represents the set of individuals on which the conversion is spreading.

Design/methodology/approach

The irreversible k-threshold conversion process on a graph G=(V,E) is an iterative process which starts by choosing a set S_0?V, and for each step t (t = 1, 2,…,), S_t is obtained from S_(t−1) by adjoining all vertices that have at least k neighbors in S_(t−1). S_0 is called the seed set of the k-threshold conversion process and is called an irreversible k-threshold conversion set (IkCS) of G if S_t = V(G) for some t = 0. The minimum cardinality of all the IkCSs of G is referred to as the irreversible k-threshold conversion number of G and is denoted by C_k (G).

Findings

In this paper the authors determine C_k (G) for generalized Jahangir graph J_(s,m) for 1 < k = m and s, m are arbitraries. The authors also determine C_k (G) for strong grids P_2? P_n when k = 4, 5. Finally, the authors determine C_2 (G) for P_n? P_n when n is arbitrary.

Originality/value

This work is 100% original and has important use in real life problems like Anti-Bioterrorism.

Details

Arab Journal of Mathematical Sciences, vol. 30 no. 1
Type: Research Article
ISSN: 1319-5166

Keywords

Open Access
Article
Publication date: 30 March 2023

Sofia Baroncini, Bruno Sartini, Marieke Van Erp, Francesca Tomasi and Aldo Gangemi

In the last few years, the size of Linked Open Data (LOD) describing artworks, in general or domain-specific Knowledge Graphs (KGs), is gradually increasing. This provides…

Abstract

Purpose

In the last few years, the size of Linked Open Data (LOD) describing artworks, in general or domain-specific Knowledge Graphs (KGs), is gradually increasing. This provides (art-)historians and Cultural Heritage professionals with a wealth of information to explore. Specifically, structured data about iconographical and iconological (icon) aspects, i.e. information about the subjects, concepts and meanings of artworks, are extremely valuable for the state-of-the-art of computational tools, e.g. content recognition through computer vision. Nevertheless, a data quality evaluation for art domains, fundamental for data reuse, is still missing. The purpose of this study is filling this gap with an overview of art-historical data quality in current KGs with a focus on the icon aspects.

Design/methodology/approach

This study’s analyses are based on established KG evaluation methodologies, adapted to the domain by addressing requirements from art historians’ theories. The authors first select several KGs according to Semantic Web principles. Then, the authors evaluate (1) their structures’ suitability to describe icon information through quantitative and qualitative assessment and (2) their content, qualitatively assessed in terms of correctness and completeness.

Findings

This study’s results reveal several issues on the current expression of icon information in KGs. The content evaluation shows that these domain-specific statements are generally correct but often not complete. The incompleteness is confirmed by the structure evaluation, which highlights the unsuitability of the KG schemas to describe icon information with the required granularity.

Originality/value

The main contribution of this work is an overview of the actual landscape of the icon information expressed in LOD. Therefore, it is valuable to cultural institutions by providing them a first domain-specific data quality evaluation. Since this study’s results suggest that the selected domain information is underrepresented in Semantic Web datasets, the authors highlight the need for the creation and fostering of such information to provide a more thorough art-historical dimension to LOD.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

1 – 10 of over 2000