Search results
1 – 10 of over 2000Edoardo Ramalli and Barbara Pernici
Experiments are the backbone of the development process of data-driven predictive models for scientific applications. The quality of the experiments directly impacts the model…
Abstract
Purpose
Experiments are the backbone of the development process of data-driven predictive models for scientific applications. The quality of the experiments directly impacts the model performance. Uncertainty inherently affects experiment measurements and is often missing in the available data sets due to its estimation cost. For similar reasons, experiments are very few compared to other data sources. Discarding experiments based on the missing uncertainty values would preclude the development of predictive models. Data profiling techniques are fundamental to assess data quality, but some data quality dimensions are challenging to evaluate without knowing the uncertainty. In this context, this paper aims to predict the missing uncertainty of the experiments.
Design/methodology/approach
This work presents a methodology to forecast the experiments’ missing uncertainty, given a data set and its ontological description. The approach is based on knowledge graph embeddings and leverages the task of link prediction over a knowledge graph representation of the experiments database. The validity of the methodology is first tested in multiple conditions using synthetic data and then applied to a large data set of experiments in the chemical kinetic domain as a case study.
Findings
The analysis results of different test case scenarios suggest that knowledge graph embedding can be used to predict the missing uncertainty of the experiments when there is a hidden relationship between the experiment metadata and the uncertainty values. The link prediction task is also resilient to random noise in the relationship. The knowledge graph embedding outperforms the baseline results if the uncertainty depends upon multiple metadata.
Originality/value
The employment of knowledge graph embedding to predict the missing experimental uncertainty is a novel alternative to the current and more costly techniques in the literature. Such contribution permits a better data quality profiling of scientific repositories and improves the development process of data-driven models based on scientific experiments.
Details
Keywords
Neha Keshan, Kathleen Fontaine and James A. Hendler
This paper aims to describe the “InDO: Institute Demographic Ontology” and demonstrates the InDO-based semiautomated process for both generating and extending a knowledge graph to…
Abstract
Purpose
This paper aims to describe the “InDO: Institute Demographic Ontology” and demonstrates the InDO-based semiautomated process for both generating and extending a knowledge graph to provide a comprehensive resource for marginalized US graduate students. The knowledge graph currently consists of instances related to the semistructured National Science Foundation Survey of Earned Doctorates (NSF SED) 2019 analysis report data tables. These tables contain summary statistics of an institute’s doctoral recipients based on a variety of demographics. Incorporating institute Wikidata links ultimately produces a table of unique, clearly readable data.
Design/methodology/approach
The authors use a customized semantic extract transform and loader (SETLr) script to ingest data from 2019 US doctoral-granting institute tables and preprocessed NSF SED Tables 1, 3, 4 and 9. The generated InDO knowledge graph is evaluated using two methods. First, the authors compare competency questions’ sparql results from both the semiautomatically and manually generated graphs. Second, the authors expand the questions to provide a better picture of an institute’s doctoral-recipient demographics within study fields.
Findings
With some preprocessing and restructuring of the NSF SED highly interlinked tables into a more parsable format, one can build the required knowledge graph using a semiautomated process.
Originality/value
The InDO knowledge graph allows the integration of US doctoral-granting institutes demographic data based on NSF SED data tables and presentation in machine-readable form using a new semiautomated methodology.
Details
Keywords
Paolo Manghi, Claudio Atzori, Michele De Bonis and Alessia Bardi
Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate…
Abstract
Purpose
Several online services offer functionalities to access information from “big research graphs” (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts.
Design/methodology/approach
This work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments.
Findings
GDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph.
Originality/value
To our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.
Details
Keywords
Wilda Sitorus, Saib Suwilo and Mardiningsih
Hamming distance of a two bit strings u and v of length n is defined to be the number of positions of u and v with different digit. If G is a simple graph on n vertices and m…
Abstract
Hamming distance of a two bit strings u and v of length n is defined to be the number of positions of u and v with different digit. If G is a simple graph on n vertices and m edges and B is an edge–vertex incidence matrix of G, then every edge e of G can be labeled using a binary digit string of length n from the row of B which corresponds to the edge e. We discuss Hamming distance of two different edges of the graph G. Then, we present formulae for the sum of all Hamming distances between two different edges of G, particularly when G is a path, a cycle, and a wheel, and some composite graphs.
Details
Keywords
This study analyzed 24 IELTS Task One (data explanation) prompts for task type, diagram type, subject matter, level of critical thought, and geographical references, in order to…
Abstract
This study analyzed 24 IELTS Task One (data explanation) prompts for task type, diagram type, subject matter, level of critical thought, and geographical references, in order to determine whether Emirati university students’ anecdotal claims of cultural bias on the IELTS academic writing exam (as experienced by the researcher in the past decade of teaching IELTS in the United Arab Emirates) are valid. The analysis found that the majority of the task types (88%) were non-process in nature (i.e. required the description of data in the form of a chart or graph, rather than the description of a process); 40% of the non-process prompts consisted of more than one diagram. The analysis revealed that 33% of the non-process prompts included bar graphs and 29% included line graphs. Pie charts appeared in 25% of the prompts and tables in only 17%. An Emirati student English preparatory program survey indicated the pie chart as the easiest to understand – a finding that may highlight a difference between the most commonly used IELTS prompt and the students’ prompt preference. A content analysis of topics found a high percentage (58%) of subject matter related to the social sciences, with 79% of the geographical references pertaining to Western contexts. An analysis of the amount of critical thought needed for graph interpretation revealed 52% of non-process prompts required some form of critical thought. The study therefore found that the cultural bias perceived by Emirati students has some validity, given the students’ socio-cultural and educational background.
Bikash Barman and Kukil Kalpa Rajkhowa
The authors study the interdisciplinary relation between graph and algebraic structure ring defining a new graph, namely “non-essential sum graph”. The nonessential sum graph…
Abstract
Purpose
The authors study the interdisciplinary relation between graph and algebraic structure ring defining a new graph, namely “non-essential sum graph”. The nonessential sum graph, denoted by NES(R), of a commutative ring R with unity is an undirected graph whose vertex set is the collection of all nonessential ideals of R and any two vertices are adjacent if and only if their sum is also a nonessential ideal of R.
Design/methodology/approach
The method is theoretical.
Findings
The authors obtain some properties of NES(R) related with connectedness, diameter, girth, completeness, cut vertex, r-partition and regular character. The clique number, independence number and domination number of NES(R) are also found.
Originality/value
The paper is original.
Details
Keywords
Gerd Hübscher, Verena Geist, Dagmar Auer, Nicole Hübscher and Josef Küng
Knowledge- and communication-intensive domains still long for a better support of creativity that considers legal requirements, compliance rules and administrative tasks as well…
Abstract
Purpose
Knowledge- and communication-intensive domains still long for a better support of creativity that considers legal requirements, compliance rules and administrative tasks as well, because current systems focus either on knowledge representation or business process management. The purpose of this paper is to discuss our model of integrated knowledge and business process representation and its presentation to users.
Design/methodology/approach
The authors follow a design science approach in the environment of patent prosecution, which is characterized by a highly standardized, legally prescribed process and individual knowledge study. Thus, the research is based on knowledge study, BPM, graph-based knowledge representation and user interface design. The authors iteratively designed and built a model and a prototype. To evaluate the approach, the authors used analytical proof of concept, real-world test scenarios and case studies in real-world settings, where the authors conducted observations and open interviews.
Findings
The authors designed a model and implemented a prototype for evolving and storing static and dynamic aspects of knowledge. The proposed solution leverages the flexibility of a graph-based model to enable open and not only continuously developing user-centered processes but also pre-defined ones. The authors further propose a user interface concept which supports users to benefit from the richness of the model but provides sufficient guidance.
Originality/value
The balanced integration of the data and task perspectives distinguishes the model significantly from other approaches such as BPM or knowledge graphs. The authors further provide a sophisticated user interface design, which allows the users to effectively and efficiently use the graph-based knowledge representation in their daily study.
Details
Keywords
Aya Khaled Youssef Sayed Mohamed, Dagmar Auer, Daniel Hofer and Josef Küng
Data protection requirements heavily increased due to the rising awareness of data security, legal requirements and technological developments. Today, NoSQL databases are…
Abstract
Purpose
Data protection requirements heavily increased due to the rising awareness of data security, legal requirements and technological developments. Today, NoSQL databases are increasingly used in security-critical domains. Current survey works on databases and data security only consider authorization and access control in a very general way and do not regard most of today’s sophisticated requirements. Accordingly, the purpose of this paper is to discuss authorization and access control for relational and NoSQL database models in detail with respect to requirements and current state of the art.
Design/methodology/approach
This paper follows a systematic literature review approach to study authorization and access control for different database models. Starting with a research on survey works on authorization and access control in databases, the study continues with the identification and definition of advanced authorization and access control requirements, which are generally applicable to any database model. This paper then discusses and compares current database models based on these requirements.
Findings
As no survey works consider requirements for authorization and access control in different database models so far, the authors define their requirements. Furthermore, the authors discuss the current state of the art for the relational, key-value, column-oriented, document-based and graph database models in comparison to the defined requirements.
Originality/value
This paper focuses on authorization and access control for various database models, not concrete products. This paper identifies today’s sophisticated – yet general – requirements from the literature and compares them with research results and access control features of current products for the relational and NoSQL database models.
Details
Keywords
Ramy Shaheen, Suhail Mahfud and Ali Kassem
This paper aims to study Irreversible conversion processes, which examine the spread of a one way change of state (from state 0 to state 1) through a specified society (the spread…
Abstract
Purpose
This paper aims to study Irreversible conversion processes, which examine the spread of a one way change of state (from state 0 to state 1) through a specified society (the spread of disease through populations, the spread of opinion through social networks, etc.) where the conversion rule is determined at the beginning of the study. These processes can be modeled into graph theoretical models where the vertex set V(G) represents the set of individuals on which the conversion is spreading.
Design/methodology/approach
The irreversible k-threshold conversion process on a graph G=(V,E) is an iterative process which starts by choosing a set S_0?V, and for each step t (t = 1, 2,…,), S_t is obtained from S_(t−1) by adjoining all vertices that have at least k neighbors in S_(t−1). S_0 is called the seed set of the k-threshold conversion process and is called an irreversible k-threshold conversion set (IkCS) of G if S_t = V(G) for some t = 0. The minimum cardinality of all the IkCSs of G is referred to as the irreversible k-threshold conversion number of G and is denoted by C_k (G).
Findings
In this paper the authors determine C_k (G) for generalized Jahangir graph J_(s,m) for 1 < k = m and s, m are arbitraries. The authors also determine C_k (G) for strong grids P_2? P_n when k = 4, 5. Finally, the authors determine C_2 (G) for P_n? P_n when n is arbitrary.
Originality/value
This work is 100% original and has important use in real life problems like Anti-Bioterrorism.
Details
Keywords
Sofia Baroncini, Bruno Sartini, Marieke Van Erp, Francesca Tomasi and Aldo Gangemi
In the last few years, the size of Linked Open Data (LOD) describing artworks, in general or domain-specific Knowledge Graphs (KGs), is gradually increasing. This provides…
Abstract
Purpose
In the last few years, the size of Linked Open Data (LOD) describing artworks, in general or domain-specific Knowledge Graphs (KGs), is gradually increasing. This provides (art-)historians and Cultural Heritage professionals with a wealth of information to explore. Specifically, structured data about iconographical and iconological (icon) aspects, i.e. information about the subjects, concepts and meanings of artworks, are extremely valuable for the state-of-the-art of computational tools, e.g. content recognition through computer vision. Nevertheless, a data quality evaluation for art domains, fundamental for data reuse, is still missing. The purpose of this study is filling this gap with an overview of art-historical data quality in current KGs with a focus on the icon aspects.
Design/methodology/approach
This study’s analyses are based on established KG evaluation methodologies, adapted to the domain by addressing requirements from art historians’ theories. The authors first select several KGs according to Semantic Web principles. Then, the authors evaluate (1) their structures’ suitability to describe icon information through quantitative and qualitative assessment and (2) their content, qualitatively assessed in terms of correctness and completeness.
Findings
This study’s results reveal several issues on the current expression of icon information in KGs. The content evaluation shows that these domain-specific statements are generally correct but often not complete. The incompleteness is confirmed by the structure evaluation, which highlights the unsuitability of the KG schemas to describe icon information with the required granularity.
Originality/value
The main contribution of this work is an overview of the actual landscape of the icon information expressed in LOD. Therefore, it is valuable to cultural institutions by providing them a first domain-specific data quality evaluation. Since this study’s results suggest that the selected domain information is underrepresented in Semantic Web datasets, the authors highlight the need for the creation and fostering of such information to provide a more thorough art-historical dimension to LOD.
Details