Search results

1 – 10 of 475
Open Access
Article
Publication date: 20 August 2021

Daniel Hofer, Markus Jäger, Aya Khaled Youssef Sayed Mohamed and Josef Küng

For aiding computer security experts in their study, log files are a crucial piece of information. Especially the time domain is very important for us because in most cases…

2537

Abstract

Purpose

For aiding computer security experts in their study, log files are a crucial piece of information. Especially the time domain is very important for us because in most cases, timestamps are the only linking points between events caused by attackers, faulty systems or simple errors and their corresponding entries in log files. With the idea of storing and analyzing this log information in graph databases, we need a suitable model to store and connect timestamps and their events. This paper aims to find and evaluate different approaches how to store timestamps in graph databases and their individual benefits and drawbacks.

Design/methodology/approach

We analyse three different approaches, how timestamp information can be represented and stored in graph databases. For checking the models, we set up four typical questions that are important for log file analysis and tested them for each of the models. During the evaluation, we used the performance and other properties as metrics, how suitable each of the models is for representing the log files’ timestamp information. In the last part, we try to improve one promising looking model.

Findings

We come to the conclusion, that the simplest model with the least graph database-specific concepts in use is also the one yielding the simplest and fastest queries.

Research limitations/implications

Limitations to this research are that only one graph database was studied and also improvements to the query engine might change future results.

Originality/value

In the study, we addressed the issue of storing timestamps in graph databases in a meaningful, practical and efficient way. The results can be used as a pattern for similar scenarios and applications.

Details

International Journal of Web Information Systems, vol. 17 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Content available
Book part
Publication date: 29 April 1994

Abstract

Details

Using Subject Headings for Online Retrieval: Theory, Practice and Potential
Type: Book
ISBN: 978-0-12221-570-4

Open Access
Article
Publication date: 8 July 2021

Johann Eder and Vladimir A. Shekhovtsov

Medical research requires biological material and data collected through biobanks in reliable processes with quality assurance. Medical studies based on data with unknown or…

1779

Abstract

Purpose

Medical research requires biological material and data collected through biobanks in reliable processes with quality assurance. Medical studies based on data with unknown or questionable quality are useless or even dangerous, as evidenced by recent examples of withdrawn studies. Medical data sets consist of highly sensitive personal data, which has to be protected carefully and is available for research only after the approval of ethics committees. The purpose of this research is to propose an architecture to support researchers to efficiently and effectively identify relevant collections of material and data with documented quality for their research projects while observing strict privacy rules.

Design/methodology/approach

Following a design science approach, this paper develops a conceptual model for capturing and relating metadata of medical data in biobanks to support medical research.

Findings

This study describes the landscape of biobanks as federated medical data lakes such as the collections of samples and their annotations in the European federation of biobanks (Biobanking and Biomolecular Resources Research Infrastructure – European Research Infrastructure Consortium, BBMRI-ERIC) and develops a conceptual model capturing schema information with quality annotation. This paper discusses the quality dimensions for data sets for medical research in-depth and proposes representations of both the metadata and data quality documentation with the aim to support researchers to effectively and efficiently identify suitable data sets for medical studies.

Originality/value

This novel conceptual model for metadata for medical data lakes has a unique focus on the high privacy requirements of the data sets contained in medical data lakes and also stands out in the detailed representation of data quality and metadata quality of medical data sets.

Details

International Journal of Web Information Systems, vol. 17 no. 5
Type: Research Article
ISSN: 1744-0084

Keywords

Open Access
Article
Publication date: 16 April 2019

Mohammad Moradi

As a relatively new computing paradigm, crowdsourcing has gained enormous attention in the recent decade. Its compliance with the Web 2.0 principles, also, puts forward…

2462

Abstract

Purpose

As a relatively new computing paradigm, crowdsourcing has gained enormous attention in the recent decade. Its compliance with the Web 2.0 principles, also, puts forward unprecedented opportunities to empower the related services and mechanisms by leveraging humans’ intelligence and problem solving abilities. With respect to the pivotal role of search engines in the Web and information community, this paper aims to investigate the advantages and challenges of incorporating people – as intelligent agents – into search engines’ workflow.

Design/methodology/approach

To emphasize the role of the human in computational processes, some specific and related areas are studied. Then, through studying the current trends in the field of crowd-powered search engines and analyzing the actual needs and requirements, the perspectives and challenges are discussed.

Findings

As the research on this topic is still in its infancy, it is believed that this study can be considered as a roadmap for future works in the field. In this regard, current status and development trends are delineated through providing a general overview of the literature. Moreover, several recommendations for extending the applicability and efficiency of next generation of crowd-powered search engines are presented. In fact, becoming aware of different aspects and challenges of constructing search engines of this kind can shed light on the way of developing working systems with respect to essential considerations.

Originality/value

The present study was aimed to portrait the big picture of crowd-powered search engines and possible challenges and issues. As one of the early works that provided a comprehensive report on different aspects of the topic, it can be regarded as a reference point.

Details

International Journal of Crowd Science, vol. 3 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 14 August 2017

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

2121

Abstract

Purpose

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

Design/methodology/approach

In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.

Findings

Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.

Originality/value

To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.

Details

PSU Research Review, vol. 1 no. 2
Type: Research Article
ISSN: 2399-1747

Keywords

Open Access
Article
Publication date: 26 April 2018

Antti Mikael Rousi, Reijo Savolainen, Maaria Harviainen and Pertti Vakkari

The purpose of this paper is to elaborate the picture of situational relevance of music information from a performing musician’s point of view by delving into its diverse layers…

2130

Abstract

Purpose

The purpose of this paper is to elaborate the picture of situational relevance of music information from a performing musician’s point of view by delving into its diverse layers within the context of Doctor of Music students’ information seeking.

Design/methodology/approach

Music-related information is approached through six modes that categorize music information sources based on their levels of abstraction. Situational relevance of the modes of music information is examined in relation to the situational requirements of accomplishing a dissertation on music task consisting of both a series of concerts and a written thesis. The empirical material was collected by interviewing Finnish doctoral students in the field of music performance.

Findings

A set of situational relevance types related to each mode of music information were identified. As a whole, the differences between the perceived importance of the modes varied a little.

Research limitations/implications

The goal of the present paper is not to create a generalizable list of situational relevance types suggested by modes of music information, but to show that the modes may suggest diverse situational relevance types of their own when evaluated by performing musicians.

Originality/value

The present paper provides a rare account on performing musicians’ vocational and school-related information seeking. For studies of music information retrieval, the present paper offers new contextual facets explaining why diverse music information could be relevant to musicians. For studies of music-related information seeking, the present study offers new insights on why performing musicians have information needs regarding certain types of music information sources.

Details

Journal of Documentation, vol. 74 no. 5
Type: Research Article
ISSN: 0022-0418

Keywords

Content available
Book part
Publication date: 29 April 1994

Abstract

Details

Using Subject Headings for Online Retrieval: Theory, Practice and Potential
Type: Book
ISBN: 978-0-12221-570-4

Open Access
Article
Publication date: 18 August 2021

Maria Giovanna Confetto and Claudia Covucci

For companies that intend to respond to the modern conscious consumers' needs, a great competitive advantage is played on the ability to incorporate sustainability messages in…

4429

Abstract

Purpose

For companies that intend to respond to the modern conscious consumers' needs, a great competitive advantage is played on the ability to incorporate sustainability messages in marketing communications. The aim of this paper is to address this important priority in the web context, building a semantic algorithm that allows content managers to evaluate the quality of sustainability web contents for search engines, considering the current semantic web development.

Design/methodology/approach

Following the Design Science (DS) methodological approach, the study develops the algorithm as an artefact capable of solving a practical problem and improving the operation of content managerial process.

Findings

The algorithm considers multiple factors of evaluation, grouped in three parameters: completeness, clarity and consistency. An applicability test of the algorithm was conducted on a sample of web pages of the Google blog on sustainability to highlight the correspondence between the established evaluation factors and those actually used by Google.

Practical implications

Studying content marketing for sustainability communication constitutes a new field of research that offers exciting opportunities. Writing sustainability contents in an effective way is a fundamental step to trigger stakeholder engagement mechanisms online. It could be a positive social engineering technique in the hands of marketers to make web users able to pursue sustainable development in their choices.

Originality/value

This is the first study that creates a theoretical connection between digital content marketing and sustainability communication focussing, especially, on the aspects of search engine optimization (SEO). The algorithm of “Sustainability-contents SEO” is the first operational software tool, with a regulatory nature, that is able to analyse the web contents, detecting the terms of the sustainability language and measuring the compliance to SEO requirements.

Details

The TQM Journal, vol. 33 no. 7
Type: Research Article
ISSN: 1754-2731

Keywords

Open Access
Article
Publication date: 18 January 2024

Puyu Yang and Giovanni Colavizza

Wikipedia's inclusive editorial policy permits unrestricted participation, enabling individuals to contribute and disseminate their expertise while drawing upon a multitude of…

1497

Abstract

Purpose

Wikipedia's inclusive editorial policy permits unrestricted participation, enabling individuals to contribute and disseminate their expertise while drawing upon a multitude of external sources. News media outlets constitute nearly one-third of all citations within Wikipedia. However, embracing such a radically open approach also poses the challenge of the potential introduction of biased content or viewpoints into Wikipedia. The authors conduct an investigation into the integrity of knowledge within Wikipedia, focusing on the dimensions of source political polarization and trustworthiness. Specifically, the authors delve into the conceivable presence of political polarization within the news media citations on Wikipedia, identify the factors that may influence such polarization within the Wikipedia ecosystem and scrutinize the correlation between political polarization in news media sources and the factual reliability of Wikipedia's content.

Design/methodology/approach

The authors conduct a descriptive and regression analysis, relying on Wikipedia Citations, a large-scale open dataset of nearly 30 million citations from English Wikipedia. Additionally, this dataset has been augmented with information obtained from the Media Bias Monitor (MBM) and the Media Bias Fact Check (MBFC).

Findings

The authors find a moderate yet significant liberal bias in the choice of news media sources across Wikipedia. Furthermore, the authors show that this effect persists when accounting for the factual reliability of the news media.

Originality/value

The results contribute to Wikipedia’s knowledge integrity agenda in suggesting that a systematic effort would help to better map potential biases in Wikipedia and find means to strengthen its neutral point of view policy.

Details

Online Information Review, vol. 48 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of 475