Search results
1 – 10 of 475Daniel Hofer, Markus Jäger, Aya Khaled Youssef Sayed Mohamed and Josef Küng
For aiding computer security experts in their study, log files are a crucial piece of information. Especially the time domain is very important for us because in most cases…
Abstract
Purpose
For aiding computer security experts in their study, log files are a crucial piece of information. Especially the time domain is very important for us because in most cases, timestamps are the only linking points between events caused by attackers, faulty systems or simple errors and their corresponding entries in log files. With the idea of storing and analyzing this log information in graph databases, we need a suitable model to store and connect timestamps and their events. This paper aims to find and evaluate different approaches how to store timestamps in graph databases and their individual benefits and drawbacks.
Design/methodology/approach
We analyse three different approaches, how timestamp information can be represented and stored in graph databases. For checking the models, we set up four typical questions that are important for log file analysis and tested them for each of the models. During the evaluation, we used the performance and other properties as metrics, how suitable each of the models is for representing the log files’ timestamp information. In the last part, we try to improve one promising looking model.
Findings
We come to the conclusion, that the simplest model with the least graph database-specific concepts in use is also the one yielding the simplest and fastest queries.
Research limitations/implications
Limitations to this research are that only one graph database was studied and also improvements to the query engine might change future results.
Originality/value
In the study, we addressed the issue of storing timestamps in graph databases in a meaningful, practical and efficient way. The results can be used as a pattern for similar scenarios and applications.
Details
Keywords
Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen
This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…
Abstract
Purpose
This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.
Design/methodology/approach
This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.
Findings
The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.
Originality/value
To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.
Details
Keywords
Johann Eder and Vladimir A. Shekhovtsov
Medical research requires biological material and data collected through biobanks in reliable processes with quality assurance. Medical studies based on data with unknown or…
Abstract
Purpose
Medical research requires biological material and data collected through biobanks in reliable processes with quality assurance. Medical studies based on data with unknown or questionable quality are useless or even dangerous, as evidenced by recent examples of withdrawn studies. Medical data sets consist of highly sensitive personal data, which has to be protected carefully and is available for research only after the approval of ethics committees. The purpose of this research is to propose an architecture to support researchers to efficiently and effectively identify relevant collections of material and data with documented quality for their research projects while observing strict privacy rules.
Design/methodology/approach
Following a design science approach, this paper develops a conceptual model for capturing and relating metadata of medical data in biobanks to support medical research.
Findings
This study describes the landscape of biobanks as federated medical data lakes such as the collections of samples and their annotations in the European federation of biobanks (Biobanking and Biomolecular Resources Research Infrastructure – European Research Infrastructure Consortium, BBMRI-ERIC) and develops a conceptual model capturing schema information with quality annotation. This paper discusses the quality dimensions for data sets for medical research in-depth and proposes representations of both the metadata and data quality documentation with the aim to support researchers to effectively and efficiently identify suitable data sets for medical studies.
Originality/value
This novel conceptual model for metadata for medical data lakes has a unique focus on the high privacy requirements of the data sets contained in medical data lakes and also stands out in the detailed representation of data quality and metadata quality of medical data sets.
Details
Keywords
As a relatively new computing paradigm, crowdsourcing has gained enormous attention in the recent decade. Its compliance with the Web 2.0 principles, also, puts forward…
Abstract
Purpose
As a relatively new computing paradigm, crowdsourcing has gained enormous attention in the recent decade. Its compliance with the Web 2.0 principles, also, puts forward unprecedented opportunities to empower the related services and mechanisms by leveraging humans’ intelligence and problem solving abilities. With respect to the pivotal role of search engines in the Web and information community, this paper aims to investigate the advantages and challenges of incorporating people – as intelligent agents – into search engines’ workflow.
Design/methodology/approach
To emphasize the role of the human in computational processes, some specific and related areas are studied. Then, through studying the current trends in the field of crowd-powered search engines and analyzing the actual needs and requirements, the perspectives and challenges are discussed.
Findings
As the research on this topic is still in its infancy, it is believed that this study can be considered as a roadmap for future works in the field. In this regard, current status and development trends are delineated through providing a general overview of the literature. Moreover, several recommendations for extending the applicability and efficiency of next generation of crowd-powered search engines are presented. In fact, becoming aware of different aspects and challenges of constructing search engines of this kind can shed light on the way of developing working systems with respect to essential considerations.
Originality/value
The present study was aimed to portrait the big picture of crowd-powered search engines and possible challenges and issues. As one of the early works that provided a comprehensive report on different aspects of the topic, it can be regarded as a reference point.
Details
Keywords
Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang
This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.
Abstract
Purpose
This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.
Design/methodology/approach
In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.
Findings
Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.
Originality/value
To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.
Details
Keywords
Antti Mikael Rousi, Reijo Savolainen, Maaria Harviainen and Pertti Vakkari
The purpose of this paper is to elaborate the picture of situational relevance of music information from a performing musician’s point of view by delving into its diverse layers…
Abstract
Purpose
The purpose of this paper is to elaborate the picture of situational relevance of music information from a performing musician’s point of view by delving into its diverse layers within the context of Doctor of Music students’ information seeking.
Design/methodology/approach
Music-related information is approached through six modes that categorize music information sources based on their levels of abstraction. Situational relevance of the modes of music information is examined in relation to the situational requirements of accomplishing a dissertation on music task consisting of both a series of concerts and a written thesis. The empirical material was collected by interviewing Finnish doctoral students in the field of music performance.
Findings
A set of situational relevance types related to each mode of music information were identified. As a whole, the differences between the perceived importance of the modes varied a little.
Research limitations/implications
The goal of the present paper is not to create a generalizable list of situational relevance types suggested by modes of music information, but to show that the modes may suggest diverse situational relevance types of their own when evaluated by performing musicians.
Originality/value
The present paper provides a rare account on performing musicians’ vocational and school-related information seeking. For studies of music information retrieval, the present paper offers new contextual facets explaining why diverse music information could be relevant to musicians. For studies of music-related information seeking, the present study offers new insights on why performing musicians have information needs regarding certain types of music information sources.
Details
Keywords
Maria Giovanna Confetto and Claudia Covucci
For companies that intend to respond to the modern conscious consumers' needs, a great competitive advantage is played on the ability to incorporate sustainability messages in…
Abstract
Purpose
For companies that intend to respond to the modern conscious consumers' needs, a great competitive advantage is played on the ability to incorporate sustainability messages in marketing communications. The aim of this paper is to address this important priority in the web context, building a semantic algorithm that allows content managers to evaluate the quality of sustainability web contents for search engines, considering the current semantic web development.
Design/methodology/approach
Following the Design Science (DS) methodological approach, the study develops the algorithm as an artefact capable of solving a practical problem and improving the operation of content managerial process.
Findings
The algorithm considers multiple factors of evaluation, grouped in three parameters: completeness, clarity and consistency. An applicability test of the algorithm was conducted on a sample of web pages of the Google blog on sustainability to highlight the correspondence between the established evaluation factors and those actually used by Google.
Practical implications
Studying content marketing for sustainability communication constitutes a new field of research that offers exciting opportunities. Writing sustainability contents in an effective way is a fundamental step to trigger stakeholder engagement mechanisms online. It could be a positive social engineering technique in the hands of marketers to make web users able to pursue sustainable development in their choices.
Originality/value
This is the first study that creates a theoretical connection between digital content marketing and sustainability communication focussing, especially, on the aspects of search engine optimization (SEO). The algorithm of “Sustainability-contents SEO” is the first operational software tool, with a regulatory nature, that is able to analyse the web contents, detecting the terms of the sustainability language and measuring the compliance to SEO requirements.
Details
Keywords
Puyu Yang and Giovanni Colavizza
Wikipedia's inclusive editorial policy permits unrestricted participation, enabling individuals to contribute and disseminate their expertise while drawing upon a multitude of…
Abstract
Purpose
Wikipedia's inclusive editorial policy permits unrestricted participation, enabling individuals to contribute and disseminate their expertise while drawing upon a multitude of external sources. News media outlets constitute nearly one-third of all citations within Wikipedia. However, embracing such a radically open approach also poses the challenge of the potential introduction of biased content or viewpoints into Wikipedia. The authors conduct an investigation into the integrity of knowledge within Wikipedia, focusing on the dimensions of source political polarization and trustworthiness. Specifically, the authors delve into the conceivable presence of political polarization within the news media citations on Wikipedia, identify the factors that may influence such polarization within the Wikipedia ecosystem and scrutinize the correlation between political polarization in news media sources and the factual reliability of Wikipedia's content.
Design/methodology/approach
The authors conduct a descriptive and regression analysis, relying on Wikipedia Citations, a large-scale open dataset of nearly 30 million citations from English Wikipedia. Additionally, this dataset has been augmented with information obtained from the Media Bias Monitor (MBM) and the Media Bias Fact Check (MBFC).
Findings
The authors find a moderate yet significant liberal bias in the choice of news media sources across Wikipedia. Furthermore, the authors show that this effect persists when accounting for the factual reliability of the news media.
Originality/value
The results contribute to Wikipedia’s knowledge integrity agenda in suggesting that a systematic effort would help to better map potential biases in Wikipedia and find means to strengthen its neutral point of view policy.
Details