Search results

1 – 10 of 324
Open Access
Article
Publication date: 14 August 2017

Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Anne H.H. Ngu and Yihong Zhang

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

2049

Abstract

Purpose

This paper aims to propose a system for generating actionable knowledge from Big Data and use this system to construct a comprehensive knowledge base (KB), called GrandBase.

Design/methodology/approach

In particular, this study extracts new predicates from four types of data sources, namely, Web texts, Document Object Model (DOM) trees, existing KBs and query stream to augment the ontology of the existing KB (i.e. Freebase). In addition, a graph-based approach to conduct better truth discovery for multi-valued predicates is also proposed.

Findings

Empirical studies demonstrate the effectiveness of the approaches presented in this study and the potential of GrandBase. The future research directions regarding GrandBase construction and extension has also been discussed.

Originality/value

To revolutionize our modern society by using the wisdom of Big Data, considerable KBs have been constructed to feed the massive knowledge-driven applications with Resource Description Framework triples. The important challenges for KB construction include extracting information from large-scale, possibly conflicting and different-structured data sources (i.e. the knowledge extraction problem) and reconciling the conflicts that reside in the sources (i.e. the truth discovery problem). Tremendous research efforts have been contributed on both problems. However, the existing KBs are far from being comprehensive and accurate: first, existing knowledge extraction systems retrieve data from limited types of Web sources; second, existing truth discovery approaches commonly assume each predicate has only one true value. In this paper, the focus is on the problem of generating actionable knowledge from Big Data. A system is proposed, which consists of two phases, namely, knowledge extraction and truth discovery, to construct a broader KB, called GrandBase.

Details

PSU Research Review, vol. 1 no. 2
Type: Research Article
ISSN: 2399-1747

Keywords

Open Access
Article
Publication date: 6 March 2017

Zhuoxuan Jiang, Chunyan Miao and Xiaoming Li

Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by…

2121

Abstract

Purpose

Recent years have witnessed the rapid development of massive open online courses (MOOCs). With more and more courses being produced by instructors and being participated by learners all over the world, unprecedented massive educational resources are aggregated. The educational resources include videos, subtitles, lecture notes, quizzes, etc., on the teaching side, and forum contents, Wiki, log of learning behavior, log of homework, etc., on the learning side. However, the data are both unstructured and diverse. To facilitate knowledge management and mining on MOOCs, extracting keywords from the resources is important. This paper aims to adapt the state-of-the-art techniques to MOOC settings and evaluate the effectiveness on real data. In terms of practice, this paper also tries to answer the questions for the first time that to what extend can the MOOC resources support keyword extraction models, and how many human efforts are required to make the models work well.

Design/methodology/approach

Based on which side generates the data, i.e instructors or learners, the data are classified to teaching resources and learning resources, respectively. The approach used on teaching resources is based on machine learning models with labels, while the approach used on learning resources is based on graph model without labels.

Findings

From the teaching resources, the methods used by the authors can accurately extract keywords with only 10 per cent labeled data. The authors find a characteristic of the data that the resources of various forms, e.g. subtitles and PPTs, should be separately considered because they have the different model ability. From the learning resources, the keywords extracted from MOOC forums are not as domain-specific as those extracted from teaching resources, but they can reflect the topics which are lively discussed in forums. Then instructors can get feedback from the indication. The authors implement two applications with the extracted keywords: generating concept map and generating learning path. The visual demos show they have the potential to improve learning efficiency when they are integrated into a real MOOC platform.

Research limitations/implications

Conducting keyword extraction on MOOC resources is quite difficult because teaching resources are hard to be obtained due to copyrights. Also, getting labeled data is tough because usually expertise of the corresponding domain is required.

Practical implications

The experiment results support that MOOC resources are good enough for building models of keyword extraction, and an acceptable balance between human efforts and model accuracy can be achieved.

Originality/value

This paper presents a pioneer study on keyword extraction on MOOC resources and obtains some new findings.

Details

International Journal of Crowd Science, vol. 1 no. 1
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 28 July 2020

Prabhat Pokharel, Roshan Pokhrel and Basanta Joshi

Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities

1076

Abstract

Analysis of log message is very important for the identification of a suspicious system and network activity. This analysis requires the correct extraction of variable entities. The variable entities are extracted by comparing the logs messages against the log patterns. Each of these log patterns can be represented in the form of a log signature. In this paper, we present a hybrid approach for log signature extraction. The approach consists of two modules. The first module identifies log patterns by generating log clusters. The second module uses Named Entity Recognition (NER) to extract signatures by using the extracted log clusters. Experiments were performed on event logs from Windows Operating System, Exchange and Unix and validation of the result was done by comparing the signatures and the variable entities against the standard log documentation. The outcome of the experiments was that extracted signatures were ready to be used with a high degree of accuracy.

Details

Applied Computing and Informatics, vol. 19 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Open Access
Article
Publication date: 12 June 2017

Lichao Zhu, Hangzhou Yang and Zhijun Yan

The purpose of this paper is to develop a new method to extract medical temporal information from online health communities.

Abstract

Purpose

The purpose of this paper is to develop a new method to extract medical temporal information from online health communities.

Design/methodology/approach

The authors trained a conditional random-filed model for the extraction of temporal expressions. The temporal relation identification is considered as a classification task and several support vector machine classifiers are built in the proposed method. For the model training, the authors extracted some high-level semantic features including co-reference relationship of medical concepts and the semantic similarity among words.

Findings

For the extraction of TIMEX, the authors find that well-formatted expressions are easy to recognize, and the main challenge is the relative TIMEX such as “three days after onset”. It also shows the same difficulty for normalization of absolute date or well-formatted duration, whereas frequency is easier to be normalized. For the identification of DocTimeRel, the result is fairly well, and the relation is difficult to identify when it involves a relative TIMEX or a hypothetical concept.

Originality/value

The authors proposed a new method to extract temporal information from the online clinical data and evaluated the usefulness of different level of syntactic features in this task.

Details

International Journal of Crowd Science, vol. 1 no. 2
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 21 June 2021

Bufei Xing, Haonan Yin, Zhijun Yan and Jiachen Wang

The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and…

Abstract

Purpose

The purpose of this paper is to propose a new approach to retrieve similar questions in online health communities to improve the efficiency of health information retrieval and sharing.

Design/methodology/approach

This paper proposes a hybrid approach to combining domain knowledge similarity and topic similarity to retrieve similar questions in online health communities. The domain knowledge similarity can evaluate the domain distance between different questions. And the topic similarity measures questions’ relationship base on the extracted latent topics.

Findings

The experiment results show that the proposed method outperforms the baseline methods.

Originality/value

This method conquers the problem of word mismatch and considers the named entities included in questions, which most of existing studies did not.

Details

International Journal of Crowd Science, vol. 5 no. 2
Type: Research Article
ISSN: 2398-7294

Keywords

Open Access
Article
Publication date: 19 August 2022

Marlon Santiago Viñán-Ludeña and Luis M. de Campos

The main purpose of this paper is to analyze a tourist destination using sentiment analysis techniques with data from Twitter and Instagram to find the most representative entities

3112

Abstract

Purpose

The main purpose of this paper is to analyze a tourist destination using sentiment analysis techniques with data from Twitter and Instagram to find the most representative entities (or places) and perceptions (or aspects) of the users.

Design/methodology/approach

The authors used 90,725 Instagram posts and 235,755 Twitter tweets to analyze tourism in Granada (Spain) to identify the important places and perceptions mentioned by travelers on both social media sites. The authors used several approaches for sentiment classification for English and Spanish texts, including deep learning models.

Findings

The best results in a test set were obtained using a bidirectional encoder representations from transformers (BERT) model for Spanish texts and Tweeteval for English texts, and these were subsequently used to analyze the data sets. It was then possible to identify the most important entities and aspects, and this, in turn, provided interesting insights for researchers, practitioners, travelers and tourism managers so that services could be improved and better marketing strategies formulated.

Research limitations/implications

The authors propose a Spanish-Tourism-BERT model for performing sentiment classification together with a process to find places through hashtags and to reveal the important negative aspects of each place.

Practical implications

The study enables managers and practitioners to implement the Spanish-BERT model with our Spanish Tourism data set that the authors released for adoption in applications to find both positive and negative perceptions.

Originality/value

This study presents a novel approach on how to apply sentiment analysis in the tourism domain. First, the way to evaluate the different existing models and tools is presented; second, a model is trained using BERT (deep learning model); third, an approach of how to identify the acceptance of the places of a destination through hashtags is presented and, finally, the evaluation of why the users express positivity (negativity) through the identification of entities and aspects.

研究目的

这项工作的主要目的是使用情感分析技术和来自 Twitter 和 Instagram 的数据来分析旅游目的地, 以便找到最具代表性的实体(或地点)和用户的感知(或方面)。

研究设计/方法/途径

我们使用 90,725 个 Instagram 帖子和 235,755 个 Twitter 推文来分析格拉纳达(西班牙)的旅游业, 以确定旅行者在两个社交媒体网站上提到的重要地点和看法。我们使用了几种方法对英语和西班牙语文本进行情感分类, 包括深度学习模型。

研究发现

测试集中的最佳结果是使用来自Transformers (BERT) 模型的双向编码器表示 (BERT) 用于西班牙语文本和Tweeteval 用于英语文本, 这些结果随后用于分析我们的数据集。然后可以确定最重要的实体和方面, 这反过来又为研究人员、从业人员、旅行者和旅游管理者提供了有趣的见解, 从而可以改进服务并制定更好的营销策略。

研究局限性

我们提出了一个用于执行情感分类的西班牙旅游 BERT 模型, 以及通过主题标签找到地点并揭示每个地点的重要负面方面的过程。

实践意义

该研究使管理人员和从业人员能够使用我们发布的西班牙旅游数据集实施西班牙-BERT 模型, 以便在应用程序中采用该数据集, 以找到正面和负面的看法。

研究原创性

本研究提出了一种如何在旅游领域应用情感分析的新方法。首先, 介绍了评估不同现有模型和工具的方法; 其次, 使用 BERT(深度学习模型)训练模型; 第三, 提出了如何通过标签识别目的地地点的接受度的方法, 最后通过实体和方面的识别来评估用户表达积极性(消极性)的原因。

Details

Journal of Hospitality and Tourism Technology, vol. 13 no. 5
Type: Research Article
ISSN: 1757-9880

Keywords

Open Access
Article
Publication date: 23 May 2023

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen and Juha Rautiainen

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different…

Abstract

Purpose

This study aims to identify user perception of different qualities of optical character recognition (OCR) in texts. The purpose of this paper is to study the effect of different quality OCR on users' subjective perception through an interactive information retrieval task with a collection of one digitized historical Finnish newspaper.

Design/methodology/approach

This study is based on the simulated work task model used in interactive information retrieval. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869–1918 which consists of ca. 1.45 million autosegmented articles. The article search database had two versions of each article with different quality OCR. Each user performed six pre-formulated and six self-formulated short queries and evaluated subjectively the top 10 results using a graded relevance scale of 0–3. Users were not informed about the OCR quality differences of the otherwise identical articles.

Findings

The main result of the study is that improved OCR quality affects subjective user perception of historical newspaper articles positively: higher relevance scores are given to better-quality texts.

Originality/value

To the best of the authors’ knowledge, this simulated interactive work task experiment is the first one showing empirically that users' subjective relevance assessments are affected by a change in the quality of an optically read text.

Details

Journal of Documentation, vol. 79 no. 7
Type: Research Article
ISSN: 0022-0418

Keywords

Open Access
Article
Publication date: 10 May 2018

Daniel Stefan Hain and Roman Jurowetzki

The purpose of this paper is to shed light on the changing pattern and characteristics of international financial flows in the emerging entrepreneurial ecosystems of Sub-Saharan…

4558

Abstract

Purpose

The purpose of this paper is to shed light on the changing pattern and characteristics of international financial flows in the emerging entrepreneurial ecosystems of Sub-Saharan Africa (SSA), provide a novel taxonomy to classify and analyze them, and discuss how such investments contribute to competence building and sustainable development.

Design/methodology/approach

In an exploratory study, the authors analyze the characteristics of international venture capital investors and the start-ups receiving funding in Kenya and map their interaction. The authors proceed by developing a novel taxonomy, classifying investors according to their main rationales (for-profit-for-impact), and start-ups according to the locus of needs and markets addressed by the start-up (local-global) and the locus of the start-ups capacity and knowledge (local-global).

Findings

The authors observe a new type of mainly western investors who support innovative ideas in SSA by identifying and investing in domestically developed technical innovations with the potential to address global market needs. The authors find such innovations to be mainly developed at the intersect of global and local knowledge.

Originality/value

The authors shed light on the – up to now – under-researched emerging phenomenon of international high-tech investments in SSA, and develop a novel taxonomy of technology investments in low-income countries, guiding further research on the conditions, impact, practical, and policy implications of this new form of finance flows.

Details

Journal of Small Business and Enterprise Development, vol. 25 no. 3
Type: Research Article
ISSN: 1462-6004

Keywords

Open Access
Article
Publication date: 9 April 2024

Lilian Gheyathaldin Salih

This study investigated the visibility of carbon emissions allowances accounting in the financial reports of 32 clean development mechanism (CDM) projects in the UAE to uncover…

Abstract

Purpose

This study investigated the visibility of carbon emissions allowances accounting in the financial reports of 32 clean development mechanism (CDM) projects in the UAE to uncover the obstacles to setting consistent standards for carbon emission accounting. As carbon emissions are monetized as credits, consistent accounting standards can aid decision-makers in the development of carbon emission mitigation strategies.

Design/methodology/approach

This study used a grounded theoretical framework for exploring the terms used in the policy documents of international accounting bodies regarding accounting standards and guidelines for carbon emission credits. Raw qualitative data were gathered, and an inductive approach was used by analyzing documents from various sources using the qualitative data text analysis software QDA Miner 6.

Findings

The findings showed that the financial statement reports of the corporations did not include disclosure of the carbon credit account. This omission was due to the lack of global standardization of carbon credit accounts and emission allowance recognition. This may hinder the production of a comprehensive report containing accurate and valuable financial information relevant to all stakeholders.

Originality/value

The study is among the first to use a grounded theoretical framework to investigate whether corporations are applying common standards and guidelines for carbon emissions accounting.

Details

Asian Journal of Accounting Research, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2459-9700

Keywords

Open Access
Article
Publication date: 16 January 2023

Javier A. Sánchez-Torres, Yuri Lorene Hernández Fernández and Carolina Perlaza Lopera

This study examines the factors that influence the ecotourist behavior of university students. The understanding of what motivates these students can inform future suggestions for…

1674

Abstract

Purpose

This study examines the factors that influence the ecotourist behavior of university students. The understanding of what motivates these students can inform future suggestions for strategies and actions in ecotourism.

Design/methodology/approach

The study was applied to university students of the University of Medellín, Colombia. It was an exploratory empirical study that surveyed a total of 696 students.

Findings

The results show that students with a positive attitude toward ecology tend to be interested in nature-related activities, therefore generating an intention to engage in ecotourism. The authors found that those who view ecotourism as an activity that promotes fun and happiness tend to engage more frequently in these activities.

Originality/value

This study is of great interest for research in motivational theory, specifically the analysis of personality profiles and how these relate to specific tourism behaviors. The findings of this study strongly suggest that those interested in the management and development of ecotourism should establish practices and programs that consider factors such as tourist segmentation, effective communication of the positive qualities of ecotourism and environmental stewardship involved in these activities.

Details

Journal of Tourism Futures, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 2055-5911

Keywords

1 – 10 of 324