Search results

1 – 10 of over 8000
Article
Publication date: 8 February 2013

Fattane Zarrinkalam and Mohsen Kahani

The purpose of this paper is to propose a novel citation recommendation system that inputs a text and recommends publications that should be cited by it. Its goal is to help…

Abstract

Purpose

The purpose of this paper is to propose a novel citation recommendation system that inputs a text and recommends publications that should be cited by it. Its goal is to help researchers in finding related works. Further, this paper seeks to explore the effect of using relational features in addition to textual features on the quality of recommended citations.

Design/methodology/approach

In order to propose a novel citation recommendation system, first a new relational similarity measure is proposed for calculating the relatedness of two publications. Then, a recommendation algorithm is presented that uses both relational and textual features to compute the semantic distances of publications of a bibliographic dataset from the input text.

Findings

The evaluation of the proposed system shows that combining relational features with textual features leads to better recommendations, in comparison with relying only on the textual features. It also demonstrates that citation context plays an important role among textual features. In addition, it is concluded that different relational features have different contributions to the proposed similarity measure.

Originality/value

A new citation recommendation system is proposed which uses a novel semantic distance measure. This measure is based on textual similarities and a new relational similarity concept. The other contribution of this paper is that it sheds more light on the importance of citation context in citation recommendation, by providing more evidences through analysis of the results. In addition, a genetic algorithm is developed for assigning weights to the relational features in the similarity measure.

Article
Publication date: 12 October 2021

Didem Ölçer and Tuğba Taşkaya Temizel

This paper proposes a framework that automatically assesses content coverage and information quality of health websites for end-users.

Abstract

Purpose

This paper proposes a framework that automatically assesses content coverage and information quality of health websites for end-users.

Design/methodology/approach

The study investigates the impact of textual and content-based features in predicting the quality of health-related texts. Content-based features were acquired using an evidence-based practice guideline in diabetes. A set of textual features inspired by professional health literacy guidelines and the features commonly used for assessing information quality in other domains were also used. In this study, 60 websites about type 2 diabetes were methodically selected for inclusion. Two general practitioners used DISCERN to assess each website in terms of its content coverage and quality.

Findings

The proposed framework outputs were compared with the experts' evaluation scores. The best accuracy was obtained as 88 and 92% with textual features and content-based features for coverage assessment respectively. When both types of features were used, the proposed framework achieved 90% accuracy. For information quality assessment, the content-based features resulted in a higher accuracy of 92% against 88% obtained using the textual features.

Research limitations/implications

The experiments were conducted for websites about type 2 diabetes. As the whole process is costly and requires extensive expert human labelling, the study was carried out in a single domain. However, the methodology is generalizable to other health domains for which evidence-based practice guidelines are available.

Practical implications

Finding high-quality online health information is becoming increasingly difficult due to the high volume of information generated by non-experts in the area. The search engines fail to rank objective health websites higher within the search results. The proposed framework can aid search engine and information platform developers to implement better retrieval techniques, in turn, facilitating end-users' access to high-quality health information.

Social implications

Erroneous, biased or partial health information is a serious problem for end-users who need access to objective information on their health problems. Such information may cause patients to stop their treatments provided by professionals. It might also have adverse financial implications by causing unnecessary expenditures on ineffective treatments. The ability to access high-quality health information has a positive effect on the health of both individuals and the whole society.

Originality/value

The paper demonstrates that automatic assessment of health websites is a domain-specific problem, which cannot be addressed with the general information quality assessment methodologies in the literature. Content coverage of health websites has also been studied in the health domain for the first time in the literature.

Details

Online Information Review, vol. 46 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 1 November 2005

Mohamed Hammami, Youssef Chahir and Liming Chen

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable…

Abstract

Along with the ever growingWeb is the proliferation of objectionable content, such as sex, violence, racism, etc. We need efficient tools for classifying and filtering undesirable web content. In this paper, we investigate this problem through WebGuard, our automatic machine learning based pornographic website classification and filtering system. Facing the Internet more and more visual and multimedia as exemplified by pornographic websites, we focus here our attention on the use of skin color related visual content based analysis along with textual and structural content based analysis for improving pornographic website filtering. While the most commercial filtering products on the marketplace are mainly based on textual content‐based analysis such as indicative keywords detection or manually collected black list checking, the originality of our work resides on the addition of structural and visual content‐based analysis to the classical textual content‐based analysis along with several major‐data mining techniques for learning and classifying. Experimented on a testbed of 400 websites including 200 adult sites and 200 non pornographic ones, WebGuard, our Web filtering engine scored a 96.1% classification accuracy rate when only textual and structural content based analysis are used, and 97.4% classification accuracy rate when skin color related visual content based analysis is driven in addition. Further experiments on a black list of 12 311 adult websites manually collected and classified by the French Ministry of Education showed that WebGuard scored 87.82% classification accuracy rate when using only textual and structural content‐based analysis, and 95.62% classification accuracy rate when the visual content‐based analysis is driven in addition. The basic framework of WebGuard can apply to other categorization problems of websites which combine, as most of them do today, textual and visual content.

Details

International Journal of Web Information Systems, vol. 1 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 15 June 2021

Runyu Chen

Micro-video platforms have gained attention in recent years and have also become an important new channel for merchants to advertise their products. Since little research has…

Abstract

Purpose

Micro-video platforms have gained attention in recent years and have also become an important new channel for merchants to advertise their products. Since little research has studied micro-video advertising, this paper aims to fill the research gap by exploring the determinants of micro-video advertising clicks. We form a micro-video advertising click prediction model and demonstrate the effectiveness of the multimodal information extracted from the advertisement producers, commodities being sold and micro-video contents in the prediction task.

Design/methodology/approach

A multimodal analysis framework was conducted based on real-world micro-video advertisement datasets. To better capture the relations between different modalities, we adopt a cooperative learning model to predict the advertising clicks.

Findings

The experimental results show that the features extracted from different data sources can improve the prediction performance. Furthermore, the combination of different modal features (visual, acoustic, textual and numerical) is also worth studying. Compared to classical baseline models, the proposed cooperative learning model significantly outperforms the prediction results, which demonstrates that the relations between modalities are also important in advertising micro-video generation.

Originality/value

To the best of our knowledge, this is the first study analysing micro-video advertising effects. With the help of our advertising click prediction model, advertisement producers (merchants or their partners) can benefit from generating more effective micro-video advertisements. Furthermore, micro-video platforms can apply our prediction results to optimise their advertisement allocation algorithm and better manage network traffic. This research can be of great help for more effective development of the micro-video advertisement industry.

Details

Internet Research, vol. 32 no. 2
Type: Research Article
ISSN: 1066-2243

Keywords

Article
Publication date: 6 November 2017

Yanti Idaya Aspura M.K. and Shahrul Azman Mohd Noah

The purpose of this study is to reduce the semantic distance by proposing a model for integrating indexes of textual and visual features via a multi-modality ontology and the use…

Abstract

Purpose

The purpose of this study is to reduce the semantic distance by proposing a model for integrating indexes of textual and visual features via a multi-modality ontology and the use of DBpedia to improve the comprehensiveness of the ontology to enhance semantic retrieval.

Design/methodology/approach

A multi-modality ontology-based approach was developed to integrate high-level concepts and low-level features, as well as integrate the ontology base with DBpedia to enrich the knowledge resource. A complete ontology model was also developed to represent the domain of sport news, with image caption keywords and image features. Precision and recall were used as metrics to evaluate the effectiveness of the multi-modality approach, and the outputs were compared with those obtained using a single-modality approach (i.e. textual ontology and visual ontology).

Findings

The results based on ten queries show a superior performance of the multi-modality ontology-based IMR system integrated with DBpedia in retrieving correct images in accordance with user queries. The system achieved 100 per cent precision for six of the queries and greater than 80 per cent precision for the other four queries. The text-based system only achieved 100 per cent precision for one query; all other queries yielded precision rates less than 0.500.

Research limitations/implications

This study only focused on BBC Sport News collection in the year 2009.

Practical implications

The paper includes implications for the development of ontology-based retrieval on image collection.

Originality value

This study demonstrates the strength of using a multi-modality ontology integrated with DBpedia for image retrieval to overcome the deficiencies of text-based and ontology-based systems. The result validates semantic text-based with multi-modality ontology and DBpedia as a useful model to reduce the semantic distance.

Details

The Electronic Library, vol. 35 no. 6
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 31 May 2022

Osamah M. Al-Qershi, Junbum Kwon, Shuning Zhao and Zhaokun Li

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of…

Abstract

Purpose

For the case of many content features, This paper aims to investigate which content features in video and text ads more contribute to accurately predicting the success of crowdfunding by comparing prediction models.

Design/methodology/approach

With 1,368 features extracted from 15,195 Kickstarter campaigns in the USA, the authors compare base models such as logistic regression (LR) with tree-based homogeneous ensembles such as eXtreme gradient boosting (XGBoost) and heterogeneous ensembles such as XGBoost + LR.

Findings

XGBoost shows higher prediction accuracy than LR (82% vs 69%), in contrast to the findings of a previous relevant study. Regarding important content features, humans (e.g. founders) are more important than visual objects (e.g. products). In both spoken and written language, words related to experience (e.g. eat) or perception (e.g. hear) are more important than cognitive (e.g. causation) words. In addition, a focus on the future is more important than a present or past time orientation. Speech aids (see and compare) to complement visual content are also effective and positive tone matters in speech.

Research limitations/implications

This research makes theoretical contributions by finding more important visuals (human) and language features (experience, perception and future time). Also, in a multimodal context, complementary cues (e.g. speech aids) across different modalities help. Furthermore, the noncontent parts of speech such as positive “tone” or pace of speech are important.

Practical implications

Founders are encouraged to assess and revise the content of their video or text ads as well as their basic campaign features (e.g. goal, duration and reward) before they launch their campaigns. Next, overly complex ensembles may suffer from overfitting problems. In practice, model validation using unseen data is recommended.

Originality/value

Rather than reducing the number of content feature dimensions (Kaminski and Hopp, 2020), by enabling advanced prediction models to accommodate many contents features, prediction accuracy rises substantially.

Article
Publication date: 29 September 2023

Shasha Deng, Xuan Cheng and Rong Hu

As convenience and anonymity, people with mental illness are increasingly willing to communicate and share information through social media platforms to receive emotional and…

Abstract

Purpose

As convenience and anonymity, people with mental illness are increasingly willing to communicate and share information through social media platforms to receive emotional and spiritual support. The purpose of this paper is to identify the degree of depression based on people's behavioral patterns and discussion content on the Internet.

Design/methodology/approach

Based on the previous studies on depression, the severity of depression is divided into four categories: no significant depressive symptoms, mild MDD, moderate MDD and severe MDD, and defined each of them. Next, in order to automatically identify the severity, the authors proposed social media digital cues to identify the severity of depression, which include textual lexical features, depressive language features and social behavioral features. Finally, the authors evaluate a system that is developed based on social media digital cues in the experiment using social media data.

Findings

The social media digital cues including textual lexical features, depressive language features and social behavioral features (F1, F2 and F3) is the relatively best one to classify four different levels of depression.

Originality/value

This paper innovatively proposes a social media data-based framework (SMDF) to identify and predict different degrees of depression through social media digital cues and evaluates the accuracy of the detection through social media data, providing useful attempts for the identification and intervention of depression.

Details

Industrial Management & Data Systems, vol. 123 no. 12
Type: Research Article
ISSN: 0263-5577

Keywords

Article
Publication date: 7 October 2021

Juan Yang, Xu Du, Jui-Long Hung and Chih-hsiung Tu

Critical thinking is considered important in psychological science because it enables students to make effective decisions and optimizes their performance. Aiming at the…

Abstract

Purpose

Critical thinking is considered important in psychological science because it enables students to make effective decisions and optimizes their performance. Aiming at the challenges and issues of understanding the student's critical thinking, the objective of this study is to analyze online discussion data through an advanced multi-feature fusion modeling (MFFM) approach for automatically and accurately understanding the student's critical thinking levels.

Design/methodology/approach

An advanced MFFM approach is proposed in this study. Specifically, with considering the time-series characteristic and the high correlations between adjacent words in discussion contents, the long short-term memory–convolutional neural network (LSTM-CNN) architecture is proposed to extract deep semantic features, and then these semantic features are combined with linguistic and psychological knowledge generated by the LIWC2015 tool as the inputs of full-connected layers to automatically and accurately predict students' critical thinking levels that are hidden in online discussion data.

Findings

A series of experiments with 94 students' 7,691 posts were conducted to verify the effectiveness of the proposed approach. The experimental results show that the proposed MFFM approach that combines two types of textual features outperforms baseline methods, and the semantic-based padding can further improve the prediction performance of MFFM. It can achieve 0.8205 overall accuracy and 0.6172 F1 score for the “high” category on the validation dataset. Furthermore, it is found that the semantic features extracted by LSTM-CNN are more powerful for identifying self-introduction or off-topic discussions, while the linguistic, as well as psychological features, can better distinguish the discussion posts with the highest critical thinking level.

Originality/value

With the support of the proposed MFFM approach, online teachers can conveniently and effectively understand the interaction quality of online discussions, which can support instructional decision-making to better promote the student's knowledge construction process and improve learning performance.

Details

Data Technologies and Applications, vol. 56 no. 2
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 6 February 2023

Xiaobo Tang, Heshen Zhou and Shixuan Li

Predicting highly cited papers can enable an evaluation of the potential of papers and the early detection and determination of academic achievement value. However, most highly…

Abstract

Purpose

Predicting highly cited papers can enable an evaluation of the potential of papers and the early detection and determination of academic achievement value. However, most highly cited paper prediction studies consider early citation information, so predicting highly cited papers by publication is challenging. Therefore, the authors propose a method for predicting early highly cited papers based on their own features.

Design/methodology/approach

This research analyzed academic papers published in the Journal of the Association for Computing Machinery (ACM) from 2000 to 2013. Five types of features were extracted: paper features, journal features, author features, reference features and semantic features. Subsequently, the authors applied a deep neural network (DNN), support vector machine (SVM), decision tree (DT) and logistic regression (LGR), and they predicted highly cited papers 1–3 years after publication.

Findings

Experimental results showed that early highly cited academic papers are predictable when they are first published. The authors’ prediction models showed considerable performance. This study further confirmed that the features of references and authors play an important role in predicting early highly cited papers. In addition, the proportion of high-quality journal references has a more significant impact on prediction.

Originality/value

Based on the available information at the time of publication, this study proposed an effective early highly cited paper prediction model. This study facilitates the early discovery and realization of the value of scientific and technological achievements.

Details

Library Hi Tech, vol. ahead-of-print no. ahead-of-print
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 19 September 2022

Srishti Sharma, Mala Saraswat and Anil Kumar Dubey

Owing to the increased accessibility of internet and related technologies, more and more individuals across the globe now turn to social media for their daily dose of news rather…

Abstract

Purpose

Owing to the increased accessibility of internet and related technologies, more and more individuals across the globe now turn to social media for their daily dose of news rather than traditional news outlets. With the global nature of social media and hardly any checks in place on posting of content, exponential increase in spread of fake news is easy. Businesses propagate fake news to improve their economic standing and influencing consumers and demand, and individuals spread fake news for personal gains like popularity and life goals. The content of fake news is diverse in terms of topics, styles and media platforms, and fake news attempts to distort truth with diverse linguistic styles while simultaneously mocking true news. All these factors together make fake news detection an arduous task. This work tried to check the spread of disinformation on Twitter.

Design/methodology/approach

This study carries out fake news detection using user characteristics and tweet textual content as features. For categorizing user characteristics, this study uses the XGBoost algorithm. To classify the tweet text, this study uses various natural language processing techniques to pre-process the tweets and then apply a hybrid convolutional neural network–recurrent neural network (CNN-RNN) and state-of-the-art Bidirectional Encoder Representations from Transformers (BERT) transformer.

Findings

This study uses a combination of machine learning and deep learning approaches for fake news detection, namely, XGBoost, hybrid CNN-RNN and BERT. The models have also been evaluated and compared with various baseline models to show that this approach effectively tackles this problem.

Originality/value

This study proposes a novel framework that exploits news content and social contexts to learn useful representations for predicting fake news. This model is based on a transformer architecture, which facilitates representation learning from fake news data and helps detect fake news easily. This study also carries out an investigative study on the relative importance of content and social context features for the task of detecting false news and whether absence of one of these categories of features hampers the effectiveness of the resultant system. This investigation can go a long way in aiding further research on the subject and for fake news detection in the presence of extremely noisy or unusable data.

Details

International Journal of Web Information Systems, vol. 18 no. 5/6
Type: Research Article
ISSN: 1744-0084

Keywords

1 – 10 of over 8000