Search results

1 – 10 of over 1000
Article
Publication date: 8 March 2011

Fatih Oguz and Michael Holt

The purpose of this research is to identify and describe the impact of comment spam in library blogs. Three research questions guided the study: current level of commenting in…

1801

Abstract

Purpose

The purpose of this research is to identify and describe the impact of comment spam in library blogs. Three research questions guided the study: current level of commenting in library blogs; librarians' perception of comment spam; and techniques used to address the comment spam problem.

Design/methodology/approach

A quantitative approach is used to investigate research questions. Informal interviews were conducted with four academic and three public libraries with active blogs to develop a better understanding of the problem and then to develop an appropriate data collection instrument. Based on the feedback received from these blog administrators, a survey questionnaire was developed and then distributed online via direct e‐mailing and mailing lists. A total of 108 responses were received.

Findings

Regardless of the library type with which blogs were affiliated with and the size of the community they served, user participation in library blogs was very limited in terms of comments left. Over 80 percent of libraries reported receiving five or fewer comments in a given week. Comment spam was not perceived to be a major problem by blog administrators. Detection‐based techniques were the most commonly used approaches to combat comment spam in library blogs.

Research limitations/implications

The research focuses on the comment spam problem in blogs affiliated with libraries where the library is responsible for content published on the blog. The comment spam problem is investigated from the library blog administrator's perspective.

Practical implications

Results of this study provide empirical evidence regarding level of commenting and the impact of comment spam in library blogs. The results and findings of the study can offer guidance to libraries that are reconsidering whether to allow commenting in their blogs and to those that are planning to establish a blog to reach out to their users, while keeping this online environment engaging and interactive.

Originality/value

The study provides empirical evidence that level of commenting is very limited, comment spam is not regarded as an important problem, and it does not interfere with the communication process in library blogs.

Details

Library Hi Tech, vol. 29 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Open Access
Article
Publication date: 23 July 2020

Rami Mustafa A. Mohammad

Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet…

1962

Abstract

Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet users. Several features can be used for creating data mining and machine learning based spam classification models. Yet, spammers know that the longer they will use the same set of features for tricking email users the more probably the anti-spam parties might develop tools for combating this kind of annoying email messages. Spammers, so, adapt by continuously reforming the group of features utilized for composing spam emails. For that reason, even though traditional classification methods possess sound classification results, they were ineffective for lifelong classification of spam emails duo to the fact that they might be prone to the so-called “Concept Drift”. In the current study, an enhanced model is proposed for ensuring lifelong spam classification model. For the evaluation purposes, the overall performance of the suggested model is contrasted against various other stream mining classification techniques. The results proved the success of the suggested model as a lifelong spam emails classification method.

Details

Applied Computing and Informatics, vol. 20 no. 1/2
Type: Research Article
ISSN: 2634-1964

Keywords

Article
Publication date: 1 May 2006

Tanzila Ahmed and Charles Oppenheim

The purpose of this research is to show how spam is generated and what methods have been proposed to combat it.

1168

Abstract

Purpose

The purpose of this research is to show how spam is generated and what methods have been proposed to combat it.

Design/methodology/approach

An experiment whereby a number of e‐mail accounts using different ISPs were set up and then checked for spam over a period of nine weeks. The results were compared to two pre‐existing e‐mail accounts. The types of spam received were classified into broad headings.

Findings

Financial spam was the biggest single type of spam received, with health‐related spam second. The growth in spam over time was noted, as was the volumes of spam received by different Internet Service Providers. The effects of using “obvious” names versus unusual ones in the e‐mail address were measured, as were those of using spam‐filtering software. In the former case, no significant differences were found, but filtering software certainly helped to reduce the volume of spam received. Active involvement in a pornographic site did not, surprisingly, greatly influence the amount of spam received. The biggest single factors affecting the volume of spam received are the length of time the e‐mail account has been active and the use, or non‐use of filtering software. It is by no means certain that responding to spam increases the volume of spam received.

Research limitations/implications

The research was conducted over a relatively small time period and small number of accounts were examined.

Practical implications

Methods of combating spam and some urban myths about it are examined.

Originality/value

To those tasked with dealing with spam, the paper provides some ideas on the scale of the problem and how to address it.

Details

Aslib Proceedings, vol. 58 no. 3
Type: Research Article
ISSN: 0001-253X

Keywords

Article
Publication date: 1 March 2013

Hu Xia, Yan Fu, Junlin Zhou and Qi Xia

The purpose of this paper is to provide an intelligent spam filtering method to meet the real‐time processing requirement of the massive short message stream and reduce manual…

Abstract

Purpose

The purpose of this paper is to provide an intelligent spam filtering method to meet the real‐time processing requirement of the massive short message stream and reduce manual operation of the system.

Design/methodology/approach

An integrated framework based on a series of algorithms is proposed. The framework consists of message filtering module, log analysis module and rules handling module, and dynamically filters the short message spam, while generating the filtering rules. Experiments using Java are used to execute the proposed work.

Findings

The experiments are carried out both on the simulation model (off‐line) and on the actual plant (on‐line). All experiment data are considered in both normal and spam real short messages. The results show that use of the integrated framework leads to a comparable accuracy and meet the real‐time filtration requirement.

Originality/value

The approach in the design of the filtering system is novel. In addition, implementation of the proposed integrated framework allows the method not only to reduce the computational cost which leads to a high processing speed but also to filter spam messages with a high accuracy.

Details

COMPEL - The international journal for computation and mathematics in electrical and electronic engineering, vol. 32 no. 2
Type: Research Article
ISSN: 0332-1649

Keywords

Article
Publication date: 1 June 2006

Janet Durgin and Joseph S. Sherif

This paper aims to advance research that accurately portrays the alarming rate at which spam is infiltrating and eroding the security of the internet.

981

Abstract

Purpose

This paper aims to advance research that accurately portrays the alarming rate at which spam is infiltrating and eroding the security of the internet.

Design/methodology/approach

The paper discusses the political, legal and ethical controversy surrounding the spam dilemma as well as the high costs of spam to telecommunications bandwidth, QoS and e‐commerce effectiveness.

Findings

Spam problem is a technological epidemic that multiplies exponentially each day. A dynamic digital jam is in prospect.

Practical implications

Presents viable options for a quick resolve, and unveils the changing strategies that integrity‐driven marketers are facing in lieu of the raging battle.

Originality/value

Tackles one of the most pressing issues in the business world today.

Details

Kybernetes, vol. 35 no. 5
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 15 June 2015

Bundit Manaskasemsak and Arnon Rungsawang

This paper aims to present a machine learning approach for solving the problem of Web spam detection. Based on an adoption of the ant colony optimization (ACO), three algorithms…

Abstract

Purpose

This paper aims to present a machine learning approach for solving the problem of Web spam detection. Based on an adoption of the ant colony optimization (ACO), three algorithms are proposed to construct rule-based classifiers to distinguish between non-spam and spam hosts. Moreover, the paper also proposes an adaptive learning technique to enhance the spam detection performance.

Design/methodology/approach

The Trust-ACO algorithm is designed to let an ant start from a non-spam seed, and afterwards, decide to walk through paths in the host graph. Trails (i.e. trust paths) discovered by ants are then interpreted and compiled to non-spam classification rules. Similarly, the Distrust-ACO algorithm is designed to generate spam classification ones. The last Combine-ACO algorithm aims to accumulate rules given from the former algorithms. Moreover, an adaptive learning technique is introduced to let ants walk with longer (or shorter) steps by rewarding them when they find desirable paths or penalizing them otherwise.

Findings

Experiments are conducted on two publicly available WEBSPAM-UK2006 and WEBSPAM-UK2007 datasets. The results show that the proposed algorithms outperform well-known rule-based classification baselines. Especially, the proposed adaptive learning technique helps improving the AUC scores up to 0.899 and 0.784 on the former and the latter datasets, respectively.

Originality/value

To the best of our knowledge, this is the first comprehensive study that adopts the ACO learning approach to solve the problem of Web spam detection. In addition, we have improved the traditional ACO by using the adaptive learning technique.

Details

International Journal of Web Information Systems, vol. 11 no. 2
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 25 November 2020

Hei Chia Wang, Yu Hung Chiang and Si Ting Lin

In community question and answer (CQA) services, because of user subjectivity and the limits of knowledge, the distribution of answer quality can vary drastically – from highly…

Abstract

Purpose

In community question and answer (CQA) services, because of user subjectivity and the limits of knowledge, the distribution of answer quality can vary drastically – from highly related to irrelevant or even spam answers. Previous studies of CQA portals have faced two important issues: answer quality analysis and spam answer filtering. Therefore, the purposes of this study are to filter spam answers in advance using two-phase identification methods and then automatically classify the different types of question and answer (QA) pairs by deep learning. Finally, this study proposes a comprehensive study of answer quality prediction for different types of QA pairs.

Design/methodology/approach

This study proposes an integrated model with a two-phase identification method that filters spam answers in advance and uses a deep learning method [recurrent convolutional neural network (R-CNN)] to automatically classify various types of questions. Logistic regression (LR) is further applied to examine which answer quality features significantly indicate high-quality answers to different types of questions.

Findings

There are four prominent findings. (1) This study confirms that conducting spam filtering before an answer quality analysis can reduce the proportion of high-quality answers that are misjudged as spam answers. (2) The experimental results show that answer quality is better when question types are included. (3) The analysis results for different classifiers show that the R-CNN achieves the best macro-F1 scores (74.8%) in the question type classification module. (4) Finally, the experimental results by LR show that author ranking, answer length and common words could significantly impact answer quality for different types of questions.

Originality/value

The proposed system is simultaneously able to detect spam answers and provide users with quick and efficient retrieval mechanisms for high-quality answers to different types of questions in CQA. Moreover, this study further validates that crucial features exist among the different types of questions that can impact answer quality. Overall, an identification system automatically summarises high-quality answers for each different type of questions from the pool of messy answers in CQA, which can be very useful in helping users make decisions.

Details

The Electronic Library , vol. 38 no. 5/6
Type: Research Article
ISSN: 0264-0473

Keywords

Article
Publication date: 20 November 2009

Maria Soledad Pera and Yiu‐Kai Ng

The web provides its users with abundant information. Unfortunately, when a web search is performed, both users and search engines must deal with an annoying problem: the presence…

Abstract

Purpose

The web provides its users with abundant information. Unfortunately, when a web search is performed, both users and search engines must deal with an annoying problem: the presence of spam documents that are ranked among legitimate ones. The mixed results downgrade the performance of search engines and frustrate users who are required to filter out useless information. To improve the quality of web searches, the number of spam documents on the web must be reduced, if they cannot be eradicated entirely. This paper aims to present a novel approach for identifying spam web documents, which have mismatched titles and bodies and/or low percentage of hidden content in markup data structure.

Design/methodology/approach

The paper shows that by considering the degree of similarity among the words in the title and body of a web docuemnt D, which is computed by using their word‐correlation factors; using the percentage of hidden context in the markup data structure within D; and/or considering the bigram or trigram phase‐similarity values of D, it is possible to determine whether D is spam with high accuracy

Findings

By considering the content and markup of web documents, this paper develops a spam‐detection tool that is: reliable, since we can accurately detect 84.5 percent of spam/legitimate web documents; and computational inexpensive, since the word‐correlation factors used for content analysis are pre‐computed.

Research limitations/implications

Since the bigram‐correlation values employed in the spam‐detection approach are computed by using the unigram‐correlation factors, it imposes additional computational time during the spam‐detection process and could generate higher number of misclassified spam web documents.

Originality/value

The paper verifies that the spam‐detection approach outperforms existing anti‐spam methods by at least 3 percent in terms of F‐measure.

Details

International Journal of Web Information Systems, vol. 5 no. 4
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 5 October 2012

Zac Sadan and David G. Schwartz

IP reputation systems, which filter e‐mail based on the sender's IP address, are located at the perimeter – before the messages reach the mail server's anti‐spam filters. To…

1057

Abstract

Purpose

IP reputation systems, which filter e‐mail based on the sender's IP address, are located at the perimeter – before the messages reach the mail server's anti‐spam filters. To increase IP reputation system efficacy and overcome the shortcomings of individual IP‐based filtering, recent studies have suggested exploiting the properties of IP clusters, such as those of Autonomous Systems (AS). Cluster‐based techniques can enhance accuracy and reduce false negative rates. However, clusters generally contain enormous amounts of IP addresses, which hinder cluster‐based systems from reaching their full spam filtering potential. The purpose of this paper is exploitation of social network metrics to obtain a more granular, i.e. sub‐divided, view of cluster‐based reputation, and thus enhance spam filtering accuracy.

Design/methodology/approach

The authors examined the performance of various social network metrics, including nodal degree, betweenness centrality, closeness centrality and valued graphs, to find an optimal element that enhances IP reputation prediction in AS clusters.

Findings

It was found that all measures contributed to prediction, yet the best predictor of spam reputation was the out‐degree metric, which showed a strong positive correlation with spam reputation prediction. This implies that more granular information can increase the accuracy of IP reputation prediction in AS clusters.

Practical implications

Used in conjunction with other technologies, the granular cluster‐based reputation system can be a valuable addition to commercial and open‐source spam filtering systems, or to standalone DNS‐based blacklists.

Originality/value

The authors' approach can promote mitigation of larger spam volumes at the perimeter, save bandwidth, and conserve valuable system resources.

Article
Publication date: 2 October 2007

Christine Sund

The purpose of this paper is to show that the full potential of the internet has not yet been realised. One of the key reasons for this is users' declining trust in the internet…

2784

Abstract

Purpose

The purpose of this paper is to show that the full potential of the internet has not yet been realised. One of the key reasons for this is users' declining trust in the internet. Over the past two decades, the internet has transformed many aspects of modern life. With an estimated four million users worldwide at the end of 2006, the use of the internet continues to grow. Building trust and confidence is one of the main enablers for the future growth and use of the internet. The paper aims to review some of the reasons behind the declining trust, the changing nature of cyber‐threats, and aims to look at cybersecurity in the context of developing countries and the specific problems these countries are facing when dealing with growing number of cyber‐threats.

Design/methodology/approach

This contribution gives an overview of some of the evolving cyber‐threats and their potential impact in order to determine whether the growth of the information society is really at risk. It further considers what the different stakeholders can do to build a safer and more secure information society. The paper poses questions, outlines possible options for a way forward and based on this gives the readers a better understanding of the issues and challenges involved in building confidence and security in the use of ICTs. The paper proposes a framework with increased co‐operation, collaboration, and information sharing, to connect the individual cybersecurity communities and single initiatives, in order to allow stakeholders to build together a roadmap for cybersecurity.

Findings

During the discussions leading up to and during the two phases of the World Summit on the information society, country representative participants re‐affirmed their commitment to deal effectively with the significant and growing problems posed by spam and other cyber‐threats. As no single country or entity can alone create trust, confidence and security in the use of ICTs, it is clear that increased international action is needed to address the issues involved.

Practical implications

This paper tries to provide readers with a simple overview of the state of cybersecurity, and with a framework for further considering how new technologies and the growing use of the internet will impact upon stakeholders' trust in the use of ICTs.

Originality/value

Along with increasing dependency on ICTs, new threats to network and information security have emerged. These include growing misuse of electronic networks for criminal purposes or for objectives that can furthermore adversely affect the integrity of critical infrastructures within states. This paper puts forward some concrete suggestions on how countries could look at the issues related to cybersecurity.

Details

Online Information Review, vol. 31 no. 5
Type: Research Article
ISSN: 1468-4527

Keywords

1 – 10 of over 1000