Search results
1 – 10 of over 1000The purpose of this research is to identify and describe the impact of comment spam in library blogs. Three research questions guided the study: current level of commenting in…
Abstract
Purpose
The purpose of this research is to identify and describe the impact of comment spam in library blogs. Three research questions guided the study: current level of commenting in library blogs; librarians' perception of comment spam; and techniques used to address the comment spam problem.
Design/methodology/approach
A quantitative approach is used to investigate research questions. Informal interviews were conducted with four academic and three public libraries with active blogs to develop a better understanding of the problem and then to develop an appropriate data collection instrument. Based on the feedback received from these blog administrators, a survey questionnaire was developed and then distributed online via direct e‐mailing and mailing lists. A total of 108 responses were received.
Findings
Regardless of the library type with which blogs were affiliated with and the size of the community they served, user participation in library blogs was very limited in terms of comments left. Over 80 percent of libraries reported receiving five or fewer comments in a given week. Comment spam was not perceived to be a major problem by blog administrators. Detection‐based techniques were the most commonly used approaches to combat comment spam in library blogs.
Research limitations/implications
The research focuses on the comment spam problem in blogs affiliated with libraries where the library is responsible for content published on the blog. The comment spam problem is investigated from the library blog administrator's perspective.
Practical implications
Results of this study provide empirical evidence regarding level of commenting and the impact of comment spam in library blogs. The results and findings of the study can offer guidance to libraries that are reconsidering whether to allow commenting in their blogs and to those that are planning to establish a blog to reach out to their users, while keeping this online environment engaging and interactive.
Originality/value
The study provides empirical evidence that level of commenting is very limited, comment spam is not regarded as an important problem, and it does not interfere with the communication process in library blogs.
Details
Keywords
Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet…
Abstract
Spam emails classification using data mining and machine learning approaches has enticed the researchers' attention duo to its obvious positive impact in protecting internet users. Several features can be used for creating data mining and machine learning based spam classification models. Yet, spammers know that the longer they will use the same set of features for tricking email users the more probably the anti-spam parties might develop tools for combating this kind of annoying email messages. Spammers, so, adapt by continuously reforming the group of features utilized for composing spam emails. For that reason, even though traditional classification methods possess sound classification results, they were ineffective for lifelong classification of spam emails duo to the fact that they might be prone to the so-called “Concept Drift”. In the current study, an enhanced model is proposed for ensuring lifelong spam classification model. For the evaluation purposes, the overall performance of the suggested model is contrasted against various other stream mining classification techniques. The results proved the success of the suggested model as a lifelong spam emails classification method.
Details
Keywords
Tanzila Ahmed and Charles Oppenheim
The purpose of this research is to show how spam is generated and what methods have been proposed to combat it.
Abstract
Purpose
The purpose of this research is to show how spam is generated and what methods have been proposed to combat it.
Design/methodology/approach
An experiment whereby a number of e‐mail accounts using different ISPs were set up and then checked for spam over a period of nine weeks. The results were compared to two pre‐existing e‐mail accounts. The types of spam received were classified into broad headings.
Findings
Financial spam was the biggest single type of spam received, with health‐related spam second. The growth in spam over time was noted, as was the volumes of spam received by different Internet Service Providers. The effects of using “obvious” names versus unusual ones in the e‐mail address were measured, as were those of using spam‐filtering software. In the former case, no significant differences were found, but filtering software certainly helped to reduce the volume of spam received. Active involvement in a pornographic site did not, surprisingly, greatly influence the amount of spam received. The biggest single factors affecting the volume of spam received are the length of time the e‐mail account has been active and the use, or non‐use of filtering software. It is by no means certain that responding to spam increases the volume of spam received.
Research limitations/implications
The research was conducted over a relatively small time period and small number of accounts were examined.
Practical implications
Methods of combating spam and some urban myths about it are examined.
Originality/value
To those tasked with dealing with spam, the paper provides some ideas on the scale of the problem and how to address it.
Details
Keywords
Hu Xia, Yan Fu, Junlin Zhou and Qi Xia
The purpose of this paper is to provide an intelligent spam filtering method to meet the real‐time processing requirement of the massive short message stream and reduce manual…
Abstract
Purpose
The purpose of this paper is to provide an intelligent spam filtering method to meet the real‐time processing requirement of the massive short message stream and reduce manual operation of the system.
Design/methodology/approach
An integrated framework based on a series of algorithms is proposed. The framework consists of message filtering module, log analysis module and rules handling module, and dynamically filters the short message spam, while generating the filtering rules. Experiments using Java are used to execute the proposed work.
Findings
The experiments are carried out both on the simulation model (off‐line) and on the actual plant (on‐line). All experiment data are considered in both normal and spam real short messages. The results show that use of the integrated framework leads to a comparable accuracy and meet the real‐time filtration requirement.
Originality/value
The approach in the design of the filtering system is novel. In addition, implementation of the proposed integrated framework allows the method not only to reduce the computational cost which leads to a high processing speed but also to filter spam messages with a high accuracy.
Details
Keywords
Janet Durgin and Joseph S. Sherif
This paper aims to advance research that accurately portrays the alarming rate at which spam is infiltrating and eroding the security of the internet.
Abstract
Purpose
This paper aims to advance research that accurately portrays the alarming rate at which spam is infiltrating and eroding the security of the internet.
Design/methodology/approach
The paper discusses the political, legal and ethical controversy surrounding the spam dilemma as well as the high costs of spam to telecommunications bandwidth, QoS and e‐commerce effectiveness.
Findings
Spam problem is a technological epidemic that multiplies exponentially each day. A dynamic digital jam is in prospect.
Practical implications
Presents viable options for a quick resolve, and unveils the changing strategies that integrity‐driven marketers are facing in lieu of the raging battle.
Originality/value
Tackles one of the most pressing issues in the business world today.
Details
Keywords
Bundit Manaskasemsak and Arnon Rungsawang
This paper aims to present a machine learning approach for solving the problem of Web spam detection. Based on an adoption of the ant colony optimization (ACO), three algorithms…
Abstract
Purpose
This paper aims to present a machine learning approach for solving the problem of Web spam detection. Based on an adoption of the ant colony optimization (ACO), three algorithms are proposed to construct rule-based classifiers to distinguish between non-spam and spam hosts. Moreover, the paper also proposes an adaptive learning technique to enhance the spam detection performance.
Design/methodology/approach
The Trust-ACO algorithm is designed to let an ant start from a non-spam seed, and afterwards, decide to walk through paths in the host graph. Trails (i.e. trust paths) discovered by ants are then interpreted and compiled to non-spam classification rules. Similarly, the Distrust-ACO algorithm is designed to generate spam classification ones. The last Combine-ACO algorithm aims to accumulate rules given from the former algorithms. Moreover, an adaptive learning technique is introduced to let ants walk with longer (or shorter) steps by rewarding them when they find desirable paths or penalizing them otherwise.
Findings
Experiments are conducted on two publicly available WEBSPAM-UK2006 and WEBSPAM-UK2007 datasets. The results show that the proposed algorithms outperform well-known rule-based classification baselines. Especially, the proposed adaptive learning technique helps improving the AUC scores up to 0.899 and 0.784 on the former and the latter datasets, respectively.
Originality/value
To the best of our knowledge, this is the first comprehensive study that adopts the ACO learning approach to solve the problem of Web spam detection. In addition, we have improved the traditional ACO by using the adaptive learning technique.
Details
Keywords
Hei Chia Wang, Yu Hung Chiang and Si Ting Lin
In community question and answer (CQA) services, because of user subjectivity and the limits of knowledge, the distribution of answer quality can vary drastically – from highly…
Abstract
Purpose
In community question and answer (CQA) services, because of user subjectivity and the limits of knowledge, the distribution of answer quality can vary drastically – from highly related to irrelevant or even spam answers. Previous studies of CQA portals have faced two important issues: answer quality analysis and spam answer filtering. Therefore, the purposes of this study are to filter spam answers in advance using two-phase identification methods and then automatically classify the different types of question and answer (QA) pairs by deep learning. Finally, this study proposes a comprehensive study of answer quality prediction for different types of QA pairs.
Design/methodology/approach
This study proposes an integrated model with a two-phase identification method that filters spam answers in advance and uses a deep learning method [recurrent convolutional neural network (R-CNN)] to automatically classify various types of questions. Logistic regression (LR) is further applied to examine which answer quality features significantly indicate high-quality answers to different types of questions.
Findings
There are four prominent findings. (1) This study confirms that conducting spam filtering before an answer quality analysis can reduce the proportion of high-quality answers that are misjudged as spam answers. (2) The experimental results show that answer quality is better when question types are included. (3) The analysis results for different classifiers show that the R-CNN achieves the best macro-F1 scores (74.8%) in the question type classification module. (4) Finally, the experimental results by LR show that author ranking, answer length and common words could significantly impact answer quality for different types of questions.
Originality/value
The proposed system is simultaneously able to detect spam answers and provide users with quick and efficient retrieval mechanisms for high-quality answers to different types of questions in CQA. Moreover, this study further validates that crucial features exist among the different types of questions that can impact answer quality. Overall, an identification system automatically summarises high-quality answers for each different type of questions from the pool of messy answers in CQA, which can be very useful in helping users make decisions.
Details
Keywords
Maria Soledad Pera and Yiu‐Kai Ng
The web provides its users with abundant information. Unfortunately, when a web search is performed, both users and search engines must deal with an annoying problem: the presence…
Abstract
Purpose
The web provides its users with abundant information. Unfortunately, when a web search is performed, both users and search engines must deal with an annoying problem: the presence of spam documents that are ranked among legitimate ones. The mixed results downgrade the performance of search engines and frustrate users who are required to filter out useless information. To improve the quality of web searches, the number of spam documents on the web must be reduced, if they cannot be eradicated entirely. This paper aims to present a novel approach for identifying spam web documents, which have mismatched titles and bodies and/or low percentage of hidden content in markup data structure.
Design/methodology/approach
The paper shows that by considering the degree of similarity among the words in the title and body of a web docuemnt D, which is computed by using their word‐correlation factors; using the percentage of hidden context in the markup data structure within D; and/or considering the bigram or trigram phase‐similarity values of D, it is possible to determine whether D is spam with high accuracy
Findings
By considering the content and markup of web documents, this paper develops a spam‐detection tool that is: reliable, since we can accurately detect 84.5 percent of spam/legitimate web documents; and computational inexpensive, since the word‐correlation factors used for content analysis are pre‐computed.
Research limitations/implications
Since the bigram‐correlation values employed in the spam‐detection approach are computed by using the unigram‐correlation factors, it imposes additional computational time during the spam‐detection process and could generate higher number of misclassified spam web documents.
Originality/value
The paper verifies that the spam‐detection approach outperforms existing anti‐spam methods by at least 3 percent in terms of F‐measure.
Details
Keywords
Zac Sadan and David G. Schwartz
IP reputation systems, which filter e‐mail based on the sender's IP address, are located at the perimeter – before the messages reach the mail server's anti‐spam filters. To…
Abstract
Purpose
IP reputation systems, which filter e‐mail based on the sender's IP address, are located at the perimeter – before the messages reach the mail server's anti‐spam filters. To increase IP reputation system efficacy and overcome the shortcomings of individual IP‐based filtering, recent studies have suggested exploiting the properties of IP clusters, such as those of Autonomous Systems (AS). Cluster‐based techniques can enhance accuracy and reduce false negative rates. However, clusters generally contain enormous amounts of IP addresses, which hinder cluster‐based systems from reaching their full spam filtering potential. The purpose of this paper is exploitation of social network metrics to obtain a more granular, i.e. sub‐divided, view of cluster‐based reputation, and thus enhance spam filtering accuracy.
Design/methodology/approach
The authors examined the performance of various social network metrics, including nodal degree, betweenness centrality, closeness centrality and valued graphs, to find an optimal element that enhances IP reputation prediction in AS clusters.
Findings
It was found that all measures contributed to prediction, yet the best predictor of spam reputation was the out‐degree metric, which showed a strong positive correlation with spam reputation prediction. This implies that more granular information can increase the accuracy of IP reputation prediction in AS clusters.
Practical implications
Used in conjunction with other technologies, the granular cluster‐based reputation system can be a valuable addition to commercial and open‐source spam filtering systems, or to standalone DNS‐based blacklists.
Originality/value
The authors' approach can promote mitigation of larger spam volumes at the perimeter, save bandwidth, and conserve valuable system resources.
Details
Keywords
The purpose of this paper is to show that the full potential of the internet has not yet been realised. One of the key reasons for this is users' declining trust in the internet…
Abstract
Purpose
The purpose of this paper is to show that the full potential of the internet has not yet been realised. One of the key reasons for this is users' declining trust in the internet. Over the past two decades, the internet has transformed many aspects of modern life. With an estimated four million users worldwide at the end of 2006, the use of the internet continues to grow. Building trust and confidence is one of the main enablers for the future growth and use of the internet. The paper aims to review some of the reasons behind the declining trust, the changing nature of cyber‐threats, and aims to look at cybersecurity in the context of developing countries and the specific problems these countries are facing when dealing with growing number of cyber‐threats.
Design/methodology/approach
This contribution gives an overview of some of the evolving cyber‐threats and their potential impact in order to determine whether the growth of the information society is really at risk. It further considers what the different stakeholders can do to build a safer and more secure information society. The paper poses questions, outlines possible options for a way forward and based on this gives the readers a better understanding of the issues and challenges involved in building confidence and security in the use of ICTs. The paper proposes a framework with increased co‐operation, collaboration, and information sharing, to connect the individual cybersecurity communities and single initiatives, in order to allow stakeholders to build together a roadmap for cybersecurity.
Findings
During the discussions leading up to and during the two phases of the World Summit on the information society, country representative participants re‐affirmed their commitment to deal effectively with the significant and growing problems posed by spam and other cyber‐threats. As no single country or entity can alone create trust, confidence and security in the use of ICTs, it is clear that increased international action is needed to address the issues involved.
Practical implications
This paper tries to provide readers with a simple overview of the state of cybersecurity, and with a framework for further considering how new technologies and the growing use of the internet will impact upon stakeholders' trust in the use of ICTs.
Originality/value
Along with increasing dependency on ICTs, new threats to network and information security have emerged. These include growing misuse of electronic networks for criminal purposes or for objectives that can furthermore adversely affect the integrity of critical infrastructures within states. This paper puts forward some concrete suggestions on how countries could look at the issues related to cybersecurity.
Details