Search results

1 – 10 of over 1000
To view the access options for this content please click here
Article
Publication date: 3 August 2021

Irvin Dongo, Yudith Cardinale, Ana Aguilera, Fabiola Martinez, Yuni Quintero, German Robayo and David Cabeza

This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze…

Abstract

Purpose

This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations.

Design/methodology/approach

As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods.

Findings

The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web.

Originality/value

Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

To view the access options for this content please click here
Book part
Publication date: 15 March 2021

Reto Hofstetter

Every second, vast amounts of data are generated and stored on the Internet. Data scraping makes these data accessible and usable for business and scientific purposes. Web…

Abstract

Every second, vast amounts of data are generated and stored on the Internet. Data scraping makes these data accessible and usable for business and scientific purposes. Web-scraped data are of high value to businesses as they can be used to inform many strategic decisions such as pricing or market positioning. Although it is not difficult to scrape data, particularly when they come from public websites, there are six key steps that analysts should ideally consider and follow. Following these steps can help to better harness the business value of online data.

To view the access options for this content please click here
Article
Publication date: 14 May 2019

Teresa Scassa

The purpose of this paper is to examine how claims to “ownership” are asserted over publicly accessible platform data and critically assess the nature and scope of rights…

Abstract

Purpose

The purpose of this paper is to examine how claims to “ownership” are asserted over publicly accessible platform data and critically assess the nature and scope of rights to reuse these data.

Design/methodology/approach

Using Airbnb as a case study, this paper examines the data ecosystem that arises around publicly accessible platform data. It analyzes current statute and case law in order to understand the state of the law around the scraping of such data.

Findings

This paper demonstrates that there is considerable uncertainty about the practice of data scraping, and that there are risks in allowing the law to evolve in the context of battles between business competitors without a consideration of the broader public interest in data scraping. It argues for a data ecosystem approach that can keep the public dimension issues more squarely within the frame when data scraping is judicially considered.

Practical implications

The nature of some sharing economy platforms requires that a large subset of their data be publicly accessible. These data can be used to understand how platform companies operate, to assess their compliance with laws and regulations and to evaluate their social and economic impacts. They can also be used in different kinds of data analytics. Such data are therefore sought after by civil society organizations, researchers, entrepreneurs and regulators. This paper considers who has a right to control access to and use of these data, and addresses current uncertainties in how the law will apply to scraping activities, and builds an argument for a consideration of the public interest in data scraping.

Originality/value

The issue of ownership/control over publicly accessible information is of growing importance; this paper offers a framework for approaching these legal questions.

Details

Online Information Review, vol. 43 no. 6
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article
Publication date: 12 November 2019

Judith Hillen

The purpose of this paper is to discuss web scraping as a method for extracting large amounts of data from online sources. The author wants to raise awareness of the…

Abstract

Purpose

The purpose of this paper is to discuss web scraping as a method for extracting large amounts of data from online sources. The author wants to raise awareness of the method’s potential in the field of food price research, hoping to enable fellow researchers to apply this method.

Design/methodology/approach

The author explains the technical procedure of web scraping, reviews the existing literature, and identifies areas of application and limitations for food price research.

Findings

The author finds that web scraping is a promising method to collect customised, high-frequency data in real time, overcoming several limitations of currently used food price data sources. With today’s applications mostly focussing on (online) consumer prices, the scope of applications for web scraping broadens as more and more price data are published online.

Research limitations/implications

To better deal with the technical and legal challenges of web scraping and to exploit its scalability, joint data collection projects in the field of agricultural and food economics should be considered.

Originality/value

In agricultural and food economics, web scraping as a data collection technique has received little attention. This is one of the first articles to address this topic with particular focus on food price analysis.

Details

British Food Journal, vol. 121 no. 12
Type: Research Article
ISSN: 0007-070X

Keywords

Content available
Article
Publication date: 10 August 2021

Krystian Jaworski

The purpose of this study paper is to focus on developing novel ways to monitor an economy in real time during the COVID-19 pandemic. A fully automated framework is…

Abstract

Purpose

The purpose of this study paper is to focus on developing novel ways to monitor an economy in real time during the COVID-19 pandemic. A fully automated framework is proposed for collecting and analyzing online food prices in Poland. This is important, as the COVID-19 outbreak in Europe in 2020 has led many governments to impose lockdowns that have prevented manual price data collection from food outlets. The study primarily addresses whether food price inflation can be accurately measured during the pandemic using only a laptop and Internet connection, without needing to rely on official statistics.

Design/methodology/approach

The big data approach was adopted to track food price inflation in Poland. Using the web-scraping technique, daily price information about individual food and non-alcoholic beverage products sold in online stores was gathered.

Findings

Based on raw online data, reliable estimates of monthly and annual food inflation were provided about 30 days before final official indexes were published.

Originality/value

This is the first paper to focus on measuring inflation in real time during the COVID-19 pandemic. Monthly and annual food price inflation are estimated in real time and updated daily, thereby improving previous forecasting solutions with weekly or monthly indicators. Using daily frequency price data deepens understanding of price developments and enables more timely detection of inflation trends, both of which are useful for policymakers and market participants. This study also provides a review of crucial issues regarding inflation that emerged during the COVID-19 pandemic.

Details

British Food Journal, vol. 123 no. 13
Type: Research Article
ISSN: 0007-070X

Keywords

Content available
Book part
Publication date: 15 March 2021

Abstract

Details

The Machine Age of Customer Insight
Type: Book
ISBN: 978-1-83909-697-6

To view the access options for this content please click here
Article
Publication date: 4 April 2016

Alain Yee Loong Chong, Boying Li, Eric W.T. Ngai, Eugene Ch'ng and Filbert Lee

The purpose of this paper is to investigate if online reviews (e.g. valence and volume), online promotional strategies (e.g. free delivery and discounts) and sentiments…

Downloads
7964

Abstract

Purpose

The purpose of this paper is to investigate if online reviews (e.g. valence and volume), online promotional strategies (e.g. free delivery and discounts) and sentiments from user reviews can help predict product sales.

Design/methodology/approach

The authors designed a big data architecture and deployed Node.js agents for scraping the Amazon.com pages using asynchronous input/output calls. The completed web crawling and scraping data sets were then preprocessed for sentimental and neural network analysis. The neural network was employed to examine which variables in the study are important predictors of product sales.

Findings

This study found that although online reviews, online promotional strategies and online sentiments can all predict product sales, some variables are more important predictors than others. The authors found that the interplay effects of these variables become more important variables than the individual variables themselves. For example, online volume interactions with sentiments and discounts are more important than the individual predictors of discounts, sentiments or online volume.

Originality/value

This study designed big data architecture, in combination with sentimental and neural network analysis that can facilitate future business research for predicting product sales in an online environment. This study also employed a predictive analytic approach (e.g. neural network) to examine the variables, and this approach is useful for future data analysis in a big data environment where prediction can have more practical implications than significance testing. This study also examined the interplay between online reviews, sentiments and promotional strategies, which up to now have mostly been examined individually in previous studies.

Details

International Journal of Operations & Production Management, vol. 36 no. 4
Type: Research Article
ISSN: 0144-3577

Keywords

To view the access options for this content please click here
Article
Publication date: 15 August 2017

Lyn Robinson and David Bawden

The purpose of this paper is to describe a new approach to education for library/information students in data literacy – the principles and practice of data collection…

Downloads
1227

Abstract

Purpose

The purpose of this paper is to describe a new approach to education for library/information students in data literacy – the principles and practice of data collection, manipulation and management – as a part of the Masters programmes in library and information science (CityLIS) at City, University of London.

Design/methodology/approach

The course takes a socio-technical approach, integrating, and giving equal importance to, technical and social/ethical aspects. Topics covered include: the relation between data, information and documents; representation of digital data; network technologies; information architecture; metadata; data structuring; search engines, databases and specialised retrieval tools; text and data mining, web scraping; data cleaning, manipulation, analysis and visualisation; coding; data metrics and analytics; artificial intelligence; data management and data curation; data literacy and data ethics; and constructing data narratives.

Findings

The course, which was well received by students in its first iteration, gives a basic grounding in data literacy, to be extended by further study, professional practice and lifelong learning.

Originality/value

This is one of the first accounts of an introductory course to equip all new entrants to the library/information professions with the understanding and skills to take on roles in data librarianship and data management.

Details

Library Management, vol. 38 no. 6/7
Type: Research Article
ISSN: 0143-5124

Keywords

To view the access options for this content please click here
Article
Publication date: 14 August 2017

Wei Xu, Lingyu Liu and Wei Shang

Timely detection of emergency events and effective tracking of corresponding public opinions are critical in emergency management. As media are immediate sources of…

Abstract

Purpose

Timely detection of emergency events and effective tracking of corresponding public opinions are critical in emergency management. As media are immediate sources of information on emergencies, the purpose of this paper is to propose cross-media analytics to detect and track emergency events and provide decision support for government and emergency management departments.

Design/methodology/approach

In this paper, a novel emergency event detection and opinion mining method is proposed for emergency management using cross-media analytics. In the proposed approach, an event detection module is constructed to discover emergency events based on cross-media analytics, and after the detected event is confirmed as an emergency event, an opinion mining module is used to analyze public sentiments and then generate public sentiment time series for early warning via a semantic expansion technique.

Findings

Empirical results indicate that a specific emergency can be detected and that public opinion can be tracked effectively and efficiently using cross-media analytics. In addition, the proposed system can be used for decision support and real-time response for government and emergency management departments.

Research limitations/implications

This paper takes full advantage of cross-media information and proposes novel emergency event detection and opinion mining methods for emergency management using cross-media analytics. The empirical analysis results illustrate the efficiency of the proposed method.

Practical implications

The proposed method can be applied for detection of emergency events and tracking of public opinions for emergency decision support and governmental real-time response.

Originality/value

This research work contributes to the design of a decision support system for emergency event detection and opinion mining. In the proposed approaches, emergency events are detected by leveraging cross-media analytics, and public sentiments are measured using an auto-expansion of the domain dictionary in the field of emergency management to eliminate the misclassification of the general dictionary and to make the quantization more accurate.

Details

Online Information Review, vol. 41 no. 4
Type: Research Article
ISSN: 1468-4527

Keywords

To view the access options for this content please click here
Article
Publication date: 18 May 2020

Petros Kostagiolas, Artur Strzelecki, Christina Banou and Charilaos Lavranos

The purpose of this paper is to discuss Google visibility of five large STM publishers (Elsevier, Emerald Publishing, Springer, Taylor & Francis and John Wiley & Sons…

Abstract

Purpose

The purpose of this paper is to discuss Google visibility of five large STM publishers (Elsevier, Emerald Publishing, Springer, Taylor & Francis and John Wiley & Sons) with the aim to focus on and investigate various upcoming current issues and challenges of the publishing industry regarding discoverability, promotion strategies, competition, information-seeking behavior and the impact of new information technologies on scholarly information.

Design/methodology/approach

The study is based on data retrieved through two commercial online tools specialized in retrieving and saving the data of the domain's visibility in search engines: SEMrush (“SEMrush – Online Visibility Management Platform”) and Ahrefs (“Ahrefs – SEO Tools & Resources To Grow Your Search Traffic”). All data gathering took place between April 15 and the May 29, 2019.

Findings

The study exhibits the significance of Google visibility in the STM publishing industry taking into consideration current issues and challenges of the publishing activity.

Originality/value

This is a “new” trend, certainly of great significance in the publishing industry. The research is conducted in this paper and the theoretical background will be offered to the study of this issue.

Details

Collection and Curation, vol. 40 no. 1
Type: Research Article
ISSN: 2514-9326

Keywords

1 – 10 of over 1000