Search results

1 – 10 of 743

View access options

Book part

Publication date: 15 March 2021

A Step-by-Step Guide for Data Scraping

Every second, vast amounts of data are generated and stored on the Internet. Data scraping makes these data accessible and usable for business and scientific purposes. Web-scraped…

HTML

PDF (385 KB)

EPUB (973 KB)

Abstract

Every second, vast amounts of data are generated and stored on the Internet. Data scraping makes these data accessible and usable for business and scientific purposes. Web-scraped data are of high value to businesses as they can be used to inform many strategic decisions such as pricing or market positioning. Although it is not difficult to scrape data, particularly when they come from public websites, there are six key steps that analysts should ideally consider and follow. Following these steps can help to better harness the business value of online data.

Details

The Machine Age of Customer Insight

Type: Book

DOI:

ISBN: 978-1-83909-697-6

Keywords

View access options

Article

Publication date: 14 May 2019

Ownership and control over publicly accessible platform data

Teresa Scassa

The purpose of this paper is to examine how claims to “ownership” are asserted over publicly accessible platform data and critically assess the nature and scope of rights to reuse…

HTML

PDF (208 KB)

Downloads

908

Abstract

Purpose

The purpose of this paper is to examine how claims to “ownership” are asserted over publicly accessible platform data and critically assess the nature and scope of rights to reuse these data.

Design/methodology/approach

Using Airbnb as a case study, this paper examines the data ecosystem that arises around publicly accessible platform data. It analyzes current statute and case law in order to understand the state of the law around the scraping of such data.

Findings

This paper demonstrates that there is considerable uncertainty about the practice of data scraping, and that there are risks in allowing the law to evolve in the context of battles between business competitors without a consideration of the broader public interest in data scraping. It argues for a data ecosystem approach that can keep the public dimension issues more squarely within the frame when data scraping is judicially considered.

Practical implications

The nature of some sharing economy platforms requires that a large subset of their data be publicly accessible. These data can be used to understand how platform companies operate, to assess their compliance with laws and regulations and to evaluate their social and economic impacts. They can also be used in different kinds of data analytics. Such data are therefore sought after by civil society organizations, researchers, entrepreneurs and regulators. This paper considers who has a right to control access to and use of these data, and addresses current uncertainties in how the law will apply to scraping activities, and builds an argument for a consideration of the public interest in data scraping.

Originality/value

The issue of ownership/control over publicly accessible information is of growing importance; this paper offers a framework for approaching these legal questions.

Details

Online Information Review, vol. 43 no. 6

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

Open Access

Article

Publication date: 10 August 2021

Measuring food inflation during the COVID-19 pandemic in real time using online data: a case study of Poland

Krystian Jaworski

The purpose of this study paper is to focus on developing novel ways to monitor an economy in real time during the COVID-19 pandemic. A fully automated framework is proposed for…

HTML

PDF (425 KB)

Downloads

5961

Abstract

Purpose

The purpose of this study paper is to focus on developing novel ways to monitor an economy in real time during the COVID-19 pandemic. A fully automated framework is proposed for collecting and analyzing online food prices in Poland. This is important, as the COVID-19 outbreak in Europe in 2020 has led many governments to impose lockdowns that have prevented manual price data collection from food outlets. The study primarily addresses whether food price inflation can be accurately measured during the pandemic using only a laptop and Internet connection, without needing to rely on official statistics.

Design/methodology/approach

The big data approach was adopted to track food price inflation in Poland. Using the web-scraping technique, daily price information about individual food and non-alcoholic beverage products sold in online stores was gathered.

Findings

Based on raw online data, reliable estimates of monthly and annual food inflation were provided about 30 days before final official indexes were published.

Originality/value

This is the first paper to focus on measuring inflation in real time during the COVID-19 pandemic. Monthly and annual food price inflation are estimated in real time and updated daily, thereby improving previous forecasting solutions with weekly or monthly indicators. Using daily frequency price data deepens understanding of price developments and enables more timely detection of inflation trends, both of which are useful for policymakers and market participants. This study also provides a review of crucial issues regarding inflation that emerged during the COVID-19 pandemic.

Details

British Food Journal, vol. 123 no. 13

Type: Research Article

DOI:

ISSN: 0007-070X

Keywords

View access options

Article

Publication date: 19 January 2022

How agricultural economists are using big data: a review

Liang Lu, Guang Tian and Patrick Hatzenbuehler

The purpose of this paper is to describe the main ways in which large amounts of information have been integrated to provide new measures of food consumption and agricultural…

HTML

PDF (1.5 MB)

Downloads

596

Abstract

Purpose

The purpose of this paper is to describe the main ways in which large amounts of information have been integrated to provide new measures of food consumption and agricultural production, and new methods for gathering and analyzing internet-based data.

Design/methodology/approach

This study reviews some of the recent developments and applications of big data, which is becoming increasingly popular in agricultural economics research. In particular, this study focuses on applications of new types of data such as text and graphics in consumers' online reviews emerging from e-commerce transactions and Normalized Difference Vegetation Index (NDVI) data as well as other producer data that are gaining popularity in precision agriculture. This study then reviews data gathering techniques such as web scraping and data analytics tools such as textual analysis and machine learning.

Findings

This study provides a comprehensive review of applications of big data in agricultural economics and discusses some potential future uses of big data.

Originality/value

This study documents some new types of data that are being utilized in agricultural economics, sources and methods to gather and store such data, existing applications of these new types of data and techniques to analyze these new data.

Details

China Agricultural Economic Review, vol. 14 no. 3

Type: Research Article

DOI:

ISSN: 1756-137X

Keywords

View access options

Article

Publication date: 15 August 2023

Linking social media marketing to restaurant performance – the moderating role of advertising expenditure

Wenjia Han, Ozgur Ozdemir and Shivam Agarwal

Built upon customer engagement marketing theory and uses and gratification theory, this study examines the link between individual social media marketing (SMM) performance…

HTML

PDF (902 KB)

Downloads

445

Abstract

Purpose

Built upon customer engagement marketing theory and uses and gratification theory, this study examines the link between individual social media marketing (SMM) performance indicators and restaurant sales performance at the firm level. Moreover, the study investigates the moderating effect of advertising expenditure on this proposed relationship.

Design/methodology/approach

Random effect regression models were developed in Stata to examine the associations between SMM performance indicators, advertising expenditure, and restaurant firm revenue. Twelve years of SMM data from brands' Facebook pages were collected with a web scraper built in Python. Natural language processing was used to analyze the sentiment of user-generated content (UGC).

Findings

The results suggest that restaurant annual sales revenue increases as the volume of brand posts, “like”s, “share”s and positive comments on restaurants' Facebook pages increase. However, the total number of comments and the number of negative comments show non-significant associations with revenue. Firm advertising expenditure negatively moderates the relationships between sales revenue and the number of “like”s, “share”s, total comments and positive comments.

Practical implications

Restaurants benefit from making frequent posts on SNSs. Promotions that motivate online users to “like”, share, and comment on brand posts should be implemented. Firms with limited advertising budgets are encouraged to actively create buzz on SNSs due to evidenced stronger effects of UGC on sales performance than large advertisers.

Originality/value

This research bridges the gap by studying the effects of individual SMM performance indicators on restaurant financial outcomes. The findings support the effectiveness of SMM; and, for the first time, demonstrate that SMM could generate a more profound impact for firms with low advertising budgets.

Details

Journal of Hospitality and Tourism Insights, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9792

Keywords

View access options

Article

Publication date: 21 April 2022

Critical care for the early web: ethical digital methods for archived youth data

Katie Mackinnon

This paper aims to provide a brief overview of the ethical challenges facing researchers engaging with web archival materials and demonstrates a framework and method for…

HTML

PDF (175 KB)

Downloads

332

Abstract

Purpose

This paper aims to provide a brief overview of the ethical challenges facing researchers engaging with web archival materials and demonstrates a framework and method for conducting research with historical web data created by young people.

Design/methodology/approach

This paper’s methodology is informed by the conceptual framing of data materials in research on the “right to be forgotten” (Crossen-White, 2015; GDPR, 2018; Tsesis, 2014), data afterlives (Agostinho, 2019; Stevenson and Gehl, 2019; Sutherland, 2017), indigenous data sovereignty and governance (Wemigwans, 2018) and feminist ethics of care (Cifor et al., 2019; Cowan, 2020; Franzke et al., 2020; Luka and Millette, 2018). It demonstrates a new method called an archive promenade, which builds on the walkthrough and scroll-back methods (Light et al., 2018; Robards and Lincoln, 2017).

Findings

The archive promenades demonstrate how individual attachments to digital traces vary and are often unpredictable, which necessitates further steps to ensure that privacy and data sovereignty are maintained through research with web archives.

Originality/value

This paper demonstrates how the archive promenade methodological intervention can lead to better practices of care with sensitive web materials and brings together previous work on ethical fabrications (Markham, 2012), speculation (Luka and Millette, 2018) and thick context (Marzullo et al., 2018), to yield new insights for research on the experiences of growing up online.

Details

Journal of Information, Communication and Ethics in Society, vol. 20 no. 3

Type: Research Article

DOI:

ISSN: 1477-996X

Keywords

View access options

Article

Publication date: 21 March 2024

How accurate are drug cryptomarket listings by content, weight, purity and repeat purchase?

Monica J. Barratt, Ross Coomber, Michala Kowalski, Judith Aldridge, Rasmus Munksgaard, Jason Ferris, Aili Malm, James Martin and David Décary-Hétu

Drug cryptomarkets increase information available to market actors, which should reduce information asymmetry and increase market efficiency. This study aims to determine whether…

HTML

PDF (169 KB)

Downloads

Abstract

Purpose

Drug cryptomarkets increase information available to market actors, which should reduce information asymmetry and increase market efficiency. This study aims to determine whether cryptomarket listings accurately represent the advertised substance, weight or number and purity, and whether there are differences in products purchased from the same listing multiple times.

Design/methodology/approach

Law enforcement drug purchases – predominantly cocaine, methamphetamine, MDMA and heroin – from Australian cryptomarket vendors (n = 38 in 2016/2017) were chemically analysed and matched with cryptomarket listings (n = 23). Descriptive and comparative analyses were conducted.

Findings

Almost all samples contained the advertised substance. In most of these cases, drugs were either supplied as-advertised-weight or number, or overweight or number. All listings that quantified purity overestimated the actual purity. There was no consistent relationship between advertised purity terms and actual purity. Across the six listings purchased from multiple times, repeat purchases from the same listing varied in purity, sometimes drastically, with wide variation detected on listings purchased from only one month apart.

Research limitations/implications

In this data set, cryptomarket listings were mostly accurate, but the system was far from perfect, with purity overestimated. A newer, larger, globally representative sample should be obtained to test the applicability of these findings to currently operating cryptomarkets.

Originality/value

This paper reports on the largest data set of forensic analysis of drug samples obtained from cryptomarkets, where data about advertised drug strength/dose were obtained.

Details

Drugs, Habits and Social Policy, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2752-6739

Keywords

View access options

Article

Publication date: 18 May 2021

Accounting for unadjusted news sentiment for asset pricing

Prajwal Eachempati and Praveen Ranjan Srivastava

A composite sentiment index (CSI) from quantitative proxy sentiment indicators is likely to be a lag sentiment measure as it reflects only the information absorbed in the market…

HTML

PDF (1.6 MB)

Downloads

492

Abstract

Purpose

A composite sentiment index (CSI) from quantitative proxy sentiment indicators is likely to be a lag sentiment measure as it reflects only the information absorbed in the market. Information theories and behavioral finance research suggest that market prices may not adjust to all the available information at a point in time. This study hypothesizes that the sentiment from the unincorporated information may provide possible market leads. Thus, this paper aims to discuss a method to identify the un-incorporated qualitative Sentiment from information unadjusted in the market price to test whether sentiment polarity from the information can impact stock returns. Factoring market sentiment extracted from unincorporated information (residual sentiment or sentiment backlog) in CSI is an essential step for developing an integrated sentiment index to explain deviation in asset prices from their intrinsic value. Identifying the unincorporated Sentiment also helps in text analytics to distinguish between current and future market sentiment.

Design/methodology/approach

Initially, this study collects the news from various textual sources and runs the NVivo tool to compute the corpus data’s sentiment polarity. Subsequently, using the predictability horizon technique, this paper mines the unincorporated component of the news’s sentiment polarity. This study regresses three months’ sentiment polarity (the current period and its lags for two months) on the NIFTY50 index of the National Stock Exchange of India. If the three-month lags are significant, it indicates that news sentiment from the three months is unabsorbed and is likely to impact the future NIFTY50 index. The sentiment is also conditionally tested for firm size, volatility and specific industry sector-dependence. This paper discusses the implications of the results.

Findings

Based on information theories and empirical findings, the paper demonstrates that it is possible to identify unincorporated information and extract the sentiment polarity to predict future market direction. The sentiment polarity variables are significant for the current period and two-month lags. The magnitude of the sentiment polarity coefficient has decreased from the current period to lag one and lag two. This study finds that the unabsorbed component or backlog of news consisted of mainly negative market news or unconfirmed news of the previous period, as illustrated in Tables 1 and 2 and Figure 2. The findings on unadjusted news effects vary with firm size, volatility and sectoral indices as depicted in Figures 3, 4, 5 and 6.

Originality/value

The related literature on sentiment index describes top-down/ bottom-up models using quantitative proxy sentiment indicators and natural language processing (NLP)/machine learning approaches to compute the sentiment from qualitative information to explain variance in market returns. NLP approaches use current period sentiment to understand market trends ignoring the unadjusted sentiment carried from the previous period. The underlying assumption here is that the market adjusts to all available information instantly, which is proved false in various empirical studies backed by information theories. The paper discusses a novel approach to identify and extract sentiment from unincorporated information, which is a critical sentiment measure for developing a holistic sentiment index, both in text analytics and in top-down quantitative models. Practitioners may use the methodology in the algorithmic trading models and conduct stock market research.

Details

Qualitative Research in Financial Markets, vol. 13 no. 3

Type: Research Article

DOI:

ISSN: 1755-4179

Keywords

View access options

Article

Publication date: 27 December 2022

A systematic examination of the family business contributions: is this domain a legitimate field of research?

Chelsea Sherlock, Erik Markin, R. Gabrielle Swab and Victoria Antin Yates

The purpose of this study is to systematically analyze family business research, which has experienced tremendous growth. Through this study’s categorization and evaluation of…

HTML

PDF (425 KB)

Downloads

385

Abstract

Purpose

The purpose of this study is to systematically analyze family business research, which has experienced tremendous growth. Through this study’s categorization and evaluation of research, the authors illustrate the evolution of family business research in management, entrepreneurship and family business domains over the past decade.

Design/methodology/approach

This study provides an interdisciplinary systematic review of family business literature between 2008 and 2022 to analyze the family business field. Following similar previous reviews (Chrisman et al., 2003; Debicki et al., 2009), this study’s final sample includes 1,443 studies, which the authors categorize into six broad topics and 21 subcategories of management topics.

Findings

This study’s analysis reveals the field has grown nearly fivefold since 2007. As such, the authors examine the growth and decline of specific research topics. The authors also find in the past decade family business research has experienced rapid growth across a variety of outlets, signaling increasing reach, richness and legitimacy of the field.

Originality/value

By reviewing and analyzing 1,443 family business articles, the results illustrate the evolution of family business research over the past decade and what this means for its future. Based on this study’s systematic review, the authors offer insights into the state of the field and propose avenues for future research so the field can continue to prosper.

Details

Journal of Management History, vol. 29 no. 3

Type: Research Article

DOI:

ISSN: 1751-1348

Keywords

View access options

Article

Publication date: 3 August 2021

A qualitative and quantitative comparison between Web scraping and API methods for Twitter credibility analysis

Irvin Dongo, Yudith Cardinale, Ana Aguilera, Fabiola Martinez, Yuni Quintero, German Robayo and David Cabeza

This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on…

HTML

PDF (268 KB)

Downloads

546

Abstract

Purpose

This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations.

Design/methodology/approach

As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods.

Findings

The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web.

Originality/value

Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco.

Details

International Journal of Web Information Systems, vol. 17 no. 6

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

Access

Year

Content type

1 – 10 of 743

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information