Search results

1 – 10 of over 1000
Book part
Publication date: 15 March 2021

Reto Hofstetter

Every second, vast amounts of data are generated and stored on the Internet. Data scraping makes these data accessible and usable for business and scientific purposes. Web-scraped

Abstract

Every second, vast amounts of data are generated and stored on the Internet. Data scraping makes these data accessible and usable for business and scientific purposes. Web-scraped data are of high value to businesses as they can be used to inform many strategic decisions such as pricing or market positioning. Although it is not difficult to scrape data, particularly when they come from public websites, there are six key steps that analysts should ideally consider and follow. Following these steps can help to better harness the business value of online data.

Article
Publication date: 14 May 2019

Teresa Scassa

The purpose of this paper is to examine how claims to “ownership” are asserted over publicly accessible platform data and critically assess the nature and scope of rights to reuse…

Abstract

Purpose

The purpose of this paper is to examine how claims to “ownership” are asserted over publicly accessible platform data and critically assess the nature and scope of rights to reuse these data.

Design/methodology/approach

Using Airbnb as a case study, this paper examines the data ecosystem that arises around publicly accessible platform data. It analyzes current statute and case law in order to understand the state of the law around the scraping of such data.

Findings

This paper demonstrates that there is considerable uncertainty about the practice of data scraping, and that there are risks in allowing the law to evolve in the context of battles between business competitors without a consideration of the broader public interest in data scraping. It argues for a data ecosystem approach that can keep the public dimension issues more squarely within the frame when data scraping is judicially considered.

Practical implications

The nature of some sharing economy platforms requires that a large subset of their data be publicly accessible. These data can be used to understand how platform companies operate, to assess their compliance with laws and regulations and to evaluate their social and economic impacts. They can also be used in different kinds of data analytics. Such data are therefore sought after by civil society organizations, researchers, entrepreneurs and regulators. This paper considers who has a right to control access to and use of these data, and addresses current uncertainties in how the law will apply to scraping activities, and builds an argument for a consideration of the public interest in data scraping.

Originality/value

The issue of ownership/control over publicly accessible information is of growing importance; this paper offers a framework for approaching these legal questions.

Details

Online Information Review, vol. 43 no. 6
Type: Research Article
ISSN: 1468-4527

Keywords

Article
Publication date: 3 August 2021

Irvin Dongo, Yudith Cardinale, Ana Aguilera, Fabiola Martinez, Yuni Quintero, German Robayo and David Cabeza

This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on…

Abstract

Purpose

This paper aims to perform an exhaustive revision of relevant and recent related studies, which reveals that both extraction methods are currently used to analyze credibility on Twitter. Thus, there is clear evidence of the need of having different options to extract different data for this purpose. Nevertheless, none of these studies perform a comparative evaluation of both extraction techniques. Moreover, the authors extend a previous comparison, which uses a recent developed framework that offers both alternates of data extraction and implements a previously proposed credibility model, by adding a qualitative evaluation and a Twitter-Application Programming Interface (API) performance analysis from different locations.

Design/methodology/approach

As one of the most popular social platforms, Twitter has been the focus of recent research aimed at analyzing the credibility of the shared information. To do so, several proposals use either Twitter API or Web scraping to extract the data to perform the analysis. Qualitative and quantitative evaluations are performed to discover the advantages and disadvantages of both extraction methods.

Findings

The study demonstrates the differences in terms of accuracy and efficiency of both extraction methods and gives relevance to much more problems related to this area to pursue true transparency and legitimacy of information on the Web.

Originality/value

Results report that some Twitter attributes cannot be retrieved by Web scraping. Both methods produce identical credibility values when a robust normalization process is applied to the text (i.e. tweet). Moreover, concerning the time performance, Web scraping is faster than Twitter API and it is more flexible in terms of obtaining data; however, Web scraping is very sensitive to website changes. Additionally, the response time of the Twitter API is proportional to the distance from the central server at San Francisco.

Details

International Journal of Web Information Systems, vol. 17 no. 6
Type: Research Article
ISSN: 1744-0084

Keywords

Article
Publication date: 12 November 2019

Judith Hillen

The purpose of this paper is to discuss web scraping as a method for extracting large amounts of data from online sources. The author wants to raise awareness of the method’s…

1081

Abstract

Purpose

The purpose of this paper is to discuss web scraping as a method for extracting large amounts of data from online sources. The author wants to raise awareness of the method’s potential in the field of food price research, hoping to enable fellow researchers to apply this method.

Design/methodology/approach

The author explains the technical procedure of web scraping, reviews the existing literature, and identifies areas of application and limitations for food price research.

Findings

The author finds that web scraping is a promising method to collect customised, high-frequency data in real time, overcoming several limitations of currently used food price data sources. With today’s applications mostly focussing on (online) consumer prices, the scope of applications for web scraping broadens as more and more price data are published online.

Research limitations/implications

To better deal with the technical and legal challenges of web scraping and to exploit its scalability, joint data collection projects in the field of agricultural and food economics should be considered.

Originality/value

In agricultural and food economics, web scraping as a data collection technique has received little attention. This is one of the first articles to address this topic with particular focus on food price analysis.

Details

British Food Journal, vol. 121 no. 12
Type: Research Article
ISSN: 0007-070X

Keywords

Article
Publication date: 14 February 2022

Stevan Milovanović, Zorica Bogdanović, Aleksandra Labus, Marijana Despotović-Zrakić and Svetlana Mitrović

The paper aims to studiy social recruiting for finding suitable candidates on social networks. The main goal is to develop a methodological approach that would enable preselection…

Abstract

Purpose

The paper aims to studiy social recruiting for finding suitable candidates on social networks. The main goal is to develop a methodological approach that would enable preselection of candidates using social network analysis. The research focus is on the automated collection of data using the web scraping method. Based on the information collected from the users' profiles, three clusters of skills and interests are created: technical, empirical and education-based. The identified clusters enable the recruiter to effectively search for suitable candidates.

Design/methodology/approach

This paper proposes a new methodological approach for the preselection of candidates based on social network analysis (SNA). The defined methodological approach includes the following phases: Social network selection according to the defined preselection goals; Automatic data collection from the selected social network using the web scraping method; Filtering, processing and statistical analysis of data. Data analysis to identify relevant information for the preselection of candidates using attributes clustering and SNA. Preselection of candidates is based on the information obtained.

Findings

It is possible to contribute to candidate preselection in the recruiting process by identifying key categories of skills and interests of candidates. Using a defined methodological approach allows recruiters to identify candidates who possess the skills and interests defined by the search. A defined method automates the verification of the existence, or absence, of a particular category of skills or interests on the profiles of the potential candidates. The primary intention is reflected in the screening and filtering of the skills and interests of potential candidates, which contributes to a more effective preselection process.

Research limitations/implications

A small sample of the participants is present in the preliminary evaluation. A manual revision of the collected skills and interests is conducted. The recruiters should have basic knowledge of the SNA methodology in order to understand its application in the described method. The reliability of the collected data is assessed, because users provide data themselves when filling out their social network profiles.

Practical implications

The presented method could be applied on different social networks, such as GitHub or AngelList for clustering profile skills. For a different social network, only the web scraping instructions would change. This method is composed of mutually independent steps. This means that each step can be implemented differently, without changing the whole process. The results of a pilot project evaluation indicate that the HR experts are interested in the proposed method and that they would be willing to include it in their practice.

Social implications

The social implication should be the determination of relevant skills and interests during the preselection phase of candidates in the process of social recruitment.

Originality/value

In contrast to previous studies that were discussed in the paper, this paper defines a method for automatic data collection using the web scraper tool. The described method allows the collection of more data in a shorter period. Additionally, it reduces the cost of creating an initial data set by removing the cost of hiring interviewers, questioners and people who collect data from social networks. A completely automated process of data collection from a particular social network stands out from this model from currently available solutions. Considering the method of data collection implemented in this paper, the proposed method provides opportunities to extend the scope of collected data to implicit data, which is not possible using the tools presented in other papers.

Details

Data Technologies and Applications, vol. 56 no. 4
Type: Research Article
ISSN: 2514-9288

Keywords

Article
Publication date: 15 May 2023

Catherine Prentice and Adam Pawlicz

This paper aims to examine the primary supply data sources that have been used for research into the sharing economy, and the advantages and limitations of these sources in the…

Abstract

Purpose

This paper aims to examine the primary supply data sources that have been used for research into the sharing economy, and the advantages and limitations of these sources in the literature.

Design/methodology/approach

To address the research aims, this study conducted a systematic literature review and content analysis of all relevant articles. Following the review, the methodological sections of the selected papers were examined to identify the characteristics and limitations of all data sources used in the papers.

Findings

This study revealed several limitations of the use of three major data sources, namely, web scraping with self-made bots, inside Airbnb and AirDNA, for sharing economy research. The review shows that the majority of the selected papers did not acknowledge any limitations, nor did they discuss the quality of the data sources.

Research limitations/implications

The findings of this paper can serve as guidelines for selecting appropriate data sources for research into the sharing economy and cautions researchers to address the limitations of the data sources used.

Originality/value

To the best of the authors’ knowledge, this is the first study that explores the advantages and limitations of data sources used in short-term rental market research.

Details

International Journal of Contemporary Hospitality Management, vol. 36 no. 3
Type: Research Article
ISSN: 0959-6119

Keywords

Open Access
Article
Publication date: 10 August 2021

Krystian Jaworski

The purpose of this study paper is to focus on developing novel ways to monitor an economy in real time during the COVID-19 pandemic. A fully automated framework is proposed for…

5961

Abstract

Purpose

The purpose of this study paper is to focus on developing novel ways to monitor an economy in real time during the COVID-19 pandemic. A fully automated framework is proposed for collecting and analyzing online food prices in Poland. This is important, as the COVID-19 outbreak in Europe in 2020 has led many governments to impose lockdowns that have prevented manual price data collection from food outlets. The study primarily addresses whether food price inflation can be accurately measured during the pandemic using only a laptop and Internet connection, without needing to rely on official statistics.

Design/methodology/approach

The big data approach was adopted to track food price inflation in Poland. Using the web-scraping technique, daily price information about individual food and non-alcoholic beverage products sold in online stores was gathered.

Findings

Based on raw online data, reliable estimates of monthly and annual food inflation were provided about 30 days before final official indexes were published.

Originality/value

This is the first paper to focus on measuring inflation in real time during the COVID-19 pandemic. Monthly and annual food price inflation are estimated in real time and updated daily, thereby improving previous forecasting solutions with weekly or monthly indicators. Using daily frequency price data deepens understanding of price developments and enables more timely detection of inflation trends, both of which are useful for policymakers and market participants. This study also provides a review of crucial issues regarding inflation that emerged during the COVID-19 pandemic.

Details

British Food Journal, vol. 123 no. 13
Type: Research Article
ISSN: 0007-070X

Keywords

Article
Publication date: 20 February 2023

Zakaria Sakyoud, Abdessadek Aaroud and Khalid Akodadi

The main goal of this research work is the optimization of the purchasing business process in the Moroccan public sector in terms of transparency and budgetary optimization. The…

Abstract

Purpose

The main goal of this research work is the optimization of the purchasing business process in the Moroccan public sector in terms of transparency and budgetary optimization. The authors have worked on the public university as an implementation field.

Design/methodology/approach

The design of the research work followed the design science research (DSR) methodology for information systems. DSR is a research paradigm wherein a designer answers questions relevant to human problems through the creation of innovative artifacts, thereby contributing new knowledge to the body of scientific evidence. The authors have adopted a techno-functional approach. The technical part consists of the development of an intelligent recommendation system that supports the choice of optimal information technology (IT) equipment for decision-makers. This intelligent recommendation system relies on a set of functional and business concepts, namely the Moroccan normative laws and Control Objectives for Information and Related Technology's (COBIT) guidelines in information system governance.

Findings

The modeling of business processes in public universities is established using business process model and notation (BPMN) in accordance with official regulations. The set of BPMN models constitute a powerful repository not only for business process execution but also for further optimization. Governance generally aims to reduce budgetary wastes, and the authors' recommendation system demonstrates a technical and methodological approach enabling this feature. Implementation of artificial intelligence techniques can bring great value in terms of transparency and fluidity in purchasing business process execution.

Research limitations/implications

Business limitations: First, the proposed system was modeled to handle one type products, which are computer-related equipment. Hence, the authors intend to extend the model to other types of products in future works. Conversely, the system proposes optimal purchasing order and assumes that decision makers will rely on this optimal purchasing order to choose between offers. In fact, as a perspective, the authors plan to work on a complete automation of the workflow to also include vendor selection and offer validation. Technical limitations: Natural language processing (NLP) is a widely used sentiment analysis (SA) technique that enabled the authors to validate the proposed system. Even working on samples of datasets, the authors noticed NLP dependency on huge computing power. The authors intend to experiment with learning and knowledge-based SA and assess the' computing power consumption and accuracy of the analysis compared to NLP. Another technical limitation is related to the web scraping technique; in fact, the users' reviews are crucial for the authors' system. To guarantee timeliness and reliable reviews, the system has to look automatically in websites, which confront the authors with the limitations of the web scraping like the permanent changing of website structure and scraping restrictions.

Practical implications

The modeling of business processes in public universities is established using BPMN in accordance with official regulations. The set of BPMN models constitute a powerful repository not only for business process execution but also for further optimization. Governance generally aims to reduce budgetary wastes, and the authors' recommendation system demonstrates a technical and methodological approach enabling this feature.

Originality/value

The adopted techno-functional approach enabled the authors to bring information system governance from a highly abstract level to a practical implementation where the theoretical best practices and guidelines are transformed to a tangible application.

Details

Kybernetes, vol. 53 no. 5
Type: Research Article
ISSN: 0368-492X

Keywords

Content available
Book part
Publication date: 15 March 2021

Abstract

Details

The Machine Age of Customer Insight
Type: Book
ISBN: 978-1-83909-697-6

Open Access
Article
Publication date: 11 July 2023

Carolina Nicolas, Angelica Urrutia and Gonzalo González

Explore the use of Gender-Fair Language (GFL) by influencers on Instagram.

1253

Abstract

Purpose

Explore the use of Gender-Fair Language (GFL) by influencers on Instagram.

Design/methodology/approach

The clustering methodology. A digital Bag-of-Words (BoW) Method called GFL Clustering BoW Methodology to identify whether an inclusive marketing (IM) strategy can be used. Thus, this research has a methodological and practical contribution to increasing the number of marketing technology tools.

Findings

This study is original as it proposes an inclusive digital marketing strategy and contributes with methods associated with digital transfers in order to improve marketing strategies, tactics and operations for inclusive content with a data integrity approach.

Research limitations/implications

Due to the limitations of the application programming interface (API) of the social network Instagram, a limited number of text data were used, which allowed for retrieving the last 12 publications of each studied profile. In addition, it should be considered that this study only includes the Spanish language and is applied to a sample of influencers from Chile.

Practical implications

The practical contribution of this study will lead to a key finding for the definition of communication strategies in both public and private organizations.

Originality/value

The originality of this work lies in its attractive implications for nonprofit and for-profit organizations, government bodies and private enterprises in the measurement of the success of campaigns with an IM communicational strategy and to incorporate inclusive and non-sexist content for their consumers so as to contribute to society.

摘要

研究目的

本研究擬探究有影響力的人士在使用即時電報 (Instagram) 時、如何使用性別中立語言。

研究設計/方法/理念

研究使用了聚類分析法;具體來說, 研究人員採用一個叫 GFL聚類詞袋法的數位詞袋分析法, 去確定研究可否使用信息管理策略。因此, 本研究在行銷科技方面、添加了一個工具, 就此而言, 本研究在學術的研究法和實務方面、均作出貢獻。

研究結果

本研究建議了一個包括一切的數位行銷策略;研究亦構建了若干與數位傳輸有關的方法, 以能利用數據完整性的理念, 為行銷策略、行銷戰術和市場營銷, 在內容的全面包含度方面取得改善。

研究的局限/啟示

因為社交網站即時電報的應用程式介面有其局限, 故使用了少量的文本數據, 這可使每個被探討的傳略的最後12個發佈能被撿回。另外需注意的是、本研究只涵蓋西班牙語, 而且, 研究使用的樣本只是來自智利有影響力的人士。

實務方面的啟示

本研究在實務方面的貢獻是、它為探討在公共機構和私營機構內使用的溝通策略的定義上、帶來重要的啟發和發現。

研究的原創性/價值

本研究的原創性在於它給營利和非營利組織、政府機關和私人企業帶來頗具吸引力的啟示。而這些啟示是與測量以包括一切的行銷溝通策略進行的專門活動是否成功有關的。另外, 涵蓋一切和無性別歧視的內容被納入供消費者使用, 以此為社會帶來裨益。

1 – 10 of over 1000