Flickr data for analysing tourists’ spatial behaviour and movement patterns: A comparison of clustering techniques
Journal of Hospitality and Tourism Technology
Article publication date: 25 February 2020
Issue publication date: 20 May 2020
The purpose of this study is to analyse the suitability of photo-sharing platforms, such as Flickr, to extract relevant knowledge on tourists’ spatial movement and point of interest (POI) visitation behaviour and compare the most prominent clustering approaches to identify POIs in various application scenarios.
The study, first, extracts photo metadata from Flickr, such as upload time, location and user. Then, photo uploads are assigned to latent POIs by density-based spatial clustering of applications with noise (DBSCAN) and k-means clustering algorithms. Finally, association rule analysis (FP-growth algorithm) and sequential pattern mining (generalised sequential pattern algorithm) are used to identify tourists’ behavioural patterns.
The approach has been demonstrated for the city of Munich, extracting 13,545 photos for the year 2015. POIs, identified by DBSCAN and k-means clustering, could be meaningfully assigned to well-known POIs. By doing so, both techniques show specific advantages for different usage scenarios. Association rule analysis revealed strong rules (support: 1.0-4.6 per cent; lift: 1.4-32.1 per cent), and sequential pattern mining identified relevant frequent visitation sequences (support: 0.6-1.7 per cent).
As a theoretic contribution, this study comparatively analyses the suitability of different clustering techniques to appropriately identify POIs based on photo upload data as an input to association rule analysis and sequential pattern mining as an alternative but also complementary techniques to analyse tourists’ spatial behaviour.
From a practical perspective, the study highlights that big data sources, such as Flickr, show the potential to effectively substitute traditional data sources for analysing tourists’ spatial behaviour and movement patterns within a destination. Especially, the approach offers the advantage of being fully automatic and executable in a real-time environment.
The study presents an approach to identify POIs by clustering photo uploads on social media platforms and to analyse tourists’ spatial behaviour by association rule analysis and sequential pattern mining. The study gains novel insights into the suitability of different clustering techniques to identify POIs in different application scenarios.
本论文旨在分析图片分享平台Flickr对截取游客空间动线信息和景点（POI）游览行为的适用性, 并且对比最知名的几种聚类分析手段, 以确定不同情况下的POI。
本论文首先从Flickr上摘录下图片大数据, 比如上传时间、地点、用户等。其次, 本论文使用DBSCAN和k-means聚类分析参数来将上传图片分配给POI隐性变量。最后, 本论文采用关联规则挖掘分析（FP-growth参数）和序列样式勘探分析（GSP参数）以确认游客行为模式。
本论文以慕尼黑城市为样本, 截取2015年13,545张图片。POIs由DBSCAN和k-means聚类分析将其分配到有名的POIs。由此, 本论文证明了两种技术对不同用法的各自优势。关联规则挖掘分析显示了显著联系（support：1%−4.6%；lift：1.4%−32.1%）, 序列样式勘探分析确立了相关频率游览次序（support：0.6%−1.7%。
本论文的理论贡献在于, 根据图片数据, 通过对比分析不同聚类分析技术对确立POIs, 并且证明关联规则挖掘分析和序列样式勘探分析各有千秋又互相补充的分析技术以确立游客空间行为。
本论文的现实意义在于, 强调了大数据的来源, 比如Flickr,证明了其对于有效代替传统数据的潜力, 以分析在游客在一个旅游目的地的空间行为和动线模式。特别是这种方法实现了实时自动可操作性等优势。
本论文展示了一种方法, 这种方法通过聚类分析社交媒体上的上传图片以确立POIs, 以及通过关联规则挖掘分析和序列样式勘探分析来分析游客空间行为。本论文对于不同聚类分析以确立不同适用情况下的POIs的确立提出了独到见解。
This paper forms part of special section “Big data in tourism and hospitality”, guest edited by Marianna Sigala and Roya Rahimi.
Höpken, W., Müller, M., Fuchs, M. and Lexhagen, M. (2020), "Flickr data for analysing tourists’ spatial behaviour and movement patterns: A comparison of clustering techniques", Journal of Hospitality and Tourism Technology, Vol. 11 No. 1, pp. 69-82. https://doi.org/10.1108/JHTT-08-2017-0059
Emerald Publishing Limited
Copyright © 2020, Emerald Publishing Limited