Bruns, A. and Weller, K. (2014), "Twitter data analytics – or: the pleasures and perils of studying Twitter", Aslib Journal of Information Management, Vol. 66 No. 3. https://doi.org/10.1108/AJIM-02-2014-0027Download as .RIS
Emerald Group Publishing Limited
Twitter data analytics – or: the pleasures and perils of studying Twitter
Article Type: Guest editorial From: Aslib Journal of Information Management, Volume 66, Issue 3
It might still sound strange to dedicate an entire journal issue exclusively to a single internet platform. But it is not the company Twitter Inc. that draws our attention; this issue is not about a platform and its features and services. It is about its users and the ways in which they interact with one another via the platform, about the situations that motivate people to share their thoughts publicly, using Twitter as a means to reach out to one another. And it is about the digital traces people leave behind when interacting with Twitter, and most of all about the ways in which these traces – as a new type of research data – can also enable new types of research questions and insights.
Since its launch in 2006, Twitter has not only attracted over 500 million registered users, but also turned into a popular subject of scholarly research, resulting in more than 1,400 scientific publications on Scopus, 470 on Web of Science, and 10,000 more available via Google Scholar (all numbers referring to searching documents with “Twitter” explicitly in the title). These publications emerge from different scholarly fields and disciplines and address various questions. Some focus on Twitter users’ roles in contemporary internet dynamics and Twitter's impact on different areas of private and public life (from popular culture to crises and disasters) while other studies make use of Twitter-based data sets in order to test models for network dynamics and communication behaviour. This diversity is as fascinating as it is challenging.
Researchers working with Twitter data may find themselves surrounded by a certain hype about their activities, while at other times they may have to defend themselves for studying a platform that is supposedly only used for “pointless babble” (Pear Analytics, 2009). While Twitter is used by fewer people than Facebook, there are almost as many studies of Twitter than there are of Facebook (as measured with a Scopus search for publications with either “Twitter” or “Facebook” in their title). One possible reason for this might be that it is relatively easy to get access to Twitter data – at least in comparison to Facebook. Twitter research is often (but not exclusively) based on rich data sets that can be retrieved via the Twitter API and subsequently be mined with specialised tools. In some cases, this is considered as working with “big data”. Yet, in this context “big” refers to the number of actual units in a data set – the number of collected tweets or monitored users – rather than to the capacity of a device or the size of storage needed for the respective files. And while there are also some challenges for computational infrastructure to handle big sets of Twitter data for analysis, the more important challenges at this stage are in developing theories, methods and models for this particular research field, in order to make sense of thousands and millions of tweets.
Although researchers in many disciplines have been making progress in the establishment of both theories and methods for studying Twitter, there still is a lack of comparability and compatibility across individual studies, and a need for some commonly accepted standards in data collection, analysis and interpretation. Since its rise to public prominence in 2009, Twitter research has been much like an experimental playground, allowing researchers to experiment with manifold research designs covering everything from case studies based on single tweets to complex network analysis, qualitative, quantitative, and mixed-methods approaches, linguistic analyses focusing for example on sentiments or event detection, and even artistic visualisations and sonifications of data. Based on the rich body of literature it is now time to advance to a new level by addressing the following top five challenges:
Representativeness and validity: researchers need to identify the appropriate way to collect the data that match their research question, which also means considering potential biases and the level of representativeness of their work. Validity of results has to account for the adoption rate of Twitter in different countries, age groups and communities. Choosing a specific hashtag for data collection, for example, will result in further biases one needs to be aware of. And restrictions in data collection as imposed by Twitter itself may also prevent the collection of the “ideal” data set. The current hype around big data has already resulted in a critical response which seeks to determine the limitations of working with such only apparently comprehensive data sets and highlights the persisting value of research which draws only on “small data”.
Cross-platform studies: many studies focus exclusively on Twitter, without considering a broader theoretical framework that takes other online services into account. Some challenges – e.g. dealing with user data from a commercial company – have already been addressed in other contexts such as search engine research.
Comparisons: more context is also needed in order to be able to compare the various Twitter studies. Comparisons of similar case studies – e.g. different national elections – are one dimension. Another is long-term studies that study Twitter use in specific usage scenarios over several years instead of single days or weeks.
Multi-method approaches: with the ever-growing number of methods that have already been applied in Twitter data analysis it is now time to deploy several approaches in dealing with the same data set in order to develop a truly comprehensive picture.
Context and meaning: finally, Twitter Inc.'s design decisions, and especially the look and feel of Twitter's web site, apps and third-party clients, make a difference to the user experience and should be taken into consideration to add context and meaning to user behaviour. In the future it will be hard to re-trace which studies have counted retweets at a point in time where there was no retweet button in the interface, for example. User-influenced features in Twitter, such as hashtags and retweets (Halavais, 2014), demonstrate that practices are evolving – and so is the meaning of a specific action on Twitter. Also, there are some “forgotten features” – images embedded in tweets as well as favourites are only rarely the focus of Twitter data analytics.
It is beyond the scope of this special issue to solve these problems; in fact, this will take time and joint effort. But we hope that some of the papers in this issue will stimulate future discussions in the field. We have selected those submissions which either contribute to the general understanding of Twitter data analytics or which apply innovative approaches and describe them in detail and with careful discussion. Based on this we have chosen the following seven papers, all of which contribute to the growing field of Twitter research in their own specific way.
Michael Zimmer and Nicholas Proferes open up this special issue with their “Topology of Twitter Research”, which gives an excellent overview of emerging research practices. Their meta-study investigates 382 publications on Twitter research in order to identify, amongst others, the tools and methods for data collection in use, the size of the respective data sets, and ethical concerns in studying Twitter users. This paper is particularly valuable as it is one of the still relatively few contributions on ethics in Twitter research and might help to raise awareness of these issues when working with user-generated content from Twitter. Issues that are far from trivial if we consider the findings from this paper that “at least 300 million user accounts were subjected to academic research between 2007 and 2012”.
The second paper, by Erik Borra and Bernhard Rieder, is an exceptionally rich presentation of a tool for Twitter data collection and analysis as it clearly goes beyond describing the features of this particular tool but also addresses both epistemological and practical issues of Twitter research. It acts as an example of the overlapping fields of digital methods, web social science or computational social science. The authors provide useful information about the challenges in capturing URLs from tweets, the limitations of accessing Twitter data, and the legal implications for sharing research data and making it reusable.
The remaining papers represent specific case studies focusing on practices in Twitter usage – all applying different approaches even though there is some overlap in scenarios as different papers deal with Twitter as a television backchannel and/or with political communication on Twitter.
Stefanie Haustein, Timothy Bowman, Kim Holmberg, Isabella Peters and Vincent Larivière work in the field of altmetrics (Priem et al., 2012), i.e. alternative indicators for scholarly activity, and compare the tweeting behaviour and publication output of a selected sample of astrophysicists. This is innovative in its effort to combine online and offline behaviour to gain a bigger picture on different user types. And indeed the authors can identify different user types based on their activity. They also compare term frequencies in tweets and in the abstracts of the astrophysicists’ published papers in order to identify possible shifts in topics of discussion. Thus, although obviously limited by the selected sample, this study provides very insightful information about tweeting in the ecology of scholarly communication.
With the next paper we move on to the use of Twitter in popular culture and its role as a means for commenting on TV shows and televised events, and at the same time enter another dimension of using Twitter in scholarly contexts. Magdalena Bober discusses her approach to using Twitter to engage students in feasible research projects. Although there are other cases where students carried out exemplary case studies with Twitter data during classes, Bober is among the first to critically reflect upon the pros and cons of doing so.
The remaining three papers all deal with political discussions on Twitter – a topic that is probably the most popular among Twitter usage case studies, and still offers room for new ideas and approaches. The selected papers differ in the countries they study; the research question and methods; and the tools used for data collection. This illustrates how complicated it can be to compare different studies even for similar domains of interest. The paper by Bente Kalsnes, Arne Krumsvik and Tanja Storsul also deals with televised events and focuses on Twitter as a backchannel for election debates. This case study is taken from Norway, a country with a relatively high Twitter adoption rate (15 per cent of the internet population). The authors have applied an interesting approach of manually coding tweets on different categorisation levels. The next case is taken from Sweden. And, unlike most other papers on political discussions on Twitter, this article by David Gunnarsson Lorentzen does not look at elections, but investigates political communication outside election periods. Gunnarsson Lorentzen encounters some dominance of ordinary Twitter users (while election debates seem to be much more dominated by politicians), which also leads him to take extra steps in the anonymisation of the data set. Finally, Yu-Chung Cheng and Pai-Lin Chen contribute the case of the 2012 Taiwan election – which is interesting as most (but not all) other papers on Twitter and election focus on European countries or the USA. This paper also reminds us not to overlook the variety of user communities within Twitter – the authors explicitly address ways of identifying sub-communities based on different language systems in use.
We thank all contributors and hope that this special issue encourages its readers to work with Twitter data in order to either develop further novel approaches or to help to connect and harmonise existing methods.
Axel Bruns, Queensland University of Technology, Brisbane, Australia
Katrin Weller, GESIS Leibniz Institute for the Social Sciences, Cologne, Germany
Halavais, A. (2014), “Structure of twitter: social and technical”, in Weller, K., Bruns, A., Burgess, J., Mahrt, M. and Puschmann, C. (Eds), Twitter & Society, Peter Lang, New York, NY, pp. 29-42
Pear Analytics (2009), “Twitter study – August 2009”, available at: http://www.pearanalytics.com/blog/wp-content/uploads/2010/05/Twitter-Study-August-2009.pdf (accessed 31 January 2014)
Priem, J., Piwowar, H.A. and Hemminger, B.M. (2012), “Altmetrics in the wild – using social media to explore scholarly impact”, available at: http://arxiv.org/abs/1203.4745v1 (accessed 31 January 2014)
About the Guest Editors
Dr Axel Bruns is an Associate Professor in the Creative Industries Faculty at the Queensland University of Technology, and a Chief Investigator in the ARC Centre of Excellence for Creative Industries and Innovation (www://cci.edu.au/). He is the author of Blogs, Wikipedia, Second Life and Beyond (2008) and Gatewatching (2005), and a co-editor of Twitter and Society (2014). See http://mappingonlinepublics.net/ for more details on his current social media research.
Dr Katrin Weller is an Information Scientist at GESIS Leibniz Institute for the Social Science, where she is responsible for new approaches to handling social media data in the Data Archive for the Social Sciences. She is co-editor of Twitter and Society (2014) and author of Knowledge Representation in the Social Semantic Web (2010). Dr Katrin Weller is the corresponding author and can be contacted at: mailto:Katrin.Weller@gesis.org