Rhythmicity of health information behaviour: Utilizing the infodemiology approach to study temporal patterns and variations

Jonas Tana (Department of Healthcare, Arcada University of Applied Sciences, Helsinki, Finland) (Department of Information Studies, Faculty of Social Sciences, Business and Economics, Åbo Akademi University, Turku, Finland)
Emil Eirola (Department of Business Management and Analytics, Arcada University of Applied Sciences, Helsinki, Finland)
Kristina Eriksson-Backa (Department of Information Studies, Faculty of Social Sciences, Business and Economics, Åbo Akademi University, Turku, Finland)

Aslib Journal of Information Management

ISSN: 2050-3806

Publication date: 18 November 2019

Abstract

Purpose

This paper brings focus and attention to the aspect of time within health information behaviour. The purpose of this paper is to critically assess and present strengths and weaknesses of utilising the infodemiology approach and metrics as a novel way to examine temporal variations and patterns of online health information behaviour. The approach is shortly exemplified by presenting empirical evidence for temporal patterns of health information behaviour on different time-scales.

Design/methodology/approach

A short review of online health information behaviour is presented and methodological barriers to studying the temporal nature of this behaviour are emphasised. To exemplify how the infodemiology approach and metrics can be utilised to examine temporal patterns, and to test the hypothesis of existing rhythmicity of health information behaviour, a brief analysis of longitudinal data from a large discussion forum is analysed.

Findings

Clear evidence of robust temporal patterns and variations of online health information behaviour are shown. The paper highlights that focussing on time and the question of when people engage in health information behaviour can have significant consequences.

Practical implications

Studying temporal patterns and trends for health information behaviour can help in creating optimal interventions and health promotion campaigns at optimal times. This can be highly beneficial for positive health outcomes.

Originality/value

A new methodological approach to study online health information behaviour from a temporal perspective, a phenomenon that has previously been neglected, is presented. Providing evidence for rhythmicity can complement existing epidemiological data for a more holistic picture of health and diseases, and their behavioural aspects.

Keywords

Citation

Tana, J., Eirola, E. and Eriksson-Backa, K. (2019), "Rhythmicity of health information behaviour: Utilizing the infodemiology approach to study temporal patterns and variations", Aslib Journal of Information Management, Vol. 71 No. 6, pp. 773-788. https://doi.org/10.1108/AJIM-01-2019-0029

Download as .RIS

Publisher

:

Emerald Publishing Limited

Copyright © 2019, Jonas Tana, Emil Eirola and Kristina Eriksson-Backa

License

Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial & non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode


Introduction

Time touches every dimension of our being, it is everywhere in human life and is constitutive of life in nature as well as society (Bender and Wellbery, 1991; Luckmann, 1991; Roenneberg, 2012). All organisms, from single cells to human beings, follow rhythmic behaviour that is the key to the time-world of nature (Adam, 1990). The rhythmicity or temporal structures of hours, days, weeks, months, seasons and years deeply affect the earth’s environment and governs and contextualises many, if not all, aspects of life, including health and illness (Mogilner et al., 2018; Ayers et al., 2014). This rhythmicity is a universal phenomenon (Adam, 1990). However, in spite of time being so omnipresent and pervasive in everyday life, as well as important to our health and well-being and behaviours related to them, time has in many aspects been treated as a phenomenon of secondary importance and even thoroughly neglected (Adam, 1990; Davies and McKenzie, 2002; Giddens, 1979; Luckmann, 1991; Roenneberg, 2012).

These temporal structures can be divided into circadian, circaseptan, circa-monthly and circannual. Circadian rhythms are associated with the 24-h cycle arising from the rotation of the earth around its axis, circaseptan rhythms describes the cyclic seven-day phenomena, whereas circa-monthly rhythms are associated by the complete lunar orbit around earth every 30 (29.53) days (Reinberg et al., 2017). Seasonal, or circannual time-periods, again, are entrained by seasonal and annual changes caused by rotation of the earth around its sun (Reinberg et al., 2017). All these rhythms have been found to affect health, making them significant to study in relation to the temporal variations of health-related behaviours (Ayers et al., 2014; Reinberg et al., 2017). Many health behaviours, issues and disorders, both somatic and psychological, follow different temporal patterns on circannual, circa-monthly, circaseptan, as well as circadian, levels. Variations in temporal structures have been reported for a myriad of symptoms and diseases as well as health-related behaviours, from sleeping disorders and depression to risk behaviours and diseases like, for example, diabetes, cardiovascular diseases and cancer (Basnet et al., 2016; Gabarron et al., 2015; Madden, 2017; Reinberg et al., 2017). This is the case for health information behaviour, or how people seek, obtain, evaluate, categorise and use health-related information, as well (Ek, 2013). Health information behaviour can be conceptualised as a process initiated by a health-related stimuli, an information need or a knowledge gap (Lambert and Loiselle, 2007; Ormandy, 2011). These gaps can be seen as a discontinuity condition for the individual, and are a fundamental aspect of reality (Savolainen, 1993, 2006). They are also temporal, affected and changing over time (Ormandy, 2011).

Research on the temporal patterns and variations of health information behaviour is scarce (Davies and McKenzie, 2002; Savolainen, 2006; Savolainen, 2018). Reasons for this have been suggested to be methodological. Traditional health-related data are usually gathered through surveys or interviews on an annual or semi-annual level, leaving temporal variations and rhythms, except some annual or seasonal, outside the scope of data collection and only providing snapshots of health behaviour in relation to time (Anker et al., 2011; Ayers et al., 2014). Research on health-related behaviour and conditions are, furthermore, often based on clinical data, and there is limited use of preclinical data on behavioural patterns, including health information behaviour, in a population. While traditional data gathering methods remain invaluable, new data and novel methods such as infodemiology, that is, aggregation and analysis of health-related information especially on the internet (Eysenbach, 2009), can complement this data and help shed some light on the temporal rhythms related to health information behaviour. More and more people turn to the internet in health-related matters to be better informed, seek peer-support, find alternative treatments, take charge of their health and manage minor conditions they perceived they had (Eysenbach, 2009; Eysenbach, 2011; Lee et al., 2014). Utilising novel methods and metrics, in this case the infodemiology approach, and mining data from this type of online health information behaviour can complement and broaden our understanding of health-related knowledge gaps, and provide new insights into the temporal patterns of health and disease on a macro scale (Ayers et al., 2014; Eysenbach, 2009; Guy et al., 2011). Measures of these temporal rhythms could also be an important determinant of health status (Basnet et al., 2018).

Aim and objective

The aim of this paper is to present the strengths and limitations of utilising the infodemiology approach and metrics to examine temporal variations and patterns of health information behaviour in a Finnish online discussion forum. The intention of the paper is to bring focus and attention to the aspect of time within health information behaviour, and to focus on answering the question of when Finnish people are engaged in health information behaviour, by using infodemiology metrics. We hypothesised that there are temporal variation and patterns in health information behaviour on circadian, circaseptan, circa-monthly and circannual levels. Temporal variations are exemplified shortly by presenting a few interesting results of longitudinal time tracking from the largest Finnish online discussion forum, Suomi24. Finland, with its northern situation and intense fluctuations in both seasonal temperature and daylight makes an interesting subject to study health-related phenomena from a temporal aspect (Basnet et al., 2018). Moreover, internet use in Finland is ubiquitous, and is and has been an important source for information related to health or illness (Ek et al., 2013; Tilastokeskus, 2017).

Studying online health information behaviour

The internet is, as mentioned, important as a source for health-related information in Finland. A nation-wide survey conducted in 2009 showed that already then, nearly 70 per cent of 18–65 year old Finns had visited some health-related site during the past 12 months. The most used sources were health portals, visited by 45 per cent (Ek et al., 2013). Recent statistics show that 64 per cent of Finns in the ages 16–89 years searched for information on illness, nutrition or health online in 2017. There are, however, variations between age groups: as many as 80 per cent of those aged 25–34 years used online health information, whereas the use was decreasing with higher age (Tilastokeskus, 2017, p. 36; cf. Ek and Niemelä, 2010). Similar figures are found elsewhere: for example, in the UK, 69 per cent searched health information online in 2013 (Dutton et al., 2013). The importance of the internet as a source for this type of information varies, however; an older survey conducted in Hong Kong showed that only 44 per cent had looked for health information on the internet (Yan, 2010).

Health information behaviour, and in particular health information seeking and needs, have typically been studied utilising cross-sectional study design, with either surveys or structured interviews as the main method of documentation, to characterize the behaviour of the general population (Anker et al., 2011; Ramsey et al., 2017). This shows that health information behaviour has strong methodological traditions. This is also the case with health information behaviour in different online settings, such as social networking sites and social media (Zhao and Zhang, 2017; Kim and Syn, 2014). This type of research design provides a “snapshot” of health information behaviour, as it characterises the behaviour at a single point in time (Anker et al., 2011). Health information behaviour has also been approached by utilising retrospective reviews that rely on previously collected data sets or cohort designs that compare similar information on samples collected at alternate points of time (Anker et al., 2011). Moreover, some studies have applied naturalistic interventions or observations that take place in a natural setting and that characterize the information seeking behaviours of a group, for instance, patients with specific ailments or illnesses, or within a specific demography (Anker et al., 2011; Kim and Syn, 2014; Zhao and Zhang, 2017). The least applied methods used for health information seeking were, according to Anker et al. (2011) experiments or quasi-experiments and longitudinal designs.

When studying online health information behaviour, factors possibly affecting use are often in focus, and these are often socio-demographic variables, health conditions and attributes of websites (Marton and Wei Choo, 2012). Socio-demographic variables often include gender (e.g. Ybarra and Suman, 2006; Renahy et al., 2010; Yan, 2010; Hallyburton and Evarts, 2014; Bidmon and Terlutter, 2015; Torrent-Sellens et al., 2016; Tilastokeskus, 2017, p. 36), education level (e.g. Flynn et al., 2006; Ek and Niemelä, 2010; Torrent-Sellens et al., 2016; Tilastokeskus, 2017, p. 36; Yan, 2010) and age (e.g. Ek and Niemelä, 2010; Tilastokeskus, 2017, p. 36; Torrent-Sellens et al., 2016; Yan, 2010; Ybarra and Suman, 2006). Health status as a variable behind online health information behaviour is often studied, as well (e.g. Flynn et al., 2006; Renahy et al., 2010; Torrent-Sellens et al., 2016). Other variables include, for example, internal factors like trust (Sbaffi and Rowley, 2017; Lee et al., 2014), and eHealth literacy skills (Lee et al., 2014) that can facilitate or hinder use. Motivations for and outcomes of using online discussion forums, including health-related forums, have been examined, as well (Pendry and Salvatore, 2015). Marton and Wei Choo (2012) note that most studies seem to be producing descriptive statistics reporting the extent and nature of online health information seeking. Time as a factor, however, does not seem to occur in these studies.

As always, all methods have their strengths and weaknesses. For instance, a limitation that the use of either surveys or interviews proposes is that it only examines behaviour at a single point in time in a non-naturalistic setting (Anker et al., 2011). As proposed by Anker et al. (2011), studies would benefit from applying a longitudinal study design, examining health information seeking over an extended period of time, as well as track the temporal nature and variations of health information seeking on the internet. This could potentially present a more comprehensive picture of health information seeking and behaviour, especially as time has previously been neglected within research on health information behaviour, and information science in general. In light of recent developments in novel methods, some of these weaknesses and recommendations could be dealt with.

Infodemiology and infoveillance

Infodemiology, a portmanteau of information and epidemiology, has been defined by Eysenbach (2009) as the science of distribution and determinants of information in an electronic medium, specifically the internet, or in a population, with the ultimate aim to inform public health and public policy. Infodemiology is an emerging discipline that tackles the problem that in today’s society it is not about the availability of information anymore, but its aggregation and analysis. The field is highly interdisciplinary and requires the collaboration of many different domains, including information and computer scientists. Methodologically, infodemiology has been divided into supply-based and demand-based infodemiology (Eysenbach, 2009). Methods within supply-based infodemiology, under which category this study falls, include analysing discussion forum postings, social media status updates or the amount of websites by topic and obtaining indicators on changes in relation to time. Such supply-based infodemiology metrics are especially useful when tracked longitudinally (Eysenbach, 2009). The scope of infodemiology also includes demand-based infodemiology, which entails analysing online health information seeking and needs, and metrics associated with this, such as patterns of activity or visits on health-related websites or queries and query volumes in search engines. As Eysenbach (2009) concludes, supply- and demand-based infodemiology methods are quite similar and employ similar workflows from selecting, filtering and constructing meaning of large data sets by various quantitative and qualitative methods to employing basic descriptive and analytical statistical methods, or more advanced temporal statistical methods to detect patterns and trends. As such, infodemiology could be described as an approach that entails different methods and metrics.

Infodemiology is rooted in the idea that there is a relationship between population health on one hand, and information and communication patterns on the internet on the other. Therefore, changes in these patterns could be an early symptom of changes in population health (Eysenbach, 2009). An increasing number of people engage in online health information behaviour, and this rapidly rising amount of user-generated infodemiological data, or metrics, from social media, search engines and websites open up for the possibility to systematically study and measure health-related patterns, trends and knowledge gaps over time, something that has previously been immeasurable (Eysenbach, 2006, 2011). The related term infoveillance has been defined as the longitudinal tracking of infodemiology metrics for surveillance and trend analysis. Infoveillance is especially relevant when the infodemiology approach, with both the supply- and demand-side methods and metrics, are employed in applications and services where the primary aim is to provide automated and continuous real-time surveillance (Eysenbach, 2009). This falls outside the scope of this study, as data utilised in this study are not available in real time.

Infodemiology research is needed to develop a set of methodologies to understand patterns and trends for online health information behaviour (Eysenbach, 2006). Analysing infodemiology metrics on when people engage in health information behaviour, including when they seek, communicate and share information, can provide novel and valuable insights to health-related behaviour and can inform public health officials as well as other actors within the field of health (Eysenbach, 2009). This is particularly significant, as an important role of public health is the surveillance of population health status, including the temporal distribution of diseases and behavioural risk factors (Rothman and Greenland, 2014). However, vast complexity and breadth of public health tasks today call for new ways to strengthen the capacity of health agencies (Mamiya et al., 2017). The infodemiology approach adds a novel set of methods, and provides unmatched opportunities for the management of health-related web data and information generated by online users, as well as unique opportunities for insights into health information behaviour (Eysenbach, 2011; Zeraatkar and Ahmadi, 2018). Given the rapidly growing use of the internet as a source for health information, monitoring online health information behaviour is not only useful, but also necessary (Mamiya et al., 2017).

The infodemiology approach has been utilised to study many different health-related phenomena, like mining tweets for pandemics, different ailments and public health issues (Chew and Eysenbach, 2010; Paul and Dredze, 2011), analysing internet search trends for a multitude of illnesses and health issues, to complement epidemiological research (Abedi et al., 2015; Brigo and Trinka, 2015; Carneiro and Mylonakis, 2009; Seifter et al., 2010) as well as studying accessing and sharing information related to different topics (Wong et al., 2013; Matsuda et al., 2017). Temporal variations and patterns of health information behaviour have also been investigated using infodemiology metrics, mostly for specific health issues or diseases, from mental health problems (Arendt and Scherr, 2017; Ayers et al., 2013; Chen et al., 2018; Tana, 2018; Tana et al., 2018), to somatic diseases like Lyme disease (Pesälä et al., 2017), diabetes (Tkachenko et al., 2017) and disease and influenza outbreaks (Bragazzi et al., 2017; Kraut et al., 2017; Ortiz-Martínez and Jiménez-Arcia, 2017; Osuka et al., 2018; Seo and Shin, 2017). However, research on temporal variations and patterns of general health and wellness utilising infodemiology metrics is scarce, as most infodemiology research has focussed on specific diseases and their symptoms (Guy et al., 2011; Zeraatkar and Ahmadi, 2018). Ayers et al. (2014) investigated the circaseptan rhythms of health considerations by monitoring internet search queries, while Jadhav et al. (2014) touched upon the temporal patterns (morning, afternoon/evening, night) while analysing how online health information seeking behaviour may differ by accessing device. To the best of the authors’ knowledge, this current study is the first case where infodemiology metrics are used to examine the temporal variations and patterns of various health-related phenomena, as well as health in general, by utilising a big, longitudinal and unique social media dataset.

Data and methods

Suomi24 (www.suomi24.fi/) is the largest, and most popular, free and anonymous discussion forum in Finland. The Suomi24 discussion forum data, which are available for research purposes, contain the entire database and all messages of the Suomi24 discussion forum. The data set can be downloaded for research purposes through FinCLARIN’s Kielipankki Korp-interface (http://urn.fi/urn:nbn:fi:lb-2019010801). The corpus contains all messages posted in the Suomi24 discussion forum, ranging from 1 January 2001 to 31 December 2017. The Suomi24 data are an exceptionally comprehensive and big data set, with over a hundred million words. Overall, there are over 82m (82,858,608) messages posted in Suomi24 for the time-period (1 January 2001–31 December 2017) analysed in this study. The Suomi24 discussion forum had 832,000 unique visitors per week in 2015, each of whom in average spent 7 min on the forum in one session. The total amount of page loads per week in 2015 was over 17m (Aller, 2016). During the first half of 2018, Suomi24 still had a monthly reach of over 2m internet users, making it the eighth most popular Finnish internet site (FIAM, 2018). According to statistics, 24 per cent of the Finnish population aged 16–89 had posted something in a discussion forum in 2014. The percentage of participation was clearly higher for the younger demographic, as over 40 per cent in the age category 16–34, and 35 per cent in the age category 35–44, had posted messages in a discussion forum in 2014 (Tilastokeskus, 2017). The forum contains several different main topics such as pets, fashion and beauty, tourism, cars and vehicles and health. The health (fi. terveys) category with 3,788,224m messages represents 4.57 per cent of posts in the forum and is divided into sixteen sub-topics, ranging from plastic surgery and diseases, to medication and mental health. These sub-topics are then further divided into different amounts of sub-categories that relate to the sub-topic.

The Suomi24 data are available as a custom VRT format (“verticalized text”), which represents a parsed version of the text for each posted message on the forum. The current analysis focusses on the metadata of the posts, in particular the date and time of posting, and the topic or sub-topic of the message board. The text content of the messages is not included in the scope of this study. The data files are processed using Python with the Python Data Analysis Library “pandas” and plotting library “matplotlib”. Temporal patterns are studied by comparing the relative share of messages (when filtering by time according to various criteria) to the overall average, to emphasise when and by how much certain categories are over-represented or under-represented. This also corrects for the natural increase and decrease in the number of messages in accordance to the usage of the forum over time. To study the circannual distributions of messages, the data are binned by the day and month. A rolling triangular window of 15 days is then applied in order to reduce the effect of noise in the data, and extract more interpretable patterns.

Results

The results presented in this study are few and limited to a small number of topics. A more thorough review of the temporal patterns and variations of health information behaviour in the Suomi24 discussion forum will be presented in another upcoming paper. For this paper, we chose to present three interesting patterns from the health category, on three different time-scales including a circadian, circaseptan and circannual, or seasonal level. The examples in this study were chosen from the larger sub-topics within the health category, based on the interest and number of messages, as well as the recurring robust patterns presented in regard to temporal variations. Forum popularity based on the number of messages for the whole discussion forum as well as the health category between 2001 and 2017 can be seen in Figure 1. A clear peak in posted messages for the whole forum, as well as the health category can be seen for the year 2006, after which the number of messages has been decreasing.

On a circadian level, analysis of the messages in the Suomi24 discussion forum shows that all health-related discussions show a similar hourly distribution with a clear unimodal curve as all other messages in the discussion forum (Figure 2).

However, when examining the distribution of messages to the various sub-topics within the health category, clear temporal variations on a circadian level can be identified. Especially topics related to depression (masennus), loneliness (yksinäisyys), weight loss (laihdutus), alcohol (alkoholi), drugs (huumeet) and birth control (ehkäisy) show that there are clear and recurring patterns and variations (Figure 3). Of these topics, weight loss shows a different peak time, starting around six o’clock in the morning and continuing during the forenoon, while depression, loneliness, alcohol and drugs, which are more related to mental health and substance abuse issues, show a higher message rate during night time. Messages in the birth control topic again show a more evenly distributed curve, with a clear trough in the early morning.

On a circaseptan time-scale, the discussion forum as a whole shows a clear and recurring pattern for message distribution (Figure 4). The health category follows this pattern, with higher activity in the beginning of the week, and a clear trough on Saturday.

Again, when examining topics within the health category, temporal variations in health information behaviour become visible on a circaseptan time-scale as well. As Figure 5 shows, the patterns for alcohol (alkoholi) and birth control (ehkäisy) are clearly opposed, with birth control showing increased behaviour during the beginning of the week while alcohol discussions are less active. This pattern is then turning during the middle of the week, and during the weekend, interest in alcohol is clearly peaking, while activity for birth control is at its lowest.

On a seasonal or circannual time-scale, a comparison between the health category and all other messages shows that health-related discussions have a slightly higher number between January and September (Figure 6). More generally, all discussions follow a bimodal distribution, with peaks in the beginning of the year and in late summer and during autumn. The peaks during both early year and late summer and early autumn are slightly higher for health-related messages. Clear troughs are visible during summer and the end of the year, around Christmas.

There are also clear differences in circannual rhythmicity within the health category. Health information behaviour in relation to loneliness (yksinäisyys), weight loss (laihdutus) and alcohol (alkoholi) show clear and recurring patterns (Figure 7). These are more evident for loneliness and alcohol, where bimodal peaks are visible during the end of the year, around the Christmas holidays, and in summertime, in contrast to the overall patterns of the discussion forum. These topics also present clearer troughs in between the peaks. For weight loss, there is a peak in mid-January and the beginning of July.

Discussion

The findings presented in Figures 1–7 are the initial steps towards understanding recurring temporal rhythms in health information behaviour. The few presented examples of longitudinal temporal analysis of health-related discussions in the Suomi24 discussion forum clearly show that there are robust, recurring patterns for online health information behaviour on all different time-scales. This rhythmicity and these temporal patterns and variations are difficult to capture by surveys, interviews or questionnaires, and are therefore not easily comparable to previous findings on health information behaviour. Some of these temporal patterns, with increased behaviour at the beginning of the day, week, year or after summer vacation, could be seen as aspirational behaviour, and interpreted as a kind of “healthy new start” or “fresh start” (Dai et al., 2014; Gabarron et al., 2015). This is particularly evident on a circaseptan and circannual level for health-related discussions. This implies that these days (such as the beginning of the workweek or year) would be when people are most motivated to pursue their aspirations or most likely to think about their health (Gabarron et al., 2015). Especially for health information behaviour related to weight loss, there are clear implications for this effect, as also noted by Dai et al. (2014). With regard to topics related to mental health and substance abuse, such as depression, loneliness, drugs and alcohol, the Suomi24 data present peaks during night time, similar to previous findings, especially for depression related health information behaviour (Tana et al., 2018). On a circadian, circaseptan and circannual level, rhythmicity in health information behaviour for substance abuse, in this case alcohol, follows previously recognised patterns with peaks in the evening and night, as well as in the weekend and December (Mustonen et al., 2010). As for the examples above, on a circadian and circannual level, there seems to be a clear division between rhythmicity in physical and mental health interest. As the aim of this paper is to present the infodemiology approach and assess it critically, further analysis of results fall outside the scope of this aim. However, a more thorough and in-depth analysis of health-related discussions, as mentioned, will be presented in an upcoming article by the same authors.

Limitations and critical assessment of the approach

As with all other methods, the infodemiology approach also presents certain limitations that need to be discussed. It needs to be noted that the strengths and limitations of the infodemiology approach presented in this paper are not based on a systematic review, but are instead identified and evaluated based on the literature cited in this paper. The first and foremost limitation of using infodemiology metrics in general to study health information behaviour is that this method only monitors the segment of the population that uses the internet for health information behaviour. However, as statistics show, internet adoption is globally reaching high penetration, and especially in developed countries, the majority of the population is using the internet as a means for health information behaviour (Zeraatkar and Ahmadi, 2018). Nonetheless, the populations using internet are not representative for the whole population, and as noted earlier, not all Finns use the internet for health-related matters, so there are methodological biases present. This is especially evident for discussion forums, where statistics show that participation has been decreasing, and today, less than 8 per cent of the population contribute to discussion forums in Finland, compared to 24 per cent in 2014 (Tilastokeskus, 2017). This can lead to underrepresentation of some population segments in the online society, and not representing or reflecting their interests and needs (Mamiya et al., 2017; Guy et al., 2011). Another related limitation is the lack of demographic data. This is common for infodemiology metrics, and as there are no demographic data available, we have no way of knowing exactly who has produced the digital traces aggregated and analysed (Eysenbach, 2011). This prevents researchers from drawing conclusions on different demographic characteristics in relation to health information behaviour. This limitation is important to bear in mind, as socio-demographic variables such as gender and level of education have, as noted above, have an impact on health information behaviour (e.g. Ek and Niemelä, 2010; Ek et al., 2013; Marton and Wei Choo, 2012; Tilastokeskus, 2017, p. 36; Yan, 2010). For this specific study, where focus lies on temporal variations and patterns, these limitations are of a lesser concern.

Another key limitation arising from conducting quantitative analysis is the limitation of semantics (Eysenbach, 2011; Mavragani et al., 2018). Without analysing content, there is no way of knowing the underlying reasons for, in this case, posting something to the discussion forum. It is therefore unclear why an individual writes a message in the specific thread at a specific time, which makes the purpose for health information behaviour unclear. However, as earlier research on health-related discussions in Suomi24 has shown, it is reasonable to assume that the reason people seek and share health-related information in the Suomi24 forum is because of either an information need or to share similar personal experiences about the issue in question (Savolainen, 2011). Moreover, the purpose of this study is not to find out the whys, hows or whats of health information behaviour, but rather to bring focus on the question when. Therefore, qualitative analysis of message content is left for further research.

For this particular study, a further limitation is that it only analyses one discussion forum. There are of course many other highly active discussion forums in Finland, of which some are specifically related to different health concerns and issues. In addition, some social media sites have become increasingly important in information and knowledge sharing for specific health-related issues. However, the Suomi24 discussion forum is still the largest and most popular discussion forum, ranging back to times before social media giants like Facebook and Twitter, and the only one available as open data, and therefore applicable for the infodemiology approach. In the future, it would be valuable to compare different discussion forums for a more comprehensive picture of the temporal variations of health information behaviour.

Other issues rising from the novel use of web data and sources are the issues of ethics, anonymity and privacy (McKee, 2013). Although automatic tracking and aggregation of published, public, open data sources on the internet do not normally raise privacy issues or even the involvement of ethics board approvals, in some cases or digital venues users may expect that privacy issues be taken into account (Eysenbach, 2011). Anonymity on the internet can also be seen as a backbone for infodemiology research and metrics, as the internet provides users with protection against stigma and stereotyping when communicating (Mamiya et al., 2017). Some of these issues are more evident while conducting qualitative research, for instance when analysing user-created textual content in virtual communities or discussion groups related to health, and when personally identifiable information, such as usernames, geographic origin or internet protocol address remains in the data set (Eysenbach, 2011). This limitation, as relevant as it is, is less important for the current study, as neither messages nor identifying factors are taken into account, and the data analysed are public (McKee, 2013).

Strengths and implications

Despite the limitations in utilising the infodemiology approach with its methods and metrics to study health information behaviour, the implications and strengths that they bring outweigh many of the weaknesses. Using large data sets such as the Suomi24 data allows for more longitudinal studies on temporal variations of online health information behaviour, in contrast to “snapshots” provided by surveys or questionnaires. This allows identifying patterns of interaction, which, as Giddens (1979) state, cannot be identified or revealed unless examined over time. As an increasing amount of digital data is gathered in relation to health and health information behaviour, a clearer picture of how people interact and behave in relation to their health, is emerging. Studying and analysing this data can yield in a more holistic view of health and health behaviours, and even reveal completely new trends, patterns and correlations. Identifying clear and robust patterns and rhythms also facilitates the prediction of the future, making precautionary action possible within health promotion and prevention (Adam, 1990; Ayers et al., 2014). Discovering and understanding these health-related temporal rhythms for the Finnish population therefore has strong potential for behavioural gains as well as improving public health. Health promotion campaigns could immediately be made more cost effective by targeting the population at the right time. Considering the temporal aspects and patterns of health information behaviour on different time-structures, at times when more individuals are contemplating their health and engaging in health-related behaviour could have substantial impact on interventions (Ayers et al., 2014).

Utilising infodemiology metrics also has another methodological strength, as it avoids the interviewer bias, whereby what is said is influenced by the researcher (McKee, 2013). Data from the Suomi24 discussion forum involve no direct contact between the subjects and the researcher. Time stamps for messages are generated automatically, making data collection and analysis rapid, even real time, with the right tools.

Further research

As previously noted, an in-depth analysis of the temporal variations of health-related discussions will be presented in an upcoming article. Apart from this, it would, as noted earlier, be fruitful to conduct content analysis on the messages, in order to see if and how message content differs during different time-scales. It would also be useful to categorise the messages into, for instance, information needs and information sharing, to identify rhythmicity and temporal variations of these specific categories. For this kind of analysis, novel research methods have started to emerge, like the use of natural language processing to discover themes or topics in text within large data sets (Dredze and Paul, 2014) or sentiment analysis to detect emotion or expressed peer-support within discussion forums (Biyani et al., 2014).

Conclusions

By utilising the infodemiology approach and metrics, it is possible to study time and temporal variations and patterns in health information behaviour. Adding the aspect of time to research on health information behaviour can potentially complement epidemiological data, as well as provide insights for health-related behaviours that fall outside the scope of traditional epidemiologic data collection methods to get a more holistic picture of health and diseases, including data on preclinical events and their behavioural aspects (Eysenbach, 2009). Surveillance is a backbone of public health activities that provides insights into population health status to guide the development, management and evaluation of interventions (Gostin, 2006). Time and time of events are a crucial aspect of this surveillance, as well as, more broadly, in epidemiological research (Rothman and Greenland, 2014). A broader understanding of the different temporal patterns and trends for health information behaviour, based on behavioural web data, can therefore aid and complement in creating optimal interventions and health promotion campaigns at times when behaviours are active. This in turn can aid in early intervention, which can be beneficial for positive health outcomes (Lambert and Loiselle, 2007). The field of information science, and health information behaviour, has a key role in infodemiology, by bringing evidence for health-related issues that traditional research has not been able to capture. Focussing on time and studying when health information behaviour happens can thus have far-reaching consequences.

Figures

Distribution of the number of messages in the health (terveys) category compared to all other messages in the discussion forum between 1 January 2001 and 31 December 2017

Figure 1

Distribution of the number of messages in the health (terveys) category compared to all other messages in the discussion forum between 1 January 2001 and 31 December 2017

Comparison of hourly distribution of messages in the health (terveys) category compared to all other messages in the discussion forum

Figure 2

Comparison of hourly distribution of messages in the health (terveys) category compared to all other messages in the discussion forum

Comparisons between message volumes for depression (masennus), loneliness (yksinäisyys), weight loss (laihdutus), alcohol (alkoholi), drugs (huumeet) and birth control (ehkäisy) on a circadian time-scale

Figure 3

Comparisons between message volumes for depression (masennus), loneliness (yksinäisyys), weight loss (laihdutus), alcohol (alkoholi), drugs (huumeet) and birth control (ehkäisy) on a circadian time-scale

Message distribution in the discussion forum and the health (terveys) category on a circaseptan level

Figure 4

Message distribution in the discussion forum and the health (terveys) category on a circaseptan level

Comparisons between message volumes for alcohol (alkoholi) and birth control (ehkäisy) on a circaseptan time-scale

Figure 5

Comparisons between message volumes for alcohol (alkoholi) and birth control (ehkäisy) on a circaseptan time-scale

Circannual comparison between all messages and messages in the health (terveys) category in the discussion forum

Figure 6

Circannual comparison between all messages and messages in the health (terveys) category in the discussion forum

Comparisons between message volumes for loneliness (yksinäisyys), weight loss (laihdutus) and alcohol (alkoholi) on a circannual time-scale

Figure 7

Comparisons between message volumes for loneliness (yksinäisyys), weight loss (laihdutus) and alcohol (alkoholi) on a circannual time-scale

References

Abedi, V., Mbaye, M., Tsivgoulis, G., Male, S., Goyal, N., Alexandrov, A.V. and Zand, R. (2015), “Internet-based information-seeking behavior for transient ischemic attack”, International Journal of Stroke, Vol. 10 No. 8, pp. 1212-1216.

Adam, B. (1990), Time and Social Theory, Polity Press, Cambridge.

Aller (2016), “Suomi24 media card”, available at: www.aller.fi/wp-content/uploads/2016/01/Suomi24-Tuotekortti-1_2016.pdf (accessed 4 January 2019).

Anker, A.E., Reinhart, M.R. and Feeley, T.H. (2011), “Health information seeking: a review of measures and methods”, Patient Education and Counselling, Vol. 82 No. 3, pp. 346-354.

Arendt, F. and Scherr, S. (2017), “Optimizing online suicide prevention: a search engine-based tailored approach”, Health Communication, Vol. 32 No. 11, pp. 1403-1408.

Ayers, J.W., Althouse, B.M., Allem, J.P., Rosenquist, J.N. and Ford, D.E. (2013), “Seasonality in seeking mental health information on Google”, American Journal of Preventive Medicine, Vol. 44 No. 5, pp. 520-525.

Ayers, J.W., Althouse, B.M., Johnson, M., Dredze, M. and Cohen, J.E. (2014), “What’s the healthiest day? Circaseptan (weekly) rhythms in healthy considerations”, American Journal of Preventive Medicine, Vol. 47 No. 1, pp. 73-76.

Basnet, S., Merikanto, I., Lahti, T., Männistö, S., Laatikainen, T., Vartiainen, E. and Partonen, T. (2016), “Seasonal variations in mood and behavior associate with common chronic diseases and symptoms in a population-based study”, Psychiatry Research, Vol. 238, pp. 181-188.

Basnet, S., Merikanto, I., Lahti, T., Männistö, S., Laatikainen, T., Vartiainen, E. and Partonen, T. (2018), “Seasonality, morningness-eveningness, and sleep in common non-communicable medical conditions and chronic diseases in a population”, Sleep Science, Vol. 11 No. 2, pp. 85-91.

Bender, J.B. and Wellbery, D.E. (1991), Chronotypes: The Construction of Time, Stanford University Press, Stanford.

Bidmon, S. and Terlutter, R. (2015), “Gender differences in searching for health information on the internet and the virtual patient-physician relationship in Germany: exploratory results on how men and women differ and why”, Journal of Medical Internet Research, Vol. 17 No. 6.

Biyani, P., Caragea, C., Mitra, P. and Yen, J. (2014), “Identifying emotional and informational support in online health communities”, Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 827-836.

Bragazzi, N.L., Alicino, C., Trucchi, C., Paganino, C., Barberis, I., Martini, M., Sticchi, L., Trinka, E., Brigo, F., Ansaldi, F. and Icardi, G. (2017), “Global reaction to the recent outbreaks of Zika virus: insights from a big data analysis”, PloS One, Vol. 12 No. 9.

Brigo, F. and Trinka, E. (2015), “Google search behavior for status epilepticus”, Epilepsy & Behavior, Vol. 49, pp. 146-149.

Carneiro, H.A. and Mylonakis, E. (2009), “Google trends: a web-based tool for real-time surveillance of disease outbreaks”, Clinical Infectious Diseases, Vol. 49 No. 10, pp. 1557-1564.

Chen, X., Sykora, M.D., Jackson, T.W. and Elayan, S. (2018), “What about mood swings: identifying depression on Twitter with temporal measures of emotions”, Companion of the the Web Conference 2018 on the Web Conference 2018, pp. 1653-1660.

Chew, C. and Eysenbach, G. (2010), “Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak”, PloS One, Vol. 5 No. 11.

Dai, H., Milkman, K.L. and Riis, J. (2014), “The fresh start effect: temporal landmarks motivate aspirational behavior”, Management Science, Vol. 60 No. 10, pp. 2563-2582.

Davies, E. and McKenzie, P.J. (2002), “Time is of the essence: social theory of time and its implications for LIS research”, FIMS Presentations Paper No. 26, Toronto, available at: https://ir.lib.uwo.ca/fimspres/26/ (accessed 25 March 2019).

Dredze, M. and Paul, M.J. (2014), “Natural language processing for health and social media”, IEEE Intelligent Systems, Vol. 29 No. 2, pp. 64-67.

Dutton, W., Blank, G. and Groselj, D. (2013), Cultures of the Internet: The Internet in Britain, Oxford Internet Survey 2013, Oxford Internet Institute, University of Oxford, Oxford, available at: http://oxis.oii.ox.ac.uk/wp-content/uploads/2014/11/OxIS-2013.pdf (accessed 4 January 2019).

Ek, S. (2013), “Gender differences in health information behaviour: a Finnish population-based survey”, Health Promotion International, Vol. 30 No. 3, pp. 736-745.

Ek, S. and Niemelä, R. (2010), “Onko internetistä tullut suomalaisten tärkein terveystiedonlähde? Deskriptiivistä tutkimustietoa vuosilta 2001 ja 2009”, Informaatiotutkimus, Vol. 29 No. 4, available at: https://journal.fi/inf/article/view/3856 (accessed 25 March 2019).

Ek, S., Eriksson-Backa, K. and Niemelä, R. (2013), “Use of and trust in health information on the internet: a nationwide eight-year follow-up survey”, Informatics for Health and Social Care, Vol. 38 No. 3, pp. 236-245.

Eysenbach, G. (2006), “Infodemiology: tracking flu-related searches on the web for syndromic surveillance”, AMIA 2006 Symposium Proceedings, pp. 244-248.

Eysenbach, G. (2009), “Infodemiology and infoveillance: framework for an emerging set of public health informatics methods to analyze search, communication and publication behavior on the Internet”, Journal of Medical Internet Research, Vol. 11 No. 1.

Eysenbach, G. (2011), “Infodemiology and infoveillance: tracking online health information and cyberbehavior for public health”, American Journal of Preventive Medicine, Vol. 40 No. 5, pp. S154-S158.

FIAM (2018), “Tulokset”, Finnish Internet Audience Measurement, available at: http://fiam.fi/tulokset/ (accessed 21 December 2018).

Flynn, K.E., Smith, M.A. and Freese, J. (2006), “When do older adults turn to the internet for health information? Findings from the Wisconsin longitudinal study”, Journal of General Internal Medicine, Vol. 21 No. 12, pp. 1295-1301.

Gabarron, E., Lau, A.Y. and Wynn, R. (2015), “Is there a weekly pattern for health searches on Wikipedia and is the pattern unique to health topics?”, Journal of Medical Internet Research, Vol. 17 No. 12.

Giddens, A. (1979), Central Problems in Social Theory – Action, Structure and Contradiction in Social Analysis, Macmillan, London.

Gostin, L. (2006), “Public health strategies for pandemic influenza: ethics and the law”, JAMA, Vol. 295 No. 14, pp. 1700-1704.

Guy, S., Ratzki-Leewing, A., Bahati, R. and Gwadry-Sridhar, F. (2011), “Social media: a systematic review to understand the evidence and application in infodemiology”, International Conference on Electronic Healthcare, Springer, Berlin and Heidelberg, pp. 1-8.

Hallyburton, A. and Evarts, L. (2014), “Gender and online health information seeking: a five survey meta-analysis”, Journal of Consumer Health on the Internet, Vol. 18 No. 2, pp. 128-142.

Jadhav, A., Andrews, D., Fiksdal, A., Kumbamu, A., McCormick, J.B., Misitano, A., Nelsen, L., Ryu, E., Sheth, A., Wu, S. and Pathak, J. (2014), “Comparative analysis of online health queries originating from personal computers and smart devices on a consumer health information portal”, Journal of Medical Internet Research, Vol. 16 No. 7.

Kim, S.U. and Syn, S.Y. (2014), “Research trends in teens’ health information behaviour: a review of the literature”, Health Information & Libraries Journal, Vol. 31 No. 1, pp. 4-19.

Kraut, R.Y., Snedeker, K.G., Babenko, O. and Honish, L. (2017), “Influence of school year on seasonality of norovirus outbreaks in developed countries”, Canadian Journal of Infectious Diseases and Medical Microbiology, Vol. 2017.

Lambert, S.D. and Loiselle, C.G. (2007), “Health information-seeking behavior”, Qualitative Health Research, Vol. 17 No. 8, pp. 1006-1019.

Lee, K., Hoti, K., Hughes, J.D. and Emmerton, L. (2014), “Dr Google and the consumer: a qualitative study exploring the navigational needs and online health information-seeking behaviors of consumers with chronic health conditions”, Journal of Medical Internet Research, Vol. 16 No. 12.

Luckmann, T. (1991), “The constitution of human life in time”, in Bender, J.B. and Wellbery, D.E. (Eds), Chronotypes: The Construction of Time, Stanford University Press, Stanford, pp. 151-167.

McKee, R. (2013), “Ethical issues in using social media for health and health care research”, Health Policy, Vol. 110 Nos 2/3, pp. 298-301.

Madden, K.M. (2017), “The seasonal periodicity of healthy contemplations about exercise and weight loss: ecological correlational study”, JMIR Public Health and Surveillance, Vol. 3 No. 4.

Mamiya, H., Shaban-Nejad, A. and Buckeridge, D.L. (2017), “Online public health intelligence: ethical considerations at the big data era”, in Shaban-Nejad, A., Brownstein, J. and Buckeridge, D.L. (Eds), Public Health Intelligence and the Internet, Springer, Cham, pp. 129-148.

Marton, C. and Wei Choo, C. (2012), “A review of theoretical models of health information seeking on the web”, Journal of Documentation, Vol. 68 No. 3, pp. 330-352.

Matsuda, S., Aoki, K., Tomizawa, S., Sone, M., Tanaka, R., Kuriki, H. and Takahashi, Y. (2017), “Analysis of patient narratives in disease blogs on the internet: an exploratory study of social pharmacovigilance”, JMIR Public Health and Surveillance, Vol. 3 No. 1.

Mavragani, A., Ochoa, G. and Tsagarakis, K.P. (2018), “Assessing the methods, tools and statistical approaches in Google trends research: systematic review”, Journal of Medical Internet Research, Vol. 20 No. 11.

Mogilner, C., Hershfield, H.E. and Aaker, J. (2018), “Rethinking time: implications for well-being”, Consumer Psychology Review, Vol. 1 No. 1, pp. 41-53.

Mustonen, H., Metso, L. and Mäkelä, P. (2010), “Milloin suomalaiset juovat”, in Mäkelä, P., Mustonen, H. and Tigerstedt, C. (Eds), Suomi juo - Suomalaisten alkoholinkäyttö ja sen muutokset 1968-2008, Terveyden ja hyvinvoinnin laitos, Helsinki, pp. 55-69.

Ormandy, P. (2011), “Defining information need in health-assimilating complex theories derived from information science”, Health Expectations, Vol. 14 No. 1, pp. 92-104.

Ortiz-Martínez, Y. and Jiménez-Arcia, L.F. (2017), “Yellow fever outbreaks and Twitter: rumors and misinformation”, American Journal of Infection Control, Vol. 45 No. 7, pp. 816-817.

Osuka, H., Hall, A.J., Wikswo, M.E., Baker, J.M. and Lopman, B.A. (2018), “Temporal relationship between healthcare-associated and nonhealthcare-associated norovirus outbreaks and Google trends data in the United States”, Infection Control & Hospital Epidemiology, Vol. 39 No. 33, pp. 355-358.

Paul, M.J. and Dredze, M. (2011), “You are what you Tweet: analyzing Twitter for public health”, Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, pp. 265-272.

Pendry, L.F. and Salvatore, J. (2015), “Individual and social benefits of online discussion forums”, Computers in Human Behavior, Vol. 50, No. C, pp. 211-220.

Pesälä, S., Virtanen, M.J., Sane, J., Mustonen, P., Kaila, M. and Helve, O. (2017), “Health information-seeking patterns of the general public and indications for disease surveillance: register-based study using Lyme disease”, JMIR Public Health and Surveillance, Vol. 3 No. 4.

Ramsey, I., Corsini, N., Peters, M.D. and Eckert, M. (2017), “A rapid review of consumer health information needs and preferences”, Patient Education and Counseling, Vol. 100 No. 9, pp. 1634-1642.

Reinberg, A.E., Dejardin, L., Smolensky, M.H. and Touitou, Y. (2017), “Seven-day human biological rhythms: an expedition in search of their origin, synchronization, functional advantage, adaptive value and clinical relevance”, Chronobiology International, Vol. 34 No. 2, pp. 162-191.

Renahy, E., Parizot, I. and Chauvin, P. (2010), “Determinants of the frequency of online health information seeking: results of a web-based survey conducted in France in 2007”, Informatics for Health and Social Care, Vol. 35 No. 1, pp. 25-39.

Roenneberg, T. (2012), Internal Time: Chronotypes, Social Jet Lag, and Why You’re So Tired, Harvard University Press, Cambridge.

Rothman, K.J. and Greenland, S. (2014), “Basic concepts”, in Ahrens, W. and Pigeot, I. (Eds), Handbook of Epidemiology, Springer, New York, NY, pp. 70-122.

Savolainen, R. (1993), “The sense-making theory: reviewing the interests of a user-centered approach to information seeking and use”, Information Processing & Management, Vol. 29 No. 1, pp. 13-28.

Savolainen, R. (2006), “Time as a context of information seeking”, Library & Information Science Research, Vol. 28 No. 1, pp. 110-127.

Savolainen, R. (2011), “Requesting and providing information in blogs and internet discussion forums”, Journal of Documentation, Vol. 67 No. 5, pp. 863-886.

Savolainen, R. (2018), “Information-seeking processes as temporal developments: comparison of stage-based and cyclic approaches”, Journal of the Association for Information Science and Technology, Vol. 69 No. 6, pp. 787-797.

Sbaffi, L. and Rowley, J. (2017), “Trust and credibility in web-based health information: a review and agenda for future research”, Journal of Medical Internet Research, Vol. 19 No. 6.

Seifter, A., Schwarzwalder, A., Geis, K. and Aucott, J. (2010), “The utility of ‘Google trends’ for epidemiological research: Lyme disease as an example”, Geospatial Health, Vol. 4 No. 2, pp. 135-137.

Seo, D.W. and Shin, S.Y. (2017), “Methods using social media and search queries to predict infectious disease outbreaks”, Healthcare Informatics Research, Vol. 23 No. 4, pp. 343-348.

Tana, J. (2018), “An infodemiological study using search engine query data to explore the temporal variations of depression in Finland”, Finnish Journal of eHealth and eWelfare, Vol. 10 No. 1, pp. 133-142.

Tana, J.C., Kettunen, J., Eirola, E. and Paakkonen, H. (2018), “Diurnal variations of depression-related health information seeking: case study in Finland using Google Trends data”, JMIR Mental Health, Vol. 5 No. 2.

Tilastokeskus (2017), “Väestön tieto- ja viestintätekniikan käyttö 2017”, available at: www.stat.fi/til/sutivi/2017/13/sutivi_2017_13_2017–11-22_fi.pdf (accessed 19 November 2018).

Tkachenko, N., Chotvijit, S., Gupta, N., Bradley, E., Gilks, C., Guo, W., Crosby, H., Shore, E., Thiarai, M., Procter, R. and Jarvis, S. (2017), “Google trends can improve surveillance of type 2 diabetes”, Scientific Reports, Vol. 7 No. 1.

Torrent-Sellens, J., Díaz-Chao, Á., Soler-Ramos, I. and Saigí-Rubió, F. (2016), “Modelling and predicting eHealth usage in Europe: a multidimensional approach from an online survey of 13,000 European union internet users”, Journal of Medical Internet Research, Vol. 18 No. 7.

Wong, P.W.C., Fu, K.W., Yau, R.S.P., Ma, H.H.M., Law, Y.W., Chang, S.S. and Yip, P.S.F. (2013), “Accessing suicide-related information on the internet: a retrospective observational study of search behavior”, Journal of Medical Internet Research, Vol. 15 No. 1.

Yan, Y.Y. (2010), “Online health information seeking behavior in Hong Kong: an exploratory study”, Journal of Medical Systems, Vol. 34 No. 2, pp. 147-153.

Ybarra, M. and Suman, M. (2006), “Reasons, assessments and actions taken: sex and age differences in uses of internet health information”, Health Education Research, Vol. 23 No. 3, pp. 512-521.

Zeraatkar, K. and Ahmadi, M. (2018), “Trends of infodemiology studies: a scoping review”, Health Information & Libraries Journal, Vol. 35 No. 2, pp. 91-120.

Zhao, Y. and Zhang, J. (2017), “Consumer health information seeking in social media: a literature review”, Health Information & Libraries Journal, Vol. 34 No. 4, pp. 268-283.

Corresponding author

Jonas Tana can be contacted at: jonas.tana@arcada.fi