The potential of Google Analytics for tracking the reading behavior in web books

Purpose – The purpose of this paper is to introduce Google Analytics as a format suitable for advanced tracking of reading behavior within web books, set the metrics for measuring the reading behavior of web books and describe the ﬁ rst results of a pilot study. This paper offers suggestions for further deployment of webbooks andwebanalyticsin digital librariesandevaluating webbooks ’ performance. Design/methodology/approach – To understand the reading behavior of web book users, researchers usequantitative researchmethods based oncustom andadvanced metricsat GoogleAnalytics. Findings – Google Analytics is a valuable tool for tracking access to individual books and tracking entire web book collections, mainly if researchers use the combination of unique custom and advanced metrics. A pilot study with 190 users uncovered signi ﬁ cant results on reading behavior, for example, the strong preference for scrolling over navigation buttons. Research limitations/implications – This pilot study is limited to measuring two web books and 190 users. This study demonstrated a workable setup of metrics for measuring reading behavior; it would be helpful to continue measurement witha larger sample of books andusers. Originality/value – Researchers in library and information science currently use web analytics mainly to understand user behavior on the website and in the catalog. This paper presents the possibilities of deploying GoogleAnalytics directlyin webbooks to understandreading behavior.


Introduction
Digital literary reading (Mangen et al., 2019) has become almost an everyday research practice.Reading behavior as an embodied human-technology interaction (Mangen and Weel, 2016) has become the subject of much research that focuses on different aspects of digital books and digital reading.The body of research is both quantitative and qualitative and follows a wide range of research questions and variables.However, reading and reading behavior are often studied indirectlythrough observation, cognitive walkthrough, instrumentation (e.g.eye trackers) or retrospective questioning.Direct data collection methods that collect real-time data on actual reader behavior are still relatively uncommon in scientific research.
Major e-reader manufacturers and ebook publishers have been tracking users of reading applications for the past 15 years.Unfortunately, those who continually gather readers' data and possess the most significant data collections have been silent about their methodologies and methods, which is not surprising given those organizations are commercial subjects.Hidalgo and Malag on (2014) described the concept of "book-as-a-service" and stressed its potential for text mining and measuring how users of web books behave.Since then, very little information on actual use cases has been publicly available, and a deep understanding of ebooks readers' behavior is still missing.Public libraries could benefit from such knowledge, as they increasingly act as intermediaries or publishers of literature.Web books offer libraries expanded opportunities to experiment with the book design and format (Wilson et al., 2003;Landoni et al., 2001) and better understand reading behavior.
These days, measuring reading behavior in web books has received minimal attention.This paper examines the implementation of the Google Analytics tracking tool in web books: the categories of measured data, actions, values and labels and the description of the measurement process relevant to libraries.Google Analytics is a standard tool used for website usage tracking used daily in libraries (Nelson, 2016).Using Google Analytics is relatively easy and does not mean additional financial costs for the library, which is why this solution is a popular option for online library analytics (Barba et al., 2013).However, librarians use Google Analytics mainly to evaluate library websites (Fang, 2007;Khoo et al., 2008;Arendt and Wagner, 2010;Vecchione et al., 2016) or catalogs (Fu et al., 2021).
Several initiatives aim to track ebook readers' tracking, which publishes their methods and results and provides at least an essential insight into this field (Rowberry, 2019;Lynch, 2017;Fagan, 2014).Measuring and evaluating ebook use in libraries has been usually based on simple metricsthe usage statistics obtained from a provider's platform (Sprague and Hunter, 2008), the number of times a web page with a book is viewed or the number of downloads of an ebook (Yuan and Jurczyk, 2019).This data can potentially be helpful for designers of reading apps, electronic bookshops, publishers, authors and libraries.
The objectives of this study are: To introduce the next-book format and Google Analytics as a format suitable for advanced tracking of reading behavior.
To set the metrics for measuring the reading behavior of web books.
To understand the first results of a pilot study.
To offer suggestions for further deployment of Google Analytics as a measurement tool for online reading analytics in libraries.
The context: the deployment of web books in the municipal library of Prague In 2020, the Municipal Library of Prague, which maintains an extensive collection of electronic books (Prokop and Stejskal, 2021), has begun experimenting with a new technology of web books.Web books are accessible via the World Wide Web and exist in several forms (Wilson, 2000).They are borrowable or free or may require a payment to own them.There are many examples of web book initiatives, especially for books in the free domain (e.g.Project Gutenberg, Bartleby.com,Internet Public Library).

Google Analytics for tracking
Municipal Library of Prague started to use free web books as the alternative to PDF, EPUB, PRC and other traditional formats.Next-book is an open platform for publishing and reading on the web.The basic design principle of the next-book format is an optimization for focused reading of books, which allows full use of the digital platform both on an individual and social level.These functions influence both the interface design and the tools for publishers.The application is based on accepted web standards, uses current standards developed by the W3C (WPUB manifest) and uses advanced web tools (React, Hugo, Node.js).From a reader's perspective, the next-book format is designed for use on any device that allows web browsing (see Figure 1), the interface is adapted to reading (scrolling by page, automatic switching between chapters) and supports the text navigation (chapter and verse chunking for easy finding of place in the book replaces, page numbers, display the current position in the text and chapter).It also allows creative work with books such as writing footnotes, annotations (highlighting notes), bookmarks and font size adjustment.After first opened, the entire book becomes available without an internet connection as it is automatically stored in a user's device.For this article, electronic books published on this platform are referred to as web books.
The deployment of web books in the electronic library of the Municipal Library of Prague allowed the research team to measure more different types of action and behavior that are usual for electronic library collections.

Methodology
The adoption of the new web book format allows the library to collect new data on ebooks and the reading behavior of their users.The team decided to implement a tracking tool to probe technical possibilities and gather data that can be used in further development and optimization of the next-book platform and in measuring web books used in the Municipal Library of Prague.We have identified two primary use cases for the data on web books provided by the library: (1) tracking users' behavior to gather data on the next-book platform as such, that is, to obtain valuable information for further development and optimizing of nextbook interface and functions regardless of a concrete book; and (2) tracking users to gather data on individual titles to learn about the differences between them, that is, to obtain information on concrete books, their use and actual reader's attitude to them.

Web analytics
Web analytics is a quantitative method that allows researchers to collect behavioral data using an application that logs user behavior on the website and other associated measures (Jansen, 2009).In our case, the website equals the web book.

Metrics for readers' tracking with Google Analytics
Google Analytics works by including a block of JavaScript code on pages in a website.It provides several predefined dimensions and metrics and allows tracking of additional custom-defined events for a specific website.As next-book itself is an HTML/JavaScriptbased platform, we chose Google Analytics for its ease of implementation and good results it provides in web analytics (Cutroni, 2010;Clifton, 2012).Methods of working with Google Analytics and correct interpretation of data gathered from websites are widely known; extensive sources are available in this regard.The context in this particular case is different.Although technically a website (or a series of consecutive HTML files), the next-book format differs from a regular website in anticipated and observed user behavior.Readers know that they are going to open an ebook andgiven they find the book interestingthey are going to interact with it in a way that differs from a typical interaction with a websiteebooks often allow the digital equivalents of the act of traditional reading (Spence, 2020).At least a certain number of users read the ebook regularly and read it from start to finish, that is, visit all or nearly all pages of the website one after the other.However, this might differ depending on the genre.Google Analytics is available in several versions; in this case, we use Universal Analytics.Each next-book title with Google Analytics code implemented was published and made publicly available on GitHub.Specific web books were tracked as different Views in Google Analytics within a single Google Analytics Property, that is, each View equals a web book unit.On the first page of a web book (an envelope), readers are presented with information about the ongoing tracking.Only those who provided consent were tracked.Data on those who refused were not collected.

Setting up the metrics
As long as the library only offered ebooks in PDF, EPUB and similar formats, it could only track basic metrics such as the number of downloads of a book or the number of times opening a PDF file in a browser.With the introduction of the web book format, the library gained the ability to track other metrics and get answers to new questions, such as how many users opened a web book?How long did the reader spend with the ebook?How many pages did they read?How many users left the book without starting to read it?Do they move through the text using the keyboard, finger movements on the screen, or do they prefer buttons?Do they create annotations and notes in the ebook?What color settings do they desire in the book when reading? Do they flip through the book?How do they move back and forth in the book?

Analytics for tracking
To obtain data on the interaction with the next-book format and its features, new custom events were set up and tracked in Google Analytics.Of the dimensions and metrics that Google Analytics tracks by default, the ones listed in Table 1 are useful in developing and optimizing the web books interface.
In connection with the library's strategic goals in access to electronic books, we also defined a structured list of custom events for readers' tracking with Google Analytics consisting of a Category (interaction, UI, Offline), an Action taken by the user and/or a Value/Label for each event.We used these metrics to capture and measure reading behavior and the use of web book features (scrolling, using the keyboard for the move between pages, swiping, using buttons in the next-book interface for turning a page in a book, creation of annotation, returning to the last opened position or reading from a new position, reading in offline mode) and modifications to the user interface (changing the color scheme, font size, opening a menu).Table 2 offers a complete list of custom events with a detailed description.

Adoption and validation of the Google Analytics-based reading metrics for web books
The pilot testing of Google Analytics-based reading metrics for web books took place from July 26, to November 30, 2021, and the data was measured on two books: Karel Capek: The Absolute at Large (Web book 1) and Karel Capek: The White Disease (Web book 2).IP addresses of nextbook team members were excluded from tracking so that Google Analytics only gathered data on external readers, not on people involved in the project.
It is important to note that this stage of testing aimed to probe possibilities that Google Analytics offers and answer the question of whether the tool is suitable for measuring web books.The overall activity aimed to find dimensions, metrics and events tracked in Google Analytics to provide a basis for future exploration once a sufficient number of ebooks in nextbook format becomes available.However, in the Results section of this paper, real dataalbeit smallare presented as a proof of suitability of Google Analytics for tracking the reading behavior.

Privacy and security issues of measuring
The legalities and ethics of using Google Analytics for library services have been the subject of many studies (O'Brien et al., 2018;Hwang and Hanson, 2021).Google Analytics is a cookie-based analytical tool, that is, it stores small files called cookies on a user's device.In some countries, including those of the European Union, a website owner is obliged to inform users about cookies and let them decide if they consent to tracking.Google has been criticized for collecting data about internet users and aggressive tracking; however, Google Analytics does not store any personally identifiable information, for example, email, name, company and information of similar nature.It is possible to enable the User-ID feature that associates engagement data from different devices and multiple sessions and tracks a concrete user for a longer time period.Because we did not aim to work with this type of data, we left the User-ID feature disabled.Ebook users were informed about data collection; only those who consented were tracked.Users could also learn details on tracking on the Privacy Policy page.Thus, we ensured that tracking was legal and in accordance with Google policy.

Results
Results presented here serves as a proof that the data gathering method featured in this paper is a functional way of collecting readers' data and that it can lead to an informed decisionmaking.One hundred ninety users visited the two books during the testing period and consented to measurement.Less than 12% of users saw only one page and left without any interaction; others proceeded into the book.The average number of pages viewed during a session was 2.55 and 4.01, respectively, the average length of a session was 1 h 7 min and 10 min.As is apparent, there were substantial differences in metrics between the two books.Several users, sessions and pageviews can help understand the popularity of a given title; here, Web book 1 drew more attention (1 h 7 min) than Web book 2 (10 min).Also, the former's lower bounce rate is in line with these findings.The average session duration in Web book 1 is Google Analytics for tracking significantly longer than in Web book 2, indicating that some (unknown) readers of the former put the book aside only after long periods of reading (see Table 3).Both titles were read on computers and mobiles and only exceptionally on tablets (see Table 4).Most visitors used computers to access the books (55.79%), 40.53% used mobile phones and only 3.68% used tablets.If data gathered from more books prove this to be the case, it can lead the next-book designers to focus more on the computer and mobile users.The most common browser was Chrome (47.37%), followed by Android WebView (17.89%) and Safari (13.68%).

Interactions and user interface
Table 5 can provide helpful insights on readers' adoption of the next-book platform and readers' behavior in both books.The higher popularity of one book may have been because of the order in which it was offered to readers.Regardless of popularity, however, the books show similar results when observing reader behavior.Scrolling was the most common way of moving through the book (91.67% overall), followed by swiping (33.33%).Users use movement through the keyboard minimally (5.56%).Annotation was created by 13.33% of readers, whereas only one user (0.56%) created a note.Of the user interface features, the most used were the font resizing and menu opening functions.During the testing period, nobody used navigational buttons to navigate the books.If this observation is confirmed with larger amounts of data and repeatedly, it would imply that these interaction terms are redundant and could be removed.

Discussion
A pilot deployment of the method confirmed that Google Analytics-based reading metrics for web books are suitable for measuring web book users' reading behavior.It can answer research questions related to reading itself (e.g.how many readers finish a book, how many readers read the first chapter, how many readers close the web book on the first page or how long it takes on average to read a chapter).With the usage of custom metrics, researchers can answer many questions related to user interface behavior (font size adjustment, screen color, use of annotation features).In the future, libraries will be able to make data-driven decisions about, for example, what web books to include in their digital collections, how to provide them and what added functionality (such as annotation or social reading features) readers need.They can thus enhance the social aspects of digital reading (Pianzola, 2021).Until now, the library lacked this data because the books available in PDF, EPUB or PRC formats do not allow gathering the data about user behavior in real time.
Based on our experience with measuring reading behavior in web books, we formulated the following recommendations: Custom metrics, such as number of users, number of sessions, number of pageviews, bounce rate, pages/session, average session duration, device category and type of browser, should be the basis for tracking basic patterns in user behavior when acquiring and reading ebooks.Advanced web book formats such as next-book also allow tracking other features and more significant variability in reading behavior (scrolling, using the keyboard for the move between pages, swiping, using buttons in the next-book interface for turning a page in a book, creation of annotation, returning to the last opened position or reading from a new position, reading in offline mode) or about modifications to the user interface (changing the color scheme, font size, opening a menu).Google Analytics can be used to track access to individual books and track entire ebook collections.However, the privacy of library users should remain assured (e.g. through disabled User-ID feature).
Based on the data from Google Analytics, librarians can make decisions regarding the design and development of web books.

Analytics for tracking
To explain and deeply understand analytics results, we recommend combining Google Analytics and other methods, predominantly qualitative and mixed methods.In case of questionable results, for example, user testing, A/B testing or eye-tracking could help explain the causes of unusual behavior in the web book.
Results from user testing will be presented in follow-up papers.
Currently, in cooperation with the Municipal Library in Prague, the authors of the method are planning the quick deployment of the new format of web books in the catalog, so we expect a more significant amount of data and the possibility of subsequent publication of the results of measuring reading behavior.
Figure 1.View next-book format on a small mobile device