Emerald Group Publishing Limited
Copyright © 2008, Emerald Group Publishing Limited
Article Type: Professional literature From: Library Hi Tech News, Volume 25, Issue 5.
Viewing and reading behaviour in a virtual environment: the full-text download and what can be read into it
David Nicholas, Paul Huntington, Hamid R. Jamali, Ian Rowlands, Tom Dobrowolski, and Carol Tenopir, in Aslib Proceedings, v. 60 (2008) issue 3, pp. 185-98
Having devoted considerable proportions of their budgets to electronic journal subscriptions, librarians have been keen to evaluate an confirm the effectiveness of their decisions. A favoured and obvious way of going about this is to analyse the transactional data provided by their digital library systems, the number of full-text downloads being taken as evidence not only of use but also of user satisfaction. But is this really the case? What does a statistic showing the download of the full text of a given article actually show? Crucially, does it demonstrate the occurrence of “reading” as defined in, and agreed since, the 1970s' studies by Tenopir et al., namely “going beyond the table of contents, title and abstract to the body of the article”
To begin with, some recent studies recognise that downloads, unlike citations, are open to self-interested abuse, whether by authors, publishers or, indeed, librarians. Another factor may be the publisher's interface and whether, depending on the route followed, it interposes an abstract or takes the user directly to the full text of the article, as with a gateway like PubMed. Another area that needs to be borne in mind is the possibility of double counting when a user first checks the relevance of an article online by skimming through it in HTML format and then opens it again as a PDF file for printing now and reading later. Learning from the findings of earlier studies, the researchers opted for deep log analysis (DLA). This technique can provide more sophisticated data, for example on the number of pages viewed, the time for which the full-text file was open on screen before being replaced by other information.
In this particular study, a DLA was conducted on e-journal usage at four universities which are members of OhioLINK, and the different page views were classified as shown in Table I.
In fact, if one eliminated the students from the results, the figures showed that, on average, faculty members spent only 77s viewing each article. With two thirds of full-text downloads lasting fewer than 3min, it is clear that articles are not being read right through online. Instead the data point strongly in the direction of “bouncing”, a phenomenon whereby users tend to view only a few of the total pages available, generally plucking information in a more horizontal way, rather than reading in depth. In other words, the full-text download is often used merely as an alternative to the abstract, giving quick confirmation of the relevance, or otherwise, of a listed article to the users' own needs.
To corroborate these findings, the researchers used a questionnaire, in which they asked users about their article reading habits, for example: how they identified articles; whether they read online or in print; the average time taken to read an article. This exercise showed that online searching generates the vast majority of references for further reading (faculty 63 per cent; students 73 per cent). Students were far more likely to read online than faculty, perhaps because of being more at ease with on screen, or else because they were reluctant to pay for paper prints. Triangulating the findings of the questionnaire with those of the DLA suggested that only around 12 per cent of full-text views could be assumed to have resulted in online reading as defined above (i.e. going beyond the title, table of contents and abstract). The researchers concluded that “It is thus impossible to isolate reading from navigating, people are reading as part of searching, not search for reading”.
Contextualisation of web searching: a grounded theory approach
Yazdan Mansourian, in The Electronic Library, v. 26 (2008), issue 2, pp. 202-14
Although the web is used for a great variety of purposes, of all web-based applications, information searching and retrieval appear to be the most common. In general, however, search engines still tend to treat queries in isolation, returning identical results for the same query regardless of the particular user or of the context in which the user made the request. Since the issue of context is likely to take on greater importance, this research seeks to address two key questions in relation to web searching:
What factors might influence different conceptualisations of success/failure?
What components form the context of a search, and how does each component play a role in the final outcome of the search session?
based on a qualitative analysis of open-ended, face-to-face interviews with 37 academics, research fellows and PhD students in life sciences at a university in the north of England.
The study categorised five key contextual elements in web searching: web user's characteristics; type of search tool used; search topic; search situation; and features in the retrieved information resources (see Figure 1). The validity of these categories is supported by the findings of previous studies, though none reports all five.
Figure 1. Web search context
In turn each category could be subdivided. The first category of characteristics, for example, comprise the user's feelings (e.g. irritation; inadequacy; perplexity), their thoughts/attitudes (e.g. success depends on choice of keywords; searching is like playing chess), and their actions (e.g. changing database; asking colleagues; checking spelling; reverting to print sources).
The research suggests that such contextual elements as the importance of the search or the time allocated to it have a major influence on the user's judgment about its success or failure. From the librarian's perspective, the significance of this research is the framework it provides for evaluating the impact of contextual factors on the effectiveness of information literacy training, enabling the development of modules which encourage users to employ different strategies optimised for various search situations, whilst raising awareness of the likely outcomes of each.
The digital library in 100 years: damage control
Michael Seadle, in Library Hi Tech, v. 26 (2008) issue 1, pp. 5-10
Today's choices will effect libraries in 100 years. Digital library systems bought, like shelving, on price and performance from commercial vendors are unlikely to stand the test of time in the absence of stable international standards on a par with MARC, the ISBN, and ISSN. In Europe librarians still tend to look to national solutions and central funding to develop digital libraries, but they may eventually have to accept the American approach of funding only initial costs.
Librarians have an interest in the distant future readability of today's many digital documents, in a broad sense including seemingly vacuous computer games and multimedia works. The rapid changes in ICT over the last 50 undermines confidence in our ability to migrate or emulate programs and data from obsolete hardware or operating systems. Yet the survival of ASCII from 1963 as a subset within Unicode proves that not everything has changed. Likewise the simple mail transfer protocol dating from 1971 still exists, and the basic internet protocol of the late 1970s still manages contemporary networks. What are required for the mass migration of documents to work are conversion programs, taking into account that some of their distinguishing features of the original format may be lost.
On the risks of archiving, systems are being built for long-term storage, but these systems are themselves as susceptible to obsolescence as the digital documents they contain. Proprietary systems are unlikely to receive sustained support once their profitable life span comes to an end, which leaves either: (a) self-financing systems which distribute the costs between libraries and publishers; (b) government-funded systems, though even currently stable governments cannot guarantee the survival of their country's wealth over a 100 years; and finally (c) open systems, which tend to be less expensive, avoid many intellectual property issues, but equally cannot guarantee ongoing maintenance and support when a user-base dwindles.
When an archiving system itself ceases to function, there are a number of options, the most attractive of which entails interoperability. This implies that, as with OPACs, the data are coded and exported in a standard format (e.g. MARC21) for shifting between systems with more or fewer functions, as budgets permit. Another, less wieldy method within in an Open Archives Information System (OAIS) environment is to extract the digital objects, metadata, etc. by means of a dissemination information process then, possible with considerable human intervention and restructuring, re-ingest them in a second OAIS using a submission information process. A third method entails loading documents in a variety of systems such that, if one installation fails, the others survive. This, however, raises questions of authenticity. It is possible to imagine two further options, but these appear the most complicated, and therefore least attractive, of all: first, to mine dead archive systems for artifacts and metadata with a view to reconstruction from segments potentially scattered right across the original storage device; and second, to create an archiving system for storing obsolete archiving systems, which might permit the archived systems to run under some form of emulation.
In conclusion, guessing which information systems will survive is risky, especially when recent history shows apparently technologically and financially robust systems collapse where weaker ones persist. Librarians, however, must think long-term, with 100 years being the shortest period worth considering. After all, as information technologies go, printed books have been around in Europe for 500 years. Why should the lifespan of digital publications be any shorter? While a handful of academic institutions can boast a continuous heritage of over 500 years, companies, even governments, spring up and fade away. Many struggle to survive more than a few decades, let alone a single century; and their products and laws likewise. The safest route for librarians therefore is not to link the survival of intellectually valuable documents to a single system or a single means of financial support.
John Maxymuk, in The Bottom Line: Managing Library Finances, v. 20 (2007) issue 4, pp. 153-6
Two years after an earlier article on http://books.google.com, the author considers where it and other book digitisation projects stand with regard to the 30 million monographs held in libraries worldwide. With 10 per cent of all titles estimated as in print, 15 per cent as public domain (i.e. out of both print and copyright), and 75 per cent in a murky limbo, the state of play appears as shown in Table II.
In conclusion, the author questions as idealistic the seminal article “Scan this book” by Kevin Kelly (2006), in which it was asserted that all digitised knowledge was becoming connected for the human community. Librarians should be please about the new focus on the content of so many long-forgotten texts, while at the same time noting that a significant proportion of what is now searchable is “of little value”.
OPAC integration in the era of mass digitisation: the MBooks experience
Christina Kelleher Powell, in Library Hi Tech, v. 26 (2008) issue 1, pp. 24-32
Cataloguers at the University of Michigan first began including electronic texts in their OPAC in 1995. Initially this entailed copying records for printed versions of the works, making the necessary amendments, and then adding them to the catalogue in the same way as one would a record for a more recent edition of a printed book. This lasted the best part of ten years. However, the University of Michigan's experience in this field goes beyond that merely adding individual e-books to their collections, but is instead built on two major projects. The first was Making of America (MoA – http://moa.umdl.umich.edu), which focused on materials published between 1850 and 1857, and was funded by the Mellon Foundation. The second is its participation in the Google Books initiative. In addition to other products, the output of both projects is added to the University's own digital store, called “MBooks”.
From the outset, it was clear that past practice was insufficient to cope with the scale of the Google project. Whereas MoA began with subject specialists individually selecting titles for digitisation and then choosing the best of any multiple copies, the approach adopted by Google Books is to scan every available copy. Furthermore, target output in volumes per day was expected to equal that earlier achieved in a whole year. So, where previously a combination of letters from the author and title, or the item's own record control number would have served as a unique identifier, now scanning each printed item's barcode number became the most sensible option, then adding a distinguishing extension. And the circulation module in the library system became the principal means to manage the mass internal migration of items through the digitisation process.
Once scanned, the stored digital versions had to be linked in someway to the record displayed in the OPAC. The orthodox route would have been to add a MARC21 856 field with a URL for each e-copy to the bibliographic record, with the problem of what to do should the URL change for any reason. So, rather than that, a decision was taken to exploit the second call number field for each item in the holdings record, and use that to capture a persistent link to the object in the MBooks digital store. An expand routine in their library system, Aleph, then largely automates the process of adding the required additional MARC21 fields, including the 533 field reproduction note and a “virtual 856 field”.
Beside the circulation physical items for scanning and the linking of digital versions to the catalogue, one further major issue has been that of copyright. Here, building on the experience of the earlier MoA project, which moved from the individual selection of titles to use of the fixed-length data elements chiefly in the 008 field of the MARC records, the Digital Library Production Service has turned again to data encoded in the same field, using algorithms to check whether or not a work is likely to be in or out of copyright based on the date, country and type of publication, taking into account, for example, whether or not it was published before 1920 by a US government body. Because the OPAC is not programmed to store encoded copyright information, a separate database has been generated, and a record added summarising the rights for each volume. These data are then linked back to the holdings records in the OPAC, where they are used to determine the extent to which the users may access the digital versions, whether “full text” or “search only”. Mass identification of copyright status in this manner obviously leads to a proportion of unnecessary restrictions.
Usage of MBooks is encouraging, especially locally, with around 200,000 pages viewed in the six months following launch April 2007, and development of the relationship between MBooks and the library's OPAC continues. One interesting recent feature relates to enhanced access for people registered with the University's Office of Services for Students with Disabilities as having a visual impairment. In their case, when they check out a book, an email is automatically sent to them notifying them of the URL to a digitised copy, thus enabling them either to enlarge the images of the book or to make use of the OCR'd text reading facilities.
The library and the internet
Boria Sax, in On the horizon, v. 16 (2008) no. 1, pp. 6-12
“There is an old enmity between books and reality,” wrote the German philosopher, Hans Blumenberg. “What is written usurps the place of reality, to finally render it anachronistic and superfluous. The tradition of writing and finally printing constantly leads to a reduction in the authenticity of experience”. Published in 1981, well before the massive explosion of the internet, of which he made no mention, Blumenberg went on to challenge the status of the university library as the repository of human knowledge, recalling the revolutions of the young against the artificiality of the world of books, the notion that it is pointless to revisit and revise our understanding of things already accounted for.
The problem with the culture of the book, according to Blumenberg, lay not only in its claim to capture learning but also in the way is defined and structured our understanding of what knowledge is, leading to a conception of the whole of nature as itself a massive book, waiting to be reformatted into print for wider study. A recent expression of this urge to reduce life to the level of metaphor can be found in the perception of the genes in DNA as a form of encoded language.
However, the rapid shift towards the digitisation of knowledge is not simply driven by technology but arises no less, it is argued, from the exhaustion of traditional culture of the book, whose only defence appears to come from a fear of barbarism. For most the library is synonymous with drudgery. Hence, as Edward Castronova (2005) asks in considering synthetic worlds, “what are the conditions under which daily life would be the best game to play, better that any computer-generated fantasy, and for how many people will these conditions apply going into the future”?
Nevertheless it is equally reasonable to view digital technologies as in some ways slowing the end of literary culture and postponing the inevitable upheaval in academia. This is because, in their present state of development, these technologies have yet to overcome the tension between print and reality; in the main, all they have achieved is to render the same tension in a different form. Is digital technology truly new, with the potential to transform human culture? Or is it no more that a sophisticated extension of the prevalent print technology of the last five centuries? Or perhaps it is something in between.
Whatever digital technology is, it is already affecting out colleges and universities. For some, the advent of online learning has brought a resurgence of John Dewey's concept of “progressive education” from the first half of the last century, melding book learning with experience. One of the effects of this is to transform the nature of the university from an institution centred not on its library but on the internet.
Yet this massive structural change in universities is still some way off. As evidence, one can start with the most prevalent learning management systems currently in use in the USA. Rather than encouraging a constructivist approach, in which student and teacher actively collaborate, it tends to support a more behaviourist approach to learning, replicating the traditional classroom setting in cyberspace. In order to facilitate the monitoring of outcomes on account of the growing pressure for “accountability”, the present systems tend to strengthen those elements which maximise faculty and administrator control. However, in true constructivism, knowledge is created within the learning environment not simply loaded into it in advance or week by week. It could even be that what is learnt, and also how it is learnt, differ significantly from one student to another.
The essential innovation lies in viewing the learning process as a three-dimensional activity rather than one of linear progression. Increased use of technology on its own is as unlikely to achieve that goal as the publication of a single book. Even when, in the history of traditional education, such change did occur, as for example with Copernicus' work on the solar system, the older doctrine persisted for some time. In the same way, although educationalists largely agree on the need for a constructivist approach, the new cultural expectations coupled with what is technologically possible have yet to slough off the baggage of the past. So, alluding to an earlier comment, given that even the most extensive library contains only a fraction of all the information found in cyberspace, it seems that the internet has not supplanted the library so much as absorbed it. And this is true even for those libraries with the most exotic collections.
The organisation of universities today by faculties and departments preserves the traditional divisions of knowledge by discipline, despite the fact that such tightly delineated boundaries are as meaningless in the internet as, arguably, they tend to be in real life. By accepting this, the universities will find that their purpose changes from the generation of knowledge, inherited from the nineteenth century, to furnishing knowledge with a necessary structure and significance.
While learning management systems, such as Blackboard and WebCT, may have solved the logistical problems posed by online learning they have not addressed the broader question of the goals of education and its future role in society. Like Communism, the current organisational structures of higher education may persist for a time, basking in past glories and tradition, even though the ideology that brought them into being is defunct. But do they retain the vigour to survive in a crisis, such as a major recession? Already the task of incorporating the vast array of new technologies, such as podcasts, into learning is proving too greater even for the most sophisticated of management systems.