Human Information Retrieval

Christine Urquhart (University of Aberystwyth, Aberystwyth, UK)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 26 July 2011

314

Keywords

Citation

Urquhart, C. (2011), "Human Information Retrieval", Journal of Documentation, Vol. 67 No. 4, pp. 740-742. https://doi.org/10.1108/00220411111145098

Publisher

:

Emerald Group Publishing Limited

Copyright © 2011, Emerald Group Publishing Limited


This book is based around previously published papers, mostly from the Journal of the American Society for Information Science and Technology. The introduction claims that the book:

[…] adopts an inclusive understanding of information retrieval systems, developed from common understandings and conveyed by ostensive exemplification rather than restrictive definition. In particular, the common antithesis between experimental and operational systems is dissolved. The real source of contrast between the types of system likely has been different forms of the description process, particularly the experimental preference for machine generation rather than human selection of index terms and for non‐Boolean searches for those descriptions. When made explicit, the basis for the distinctions between experimental and operational information retrieval systems appears theoretically weak. The distinction is being increasingly eroded in practice, with operational systems possibly selecting records or documents by directly Boolean operations but ordering retrieved documents on the basis of other indicators.

This extract left me, and perhaps you, puzzled. If it is any consolation, then the readability score of that extract, according to my Microsoft Word calculator amounts to a reading ease score of zero (in other words, extremely difficult to read) or a grade level of 19, which would mean (apparently) that one would need 19 years of schooling to be able to read the extract with ease. Readability calculations are only an indicator, and the Microsoft Word calculations would possibly work more accurately with larger amounts of text, but much of the book is very hard to read, with the result that the expected audience are likely to be unsure whether they have understood some of the claims correctly or not. There is no need for academic gobbledygook, and one reasonable expectation of a book that reflects and integrates the thinking behind previously published journal articles would be a simpler account, with clearer definitions, more examples, and more discussion. Unfortunately, I am not convinced that this book supplies that level of debate or the clear explanations that make it suitable for final year undergraduate or postgraduate students. In the small doses that journal articles provide, comprehension is perhaps easier. Integrated into a book, the effect is overwhelmingly turgid, particularly in the later chapters. Of course it is easier to spew out the jargon, comforted that a substantial proportion of your audience will assume that their difficulties in comprehension are their fault, not that yours, as the writer. I have spent a substantial proportion of my career in library and information science in writing distance learning materials for students, composing abstracts of highly technical literature and also attempting to write lay summaries for Cochrane reviews. And yes, it is more difficult to write clear text than to resort to strings of long words, perhaps intended to impress. For this book, the problems of readability detract from some interesting ideas.

Chapters 1‐3 introduce the idea of selection power, “the human ability to make informed choices between objects or representations of objects” (p. 18), adopted for this book as the primary aim of information retrieval systems. To support that ability, work (labour) is required, whether human or machine (computer). Warner appears to be basing his thesis on the Marxist distinctions of universal and communal labour, and the diagram on p. 26 illustrating how products (e.g. catalogues, bibliographic databases) may result from human labour and process (human or machine). It is not the aim of the book to delve into the more recent debates about labour process theory, although considerations of the value of labour, in particular the unique value of human labour for some aspects of information retrieval work are proposed in the chapter 4 synthesis. As a perspective, it is workable, although there seems to be little place for the mid‐way position where humans and machine work together – perhaps related to the theory of flexibility of labour. Personally, I find Durkheim's functionalist views of the division of labour an easier way of following Warner's arguments about the division of labour, as these lead to debate about the value of professional (service‐related) input to society at large. It is a little surprising that no reference was made to the use of activity theory (also based on Marxist principles), which has been proposed as a framework for information behaviour research and information systems design (Wilson, 2006). Warner proposes that human mental labour can be transferred to information technology, and that human mental labour separates into semantic labour and syntactic labour – and that is the latter of these two that can be transferred to information technology (becoming process). Selection labour separates into description labour and search labour, and humans need to deal with the semantic components of such labour. If I have understood this correctly, then these ideas can be applied to the problems of searching for relevant items for systematic reviews of the health literature. Normally the highly sensitive search strategies produce hundreds, if not thousands of items, and screening the search outputs is a labour intensive process. Search aids, such as the RCT publication tag, are syntactic search labour (machine aided – the labour becomes process). Similarly, the description labour syntactic component, in the form of the PubMed related article algorithm, can be transferred to information technology, although it is not something that humans could do in the first place – do we have a problem here? However, if Warner's views hold, then the description semantic labour (by MEDLINE indexers) and the search labour (by retrieval experts for systematic reviews) cannot be transferred to machine. It might be possible to save on search labour time by increasing description labour (but only where the knowledge community has agreed, precise, and stable definitions of terms). Some types of machine learning (e.g. text mining) may help to speed up screening – these are “semi‐automated”, but the Warner framework does not seem to have a good place for these.

Chapters 5 to 8 deal with full text retrieval, Chapter 6, for example, discussing the semantics for retrieval from full text. This chapter makes heavy use of de Saussure's work on linguistics. Saussure has been called the father of modern linguistics, but given the fact that the 1916 work was edited after his death by students, he might be better termed the grandfather of modern linguistics. There has been later work on semantics (see Lyons, 1977, for example) and I would expect to find some reference to critiques of Saussurean views of syntagma. I am uneasy about this chapter, as there has been so much later cognitive psychological research on the way humans categorise. The discussion bears little resemblance to practical problems of information retrieval that have concerned me. Similarly, Chapter 7, while carefully trying to relate information theory (Shannon) with the work of de Saussure, did not convince me that there was anything of interest. I checked recent issues of Information Processing and Management. Topics included management of suffixes, structured queries in probabilistic XML retrieval systems, epistemic modality (the writer's estimation of the validity of propositional content in texts), fuzzy models for text summarisation, combining evidence with a probabilistic framework for answer ranking, identification of semantic relations between nominals (nouns). I do not see how Chapters 7 and 8 would illuminate many, if any, of these articles. Unless the Warner framework could be applied easily to questions around probabilistic information retrieval, or the methods used for text summarisation, it is hard to see how the signifier/signified and syntagma ideas apply. Often the cheap and cheerful methods of text summarisation – the shallow approaches, based on very simple text analysis, work just as well as deeper approaches, based on discourse analysis that takes participants' knowledge and purposes into account (Urquhart and Urquhart, 2000). If I were faced with the dates provided on the list of references at the back of the book in a student assignment, I would probably be phrasing some comments about the need to read the more recent literature. That applies particularly to the sections dealing with linguistics. In summary, some interesting ideas but more work is required. Just for the record, this review, excluding the quote from the book, has a reading ease of 30.2.

References

Lyons, J. (1977), Semantics, Vol. 1, CUP, Cambridge.

Urquhart, C. and Urquhart, A. (2000), “Review of Mani., and Maybury, M.T. (Eds). Advances in Automatic Text Summarisation”, Education for Information, Vol. 18, pp. 2235.

Wilson, T.D. (2006), “A re‐examination of information seeking behaviour in the context of activity theory”, Information Research, Vol. 11 No. 4, paper 260, available at: http://InformationR.net/ir/11‐4/paper260.html (accessed 13 January 2011).

Related articles