What's in a Word‐List? Integrating Word Frequency and Keyword Extraction

Madely du Preez (University of South Africa, South Africa)

The Electronic Library

ISSN: 0264-0473

Article publication date: 13 November 2009

122

Keywords

Citation

du Preez, M. (2009), "What's in a Word‐List? Integrating Word Frequency and Keyword Extraction", The Electronic Library, Vol. 27 No. 6, pp. 1045-1046. https://doi.org/10.1108/02640470911004129

Publisher

:

Emerald Group Publishing Limited

Copyright © 2009, Emerald Group Publishing Limited


What's in a Word‐List? is the third volume in the Digital Research in the Arts and Humanities series. Every title in this series critically examines the applications of advanced information communication technologies (ICTs) in the Arts and Humanities and is a compilation of papers presented at Expert Seminars. The focus of these Expert Seminars is on the impact of new technologies in academic research. These publications are the legacy of the British Arts and Humanities Research Council (AHRC) ICT Methods Network.

In the introduction to this volume, editor Dawn Archer contends that the frequency with which particular words are used in a text can tell something meaningful about that text and also about the author. To do this, researchers interested in frequency and keyword analysis use automatic computational techniques, such as text‐mining procedures, to construct word lists that can be analysed in various ways. These lists are mainly used to identify (linguistic) items which are likely to be of interest in terms of the texts' aboutness and structuring, and likely to repay further study.

What's in a Word‐List? is an edited collection of papers written by a group of internationally renowned experts involved in the promotion of ICT methods. The aims of the volume are to demonstrate:

  • the benefits that can be gained from engaging in linguistic techniques such as frequency and keyword analysis; and

  • the very broad applicability of these techniques both within and outside the academic world.

The contributors identified some issues that are crucial in the successful applications of corpus linguistic techniques within and beyond the field of linguistics. The collection thus brings together cutting‐edge research respecting the construction – and use – of word‐lists for the analysis of both the frequency and keyword usage.

The collection begins with an assessment of the concept of word. A discussion of high versus low frequency words and authorship attribution then follows. Subsequently, some alternatives to techniques based on word searching are introduced. Variable spelling within historical texts and the difficulties that this occasions when seeking to “catch a word” in corpora is the focus of chapter 5. Chapter 6 tackles the issue of reference corpora by looking at how bad a reference corpus can be before it becomes unusable.

Three chapters discuss the utilisation of WordSmith Tools and USAS. WordSmith Tools is used in two subsequent chapters to determine the extent of moral panic in Mary Whitehouse's books and to examine a small corpus of debates on fox hunting. The USAS system is used to explore the concept of love in three Shakespearean love‐tragedies. The final chapter reports on several AHRC ICT Methods Network promotional events that have helped to bring frequency and keyword extraction techniques to a wider community of users.

What's in a Word‐List? is an impressive and exciting collection of papers exploring issues that are fundamental to corpus linguistics. This collection of papers provides a comprehensive and up‐to‐date survey of the most exciting research being conducted in this subject. Researchers interested in word frequency issues will therefore find this book a most valuable addition to their academic literature collection.

Related articles